I've searched all over stackoverflow / google for this, but can't seem to figure it out.
I'm scraping social media links of a given URL page, and the function returns an object with a list of URLs.
When I try to write this data into a different file, it outputs to the file as [object Object] instead of the expected:
[ 'https://twitter.com/#!/101Cookbooks',
'http://www.facebook.com/101cookbooks']
as it does when I console.log() the results.
This is my sad attempt to read and write a file in Node, trying to read each line(the url) and input through a function call request(line, gotHTML):
fs.readFileSync('./urls.txt').toString().split('\n').forEach(function (line){
console.log(line);
var obj = request(line, gotHTML);
console.log(obj);
fs.writeFileSync('./data.json', obj , 'utf-8');
});
for reference -- the gotHTML function:
function gotHTML(err, resp, html){
var social_ids = [];
if(err){
return console.log(err);
} else if (resp.statusCode === 200){
var parsedHTML = $.load(html);
parsedHTML('a').map(function(i, link){
var href = $(link).attr('href');
for(var i=0; i<socialurls.length; i++){
if(socialurls[i].test(href) && social_ids.indexOf(href) < 0 ) {
social_ids.push(href);
};
};
})
};
return social_ids;
};
Building on what deb2fast said I would also pass in a couple of extra parameters to JSON.stringify() to get it to pretty format:
fs.writeFileSync('./data.json', JSON.stringify(obj, null, 2) , 'utf-8');
The second param is an optional replacer function which you don't need in this case so null works.
The third param is the number of spaces to use for indentation. 2 and 4 seem to be popular choices.
obj is an array in your example.
fs.writeFileSync(filename, data, [options]) requires either String or Buffer in the data parameter. see docs.
Try to write the array in a string format:
// writes 'https://twitter.com/#!/101Cookbooks', 'http://www.facebook.com/101cookbooks'
fs.writeFileSync('./data.json', obj.join(',') , 'utf-8');
Or:
// writes ['https://twitter.com/#!/101Cookbooks', 'http://www.facebook.com/101cookbooks']
var util = require('util');
fs.writeFileSync('./data.json', util.inspect(obj) , 'utf-8');
edit: The reason you see the array in your example is because node's implementation of console.log doesn't just call toString, it calls util.format see console.js source
If you're geting [object object] then use JSON.stringify
fs.writeFile('./data.json', JSON.stringify(obj) , 'utf-8');
It worked for me.
In my experience JSON.stringify is slightly faster than util.inspect.
I had to save the result object of a DB2 query as a json file, The query returned an object of 92k rows, the conversion took very long to complete with util.inspect, so I did the following test by writing the same 1000 record object to a file with both methods.
JSON.Stringify
fs.writeFile('./data.json', JSON.stringify(obj, null, 2));
Time: 3:57 (3 min 57 sec)
Result's format:
[
{
"PROB": "00001",
"BO": "AXZ",
"CNTRY": "649"
},
...
]
util.inspect
var util = require('util');
fs.writeFile('./data.json', util.inspect(obj, false, 2, false));
Time: 4:12 (4 min 12 sec)
Result's format:
[ { PROB: '00001',
BO: 'AXZ',
CNTRY: '649' },
...
]
Could you try doing JSON.stringify(obj);
Like this:
var stringify = JSON.stringify(obj);
fs.writeFileSync('./data.json', stringify, 'utf-8');
Just incase anyone else stumbles across this, I use the fs-extra library in node and write javascript objects to a file like this:
const fse = require('fs-extra');
fse.outputJsonSync('path/to/output/file.json', objectToWriteToFile);
Further to #Jim Schubert's and #deb2fast's answers:
To be able to write out large objects of order which are than ~100 MB, you'll need to use for...of as shown below and match to your requirements.
const fsPromises = require('fs').promises;
const sampleData = {firstName:"John", lastName:"Doe", age:50, eyeColor:"blue"};
const writeToFile = async () => {
for (const dataObject of Object.keys(sampleData)) {
console.log(sampleData[dataObject]);
await fsPromises.appendFile( "out.json" , dataObject +": "+ JSON.stringify(sampleData[dataObject]));
}
}
writeToFile();
Refer https://stackoverflow.com/a/67699911/3152654 for full reference for node.js limits
Related
So I have a text file test.txt with lines similar to:
08/12/2021
test1
test2
test3
... (some entries)
12/12/2021
test21
test22
test23
... (some entries)
24/12/2021
What should I to write next in order to filter the text file to get the lines between the two newest dates??
const fs = require('fs');
fs.watchFile('test.txt', (eventType, filename) => {
fs.readFile('test.txt', 'utf-8', (err, data) => {
const arr = data.toString().replace(/\r\n/g,'\n').split('\n');
...
The output will be something such as:
test21
test22
test23
... (some entries)
Which are the entries between the two newest dates.
Update:
The text file is actually constantly writing in entries and will input the current date at the end of the day. Which now I am trying to extract the entries between the previous and newest date for further process
I don't know about javascript but it seems you are looking like this one
Is there any way to find data between two dates which are presend in same string and store it to json object
You can do one thing, find the indexOf start date and end date then you can slice the contents. You can do something like this:
try {
const data = fs.readFileSync('test.txt', {encoding:'utf8', flag:'r'}),
start = data.indexOf('start date'),
end = data.lastIndexOf('end data');
const trimText = data.slice(start, end);
} catch(e) {
console.log(e)
}
This method will work well for small file if the file is large we need to read it asynchronously and check for the start and end date while reading it.
const test_data = `08/12/2021
test0
test2
test3
12/12/2021`;
console.log(
test_data.split(/\d+\/\d+\/\d+\n?/g).slice(1, -1)
);
stream-json noob here. I'm wondering why the below code is running out of memory.
context
I have a large JSON file. The JSON file's structure is something like this:
[
{'id': 1, 'avg_rating': 2},
{'id': 1, 'name': 'Apple'}
]
I want to modify it to be
[
{'id': 1, 'avg_rating': 2, 'name': 'Apple'}
]
In other words, I want to run a reducer function on each element of the values array of the JSON (Object.values(data)) to check if the same id is entered into different keys in the json, and if so "merge" that into one key.
The code I wrote to do this is:
var chunk = [{'id': 1, 'name': 'a'},{'id': 1, 'avg_rating': 2}]
const result = Object.values(chunk.reduce((j, c) => {
if (j[c.id]) {
j[c.id]['avg_rating'] = c.avg_rating
} else {
j[c.id] = { ...c };
}
return j;
}, {}));
console.log(result)
The thing is, you cannot try to run this on a large JSON file without running out of memory. So, I need to use JSON streaming.
the streaming code
Looking at the stream-json documentation, I think I need to use a Parser to take in text as a Readable stream of objects and output a stream of data items as Writeable buffer/text "things".
The code I can write to do that is:
const {chain} = require('stream-chain');
const {parser} = require('stream-json/Parser');
const {streamValues} = require('stream-json/streamers/StreamValues');
const fs = require('fs');
const pipeline = chain([
fs.createReadStream('test.json'),
parser(),
streamValues(),
data => {
var chunk = data.value
const result = Object.values(chunk.reduce((j, c) => {
if (j[c.id]) {
j[c.id]['avg_rating'] = c.avg_rating
} else {
j[c.id] = { ...c };
}
return j;
}, {}));
//console.log(result)
return JSON.stringify(result);
},
fs.createWriteStream(fpath)
])
To create a write stream (since I do want an output json file), I just added to the parse function above fs.createWriteStream(filepath) , but it looks like -- while this works on a small sample -- this doesn't work for a large JSON file: I get the error "heap out of memory".
attempts to fix
I think the main issue of the code is that "chunk" philosophy is wrong. If this works via "streaming" a JSON line by line (?), then "chunk" might be trying to save all the data that the program has run into so far, whereas I really only want it to run a reducer function in batches. I then am kind of back at square one .. how would I merge the key-value pairs of a JSON if the id is the same?
If the data custom code isn't the problem, then I get the feeling I need to use a Stringer , since I want to edit a stream with custom code, and save it back to a file.
However, I can't seem to get how Stringer reads data, as the below code runs an error where data is undefined:
const pipeline = chain([
fs.createReadStream('testjson'),
parser(),
data => {
var chunk = data.value
const result = Object.values(chunk.reduce((j, c) => {
if (j[c.id]) {
j[c.id]['avg_rating'] = c.avg_rating
} else {
j[c.id] = { ...c };
}
return j;
}, {}));
console.log(result)
return JSON.stringify(result);
},
stringer(),
zlib.Gzip(),
fs.createWriteStream('edited.json.gz')
])
I would greatly appreciate any advice on this situation or any help diagnosing the problems in my approach.
Thank you!!
While this is certainly an interesting question - I have the liberty to just restructure how the data's scraped, and as such can bypass having to do this at all.
Thanks all!
I'm working with the twitter API and I'm hitting a really confusing issue.
I have the following script:
const Twitter = require('twitter-api-stream')
const twitterCredentials = require('./credentials').twitter
const twitterApi = new Twitter(twitterCredentials.consumerKey, twitterCredentials.consumerSecret, function(){
console.log(arguments)
})
twitterApi.getUsersTweets('everycolorbot', 1, twitterCredentials.accessToken, twitterCredentials.accessTokenSecret, (error, result) => {
if (error) {
console.error(error)
}
if (result) {
console.log(result) // outputs an array of json objects
console.log(result.length) //outputs 3506 for some reason (it's only an array of 1)
console.log(result[0]) // outputs a opening bracket ('[')
console.log(result[0].text) // outputs undefined
}
})
Which is calling the following function to interact with twitter:
TwitterApi.prototype.getUsersTweets = function (screenName, statusCount, userAccessToken, userRefreshToken,cb ) {
var count = statusCount || 10;
var screenName = screenName || "";
_oauth.get(
"https://api.twitter.com/1.1/statuses/user_timeline.json?count=" + count + "&screen_name=" + screenName
, userAccessToken
, userRefreshToken
, cb
);
};
It seems like I'm getting the result I want. When I log the result itself I get the following output:
[
{
"created_at": "Thu Sep 01 13:31:23 +0000 2016",
"id": 771339671632838656,
"id_str": "771339671632838656",
"text": "0xe07732",
"truncated": false,
...
}
]
Which is great, an array of the tweets limited to 1 tweet.
The problem I'm running into is when I try to access this array.
console.log(result.length) //outputs 3506 for some reason (it's only an array of 1)
console.log(result[0]) // outputs a opening bracket ('[')
console.log(result[0].text) // outputs undefined
I read back through the api docs for the user_timeline but unless I'm completely missing it I'm not seeing any mention of special output.
Any ideas?
Update
Thanks #nicematt for pointing out that answer.
Just to elaborate on the solution, I updated my code to this and now I'm getting the result I want:
if (result) {
let tweet = JSON.parse(result)[0] // parses the json and returns the first index
console.log(tweet.text) // outputs '0xe07732'
}
Thanks for the help!
Result is a String and you're indexing it (result[0] (whereas 0 is converted to a string), is almost identical to result.charAt(0) though), this is why result[0] is equal to "["–because it's the first character specified in. You forgot to parse the result as JSON data.
JSON.parse(result).length // probably 1
And result.text is undefined since result (a string) is like an Object (but isn't instanceof) and allow lookups and getters to happen in itself.
I'd show the difference between str[0] and str.charAt(0), too:
str[0] // same as str['0'], but a getter. 0 converts to
// string (because every key of an object
// is string in ECMAScript)
str.charAt(0) // get/lookup String#charAt, call it
// without new `this` context and with arguments list: 0
I have a control that returns 2 records:
{
"value": [
{
"ID": 5,
"Pupil": 1900031265,
"Offer": false,
},
{
"ID": 8,
"Pupil": 1900035302,
"Offer": false,
"OfferDetail": ""
}
]
}
I need to test via Postman, that I have 2 records returned. I've tried various methods I've found here and elsewhere but with no luck. Using the code below fails to return the expected answer.
responseJson = JSON.parse(responseBody);
var list = responseBody.length;
tests["Expected number"] = list === undefined || list.length === 2;
At this point I'm not sure if it's the API I'm testing that's at fault or my coding - I've tried looping through the items returned but that's not working for me either. Could someone advise please - I'm new to javascript so am expecting there to be an obvious cause to my problem but I'm failing to see it. Many thanks.
In postman, under Tests section, do the following (screenshot below):
var body = JSON.parse(responseBody);
tests["Count: " + body.value.length] = true;
Here is what you should see (note: I replaced responseBody with JSON to mock up example above):
Correct your json. and try this.
=======================v
var test = JSON.parse('{"value": [{"ID": 5,"Pupil": 1900031265,"Offer": false},{"ID": 8,"Pupil": 1900035302,"Offer": false,"OfferDetail": ""}] }')
test.value.length; // 2
So you need to identify the array in the json (starting with the [ bracket. and then take the key and then check the length of the key.
Here's the simplest way I figured it out:
pm.expect(Object.keys(pm.response.json()).length).to.eql(18);
No need to customize any of that to your variables. Just copy, paste, and adjust "18" to whatever number you're expecting.
This is what I did for counting the recods
//parsing the Response body to a variable
responseJson = JSON.parse(responseBody);
//Finding the length of the Response Array
var list = responseJson.length;
console.log(list);
tests["Validate service retuns 70 records"] = list === 70;
More updated version of asserting only 2 objects in an array:
pm.test("Only 2 objects in array", function (){
pm.expect(pm.response.json().length).to.eql(2);
});
Your response body is an object you cannot find the length of an object try
var list = responseJson.value.length;
First of all you should convert response to json and find value path. Value is array. You should call to length function to get how many objects in there and check your expected size
pm.test("Validate value count", function () {
pm.expect(pm.response.json().value.length).to.eq(2);
});
I had a similar problem, what I used to test for a certain number of array members is:
responseJson = JSON.parse(responseBody);
tests["Response Body = []"] = responseJson.length === valueYouAreCheckingFor;
To check what values you're getting, print it and check the postman console.
console.log(responseJson.length);
Counting records in JSON array using javascript and insomnia
//response insomnia
const response = await insomnia.send();
//Parse Json
const body = JSON.parse(response.data);
//Print console:
console.log(body.data.records.length);
pm.test("Only 2 objects in array", function (){
var jsonData = pm.response.json();
let event_length = jsonData.data.length;
pm.expect(event_length).to.eql(2);
});
As mentioned in the comments, you should test responseJson.value.length
responseJson = JSON.parse(responseBody);
tests["Expected number"] = typeof responseJson === 'undefined' || responseJson.value.length;
I was facing similar issue while validating the length of an array inside a JSON. The below snippet should help you resolve it-
responseJson = JSON.parse(responseBody);
var list = responseBody.length;
tests["Expected number"] = responseJson.value.length === list;
Working Code
pm.test("Verify the number of records",function()
{
var response = JSON.parse(responseBody);
pm.expect(Object.keys(response.value).length).to.eql(5);
});
//Please change the value in to.eql function as per your requirement
//'value' is the JSON notation name for this example and can change as per your JSON
when saving an array of objects as a JSON, you need to use the following format in Sample.txt to not run into parsing errors:
[{"result":"\"21 inches = 21 inches\"","count":1},{"result":"\"32 inches = 32 inches\"","count":2}]
I'm new to JSON and searching over this for since last 4 days. I tried different approaches of storing an array of objects but no success. My first and simplest try is like this:
function createData() {
//original, single json object
var dataToSave = {
"result": '"' + toLength.innerText +'"',
"count": counter
};
//save into an array:
var dataArray = { [] }; //No idea how to go ahead..
var savedData = JSON.stringify(dataToSave);
writeToFile(filename, savedData); //filename is a text file. Inside file, I want to save each json object with , in between. So It can be parsed easily and correctly.
}
function readData(data) {
var dataToRead = JSON.parse(data);
var message = "Your Saved Conversions : ";
message += dataToRead.result;
document.getElementById("savedOutput1").innerText = message;
}
To make an array from your object, you may do
var dataArray = [dataToSave];
To add other elements after that, you may use
dataArray.push(otherData);
When you read it, as data is an array, you can't simply use data.result. You must get access to the array's items using data[0].result, ... data[i].result...