I'm developing a multi-process application using Node.js. In this application, a parent process will spawn a child process and communicate with it using a JSON-based messaging protocol over a pipe. I've found that large JSON messages may get "cut off", such that a single "chunk" emitted to the data listener on the pipe does not contain the full JSON message. Furthermore, small JSON messages may be grouped in the same chunk. Each JSON message will be delimited by a newline character, and so I'm wondering if there is already a utility that will buffer the pipe read stream such that it emits one line at a time (and hence, for my application, one JSON document at a time). This seems like it would be a pretty common use case, so I'm wondering if it has already been done.
I'd appreciate any guidance anyone can offer. Thanks.
Maybe Pedro's carrier can help you?
Carrier helps you implement new-line
terminated protocols over node.js.
The client can send you chunks of
lines and carrier will only notify you
on each completed line.
My solution to this problem is to send JSON messages each terminated with some special unicode character. A character that you would never normally get in the JSON string. Call it TERM.
So the sender just does "JSON.stringify(message) + TERM;" and writes it.
The reciever then splits incomming data on the TERM and parses the parts with JSON.parse() which is pretty quick.
The trick is that the last message may not parse, so we simply save that fragment and add it to the beginning of the next message when it comes. Recieving code goes like this:
s.on("data", function (data) {
var info = data.toString().split(TERM);
info[0] = fragment + info[0];
fragment = '';
for ( var index = 0; index < info.length; index++) {
if (info[index]) {
try {
var message = JSON.parse(info[index]);
self.emit('message', message);
} catch (error) {
fragment = info[index];
continue;
}
}
}
});
Where "fragment" is defined somwhere where it will persist between data chunks.
But what is TERM? I have used the unicode replacement character '\uFFFD'. One could also use the technique used by twitter where messages are separated by '\r\n' and tweets use '\n' for new lines and never contain '\r\n'
I find this to be a lot simpler than messing with including lengths and such like.
Simplest solution is to send length of json data before each message as fixed-length prefix (4 bytes?) and have a simple un-framing parser which buffers small chunks or splits bigger ones.
You can try node-binary to avoid writing parser manually. Look at scan(key, buffer) documentation example - it does exactly line-by line reading.
As long as newlines (or whatever delimiter you use) will only delimit the JSON messages and not be embedded in them, you can use the following pattern:
let buf = ''
s.on('data', data => {
buf += data.toString()
const idx = buf.indexOf('\n')
if (idx < 0) { return } // No '\n', no full message
let lines = buf.split('\n')
buf = lines.pop() // if ends in '\n' then buf will be empty
for (let line of lines) {
// Handle the line
}
})
Related
so i need to get from a text file alot of data and when i use fs.createReadStream and copy the data to a varible and start changing it it looks like \n and \r and present and the are messing my splits and array checking and i tried to do a function that removes them that runs on the array and checks for ''(it doesnt work for some reason)
if(arr[i]==='\'(this throws the mistake){**strong text**
(removing it and stuff)
}
do you have any idea how to remove it?
You can either use the String prototype replaceAll, or work with split and join:
let data = '\r \n sdfsdf. \r'
data = data.replaceAll(`\r`, '')
data = data.replaceAll(`\n`, '')
OR ------------------------
let data = '\r \n sdfsdf. \r'
data = data.split(`\r`).join('').split(`\n`).join()
Pay attention - replaceAll is a new prototype and exists only on Node v.15+ and the latest versions of the modern browsers.
See MDN documents to check your needs.
My problem is that I have to concatenate all the text after the "product_link_href": in a huge serie of things (there are 200+ of these so I couldn't post the entire thing) like:
Solved snippet removed for privacy reasons
It's coming from an api, it's white in windows power shell, the name of the thing is response.data I'm using axios; I think this thing to the machine is just plain text, because it was green before I selected it whit .data ; but I still need all the text after "product_link_href": concatenated, in text format and sepsarated by ","
The code I am using is
axios.get('https://randomapi/' + id + '/json?api_token=examplenotrealapitoken').then(response => {
console.log(response.data);
});
I tried JSON.parse and stringify but nothing works.
The response from the server is stringified JSON objects that have been concatenated with line return characters "\n".
It is not JSON parseable because it is not an array, which it needs to be, to be valid JSON.
The approach I took is to coerce it to "an array of stringified JSON objects". Since each object is shallow, there is no nesting, so the } character is unambiguously the end of a stringified object.
You can call massiveJSONishString.split('}'), and you get an array of JSON-stringified objects with the trailing } missing on each one.
Then you map over that array, and for each element, add the trailing } that we threw away to array-ify it, and JSON.parse() that string, producing an array of JSON objects.
This is the code you are looking for:
const textArray = res.data.split("}");
const jsonArray = textArray.map(element => {
try {
return JSON.parse(`${element}}`);
} catch (e) {
return {
product_link_href: "MALFORMED JSON"
};
}
});
// console.log(jsonArray);
const product_link_hrefs = jsonArray.map(obj => obj.product_link_href);
const list = product_link_hrefs.join(", ");
console.log(list);
console.log(`You're welcome!`);
My Setup:
What I'm doing is pulling a very large table from MySQL (40,000+ Rows)
SpoolInfo.request.query("SELECT SpoolNumber AS a, DrawingNumber AS b, LineSize AS c, PipeSpec AS d, CU AS e, SheetNumber AS f, FluidCode AS g FROM Job" + JobNumber + ".SpoolInfo ORDER BY SpoolNumber ASC");
Doing a simple operation to it to format it how it is needed for the application I have to send it to:
if(SpoolInfo.response.data){
SpoolInfoRowsObj = {"Cache":[]};
SpoolInfo.response.data.forEach(function(d,di){
SpoolInfoRowsObj.Cache.push(_.values(SpoolInfo.response.data[di]));
});
Then storing it back into a Cache table for quicker access:
(This is needed because I can't find a way to make the .forEach loop to run faster, and currently takes over 20 seconds.)
UpdateCacheTable1.request = new AP.MySQL.Request();
UpdateCacheTable1.request.execute("UPDATE `Job" + JobNumber + "`.`CacheTable_SpoolInfo` SET `SpoolInfoObj` = '" + JSON.stringify(SpoolInfoRowsObj.Cache) + "' WHERE ID = 1");
The Problem:
When I retrieve that data back later:
CacheSpoolInfo.request.query("SELECT SpoolInfoObj FROM Job" + GetJobNumber.response.data[0].JobNumber + ".CacheTable_SpoolInfo ");
CommRef_.SpoolInfo = CacheSpoolInfo.response.data
And try to use is like this:
....
}else if (t == "SpoolInfo"){
rowsObj = {"Rows":[]};
rowsObj = {"Rows":JSON.parse(CommRef_.SpoolInfo[0].SpoolInfoObj)};
}else{
....
It has a bunch of extra "\" stuff in it for all the special characters like:
"Rows": [
[
"[[\" 1-AII22-84042S01 \",\"1040\r\"],[\"0-A102-27564S01 \",\"110\r\"],.....
]
]
So of course the ouput is not the same JSON structured format that I saved.
I kinda thought using JSON.stringify() and JSON.parse() as I did would handle this.
My Question:
How can I handle this so the data I pull back can be handled the same way it would have been before I sent it to the DB?? AKA get rid of all the little slashes so JSON can parse it again.
The problem is that MySQL is adding the escapes when you save the string back in, because MySQL has no idea what JSON is and is storing it as a VARCHAR. An easy way to get around this is to do the escaping yourself before you store the string in the database.
For example, if you have a PHP backend, to store you would do
$escapedString = mysqli_escape_string($myJSONString);
// Store $escapedString in db
For, retrieval, you would then do
$escapedString = // query code
$myJSONString = stripslashes($escapedString);
How you handle the escape will vary based on your backend architecture. You might also consider using a database that has native support for JSON.
Another option would be to write a JavaScript function that does the stripping for you (there is no native function that does this). It should be doable with some regex. To see how you might go about this, see this existing SO question related to stripping slashes. Note that you'll need to change that a bit, since you only want to remove escape slashes while in that post the OP wants to remove all slashes (which is wrong for JSON since your data might actually contain slashes).
I'm using Node.js to read and parse a file of pairs encoding numbers. I have a file like this:
1561 0506
1204 900
6060 44
And I want to read it as an array, like this:
[[1561,0506],[1204,900],[6060,44]]
For that, I am using a readStream, reading the file as chunks and using native string functions to do the parsing:
fileStream.on("data",function(chunk){
var newLineIndex;
file = file + chunk;
while ((newLineIndex = file.indexOf("\n")) !== -1){
var spaceIndex = file.indexOf(" ");
edges.push([
Number(file.slice(0,spaceIndex)),
Number(file.slice(spaceIndex+1,newLineIndex))]);
file = file.slice(newLineIndex+1);
};
});
That took way to many time, though (4s for the file I need on my machine). I see some reasons:
Use of strings;
use of "Number";
Dynamic array of arrays.
I've rewriten the algorithm without using the builtin string functions, but loops instead and, to my surprise, it became much slower! Is there any way to make it faster?
Caveat: I have not tested the performance of this solution, but it's complete so should be easy to try.
How about using this liner implementation based on the notes in this question.
Using the liner:
var fs = require('fs')
var liner = require('./liner')
var source = fs.createReadStream('mypathhere')
source.pipe(liner)
liner.on('readable', function () {
var line
while (line = liner.read()) {
var parts = line.split(" ");
edges.push([Number(parts[0]), Number(parts[1])]);
}
})
As you can see I also moved the edge array to be an inline constant-sized array separate from the split parts, which I'm guessing would speed up allocation. You could even try swapping out using indexOf(" ") instead of split(" ").
Beyond this you could instrument the code to identify any further bottlenecks.
I'm trying to think of a way to count the number of lines in a .csv file using Javascript, any useful tips or resources someone can direct me to?
Depends what you mean by a line. For simple number of newlines, Robusto's answer is fine.
If you want to know how many rows of CSV data that represents, things may be a little more difficult, as a CSV field may itself contain a newline:
field1,"field
two",field3
...is one row, at least in CSV as defined by RFC4180. (It's one of the aggravating features of CSV that there are so many non-standard variants; the RFC itself was very late to the game.)
So if you need to cope with that case you'll have to essentially parse each field.
A field can be raw, or (necessarily if it contains \n or ,) quoted, with " represented as double quotes. So a regex for one field would be:
"([^"]|"")*"|[^,\n]*
and so for a whole row (assuming it is not empty):
("([^"]|"")*"|[^,\n]*)(,("([^"]|"")*"|[^,\n]*))*\n
and to get the number of those:
var rowsn= csv.match(/(?:"(?:[^"]|"")*"|[^,\n]*)(?:,(?:"(?:[^"]|"")*"|[^,\n]*))*\n/g).length;
If you are lucky enough to be dealing with a variant of CSV that complies with RFC4180's recommendation that there are no " characters in unquoted fields, you can make this a bit more readable. Split on newlines as before and count the number of " characters in each line. If it's an even number, you have a complete line; if it's an odd number you've got a split.
var lines= csv.split('\n');
for (var i= lines.length; i-->0;)
if (lines[i].match(/"/g).length%2===1)
lines.splice(i-1, 2, lines[i-1]+lines[i]);
var rowsn= lines.length;
To count the number of lines in a document (once you have it as a string in Javascript), simply do:
var lines = csvString.split("\n").length;
You can use the '.' to match everything on a line except the newline at the end-
it won't count quoted new lines. Use the 'm' for multiline flag, as well as 'g' for global.
function getLines(s){
return s.match(/^(.*)$/mg);
}
alert(getLines(string).length)
If you don't mind skipping empty lines it is simpler-
but sometimes you need to keep them for spaceing.
function getLines(s){
return s.match(/(.+)/g);
}
If you are asking to count number of rows in a csv then you can use this example..
http://purbayubudi.wordpress.com/2008/11/09/csv-parser-using-javascript/
It takes the csv file and in a popup window display number of rows..
Here is a sample code in Typescript.
FileReader is needed to obtain the contents.
Returning a promise makes it easy to wait for the results of the asynchronous readAsText and onload function.
const countRowsInCSV = async (csvFile: File): Promise<number> => {
return new Promise((resolve, reject) => {
try {
const reader = new FileReader();
reader.onload = (event: any) => {
const cvsData = event.target.result;
const rowData = cvsData.split('\n');
resolve(rowData.length);
};
reader.readAsText(csvFile);
} catch (error: any) {
reject(error);
}
});
};
const onChangeFile = (selectedFile: File) => {
const totalRows = await countRowsInCSV(selectedFile);
}