Streaming JSON to Redis - javascript

I have a large JSON file that I would like to store it in Redis. The problem is when I parse it I run out of memory in Node.Js
I extended the heap memory from 1.39GB from 4GB it's still happening and I believe I am not doing it properly.
With a lot of search, I found out that streaming is my best bet. The thing is I am not really fluent with Streaming and I am not sure even this would resolve my problem
I read a lot and there is a lot of scattered information. I was wondering to ask if you think if this is even approachable or if this is correct?
Would I be able to stream a JSON object into Redis?
Do I have to Staingifiying it or it will be automatically?
Should I stringify my json chunk by chunk?
or streaming into redis will end up being a string?
I am using ioRedis client to interact with Redis.
I appreciate your help in advance.

If you can guarantee that only one processor will be updating that key, you could possibly use SETRANGE. As you parse the file, you can keep a reference to the next offset:
(psuedo-code)
offset = 0
offset = redis.set_range(key, offset, "string")
Then you can load pieces of the file up to Redis without having to load everything into memory at once.
SETRANGE returns the length of the string after it was modified.
This also assumes that you can load pieces of the file contents without having to parse everything as JSON then convert it back to a string. Also assumes that only one process is updating that key -- if multiple processes try to update it, the JSON value can get corrupted.

Related

Memory leak when sending ArrayBuffers with large amounts of binary data over WebRTC in electron

I am currently working on an electron application that needs to be able to send files over a WebRTC datachannel. I am currently using PeerJS as a way to abstract WebRTC away a bit and to make developing easier.
My current implementation uses a FileReader on the sender’s side to read a file in 32 kilobyte binary chunks. Those chunks then get put into an ArrayBuffer which then gets send along with some data to tell the other side what exactly it is the sender is sending. The receiver then writes the binary data to a file. While the receiver is writing that data the sender waits for a “file-proceed” signal from to receiver. When the receiver is done the sender gets notified and sends the next chunk. This goes on until the entire file is send.
This approach works until all files send over the course of application runtime reach about 500 megabytes. This is what I believe due to a memory leak which I cannot find the root cause of. As far as I know I don’t save the objects in memory and they should be cleared by GC. Another rather unusual thing is that only the recipient of the file suffers from this problem.
There is a lot going on in my application however this is the part I think is the cause of the problem. (But feel free to ask for more code).
This is the part that is supposed to write the ArrayBuffer:
sm.writeChunk = function(arrayBuffer) {
sm.receivedBytes += sm.receivedFileMeta.chunkSize;
fs.appendFileSync(sm.downloadsFolder + path.sep + sm.receivedFileMeta.name + '.part' , new Buffer(arrayBuffer , 'binary'), 'binary', function (err) {
if (err) {
console.log(err);
}
});
sm.onAction({t:'file-progress', percent: sm.receivedBytes / sm.receivedFileMeta.size * 100});
sm.dataConnection.send({t: 'file-proceed'});
};
sm is an object that holds file-transfer related functions and variables hence the “sm.” everywhere.
I’ve tried setting the ArrayBuffer as undefined or null but nothing seems to make the object dissapear from memory. Not even after the file transfer is completed. A snapshot from the heap seems to back this up. Also removing the fs.appendFileSync function to make it not write to disk makes no difference.
Is there anything I can do to fix this issue? Or is this a problem related to PeerJS? Any help or suggestions is much appreciated!
Well it appears to be a PeerJS error after all. It appears that if you want to send packets larger than 16K PeerJS will chunk them for you. The memory problem lies with the chunking. PeerJS chunks at 16K while electron (actually chrome) can send 64K at a time. This is to keep cross-browser compatibility but since I am using strictly electron I changed the PeerJS code to not chunk my 32K packets. This resolved the issue.

How to read local client side text files, without using any html input/search elements

The purpose, game programming, as you may have guessed, why else right?
How is it actually possible to read in values from a text file, so that those values can be then on wards used in the game? I have searched for hours on this topic.
example: each text file line token, will be read and passed as the different arguments into the constructor of each object during its instantiation via for loop. A common practice. Its Too expensive to store that much data at any given time in an array I would suspect.
In java this is dead simple using the Scanner object.
Any suggestions are appreciated thanks. I guess all I am asking is, is it even possible?
As Roland Starke said, the array will probably take up less memory than the objects you construct from it... So it is perfectly fine to have all the information in a JSON file for instance, which you load from your server.
If you want to avoid transfering all the data every time, you would need to use the right caching headers so that the data can be cached by the browser.

Proper Way to Read a Data File in Javascript/Node.js?

I have a flat data file in the form of xml, but there isn't a real Windows viewer for the file, currently. I decided to create a simple application with Node-WebKit, just for basic viewing - the data file won't need to be written to by the application.
My problem is, I don't know the proper way to read a large file. The data file is a backup of phone SMS's and MMS's, and the MMS entries contain Base64 image strings where applicable - so, the file gets pretty big, with large amounts of images (generallly, around 250mb). I didn't create/format the original data in the file, so I can't modify it's structure.
So, the question is - assuming I already have a way to parse the XML into JavaScript objects, should I,
a) Parse the entire file when the application is first run, storing an array of objects in memory for the duration of the applications lifetime, or
b) Read through the entire file each time I want to extract a conversation (all of the messages with a specific outgoing or incoming number), and only store that data in memory, or
c) Employ some alternate, more efficient, solution that I don't know about yet.
Convert your XML data into an SQLite db. SQLite is NOT memory based by default. Query the db when you need the data, problem solved :)

Passing an ActionScript JPG Byte Array to Javascript (and eventually to PHP)

Our web application has a feature which uses Flash (AS3) to take photos using the user's web cam, then passes the resulting byte array to PHP where it is reconstructed and saved on the server.
However, we need to be able to take this web application offline, and we have chosen Gears to do so. The user takes the app offline, performs his tasks, then when he's reconnected to the server, we "sync" the data back with our central database.
We don't have PHP to interact with Flash anymore, but we still need to allow users to take and save photos. We don't know how to save a JPG that Flash creates in a local database. Our hope was that we could save the byte array, a serialized string, or somehow actually persist the object itself, then pass it back to either PHP or Flash (and then PHP) to recreate the JPG.
We have tried:
- passing the byte array to Javascript instead of PHP, but javascript doesn't seem to be able to do anything with it (the object seems to be stripped of its methods)
- stringifying the byte array in Flash, and then passing it to Javascript, but we always get the same string:
ÿØÿà
Now we are thinking of serializing the string in Flash, passing it to Javascript, then on the return route, passing that string back to Flash which will then pass it to PHP to be reconstructed as a JPG. (whew). Since no one on our team has extensive Flash background, we're a bit lost.
Is serialization the way to go? Is there a more realistic way to do this? Does anyone have any experience with this sort of thing? Perhaps we can build a javascript class that is the same as the byte array class in AS?
I'm not sure why you would want to use Javascript here. Anyway, the string you pasted looks like the beginning of a JPG header. The problem is that a JPG will for sure contain NULs (characters with 0 as its value). This will most likely truncate the string (as it seems to be the case with the sample you posted). If you want to "stringify" the JPG, the standard approach is encoding it as Base 64.
If you want to persist data locally, however, there's a way to do it in Flash. It's simple, but it has some limitations.
You can use a local Shared Object for this. By default, there's a 100 Kb limit, which is rather inadequate for image files; you could ask the user to allot more space to your app, though. In any case, I'd try to store the image as JPG, not the raw pixels, since the difference in size is very significative.
Shared Objects will handle serialization / deserialization for you transparently. There are some caveats: not every object can really be serialized; for starters, it has to have a parameterless constructor; DisplayObjects such as Sprites, MovieClips, etc, won't work. It's possible to serialize a ByteArray, however, so you could save your JPGs locally (if the user allows for the extra space). You should use AMF3 as the encoding scheme (which is the default, I think); also, you should map the class you're serializing with registerClassAlias to preserve the type of serialized the object (otherwise it will be treated as an Object object). You only need to do it once in the app life cycle, but it must be done before any read / write to the Shared Object.
Something along the lines of:
registerClassAlias("flash.utils.ByteArray",ByteArray);
I'd use Shared Objects rather than Javascript. Just keep in mind that you'll most likely have to ask the user to give you more space for storing the images (which seems reasonable enough if you're allowing them to work offline), and that the user could delete the data at any time (just like he could delete their browser's cookies).
Edit
I realize I didn't really pay much attention the "we have chosen Gears to do so" part of your question.
In that case, you could give the base 64 approach a try to pass the data to JS. From the Actionscript side it's easy (grab one of the many available Base64 encoders/decoders out there), and I assume the Gear's API must have an encoder / decoder available already (or at least it shouldn't be hard to find one). At that point you'll probably have to turn that into a Blob and store it to disk (maybe using the BlobAPI, but I'm not sure as I don't have experience with Gears).

Is it bad to store JSON on disk?

Mostly I have just used XML files to store config info and to provide elementary data persistence. Now I am building a website where I need to store some XML type data. However I am already using JSON extensively throughout the whole thing. Is it bad to store JSON directly instead of XML, or should I store the XML and introduce an XML parser.
Not bad at all. Although there are more XML editors, so if you're going to need to manually edit the files, XML may be better.
Differences between using XML and JSON are:
A lot easier to find an editor supporting nice way to edit XML. I'm aware of no editors that do this for JSON, but there might be some, I hope :)
Extreme portability/interoperability - not everything can read JSON natively whereas pretty much any language/framework these days has XML libraries.
JSON takes up less space
JSON may be faster to process, ESPECIALLY in a JavaScript app where it's native data.
JSON is more human readable for programmers (this is subjective but everyone I know agrees so).
Now, please notice the common thread: any of the benefits of using pure XML listed above are 100% lost immediately as soon as you store JSON as XML payload.
Therefore, the gudelines are as follows:
If wide interoperability is an issue and you talk to something that can't read JSON (like a DB that can read XML natively), use XML.
Otherwise, I'd recommend using JSON
NEVER EVER use JSON as XML payload unless you must use XML as a transport container due to existing protocol needs AND the cost of encoding and decoding JSON to/from XML is somehow prohibitively high as compared to network/storage lossage due to double encoding (I have a major trouble imagining a plausible scenario like this, but who knows...)
UPDATED: Removed Unicode bullets as per info in comments
It's just data, like XML. There's nothing about it that would preclude saving it to disk.
Define "bad". They're both just plain-text formats. Knock yourself out.
If your storing the data as a cache (meaning it was in one format and you had to process it programatically to "make" it JSON. Then I say no problem. As long as the consumer of your JSON reads native JSON then it's standard practice to save cache data to disk or memory.
However if you're storing a configuration file in JSON which needs human interaction to "process" then I may reconsider. Using JSON for simple Key:Value pairs is cool, but anything beyond that, the format may be too compact (meaning nested { and [ brackets can be hard to decipher).
one potential issue with JSON, when there is deep nesting, is readability,
you may actually see ]]]}], making debugging difficult

Categories