I looked through the documentation of mongoosejs odm and found following:
http://mongoosejs.com/docs/querystream.html
What are they used for? What can I do with them.
I am not sure if they are used for streaming docs or for dynamicly updating queries...
Regards
Well, it's all about the API.
QueryStream allows to to use ReadStream's API so in order to appreciate QueryStream, you need to know more about ReadStream/WriteStream.
There are many pros:
You can process large amount of data, which you'll be getting as "chunks" so the memory contains one item at a time (it could of a DB document, DB row, a single line from the file, etc.)
You can pause/resume the stream(s)
You can pipe read->write very easily
The idea is that it gives you a unified API for read and write operations.
To answer your question "What can I do with them":
You could do anything with or without node.js's stream API but it definitely makes it clearer and easier to use when there's some sort of standard.
Also, node.js's streams are event-based (based on EventEmitter) so it helps with decoupling.
Edit:
That was more about the aspect of streams. In Mongoose's case, a single chunk contains a document.
To clarify the advantage of the API:
node.js's http.ServerResponse is a writable-stream, which means you should be able to stream Mongoose's resultset to the browser using a single line:
// 'res' is the http response from your route's callback.
Posts.find().stream().pipe(res);
The point is that it doesn't matter if you're writing to http.ServerResponse, a file or anything else. As long as it implements a writable stream, it should work without changes.
Hope I made it clearer.
Related
Straight to the point, I am running an http server in Node.js managing a hotel's check-in/out info where I write all the JSON data from memory to the same file using "fs.writeFile".
The data usually don't exceed 145kB max, however since I need to write them everytime that I get an update from my DataBase, I have data loss/bad JSON format when calls to fs.writeFile happen one after each other immediately.
Currently I have solved this problem using "fs.writeFileSync" however I would like to hear for a more sophisticated solution and not using the easy/bad solution of sync function.
Using fs.promises results in the same error since again I have to make multiple calls to fs.promises.
According to Node's documentation , calling fs.writefile or fs.promises multiple times is not safe and they suggest using a filestream, however this is not currently an option.
To summarize, I need to wait for fs.writeFile to end normally before attempting any repeated write action, and using the callback is not useful since I don't know a priori when a write action needs to be done.
Thank you very much in advance
I assume you mean you are overwriting or truncating the file while the last write request is still being written. If I were you, I would use the promises API and heed the warning from the documentation:
It is unsafe to use fsPromises.writeFile() multiple times on the same file without waiting for the promise to be settled.
You can await the result in a traditional loop, or very carefully use .then() to "synchronize" your callbacks, but if you're not doing anything else in your event loop except reading from your database and writing to this file, you might as well just use writeFileSync to keep things simple/safe. The asynchronous APIs (callback and Promises) are intended to allow your program to do other things in the meantime; if this is not necessary and the async APIs add troublesome complexity for your code, just use the synchronous APIs. That's true for any node API or library function, not just fs.writeFile.
There are also libraries that will perform atomic filesystem operations for you and abstract away the implementation details, but I think these are probably overkill for you unless you describe your use case in more detail. For example, why you're dumping a database to disk as JSON as fast/frequently as you can, rather than keeping things in memory or using event-based incremental updates (e.g. a real, local database with atomicity and consistency guarantees).
thank you for your response!
Since my app is mainly an http server,yes I do other things rather than simply input/output, although with not a great amount of requests. I will review again the promises solution but the first time I had no luck.
To explain more I have a:function updateRoom(data){ ...update things in memory... writetoDisk(); }
and the function writetoDisk(){
fsWriteFile(....)
}
Making the function writetoDisk an async function and implementing "await" inside it still does not solve the problem since the updateRoom function will call the writetoDisk without waiting for it to end.
The ".then" approach can not be implemented since my updateRoom is being called constantly and dynamically .
If you happen to know 1-2 thing about async-await you are more than welcome to explain me a bit more, thanks again nevertheless!
There's a bit of someone else's code I am trying to add functionality to. It's using websockets to communicate with a server which I will most likely not be able to change (the server runs on a 3$ micro-controller...)
The pattern used, for instance when uploading data to the server, consists in using global variables, then sending a series of messages on the socket, as well as having an 'onmessage' which will handle the response. This seems clumsy, given that it assumes that there is only ever one socket call made at a time (I think the server guarantees that in fact). The messages sent by the server can be multiple, and even figuring out when the messages are finished is fiddly.
I am thinking of making things so that I have a better handle on things, mostly w.r.t. being able to know when the response has arrived (and finished), going to patterns like
function save_file(name, data, callback) {
}
And perhaps at some point I can even turn them into async functions.
So couple of ideas:
- is there some kind of identifier that I could find in the websocket object that might allow me to better string together request and response?
- short of that, what is the right pattern? I started using custom events, that allows me to much better tie the whole process, where I can supply a callback by attaching it to the event, but even doing removeEventListener is tricky because I need to keep reference to every single listener to make sure I can remove them later.
Any advice anyone?
In a recent SO question, I outlined an OOM condition that I'm running into while processing a large number of csv files with millions of records in each.
The more I am looking into the problem and the more I'm reading up on Node.js the more convinced I become that the OOM isn't happening because of a memory leak but because I'm not throttling the data input into the system.
The code just blindly sucks in all data, creating a single callback event for each line. The events keep getting added to the main event loop, which eventually becomes so large that it exhausts all available memory.
What are Node's idiomatic patterns for dealing with this scenario? Should I be tying reading of csv files to a blocking queue of some sort that, once full, will block the file reader from parsing more of the data? Are there any good examples dealing with processing of large data sets?
Update: To put this differently and simpler, Node can process input faster than it can process output and the slack is being stored in memory (queued as events for the event queue). Because there is a lot of slack, the memory eventually gets exhausted. So the question is: what's the idiomatic way of throttling down input to the output's rate?
Your best bet is to set things up as streams, and rely on the built-in backpressure semantics to do so. The Streams Handbook as a really good overview on it.
Similar to unix, the node stream module's primary composition operator is called .pipe() and you get a backpressure mechanism for free to throttle writes for slow consumers.
Update
I've not used the readline module for anything other than a terminal input before, but reading the docs it looks like it accepts an input stream and an output stream. If you frame your DB writer as a writeable stream, you should be able to let readline pipe it for you internally.
We are trying to pre-cache a large sum of data on load of our web application into indexed db. From my performance testing the speed is decent on a desktop browser (e.g. Internet Explorer) where I can insert 10,000 records in around 2 seconds. But comparing the exact same functionality on the iPad it drops to 30 seconds. That comparison just blew my mind.
Does anyone know of any hints or tricks to inserting large data sets into indexedDB. I dont know if it is possible at all but if we could build up a copy of an indexedDB server side with all the data prepopulated and then just shoot it over to the client and it just stores it down to the browser. Is anything along these lines doable?
Thanks
I had problems with massive bulk insert (100.000 - 200.000 records). I've solved all my IndexedDB performance problems using Dexie library. It has this important feature:
Dexie has a kick-ass performance. It's bulk methods take advantage of
a not well known feature in indexedDB that makes it possible to store
stuff without listening to every onsuccess event. This speeds up the
performance to a maximum.
Dexie: https://github.com/dfahlander/Dexie.js
Some pretty bad IndexedDB performance problems can be caused by a prolonged period of the browser just calling onsuccess callbacks and running into event loop overhead after the work is actually done. The performance pattern observed by my app which was doing this was that it did a bunch of work, then it just went answering thousands of callbacks very inefficiently:
The right hand part of this image is the callbacks on every request. The solution to doing that is, of course, to not put a callback on every request, but it was previously unclear to me how to do this.
The way that Dexie.js accomplishes this (for details, see src/dbcore/dbcore-indexeddb.ts) is that it saves the last request (e.g. IDBObjectStore.put, etc) sent and sets an onsuccess callback on that one, which then collects the results from the rest of the requests. Thus, it avoids the callback hell.
Another approach from this is to use the IDBTransaction.oncomplete event, and not worry about the callbacks on the individual requests at all.
(note: yes, I know how old this question is, I had this problem today and wanted to put something more useful for this question which is high in Google results)
How is your data stored in the indexeddb? Is everything in a single object store of do you use multiple objectstores. Do you need all the cached data immediatly?
If you only have a single object store you can start with storing all the data you initialy need, commit that transaction and start a new for all the rest. This way you can start retrieving the initial data while inserting the rest. IndexedDB is async so it should block you.
If you have multiple object stores you can use the same stratigy. First fill up the objectstore you need immediatly and delay the others.
Or maybe consider using the AppCache API instead of the indexeddb api. Using this you can just cache a javascriptfile containing all the json objects you want to cache. This is more the case when you don't need a lot of querying on the data.
I'm working on a game prototype and worried about the following case: Browser does AJAX to Node.JS, which has to do several MongoDB operations using async.series.
What prevents multiple requests at the same time causing the database issues? New events (i.e. db operations) seem like they could be run out of order or in between the async.series steps.
In other words, what happens if a user does AJAX calls very quickly, before the prior ones have finished their async.series. Hopefully that makes sense.
If this is indeed an issue, what is the proper way to handle it?
First and foremost, #fmodos's comment should be completely disregarded. It is wrong on many levels but most simply you could have any number of nodes running (say on Heroku) and there is no guarantee that subsequent requests will hit the same node.
Now, I'm going to answer your question by asking more questions. (You really didn't give me a choice here)
What are these operations doing? Inserting documents? Updating existing documents? Removing documents? This is very important because if all you're doing is simply inserting documents then why does it matter if one finishes for before the other? If you're updating documents then you should NOT be issuing a find, grabbing a ref to the object, and then calling save. (I'm making the assumption you're using mongoose, if you're not, I would) Instead what you should be doing is using built in mongo functions like $inc which properly handle concurrent requests.
http://docs.mongodb.org/manual/reference/operator/update/inc/
Does that help at all? If not, please let me know and I will give it another shot.
Mongo has database wide read/write locks. It gives preference to writes of the same collection first then fulfills reads. So, if by chance, you have Bill writing to the db and Joe is reading at the same time, Bill's write will execute first while Joe waits until the write is complete and then he is given all the data (including Bill's).