Straight to the point, I am running an http server in Node.js managing a hotel's check-in/out info where I write all the JSON data from memory to the same file using "fs.writeFile".
The data usually don't exceed 145kB max, however since I need to write them everytime that I get an update from my DataBase, I have data loss/bad JSON format when calls to fs.writeFile happen one after each other immediately.
Currently I have solved this problem using "fs.writeFileSync" however I would like to hear for a more sophisticated solution and not using the easy/bad solution of sync function.
Using fs.promises results in the same error since again I have to make multiple calls to fs.promises.
According to Node's documentation , calling fs.writefile or fs.promises multiple times is not safe and they suggest using a filestream, however this is not currently an option.
To summarize, I need to wait for fs.writeFile to end normally before attempting any repeated write action, and using the callback is not useful since I don't know a priori when a write action needs to be done.
Thank you very much in advance
I assume you mean you are overwriting or truncating the file while the last write request is still being written. If I were you, I would use the promises API and heed the warning from the documentation:
It is unsafe to use fsPromises.writeFile() multiple times on the same file without waiting for the promise to be settled.
You can await the result in a traditional loop, or very carefully use .then() to "synchronize" your callbacks, but if you're not doing anything else in your event loop except reading from your database and writing to this file, you might as well just use writeFileSync to keep things simple/safe. The asynchronous APIs (callback and Promises) are intended to allow your program to do other things in the meantime; if this is not necessary and the async APIs add troublesome complexity for your code, just use the synchronous APIs. That's true for any node API or library function, not just fs.writeFile.
There are also libraries that will perform atomic filesystem operations for you and abstract away the implementation details, but I think these are probably overkill for you unless you describe your use case in more detail. For example, why you're dumping a database to disk as JSON as fast/frequently as you can, rather than keeping things in memory or using event-based incremental updates (e.g. a real, local database with atomicity and consistency guarantees).
thank you for your response!
Since my app is mainly an http server,yes I do other things rather than simply input/output, although with not a great amount of requests. I will review again the promises solution but the first time I had no luck.
To explain more I have a:function updateRoom(data){ ...update things in memory... writetoDisk(); }
and the function writetoDisk(){
fsWriteFile(....)
}
Making the function writetoDisk an async function and implementing "await" inside it still does not solve the problem since the updateRoom function will call the writetoDisk without waiting for it to end.
The ".then" approach can not be implemented since my updateRoom is being called constantly and dynamically .
If you happen to know 1-2 thing about async-await you are more than welcome to explain me a bit more, thanks again nevertheless!
Related
I've studied sync and async in JavaScript. I'm going to make a crawling program using Puppeteer.
There are many code examples of crawling in Puppeteer.
But, I have one question: Why do they use async in basic Puppeteer example scripts?
Can't I use sync programming in Puppeteer? Is there an issue that I don't know about that makes async necessary?
It doesn't seem useful if I don't use multiple threads (multi-crawling).
For starters, I recommend reading How the single threaded non blocking IO model works in Node.js. This thread motivates the callback and promise-based models Node provides for achieving concurrency.
Whenever the Node process needs to access an out-of-process resource such as the file system or a network socket (as Puppeteer does to communicate with the browser it's connected to), there are two options:
Block the whole process and wait for the response, as fs.readFileSync does.
Use a promise or a callback to be notified of the response and go about other things, as fs.readFile (either via callback or fs.promises) and Puppeteer do.
The first option is a poor choice, with the only advantage being easier syntax to write. Blocking the thread to wait for a resource is like ordering a pizza, then doing nothing until the pizza arrives. You might as well read a book or water your plants while you wait.
Historically, callbacks were originally the only way to write concurrent code in Node. Eventually, promises and then arrived, which were better, but still posed readability burdens. With the advent of async/await, it's no longer difficult to write asynchronous code that reads like synchronous code. Synchronous APIs like fs's __Sync functions that alias an asynchronous API are historical artifacts. It's normal that Puppeteer doesn't offer page.waitForSelectorSync, page.$evalSync, etc.
Now, it's understandable to think that Puppeteer's asynchronous API is pointless in a simple, straight-line script since your Node process doesn't have anything else to do while awaiting responses, but having to type await for each call is the least evil of the available design options for the API.
Simply not awaiting promises isn't an option even when a script is a single sequence of straight-line code. Without await, ordering of operations/results becomes nondeterministic as each promise runs concurrently, independent of the others. This interleaving would be unintended in sequential code, but is a useful tool in cases when concurrency is desired.
For the authors of an asynchronous API where almost all calls are accessesing an external resource, as is the case with Puppeteer, the options are:
Write and maintain two versions of the API, a synchronous and an asynchronous version. No libraries that I know of do this -- it's a major pain with little benefit and plenty of room for misuse.
Write and maintain a synchronous API only to cater to the simple use case at the expense of making the library virtually unusable for anyone that cares about concurrency. Clearly, this is horrible design, like forcing everyone who orders a pizza (in the above real-world example) to do nothing until it arrives.
Write and maintain one asynchronous API, and make clients who don't care about concurrency in a particular program have to write await in front of all the calls. That's what Puppeteer does.
Incidentally, the fact that the browser is in a separate process tends to cause all manner of confusion in Puppeteer beginners. For example, the fact that data is serialized and deserialized (converted to a string) on every call to page.evaluate (and family) means that you can't pass complex structures like DOM nodes across the inter-process gap. You can't access variables you've defined in Node from the body of an evaluate callback without passing them as arguments to the evaluate call, and these variables need to be able to respond correctly to JSON.stringify() (that is, be serializable).
Just 13 hours before this post, someone asked node.js puppeteer "document is not defined" -- they were trying to access the browser process' document object inside of Node.
If you're on Windows, try running a simple Puppeteer Node script that doesn't close the browser, then look at your task manager. On Linux, you can run ps -a. You'll see that there's a Chromium browser and a Node process. The two processes communicate over a socket, which has much higher latency than intra-process communication and involves the operating system's network stack. Every Puppeteer call provides an opportunity for concurrency that'd be lost if Puppeteer's API was synchronous.
Understanding the inter-process gap is critical to success in Puppeteer because it motivates why the API calls are asynchronous, and helps clarify which code is executing in which process.
async is very important for data fetching/crawling. You can imagine this case, you have 1 element is book-container, but inside book-container, it will have book data coming later on UI with API fetch.
const scraperObject = {
url: 'http://book-store.com',
scraper(browser){
let page = browser.newPage();
page.goto(this.url);
page.waitForSelector('.book-container');
page.waitForSelector('.book');
//TODO: save book data after this
});
}
}
With this code snippet, it will run like this
page.goto(this.url) Go to the page with certain URL
page.waitForSelector('.book-container') No async here, so it will try to get .book-container element immediately (of course, it won't be there because the page is possibly still loading due to some network problem)
page.waitForSelector('.book') Similarly, it try to get book data immediately (even though book-container has not been in HTML yet)
To solve this problem, we should have async to WAIT for elements ready in HTML.
const scraperObject = {
url: 'http://book-store.com',
async scraper(browser){
let page = await browser.newPage();
await page.goto(this.url);
await page.waitForSelector('.book-container');
await page.waitForSelector('.book');
//TODO: save book data after this
});
}
}
Explain it again with async/await.
page.goto(this.url) Go to the page with certain URL and wait till the page loaded
page.waitForSelector('.book-container') Wait till .book-container element appears in HTML
page.waitForSelector('.book') Wait till .book element appears in HTML (we can understand that API's data responded)
Can anyone help me understand the function of NodeJS and performance impact for the below scenario.
a. Making the request to Rest API end point "/api/XXX". In this request, i am returning the response triggering the asynchronous function like below.
function update(req, res) {
executeUpdate(req.body); //Asynchronous function
res.send(200);
}
b. In this, I send the response back without waiting for the function to complete and this function executing four mongodb updates of different collection.
Questions:
As I read, the NodeJS works on the single thread, how this
asynchronous function is executing?
If there are multiple requests for same end point, how will be the
performance impact of NodeJS?
How exactly the NodeJS handles the asynchronous function of each
request, because as the NodeJS is runs on the single thread, is there
any possibility of the memory issue?
In short, it depends on what you are doing in your function.
The synchronous functions in node are executed on main thread, thus,
they will not preempt and execute until end of the function or until
return statement is encountered.
The async functions, on the other hand, are removed from main thread,
and will only be executed when async tasks are completed on a
separate worker thread.
There are, I think, two different parts in the answer to your question.
Actual Performance - which includes CPU & memory performance. It also obviously includes speed.
Understanding as the previous poster said, Sync and Async.
In dealing with #1 - actual performance the real only way to test it is to create or use a testing environment on your code. In a rudimentary way based upon the system you are using you can view some of the information in top (linux) or Glances will give you a basic idea of performance, but in order to know exactly what is going on you will need to apply some of the various testing environments or writing your own tests.
Approaching #2 - It is not only sync and async processes you have to understand, but also the ramifications of both. This includes the use of callbacks and promises.
It really all depends on the current process you are attempting to code. For instance, many Node programmers seem to prefer using promises when they make calls to MongoDB, especially when one requires more than one call based upon the return of the cursor.
There is really no written-in-stone formula for when you use sync or async processes. Avoiding callback hell is something all Node programmers try to do. Catching errors etc. is something you always need to be careful about. As I said some programmers will always opt for Promises or Async when dealing with returns of data. The famous Async library coupled with Bluebird are the choice of many for certain scenarios.
All that being said, and remember your question is general and therefore so is my answer, in order to properly know the implications on your performance, in memory, cpu and speed as well as in return of information or passing to the browser, it is a good idea to understand as best as you can sync, async, callbacks, promises and error catching. You will discover certain situations are great for sync (and much faster), while others do require async and/or promises.
Hope this helps somewhat.
Recently, I have been developing web application and I realize that I am not making use of the asynchronous property at all. Hence I am ending up with a lot of nested callbacks.
For example, if the user want to get a file from the server through a particular API, I will have code similar to this,
db.query(<select list of permitted files_names>, function(err, filenames) {
async.each(file_names, function(name, next) {
//open each file to put into array
});
})
This code needs to query database to get a list of file names before looping asynchronously and putting each file content into an array. Finally it will return the finished array to the client.
With the nested callback, and async library, this code is behaving like a synchronous code.
names = db.querySync(//select list of permitted files_names);
for(name in names) {
//open each file to put into array
}
I am better off writing synchronous code like this since it is much neater. My use case might be a little strange but most of my api behaves in similar manner and that makes me think why do I even need asynchronous function?
Can someone please enlighten me if there are any differences between these two codes in term of performance? How do I make use of non-blocking property to enhance the performance in this use case?
If you're writing callback functions you're using by definition using async calls. The callback function fires only when the operation is complete or has errored out. You don't need a fancy library to use these, this is the backbone of how Node's event-loop driven subsystem operates.
Node strongly advises against using "Sync" calls. The Node core only includes a handful as a convenience, they're there as last-resort tools. Many libraries don't even support them so you absolutely must get used to writing async code. In the browser environment, for example, you simply cannot use blocking calls without jamming up the JavaScript runtime and stalling the page.
I prefer using Promises line Bluebird implements to keep code orderly. There are other ways, like the async library, which can help manage otherwise complicated nesting patterns.
Some of the perks include things like Promise.all method runs a series of promises to completion and then triggers a next step, and Promise.map which iterates over a list, running async code for each element, then advancing when the list is complete.
If you're disciplined about organizing your code it's not too bad. Node does require a lot more attention being paid to the order of operations than in a traditional sync-by-default language like Ruby, Python or Java, but you can get used to it. Once you start working with async code rather than fighting it you can often do a ton of work quickly, efficiently, and with a minimum of fuss, in many cases more effectively than in other languages where you must juggle threads plus locking and/or deal with IPC.
Yes, there is a difference in the two codes in terms of performance.
In synchronous code:
names = db.querySync(//select list of permitted files_names);
you are calling the database here to give list of names. Assume , this takes 10 sec. So for this time, nodeJS as it is single threaded gos into blocking state. After 10 sec, it executes the rest of the code . Assume this for loop takes 5 sec and some code takes 5 sec.
for(name in names) {
//open each file to put into array
}
//some code
Therefore it takes a total time of 20 sec.
whereas in Asynchronous code:
db.query(<select list of permitted files_names>, function(err, filenames) {
NodeJs will ask the database to give list of names to a callback. Assume that it takes 10 sec. And immediately it goes into the next step(some code), but not into the blocking state. Assume that some code takes 5 sec.
async.each(file_names, function(name, next) {
//open each file to put into array
});
})
//some code.
After 5 sec, it will check whether it has an i/o operations to be performed. Once the call back is returned. It will execute the function(name, next) {..} for the 5 sec.
So the total time here is 15sec.
In this manner the performance is improved.
If the asynchronous code should be clear and neat then make use of closures & promises.
For ex: Above asynchronous code can be written as
fun = function(err, filenames) {
async.each(file_names, function(name, next) {
//open each file to put into array
}
db.query(<select list of permitted files_names>, fun);
The benefit is simple: By using asynchronous code, the current thread (remember, Node.js is single-threaded) is able to handle other requests while the current request is waiting on something (like a database query) to return.
If you use synchronous code instead, the current thread will block while it waits, and it won't be able to handle other requests in the meantime. In other words, you lose concurrency.
To keep your asynchronous code clean, look into promises (to avoid deeply nested callbacks) and ES7 async/await (to avoid callbacks at all and write asynchronous code that looks just like synchronous code).
I'm wondering if there's a way to cause JavaScript to wait for some variable-length code execution to finish before continuing using events and loops. Before answering with using timeouts, callbacks or referencing this as a duplicate, hear me out.
I want to expose a large API to a web worker. I want this API to feel 'native' in the sense that you can access each member using a getter which gets the information from the other thread. My initial idea was to compile the API and rebuild the entire object on the worker. While this works (and was a really fun project), it's slow at startup and cannot show changes made to the API without it being sent to the worker again after modification. Observers would solve part of this, and web workers transferrable objects would solve all, but they aren't adopted widely yet.
Since worker round-trip calls happen in a matter of milliseconds, I think stalling the thread for a few milliseconds may be an alright solution. Of course I would think about terminating in cases where calls take too long, but I'm trying to create a proof of concept first.
Let's say I want to expose the api object to the worker. I would define a getter for self.api which would fetch the first layer of properties. Each property would then be another getter and the process would continue until the final object is found.
worker.js
self.addEventListener('message', function(event) {
self.dataRecieved = true;
self.data = event.data; // would actually build new getters here
});
Object.defineProperty(self, 'api', {
get: function() {
self.dataRecieved = false;
self.postMessage('request api first-layer properties');
while(!self.dataRecieved);
return self.data; // whatever properties were received from host
}
});
For experimentation, we'll do a simple round-trip with no data processing:
index.html (only JS part)
var worker = new Worker("worker.js");
worker.onmessage = function() {
worker.postMessage();
};
If onmessage would interrupt the loop, the script should theoretically work. Then the worker could access objects like window.document.body.style on the fly.
My question really boils down to: is there a way to guarantee that an event will interrupt an executing code block?
From my understanding of events in JavaScript, I thought they did interrupt the current thread. Does it not because it's executing a blank statement over and over? What if I generated code to be executed and kept doing that until the data returned?
is there a way to guarantee that an event will interrupt an executing code block
As #slebetman suggests in comments, no, not in Javascript running in a browser's web-worker (with one possible exception that I can think of, see suggestion 3. below).
My suggestions, in decreasing order of preference:
Give up the desire to feel "native" (or maybe "local" might be a better term). Something like the infinite while loop that you suggest also seems to be very much fighting agains the cooperative multitasking environment offered by Javascript, including when thinking about a single web worker.
Communication between workers in Javascript is asynchronous. Perhaps it can fail, take longer than just a few milliseconds. I'm not sure what your use case is, but my feeling is that when the project grows, you might want to use those milliseconds for something else.
You could change your defined property to return a promise, and then the caller would do a .then on the response to retrieve the value, just like any other asynchronous API.
Angular Protractor/Webdriver has an API that uses a control flow to simulate a synchronous environment using promises, by always passing promises about. Taking the code from https://stackoverflow.com/a/22697369/1319998
browser.get(url);
var title = browser.getTitle();
expect(title).toEqual('My Title');
By my understanding, each line above adds a promise to the control flow to execute asynchronously. title isn't actually the title, but a promise that resolves to the title for example. While it looks like synchronous code, the getting and testing all happens asynchronously later.
You could implement something similar in the web worker. However, I do wonder whether it will be worth the effort. There would be a lot of code to do this, and I can't help feeling that the main consequence would be that it would end up harder to write code using this, and not easier, as there would be a lot of hidden behaviour.
The only thing that I know of that can be made synchronous in Javascript, is XMLHttpRequest when setting the async parameter to false https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest#Parameters. I wonder if you could come up with some sort of way to request to the server that maintains a connection with the main thread and pass data along that way. I have to say, my instinct is that this is quite an awful idea, and would be much slower than just requesting data from the main thread.
For what I know, there is not something native in JS to do this but it is relatively easy to do something similar. I made one some time ago for myself: https://github.com/xpy/whener/blob/master/whener.js .
You use it like when( condition, callback ) where condition is a function that should return true when your condition is met, and callback is the function that you want to execute at that time.
I'm working on a game prototype and worried about the following case: Browser does AJAX to Node.JS, which has to do several MongoDB operations using async.series.
What prevents multiple requests at the same time causing the database issues? New events (i.e. db operations) seem like they could be run out of order or in between the async.series steps.
In other words, what happens if a user does AJAX calls very quickly, before the prior ones have finished their async.series. Hopefully that makes sense.
If this is indeed an issue, what is the proper way to handle it?
First and foremost, #fmodos's comment should be completely disregarded. It is wrong on many levels but most simply you could have any number of nodes running (say on Heroku) and there is no guarantee that subsequent requests will hit the same node.
Now, I'm going to answer your question by asking more questions. (You really didn't give me a choice here)
What are these operations doing? Inserting documents? Updating existing documents? Removing documents? This is very important because if all you're doing is simply inserting documents then why does it matter if one finishes for before the other? If you're updating documents then you should NOT be issuing a find, grabbing a ref to the object, and then calling save. (I'm making the assumption you're using mongoose, if you're not, I would) Instead what you should be doing is using built in mongo functions like $inc which properly handle concurrent requests.
http://docs.mongodb.org/manual/reference/operator/update/inc/
Does that help at all? If not, please let me know and I will give it another shot.
Mongo has database wide read/write locks. It gives preference to writes of the same collection first then fulfills reads. So, if by chance, you have Bill writing to the db and Joe is reading at the same time, Bill's write will execute first while Joe waits until the write is complete and then he is given all the data (including Bill's).