Using Javascript Web Worker and Async together?

Using Javascript Web Worker and Async together? - javascript

Does anyone used Web Worker and Async together or does anyone vote against this?
For example, if I want to send out 1000 async calls and wait for them to finish, it is synchronous, so, it is slow. While it doesn't block the main thread, it is slow to do await one by one.
Can I wait on a single async method that creates 1000 Web Workers and sends out the 1000 fetches in parallel (one fetch per worker). And each web worker will wait for the fetch result and post the result back. And on the main async method that created 1000 Web Workers, it collects all 1000 results. Once done, it finishes the method and the main thread will continue from there?
I am not seeing examples out there. I am wondering why. Is this a bad idea? Or maybe there is a framework for it?
Thanks

You don't need workers since fetch doesn't block the main thread. On top of that there would be a significant overhead.
fetch already does what you want by default, you should simply not await every single call.
You can use Promise.allSettled to convert an array of promises to a single promise of results that you can then await.
const promises = urls.map(url => fetch(url));
const results = await Promise.allSettled(promises);

Related

Why are almost all Puppeteer calls asynchronous?

I've studied sync and async in JavaScript. I'm going to make a crawling program using Puppeteer.
There are many code examples of crawling in Puppeteer.
But, I have one question: Why do they use async in basic Puppeteer example scripts?
Can't I use sync programming in Puppeteer? Is there an issue that I don't know about that makes async necessary?
It doesn't seem useful if I don't use multiple threads (multi-crawling).

For starters, I recommend reading How the single threaded non blocking IO model works in Node.js. This thread motivates the callback and promise-based models Node provides for achieving concurrency.
Whenever the Node process needs to access an out-of-process resource such as the file system or a network socket (as Puppeteer does to communicate with the browser it's connected to), there are two options:
Block the whole process and wait for the response, as fs.readFileSync does.
Use a promise or a callback to be notified of the response and go about other things, as fs.readFile (either via callback or fs.promises) and Puppeteer do.
The first option is a poor choice, with the only advantage being easier syntax to write. Blocking the thread to wait for a resource is like ordering a pizza, then doing nothing until the pizza arrives. You might as well read a book or water your plants while you wait.
Historically, callbacks were originally the only way to write concurrent code in Node. Eventually, promises and then arrived, which were better, but still posed readability burdens. With the advent of async/await, it's no longer difficult to write asynchronous code that reads like synchronous code. Synchronous APIs like fs's __Sync functions that alias an asynchronous API are historical artifacts. It's normal that Puppeteer doesn't offer page.waitForSelectorSync, page.$evalSync, etc.
Now, it's understandable to think that Puppeteer's asynchronous API is pointless in a simple, straight-line script since your Node process doesn't have anything else to do while awaiting responses, but having to type await for each call is the least evil of the available design options for the API.
Simply not awaiting promises isn't an option even when a script is a single sequence of straight-line code. Without await, ordering of operations/results becomes nondeterministic as each promise runs concurrently, independent of the others. This interleaving would be unintended in sequential code, but is a useful tool in cases when concurrency is desired.
For the authors of an asynchronous API where almost all calls are accessesing an external resource, as is the case with Puppeteer, the options are:
Write and maintain two versions of the API, a synchronous and an asynchronous version. No libraries that I know of do this -- it's a major pain with little benefit and plenty of room for misuse.
Write and maintain a synchronous API only to cater to the simple use case at the expense of making the library virtually unusable for anyone that cares about concurrency. Clearly, this is horrible design, like forcing everyone who orders a pizza (in the above real-world example) to do nothing until it arrives.
Write and maintain one asynchronous API, and make clients who don't care about concurrency in a particular program have to write await in front of all the calls. That's what Puppeteer does.
Incidentally, the fact that the browser is in a separate process tends to cause all manner of confusion in Puppeteer beginners. For example, the fact that data is serialized and deserialized (converted to a string) on every call to page.evaluate (and family) means that you can't pass complex structures like DOM nodes across the inter-process gap. You can't access variables you've defined in Node from the body of an evaluate callback without passing them as arguments to the evaluate call, and these variables need to be able to respond correctly to JSON.stringify() (that is, be serializable).
Just 13 hours before this post, someone asked node.js puppeteer "document is not defined" -- they were trying to access the browser process' document object inside of Node.
If you're on Windows, try running a simple Puppeteer Node script that doesn't close the browser, then look at your task manager. On Linux, you can run ps -a. You'll see that there's a Chromium browser and a Node process. The two processes communicate over a socket, which has much higher latency than intra-process communication and involves the operating system's network stack. Every Puppeteer call provides an opportunity for concurrency that'd be lost if Puppeteer's API was synchronous.
Understanding the inter-process gap is critical to success in Puppeteer because it motivates why the API calls are asynchronous, and helps clarify which code is executing in which process.

async is very important for data fetching/crawling. You can imagine this case, you have 1 element is book-container, but inside book-container, it will have book data coming later on UI with API fetch.
const scraperObject = {
url: 'http://book-store.com',
scraper(browser){
let page = browser.newPage();
page.goto(this.url);
page.waitForSelector('.book-container');
page.waitForSelector('.book');
//TODO: save book data after this
});
}
}
With this code snippet, it will run like this
page.goto(this.url) Go to the page with certain URL
page.waitForSelector('.book-container') No async here, so it will try to get .book-container element immediately (of course, it won't be there because the page is possibly still loading due to some network problem)
page.waitForSelector('.book') Similarly, it try to get book data immediately (even though book-container has not been in HTML yet)
To solve this problem, we should have async to WAIT for elements ready in HTML.
const scraperObject = {
url: 'http://book-store.com',
async scraper(browser){
let page = await browser.newPage();
await page.goto(this.url);
await page.waitForSelector('.book-container');
await page.waitForSelector('.book');
//TODO: save book data after this
});
}
}
Explain it again with async/await.
page.goto(this.url) Go to the page with certain URL and wait till the page loaded
page.waitForSelector('.book-container') Wait till .book-container element appears in HTML
page.waitForSelector('.book') Wait till .book element appears in HTML (we can understand that API's data responded)

Does Promise.all() run in sequential or parallel?

Does Promise.all() run in sequential or parallel in Javascript?
For Example:
const promises = [promise1(), promise2(), promise3()]
Promise.all(promises)
.then(data => {
// whatever
});
Does promise1() execute and resolve before moving onto promise2() or does promise1(), promise2(), and promise 3() all run in parallel at the same time? I would assume like Node, Javascript in the browser to be single threaded thus they don't run in parallel?

Javascript is a single threaded application. However, asynchronous calls allow you to not be blocked by that call. This is particularly useful when making REST API calls. For example your promise1() can make a REST API call and before it waits for the results, another REST API call can be made from promise2(). This pseudo-parallelism is thus achieved by not waiting on API servers to do the tasks and fire multiple such calls to either same or different API endpoints in parallel. This allows your code to continue executing that parts that are not dependent on resolution of the promises.
So yes, promise1(), promise2() and promise3() can be said to be running in parallel in that respect. And there is a chance that promise2() gets resolved before promise1() and so on. The function Promise.all() waits for all the promises provided to it to fulfill or at least one of them to fail.
Learn more about Javascript event loops in this video by Jake Archibald.

Promise.all does not make your promises run in parallel.
Promise.all does not make your promises run at all.
What Promise.all does, is just waiting for all the promises to complete.
The line of code that actually executes things is this one:
const promises = [promise1(), promise2(), promise3()]
Assuming that your promises make HTTP calls:
promise1() is executed -> 1 HTTP call going on
promise2() is executed -> 2 HTTP calls going on
promise3() is executed -> 3 HTTP calls going on
then after a while, in undetermined order, these promises complete.
These 3 HTTP calls could complete in the same time, but in your code, you will have 3 sequential completition. For this reason, you can use Promise.all to assure that your callback is executed only when all of your promises complete.
Remember that there's a limit on the amount of parallel HTTP connections you can have inyour environment, look at this: https://stackoverflow.com/a/985704/7327715

Why is it a best practice to use Asynchornus functions everywhere in nodejs

ive found out that its a best practice for nodeJS / ExpressJS as an API-Endpoint for a ReactJS Software to only use asynchornus functions for mysql-querys etc.
But i really dont get how this should work and why it decrease performance if i didnt use it.
Please imagine the following code:
API Endpoint "/user/get/1"
Fetches every datas from user with id one and responds with a json content. If i use async there is no possibility to respond with the information gathered by the query, because its not fulfilled when the function runs to its end.
If i wrap it in a Promise, and wait until its finished its the same like a synchronus function - isnt it?
Please describe for me whats the difference between waiting for a async function or use sync function directly.
Thanks for your help!

If i wrap it in a Promise, and wait until its finished its the same
like a synchronus function - isnt it?
No, it isn't. The difference between synchronous and async functions in JavaScript is precisely that async code is scheduled to run whenever it can instead of immediately, right now. In other words, if you use sync code, your entire Node application will stop everything to grab your data from the database and won't do anything else until it's done. If you use a Promise and async code, instead, while your response won't come until it's done, the rest of the Node app will still continue running while the async code is getting the data, allowing other connections and requests to be made in the meantime.
Using async here isn't about making the one response come faster; it's about allowing other requests to be handled while waiting for that one response.

Is there a limit to how many promises can or should run concurrently?

Surprisingly google had trouble returning the result for this question.
I'm wondering how many promises can or should be ran in parallel before queuing them and waiting for the next one to finish. I guess it might depend on the user's internet, but I figured it was worth asking.
If it's based on the user's ISP/connection type is there a way to test for the ideal amount of promises to send before starting a queue?
Also, I'm talking strictly from the client side. So, single thread js.
Example code:
function uploadToServer(requestData){
return Promise((...));
}
function sendRequests(requestArray){
var count = 0;
for(var requestData in requestArray){
if(count<idealAmount){
uploadToServer(idealAmount).then(count--);
count++;
}else{
// Logic to wait before attempting to fire event
}
}
}

Promises themselves have no particular coded limits. They are just a notification system and you could have millions of them just fine (as long as you had enough memory to hold those Javascript objects).
Now, if a promise represents an underlying asynchronous operation (which they usually do), there could very well be some limits to how many of that specific type of asynchronous operation can be in flight at the same time. For example, at some point you might run into limits of how many requests a single host would accept from you at the same time. Or, you might run into local resources issues with zillions of connections somewhere.
For things like node.js disk I/O operations, the underlying disk I/O sub-system already has a queuing system so that only a small number of operations are actually running at once and the rest are queued.
So, to answer a question about how many concurrent operations you can have, it can only be analyzed and answered in the context of a specific type of asynchronous request and sometimes even a specific type of receiving host.
If you know you're processing a large or potentially large array of requests and you'll be sending a network request for every item in the array, then it is common to code a limit yourself to avoid overwhelming either local resources or the target host resources. This is usually not done with a queue, but rather code that just launches N requests and then as one finishes, it launches the next one and so on. Both the Bluebird and Async libraries have methods for managing this for you. In Bluebird, it's the concurrency option for Promise.map(). I've also hand-coded loops that manage the number of concurrent connections several times myself and here are links to some of that code:
Promise.all consumes all my RAM
Javascript - how to control how many promises access network in parallel
Make several requests to an API that can only handle 20 request a minute
Loop through an api get request with variable URL
Choose proper async method for batch processing for max requests/sec
Nodejs: Async request with a list of URL

As #jfried00 mentioned there can't be any limits on a number of promises running, as there's no such thing as running a Promise. Once you run an async function or run a code like new Promise(res => something(res)), the method is run.
What you can do is limit the number of promise chains being resolved:
// ten promises ago:
let oldPromise = doSomethingAsync();
// and now:
oldPromise.then(doSomethingNewAsync());
But actually coding this on your own is gonna dye your hair grey rather quickly as my example has shown - error handling, finding the empty slots and keeping the flow in the right order will be hard.
That said it is possible and my framework, Scramjet, which I'll shamelessly plug here does what you need:
DataStream.from(requestArray)
.setOptions({maxParallel: 4})
.unorder(requestData => uploadToServer(requestData))
.run()
Scramjet will keep 4 promises resolving but won't try to keep order (there are other methods for that) and you can use any function - if it doesn't return a promise, it will work the same as if it did. Here's some more text on unordered transforms in scramjet. You can also peek at the source code if you'd rather do that yourself...

Will await keyword in javascript slow down the application?

I am relatively new to NodeJS. I was wondering if the await keyword will slow down the entire javascript/nodeJS program?
For example,
If I have many express routers written on a single server file, and one router function calls the 'await' for a promise to resolve, will all others routers and asynchronous functions stay on halt/paused until the promise is resolved? Or just that thread will be paused?
In such a case the await call will cause performance issues to the Javascript program?

No. While await sounds like it is blocking, it is fully asynchronous(non blocking) as it is also implied in the required function signature by async keyword. It is a (lot) nicer way to use promises. So go ahead and use it in your code.
You also mentioned threads, I suggest you ignore thread concept while developing node.js apps and trust the node.js event loop. Just never use blocking IO calls(which are explicitly named so by having 'Sync' in the name).

await waits for a promise to get resolved, but as we know that node is asynchronous in nature, therefore, other requests will be made to the application will have no effect on them they will not wait for the previous request promise to be resolved.
example
route 1 -> it will await and iterate through million rows and return sum in response
route 2 -> it will only return '1' in response
now when we'll call route 1 first then route 2 then you'll see that still you'll get the response from route 2 immediately and when route 1 will get completed you'll get that response.

We Keep Coding

JavaScript is the programming language of the Web.