One could simply encapsulate number of synchronous requests as an asynchronous request.
The "func" parameter within the below code could for example contain multiple synchronous requests in order. This should give you more power over data contrasting the use of the DOM as a medium to act on the data. (Is there another way?, it has been a while since I used javaScript)
function asyncModule(func)
{
"use strict";
var t, args;
t = func.timeout === undefined ? 1 : func.timeout;
args = Array.prototype.slice.call(arguments, 1);
setTimeout(function () {
func.apply(null, args);
}, t);
}
Now something must be wrong with my reasoning because here is what the specs says:
Synchronous XMLHttpRequest outside of workers is in the process of being removed from the web platform as it has detrimental effects to the end user's experience. (This is a long process that takes many years.) Developers must not pass false for the async argument when the JavaScript global environment is a document environment. User agents are strongly encouraged to warn about such usage in developer tools and may experiment with throwing an InvalidAccessError exception when it occurs. # https://xhr.spec.whatwg.org/
I would think you would want to avoid async in requests at all costs and instead wrapp sync requests within async function.
Here is the main question along with the follow up.
Is there something wrong with the example I gave?
If not then:
How is forcing requests to be async the right solution?
It goes without saying that you have freedom to debunk any of my "claims" if they are simply wrong or half truths. I am confused over this, I give you that.
Keep in mind that I am testing javaScript in terminal, not in the browser. I used the webserver within GO programming language and everything seems to be working fine. It is not until I test the code within the browser that I get hint for this spec.
This answer has been edited.
Yes I my reasoning was faulty!
There are two angles to think about.
What does async actually mean in javascript?
Can one async call stall another async call?
Async in javascript doesn't mean script will be running in a interleaved/alternating processes with more then one callstack. It can be more like a global timed defer/postpone command that will fully take over once it get its chance. This means async call can be blocking and the nonblocking "async:true" part is only a "trick" based on how xhttprequest is implemented.
This means encapsulating a synchrounous request within setTimeout could be waiting for a failed request that ends up blocking other unrelated async requests where as "async:true" feature would only execute based on its state value.
This means older browser support requires you to chain requests or to use DOM as a medium when you need to do multiple requests that depend on another..Ugh...
Lucky for us, Javascript has threads now. Now we can simply use threads to get clean encapsulation of multiple correlated requests in sync. (or any other background tasks)
In short:
The browser shouldn't have any problems of running request in sync if it is within a worker. Browsers have yet to become OS, but they are closer.
P.S. This answer is more or less because of trial and error. I made some test cases around firefox and observed async request do halt other async requests. I am simply extrapolating from that observation. I will not accept my own answer in case I am still missing something.
EDIT (Again..)
Actually, it might be possible to use xhttp.timeout along with xhttp.ontimeout. See Timeout XMLHttpRequest
This means you could recover from bad requests if you abstract setTimeout and use it as a schedular.
// Simple example
function runSchedular(s)
{
setTimeout(function() {
if (s.ptr < callQue.length) {
// Handles rescheduling if needed by pushing the que.
s = s.callQue[s.ptr++](s);
} else {
s.ptr = 0;
s.callQue = [];
s.t = 200;
}
runSchedular(s);
}, s.t);
}
Related
I've studied sync and async in JavaScript. I'm going to make a crawling program using Puppeteer.
There are many code examples of crawling in Puppeteer.
But, I have one question: Why do they use async in basic Puppeteer example scripts?
Can't I use sync programming in Puppeteer? Is there an issue that I don't know about that makes async necessary?
It doesn't seem useful if I don't use multiple threads (multi-crawling).
For starters, I recommend reading How the single threaded non blocking IO model works in Node.js. This thread motivates the callback and promise-based models Node provides for achieving concurrency.
Whenever the Node process needs to access an out-of-process resource such as the file system or a network socket (as Puppeteer does to communicate with the browser it's connected to), there are two options:
Block the whole process and wait for the response, as fs.readFileSync does.
Use a promise or a callback to be notified of the response and go about other things, as fs.readFile (either via callback or fs.promises) and Puppeteer do.
The first option is a poor choice, with the only advantage being easier syntax to write. Blocking the thread to wait for a resource is like ordering a pizza, then doing nothing until the pizza arrives. You might as well read a book or water your plants while you wait.
Historically, callbacks were originally the only way to write concurrent code in Node. Eventually, promises and then arrived, which were better, but still posed readability burdens. With the advent of async/await, it's no longer difficult to write asynchronous code that reads like synchronous code. Synchronous APIs like fs's __Sync functions that alias an asynchronous API are historical artifacts. It's normal that Puppeteer doesn't offer page.waitForSelectorSync, page.$evalSync, etc.
Now, it's understandable to think that Puppeteer's asynchronous API is pointless in a simple, straight-line script since your Node process doesn't have anything else to do while awaiting responses, but having to type await for each call is the least evil of the available design options for the API.
Simply not awaiting promises isn't an option even when a script is a single sequence of straight-line code. Without await, ordering of operations/results becomes nondeterministic as each promise runs concurrently, independent of the others. This interleaving would be unintended in sequential code, but is a useful tool in cases when concurrency is desired.
For the authors of an asynchronous API where almost all calls are accessesing an external resource, as is the case with Puppeteer, the options are:
Write and maintain two versions of the API, a synchronous and an asynchronous version. No libraries that I know of do this -- it's a major pain with little benefit and plenty of room for misuse.
Write and maintain a synchronous API only to cater to the simple use case at the expense of making the library virtually unusable for anyone that cares about concurrency. Clearly, this is horrible design, like forcing everyone who orders a pizza (in the above real-world example) to do nothing until it arrives.
Write and maintain one asynchronous API, and make clients who don't care about concurrency in a particular program have to write await in front of all the calls. That's what Puppeteer does.
Incidentally, the fact that the browser is in a separate process tends to cause all manner of confusion in Puppeteer beginners. For example, the fact that data is serialized and deserialized (converted to a string) on every call to page.evaluate (and family) means that you can't pass complex structures like DOM nodes across the inter-process gap. You can't access variables you've defined in Node from the body of an evaluate callback without passing them as arguments to the evaluate call, and these variables need to be able to respond correctly to JSON.stringify() (that is, be serializable).
Just 13 hours before this post, someone asked node.js puppeteer "document is not defined" -- they were trying to access the browser process' document object inside of Node.
If you're on Windows, try running a simple Puppeteer Node script that doesn't close the browser, then look at your task manager. On Linux, you can run ps -a. You'll see that there's a Chromium browser and a Node process. The two processes communicate over a socket, which has much higher latency than intra-process communication and involves the operating system's network stack. Every Puppeteer call provides an opportunity for concurrency that'd be lost if Puppeteer's API was synchronous.
Understanding the inter-process gap is critical to success in Puppeteer because it motivates why the API calls are asynchronous, and helps clarify which code is executing in which process.
async is very important for data fetching/crawling. You can imagine this case, you have 1 element is book-container, but inside book-container, it will have book data coming later on UI with API fetch.
const scraperObject = {
url: 'http://book-store.com',
scraper(browser){
let page = browser.newPage();
page.goto(this.url);
page.waitForSelector('.book-container');
page.waitForSelector('.book');
//TODO: save book data after this
});
}
}
With this code snippet, it will run like this
page.goto(this.url) Go to the page with certain URL
page.waitForSelector('.book-container') No async here, so it will try to get .book-container element immediately (of course, it won't be there because the page is possibly still loading due to some network problem)
page.waitForSelector('.book') Similarly, it try to get book data immediately (even though book-container has not been in HTML yet)
To solve this problem, we should have async to WAIT for elements ready in HTML.
const scraperObject = {
url: 'http://book-store.com',
async scraper(browser){
let page = await browser.newPage();
await page.goto(this.url);
await page.waitForSelector('.book-container');
await page.waitForSelector('.book');
//TODO: save book data after this
});
}
}
Explain it again with async/await.
page.goto(this.url) Go to the page with certain URL and wait till the page loaded
page.waitForSelector('.book-container') Wait till .book-container element appears in HTML
page.waitForSelector('.book') Wait till .book element appears in HTML (we can understand that API's data responded)
At the risk of getting roasted for not posting code, what is the best way for getting around the 6 concurrent call limitation for ajax requests?
So currently I'm working with an application that can have up to 40 or so ajax requests on page load. For background, most of these requests are for various graphs, hidden behind tabs. There are some requests that can be triggered by the user (such as updating the name of an entity without refreshing the page). This means that with the limitation on concurrent requests the user won't be able to change anything until there's only 5 other requests running, and that's an unacceptable user experience.
This may mean that the app is structured badly, but most of the things loading are not required right away.
Anyway, I've looked a bit into fetch() and webworkers but can't find any information on whether these would help get around the limitation.
One solution would be to put resources on different subdomains, but this makes the backend API unnecessarily complicated (and it's a browser issue, not a server issue).
I've considered these approaches:
delay requests until the user actively needs them (IMO this is a bad user experience because they will have to wait a little bit a lot, which is annoying)
create a queuing system that leaves open one spot for user initiated requests (I'm not sure how to implement this, but it should be doable)
restructure the API so that more data is returned per request (this again is mainly a backend solution that feels a little dirty and unRESTful. Also it won't necessarily improve the load time)
chaining calls such as with Multiple Async AJAX Calls Best Practice (however given there are an unpredictable number of calls on unrelated endpoints so I don't think this is all that practical here)
webworkers? (again, not sure if this could help, since this is used to multithread js)
fetch()? (I can't find info on whether this is subject to the same limitation)
This is very much opinion based.
40 requests is not unreasonable but depending on your server and site setup it can take quite a while.
With that many calls I would bundle some of them together in a initializePage=X call. This does involve some serverside work.
Delay requests is not necessarily bad, depending on your estimated time to deliver. If possible you could present some kind of animation or "expected result" until the response ticks in, to keep the user entertained. The same applies to Queing your requests.
Restructuring your code to return everything in a bundle could also greatly speed up your site if you run a lot of initialization on your server (like security checks).
If performance is still a concern you can look into connections that provide faster results such as EventSource or WebSocket. Such a faster connection also allows for a more flexible approach to chaining. EventSource, for instance, supports events, so you could set several events on a single, bundled request and fire them as the server returns data.
Webworkers is not the answer, as the problem here is connection speed and concurrent connection limits.
I don't think we can answer this question directly. Several of the solutions you have mentioned are viable but vary by level of effort. If you are willing to adjust architecture you can consider a GraphQL approach which can wrap the bundling for you. I would also say that you can maintain REST but have a special proxy service that bundles data for you. I'd also say, don't let RESTfullness dictate or force how you develop.
Also, delaying requests until the user needs them seems like the appropriate choice to me. It's the basis for why we have "above the fold" CSS styling and infinite scrolling. Load what is needed right now first and defer the stuff that might not actually matter when it needs to be.
Concurrency of AJAX calls would come into picture if these requests are called from one thread. If WebWorker is used with AJAX then no issues at all, reason being each instance of webworker will be isolated, in a thread that is not in the main thread.
I would call that as JaxWeb and I will be pushing a git repo in coming week where you may find pure JS code that takes care of it. This is being tested right now, but yeah it does solve the problem.
Example:
Add below code in JaxWeb.js
onmessage = function (e) {
var JaxWeb = function (e) {
return {
requestChannel: {},
get_csrf_token: function () {
return this._csrf_token;
},
set_csrf_token: function (_csrf_token = null) {
this._csrf_token = _csrf_token;
},
prepare: (function ( e ) {
this.requestChannel = new XMLHttpRequest();
this.requestChannel.onreadystatechange = function () {
if (this.readyState == 4 && this.status == 200) {
postMessage(JSON.parse(this.responseText));
}
};
this.requestChannel.open(e.data.method, e.data.callname, true);
this.requestChannel.setRequestHeader("X-CSRF-TOKEN", e.data.token);
var postData = '';
if (e.data.data)
postData = JSON.stringify(e.data.data);
this.requestChannel.send(postData);
})(e)
}
};
return JaxWeb(e);
}
Usage:
jaxWebGetServerResponse = function () {
var wk2 = new Worker('path_to_jaxweb_js/JaxWeb.js');
wk2.postMessage({
"callname": '<url end point>',
"method": '<your http method>',
"data": ''
});
wk2.onmessage = function (serverResponse) {
//
//process results
//with data that is received from server
}
};
//Invoke the function
jaxWebGetServerResponse();
Recently, I have been developing web application and I realize that I am not making use of the asynchronous property at all. Hence I am ending up with a lot of nested callbacks.
For example, if the user want to get a file from the server through a particular API, I will have code similar to this,
db.query(<select list of permitted files_names>, function(err, filenames) {
async.each(file_names, function(name, next) {
//open each file to put into array
});
})
This code needs to query database to get a list of file names before looping asynchronously and putting each file content into an array. Finally it will return the finished array to the client.
With the nested callback, and async library, this code is behaving like a synchronous code.
names = db.querySync(//select list of permitted files_names);
for(name in names) {
//open each file to put into array
}
I am better off writing synchronous code like this since it is much neater. My use case might be a little strange but most of my api behaves in similar manner and that makes me think why do I even need asynchronous function?
Can someone please enlighten me if there are any differences between these two codes in term of performance? How do I make use of non-blocking property to enhance the performance in this use case?
If you're writing callback functions you're using by definition using async calls. The callback function fires only when the operation is complete or has errored out. You don't need a fancy library to use these, this is the backbone of how Node's event-loop driven subsystem operates.
Node strongly advises against using "Sync" calls. The Node core only includes a handful as a convenience, they're there as last-resort tools. Many libraries don't even support them so you absolutely must get used to writing async code. In the browser environment, for example, you simply cannot use blocking calls without jamming up the JavaScript runtime and stalling the page.
I prefer using Promises line Bluebird implements to keep code orderly. There are other ways, like the async library, which can help manage otherwise complicated nesting patterns.
Some of the perks include things like Promise.all method runs a series of promises to completion and then triggers a next step, and Promise.map which iterates over a list, running async code for each element, then advancing when the list is complete.
If you're disciplined about organizing your code it's not too bad. Node does require a lot more attention being paid to the order of operations than in a traditional sync-by-default language like Ruby, Python or Java, but you can get used to it. Once you start working with async code rather than fighting it you can often do a ton of work quickly, efficiently, and with a minimum of fuss, in many cases more effectively than in other languages where you must juggle threads plus locking and/or deal with IPC.
Yes, there is a difference in the two codes in terms of performance.
In synchronous code:
names = db.querySync(//select list of permitted files_names);
you are calling the database here to give list of names. Assume , this takes 10 sec. So for this time, nodeJS as it is single threaded gos into blocking state. After 10 sec, it executes the rest of the code . Assume this for loop takes 5 sec and some code takes 5 sec.
for(name in names) {
//open each file to put into array
}
//some code
Therefore it takes a total time of 20 sec.
whereas in Asynchronous code:
db.query(<select list of permitted files_names>, function(err, filenames) {
NodeJs will ask the database to give list of names to a callback. Assume that it takes 10 sec. And immediately it goes into the next step(some code), but not into the blocking state. Assume that some code takes 5 sec.
async.each(file_names, function(name, next) {
//open each file to put into array
});
})
//some code.
After 5 sec, it will check whether it has an i/o operations to be performed. Once the call back is returned. It will execute the function(name, next) {..} for the 5 sec.
So the total time here is 15sec.
In this manner the performance is improved.
If the asynchronous code should be clear and neat then make use of closures & promises.
For ex: Above asynchronous code can be written as
fun = function(err, filenames) {
async.each(file_names, function(name, next) {
//open each file to put into array
}
db.query(<select list of permitted files_names>, fun);
The benefit is simple: By using asynchronous code, the current thread (remember, Node.js is single-threaded) is able to handle other requests while the current request is waiting on something (like a database query) to return.
If you use synchronous code instead, the current thread will block while it waits, and it won't be able to handle other requests in the meantime. In other words, you lose concurrency.
To keep your asynchronous code clean, look into promises (to avoid deeply nested callbacks) and ES7 async/await (to avoid callbacks at all and write asynchronous code that looks just like synchronous code).
I'm relatively new to Node.js and JavaScript -please excuse if the question below is dumb.
To me, promises for async processing make sense but I'm not 100% sure about the use of promises when it comes to serial/sequential processing. Lets look at an example (pseudo code):
Objective: Read file, process what was read from the file and send notification
using HTTP post call.
bendUniverseWithoutPromise: function() {
var data = fs.readFileSync(..); //Read the file
var result = processData(data);
this.postNotification(data);
}
In the above function, processData() can not run until we've read the file. And we cannot send the notification until we've finished processing.
Lets look at a slightly different version (assuming each of the above method calls return a promise or we wrap them in a promise):
bendUniverseWithPromise: function() {
return new Promise(function() {
fs.readFileAsync(fileName)
.then(processData(data))
.then(postNotification(result))
})
}
Now, my questions are:
Seeing that we require serial/sequential processing in this instance, how is the promise version better than the non promise version? What is it doing better than the first example? Maybe it is a bad example, but then what would be a good example to demonstrate the differences?
Besides the syntax, the promise version adds a little (only a little) in terms readability of code and can get quite complicated with nested promises, context (this!) etc.
I do understand that technically, the first method will NOT return until all processing is done and the second will return immediately and the processing, although still sequential (in context of the method), will carry on in the background.
Is there a general rule regarding the use of promises? Are there any patterns and anti patterns?
Thank you in advance.
I will try to answer all four of your points by taking your example further.
Lets say the first operation (file read) is a slow I/O bound operation and takes 900 ms. The processing and notification are CPU bound and I/O bound respectively, taking 50 ms each. What do the terms “CPU bound” and “I/O bound” mean?
Now, both versions will take the same 1000 ms to complete, but the second example utilizes available resources better, as it is asynchronous. Herein lies the advantage of the promise based version. The first version will make the server completely unresponsive for an entire second, while the second version will only make the server unresponsive during the 50 ms CPU bound processing step.
This hopefully becomes even more lucid when we consider 10 of these requests coming in at the same time. The first example goes through them one at a time, serving request #1 after 1s, #2 after 2s, and so on, finishing after 10s. Its average performance is 1 req/s. The second version would start a file read for request #1, then immediately go on to request #2, spinning up another file read, and so on for all requests. All requests would then finish their reads in around 1s, assuming 100 ms overhead and little or no saturation of disk read bandwidth. The processing would then queue up, taking 500 ms in total for all requests. Lastly we could do the notification posting in parallel, due to it again being I/O-bound. All requests would then in this idealized example be finished in around 1.5 seconds at over 6 req/s, a 6x performance increase. This is exclusively due to the better resourcefulness provided by asynchronicity.
The rule is therefore, always use async/promises when performing I/O bound work.
Sidenote:
Your second example is not correct as there is no data or result variable defined in that scope, the correct version would pass in only the functions to then().
bendUniverseWithPromise: function (fileName) {
return fs.readFileAsync(fileName)
.then(processData)
.then(postNotification)
}
First let's correct the asynchronous code:
bendUniverseWithPromise: function() {
return fs.readFileAsync(fileName)
.then(processData)
.then(postNotification);
}
Now, that above was (almost, had it been complete) an anti-pattern - the explicit construction anti-pattern.
As for why you'd want to use the promised version, well, it's asynchronous
.. It allows other operations to take place while asynchronous (mostly I/O) operations are waited for.
Note: fs.readFileAsync does not return a promise by default and needs to be "promisified".
I'm wondering if there's a way to cause JavaScript to wait for some variable-length code execution to finish before continuing using events and loops. Before answering with using timeouts, callbacks or referencing this as a duplicate, hear me out.
I want to expose a large API to a web worker. I want this API to feel 'native' in the sense that you can access each member using a getter which gets the information from the other thread. My initial idea was to compile the API and rebuild the entire object on the worker. While this works (and was a really fun project), it's slow at startup and cannot show changes made to the API without it being sent to the worker again after modification. Observers would solve part of this, and web workers transferrable objects would solve all, but they aren't adopted widely yet.
Since worker round-trip calls happen in a matter of milliseconds, I think stalling the thread for a few milliseconds may be an alright solution. Of course I would think about terminating in cases where calls take too long, but I'm trying to create a proof of concept first.
Let's say I want to expose the api object to the worker. I would define a getter for self.api which would fetch the first layer of properties. Each property would then be another getter and the process would continue until the final object is found.
worker.js
self.addEventListener('message', function(event) {
self.dataRecieved = true;
self.data = event.data; // would actually build new getters here
});
Object.defineProperty(self, 'api', {
get: function() {
self.dataRecieved = false;
self.postMessage('request api first-layer properties');
while(!self.dataRecieved);
return self.data; // whatever properties were received from host
}
});
For experimentation, we'll do a simple round-trip with no data processing:
index.html (only JS part)
var worker = new Worker("worker.js");
worker.onmessage = function() {
worker.postMessage();
};
If onmessage would interrupt the loop, the script should theoretically work. Then the worker could access objects like window.document.body.style on the fly.
My question really boils down to: is there a way to guarantee that an event will interrupt an executing code block?
From my understanding of events in JavaScript, I thought they did interrupt the current thread. Does it not because it's executing a blank statement over and over? What if I generated code to be executed and kept doing that until the data returned?
is there a way to guarantee that an event will interrupt an executing code block
As #slebetman suggests in comments, no, not in Javascript running in a browser's web-worker (with one possible exception that I can think of, see suggestion 3. below).
My suggestions, in decreasing order of preference:
Give up the desire to feel "native" (or maybe "local" might be a better term). Something like the infinite while loop that you suggest also seems to be very much fighting agains the cooperative multitasking environment offered by Javascript, including when thinking about a single web worker.
Communication between workers in Javascript is asynchronous. Perhaps it can fail, take longer than just a few milliseconds. I'm not sure what your use case is, but my feeling is that when the project grows, you might want to use those milliseconds for something else.
You could change your defined property to return a promise, and then the caller would do a .then on the response to retrieve the value, just like any other asynchronous API.
Angular Protractor/Webdriver has an API that uses a control flow to simulate a synchronous environment using promises, by always passing promises about. Taking the code from https://stackoverflow.com/a/22697369/1319998
browser.get(url);
var title = browser.getTitle();
expect(title).toEqual('My Title');
By my understanding, each line above adds a promise to the control flow to execute asynchronously. title isn't actually the title, but a promise that resolves to the title for example. While it looks like synchronous code, the getting and testing all happens asynchronously later.
You could implement something similar in the web worker. However, I do wonder whether it will be worth the effort. There would be a lot of code to do this, and I can't help feeling that the main consequence would be that it would end up harder to write code using this, and not easier, as there would be a lot of hidden behaviour.
The only thing that I know of that can be made synchronous in Javascript, is XMLHttpRequest when setting the async parameter to false https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest#Parameters. I wonder if you could come up with some sort of way to request to the server that maintains a connection with the main thread and pass data along that way. I have to say, my instinct is that this is quite an awful idea, and would be much slower than just requesting data from the main thread.
For what I know, there is not something native in JS to do this but it is relatively easy to do something similar. I made one some time ago for myself: https://github.com/xpy/whener/blob/master/whener.js .
You use it like when( condition, callback ) where condition is a function that should return true when your condition is met, and callback is the function that you want to execute at that time.