I'm trying to create a small javascript framework that can make it easier when dealing with a third party library. This library is primarily asynchronous, so for example to establish a connection you would use the code:
var com = establishConnection("api-key");
com.onsuccess = function(c) {
c.submit("something");
};
What I want is to be able to use my wrapper framework to be able to simply do
var com = establishConnection("api-key");
com.submit("something");
Obviously though I need a way to handle the asynch nature of the original library, so it will wait until the connection is established before carrying out the commands. I know I can do something like set a flag to say whether or not the connection is established and then use some kind of looping delay, ie
function submit(msg) {
while (!connectionEstablished) {}
// do submit stuff
}
but it seems like such an ugly hack, does anyone have any advice for nicer ways to do this?
That would be the only way to convert an asynchronous request to a synchronous request.
Yes, it looks horrible, and ugly; because it is.
I know I don't know the full in's and out's of your library, but I would suggest you (and your users) should embrace asynchronous requests rather than trying to mask them. HTTP requests are asynchronous in JavaScript for a reason; they don't lock up the browser. Synchronous requests completely lock up the browser for the duration of the request. If you're not careful, if the HTTP request is too long, your users will get an alert on most browsers saying to them that the script has stopped executing; and offer them the option to disable the script.
Related
I've studied sync and async in JavaScript. I'm going to make a crawling program using Puppeteer.
There are many code examples of crawling in Puppeteer.
But, I have one question: Why do they use async in basic Puppeteer example scripts?
Can't I use sync programming in Puppeteer? Is there an issue that I don't know about that makes async necessary?
It doesn't seem useful if I don't use multiple threads (multi-crawling).
For starters, I recommend reading How the single threaded non blocking IO model works in Node.js. This thread motivates the callback and promise-based models Node provides for achieving concurrency.
Whenever the Node process needs to access an out-of-process resource such as the file system or a network socket (as Puppeteer does to communicate with the browser it's connected to), there are two options:
Block the whole process and wait for the response, as fs.readFileSync does.
Use a promise or a callback to be notified of the response and go about other things, as fs.readFile (either via callback or fs.promises) and Puppeteer do.
The first option is a poor choice, with the only advantage being easier syntax to write. Blocking the thread to wait for a resource is like ordering a pizza, then doing nothing until the pizza arrives. You might as well read a book or water your plants while you wait.
Historically, callbacks were originally the only way to write concurrent code in Node. Eventually, promises and then arrived, which were better, but still posed readability burdens. With the advent of async/await, it's no longer difficult to write asynchronous code that reads like synchronous code. Synchronous APIs like fs's __Sync functions that alias an asynchronous API are historical artifacts. It's normal that Puppeteer doesn't offer page.waitForSelectorSync, page.$evalSync, etc.
Now, it's understandable to think that Puppeteer's asynchronous API is pointless in a simple, straight-line script since your Node process doesn't have anything else to do while awaiting responses, but having to type await for each call is the least evil of the available design options for the API.
Simply not awaiting promises isn't an option even when a script is a single sequence of straight-line code. Without await, ordering of operations/results becomes nondeterministic as each promise runs concurrently, independent of the others. This interleaving would be unintended in sequential code, but is a useful tool in cases when concurrency is desired.
For the authors of an asynchronous API where almost all calls are accessesing an external resource, as is the case with Puppeteer, the options are:
Write and maintain two versions of the API, a synchronous and an asynchronous version. No libraries that I know of do this -- it's a major pain with little benefit and plenty of room for misuse.
Write and maintain a synchronous API only to cater to the simple use case at the expense of making the library virtually unusable for anyone that cares about concurrency. Clearly, this is horrible design, like forcing everyone who orders a pizza (in the above real-world example) to do nothing until it arrives.
Write and maintain one asynchronous API, and make clients who don't care about concurrency in a particular program have to write await in front of all the calls. That's what Puppeteer does.
Incidentally, the fact that the browser is in a separate process tends to cause all manner of confusion in Puppeteer beginners. For example, the fact that data is serialized and deserialized (converted to a string) on every call to page.evaluate (and family) means that you can't pass complex structures like DOM nodes across the inter-process gap. You can't access variables you've defined in Node from the body of an evaluate callback without passing them as arguments to the evaluate call, and these variables need to be able to respond correctly to JSON.stringify() (that is, be serializable).
Just 13 hours before this post, someone asked node.js puppeteer "document is not defined" -- they were trying to access the browser process' document object inside of Node.
If you're on Windows, try running a simple Puppeteer Node script that doesn't close the browser, then look at your task manager. On Linux, you can run ps -a. You'll see that there's a Chromium browser and a Node process. The two processes communicate over a socket, which has much higher latency than intra-process communication and involves the operating system's network stack. Every Puppeteer call provides an opportunity for concurrency that'd be lost if Puppeteer's API was synchronous.
Understanding the inter-process gap is critical to success in Puppeteer because it motivates why the API calls are asynchronous, and helps clarify which code is executing in which process.
async is very important for data fetching/crawling. You can imagine this case, you have 1 element is book-container, but inside book-container, it will have book data coming later on UI with API fetch.
const scraperObject = {
url: 'http://book-store.com',
scraper(browser){
let page = browser.newPage();
page.goto(this.url);
page.waitForSelector('.book-container');
page.waitForSelector('.book');
//TODO: save book data after this
});
}
}
With this code snippet, it will run like this
page.goto(this.url) Go to the page with certain URL
page.waitForSelector('.book-container') No async here, so it will try to get .book-container element immediately (of course, it won't be there because the page is possibly still loading due to some network problem)
page.waitForSelector('.book') Similarly, it try to get book data immediately (even though book-container has not been in HTML yet)
To solve this problem, we should have async to WAIT for elements ready in HTML.
const scraperObject = {
url: 'http://book-store.com',
async scraper(browser){
let page = await browser.newPage();
await page.goto(this.url);
await page.waitForSelector('.book-container');
await page.waitForSelector('.book');
//TODO: save book data after this
});
}
}
Explain it again with async/await.
page.goto(this.url) Go to the page with certain URL and wait till the page loaded
page.waitForSelector('.book-container') Wait till .book-container element appears in HTML
page.waitForSelector('.book') Wait till .book element appears in HTML (we can understand that API's data responded)
The impression I get from people is... All JavaScript functions are synchronous unless used with process.nextTick. When's the best time to use it?
I want to make sure that I don't over use it in places where I don't need it. At this point, I'm thinking to use it right before something like a database call, however, at the same time, as I understand, those calls are asynchronous by default because of the whole "async IO" thing.
Are they to be used only when doing some intensive work within the JavaScript boundaries? Like parsing XML etc?
Btw, there's already a question like this but it seems dead so I raised another one.
I'm thinking to use it right before something like a database call, however, at the same time, as I understand, those calls are asynchronous by default because of the whole "async IO" thing.
Yes. The database driver itself should be natively asynchronous already, so you don't need to use process.nextTick yourself here to "make it asynchronous". The most time-consuming part is the IO and the computations inside the database, so waiting an extra tick just slows things down actually.
Are they to be used only when doing some intensive work within the JavaScript boundaries? Like parsing XML etc?
Yes, exactly. You can use it to prevent large synchronous functions from blocking your application. If you want to parse an XML file, instead of gnawing through it for 3 seconds during which no new connections can be opened, no requests received, and no responses be sent, you would stream the file and parse only small chunks of it every time before using nextTick and allowing other work to be done concurrently.
However, notice that the parser should use nextTick internally and offer an asynchronous API, instead of the caller using nextTick before invoking the parser.
This answer makes no claims of being complete, but here are my thoughts:
I can imagine two use cases. The first one is, to make sure something is really async. This comes in handy when using EventEmitter. Imagine you want to be able to use all methods of your emitter like this:
const EventEmitter = require('events');
class MyEmitter extends EventEmitter {
aMethod(){
console.log('some sync stuff');
this.emit('aMethodResponse');
return this;
}
}
var myEmitter = new MyEmitter();
myEmitter.aMethod()
.once('aMethodResponse', () => console.log('got response'));
This will simply not work as the event is fired before the listener is established. process.nextTick() makes sure that this won't happen.
aMethod(){
console.log('some sync stuff');
process.nextTick(() => this.emit('aMethodResponse'));
return this;
}
Edit: removed second suggestion because it was simply wrong
If I have an SMTP server (like haraka) or a web server (like Express) that uses Node.js and I have to use a sync function that could not convert to be Async what would happen?
If I have to make an HTTP Request in a syncronous way the server will be hanged to all users until the http request is finished or be hanged only to the current user/email? Will the processing of all users be paused?
To understand the answer to your question, you really need to understand the node.js event loop. I highly recommend visiting that SO link and following a few of the links there.
If you must perform a synchronous operation, be aware that yes, a synchronous call will block all other requests on that process. If you want more than one client/remote to be served concurrently, then you'll definitely want to use the cluster module baked into node.js to spawn concurrent processes.
Also, if your synchronous operation is slow enough to impact your QoS, then it'd be a very smart idea to get familiar with process.nextTick and prevent your processes from being completely stalled while waiting for a sync operation to complete.
As far as I know, synchronous requests are not blocking all users, but before I do something like that, if I am not absolutely sure, I conduct a few tests. Using futures or other libraries, one can send synchronous requests, which is sub-optimal indeed, but if the developer does not have the rights/possibility to work with all the source-code, then it might be a necessary evil.
If this is done on the client-side (not this case, but let's clarify anyway), then a synchronous request from a browser looks like this:
funcion Foo(){
var request = new XMLHttpRequest();
request.open("GET", "http://server.com", false);
request.send();
function handleResponse(response) {
console.log(response);
}
handleResponse(request.responseText);
}
One could simply encapsulate number of synchronous requests as an asynchronous request.
The "func" parameter within the below code could for example contain multiple synchronous requests in order. This should give you more power over data contrasting the use of the DOM as a medium to act on the data. (Is there another way?, it has been a while since I used javaScript)
function asyncModule(func)
{
"use strict";
var t, args;
t = func.timeout === undefined ? 1 : func.timeout;
args = Array.prototype.slice.call(arguments, 1);
setTimeout(function () {
func.apply(null, args);
}, t);
}
Now something must be wrong with my reasoning because here is what the specs says:
Synchronous XMLHttpRequest outside of workers is in the process of being removed from the web platform as it has detrimental effects to the end user's experience. (This is a long process that takes many years.) Developers must not pass false for the async argument when the JavaScript global environment is a document environment. User agents are strongly encouraged to warn about such usage in developer tools and may experiment with throwing an InvalidAccessError exception when it occurs. # https://xhr.spec.whatwg.org/
I would think you would want to avoid async in requests at all costs and instead wrapp sync requests within async function.
Here is the main question along with the follow up.
Is there something wrong with the example I gave?
If not then:
How is forcing requests to be async the right solution?
It goes without saying that you have freedom to debunk any of my "claims" if they are simply wrong or half truths. I am confused over this, I give you that.
Keep in mind that I am testing javaScript in terminal, not in the browser. I used the webserver within GO programming language and everything seems to be working fine. It is not until I test the code within the browser that I get hint for this spec.
This answer has been edited.
Yes I my reasoning was faulty!
There are two angles to think about.
What does async actually mean in javascript?
Can one async call stall another async call?
Async in javascript doesn't mean script will be running in a interleaved/alternating processes with more then one callstack. It can be more like a global timed defer/postpone command that will fully take over once it get its chance. This means async call can be blocking and the nonblocking "async:true" part is only a "trick" based on how xhttprequest is implemented.
This means encapsulating a synchrounous request within setTimeout could be waiting for a failed request that ends up blocking other unrelated async requests where as "async:true" feature would only execute based on its state value.
This means older browser support requires you to chain requests or to use DOM as a medium when you need to do multiple requests that depend on another..Ugh...
Lucky for us, Javascript has threads now. Now we can simply use threads to get clean encapsulation of multiple correlated requests in sync. (or any other background tasks)
In short:
The browser shouldn't have any problems of running request in sync if it is within a worker. Browsers have yet to become OS, but they are closer.
P.S. This answer is more or less because of trial and error. I made some test cases around firefox and observed async request do halt other async requests. I am simply extrapolating from that observation. I will not accept my own answer in case I am still missing something.
EDIT (Again..)
Actually, it might be possible to use xhttp.timeout along with xhttp.ontimeout. See Timeout XMLHttpRequest
This means you could recover from bad requests if you abstract setTimeout and use it as a schedular.
// Simple example
function runSchedular(s)
{
setTimeout(function() {
if (s.ptr < callQue.length) {
// Handles rescheduling if needed by pushing the que.
s = s.callQue[s.ptr++](s);
} else {
s.ptr = 0;
s.callQue = [];
s.t = 200;
}
runSchedular(s);
}, s.t);
}
I'm wondering if there's a way to cause JavaScript to wait for some variable-length code execution to finish before continuing using events and loops. Before answering with using timeouts, callbacks or referencing this as a duplicate, hear me out.
I want to expose a large API to a web worker. I want this API to feel 'native' in the sense that you can access each member using a getter which gets the information from the other thread. My initial idea was to compile the API and rebuild the entire object on the worker. While this works (and was a really fun project), it's slow at startup and cannot show changes made to the API without it being sent to the worker again after modification. Observers would solve part of this, and web workers transferrable objects would solve all, but they aren't adopted widely yet.
Since worker round-trip calls happen in a matter of milliseconds, I think stalling the thread for a few milliseconds may be an alright solution. Of course I would think about terminating in cases where calls take too long, but I'm trying to create a proof of concept first.
Let's say I want to expose the api object to the worker. I would define a getter for self.api which would fetch the first layer of properties. Each property would then be another getter and the process would continue until the final object is found.
worker.js
self.addEventListener('message', function(event) {
self.dataRecieved = true;
self.data = event.data; // would actually build new getters here
});
Object.defineProperty(self, 'api', {
get: function() {
self.dataRecieved = false;
self.postMessage('request api first-layer properties');
while(!self.dataRecieved);
return self.data; // whatever properties were received from host
}
});
For experimentation, we'll do a simple round-trip with no data processing:
index.html (only JS part)
var worker = new Worker("worker.js");
worker.onmessage = function() {
worker.postMessage();
};
If onmessage would interrupt the loop, the script should theoretically work. Then the worker could access objects like window.document.body.style on the fly.
My question really boils down to: is there a way to guarantee that an event will interrupt an executing code block?
From my understanding of events in JavaScript, I thought they did interrupt the current thread. Does it not because it's executing a blank statement over and over? What if I generated code to be executed and kept doing that until the data returned?
is there a way to guarantee that an event will interrupt an executing code block
As #slebetman suggests in comments, no, not in Javascript running in a browser's web-worker (with one possible exception that I can think of, see suggestion 3. below).
My suggestions, in decreasing order of preference:
Give up the desire to feel "native" (or maybe "local" might be a better term). Something like the infinite while loop that you suggest also seems to be very much fighting agains the cooperative multitasking environment offered by Javascript, including when thinking about a single web worker.
Communication between workers in Javascript is asynchronous. Perhaps it can fail, take longer than just a few milliseconds. I'm not sure what your use case is, but my feeling is that when the project grows, you might want to use those milliseconds for something else.
You could change your defined property to return a promise, and then the caller would do a .then on the response to retrieve the value, just like any other asynchronous API.
Angular Protractor/Webdriver has an API that uses a control flow to simulate a synchronous environment using promises, by always passing promises about. Taking the code from https://stackoverflow.com/a/22697369/1319998
browser.get(url);
var title = browser.getTitle();
expect(title).toEqual('My Title');
By my understanding, each line above adds a promise to the control flow to execute asynchronously. title isn't actually the title, but a promise that resolves to the title for example. While it looks like synchronous code, the getting and testing all happens asynchronously later.
You could implement something similar in the web worker. However, I do wonder whether it will be worth the effort. There would be a lot of code to do this, and I can't help feeling that the main consequence would be that it would end up harder to write code using this, and not easier, as there would be a lot of hidden behaviour.
The only thing that I know of that can be made synchronous in Javascript, is XMLHttpRequest when setting the async parameter to false https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest#Parameters. I wonder if you could come up with some sort of way to request to the server that maintains a connection with the main thread and pass data along that way. I have to say, my instinct is that this is quite an awful idea, and would be much slower than just requesting data from the main thread.
For what I know, there is not something native in JS to do this but it is relatively easy to do something similar. I made one some time ago for myself: https://github.com/xpy/whener/blob/master/whener.js .
You use it like when( condition, callback ) where condition is a function that should return true when your condition is met, and callback is the function that you want to execute at that time.