I have a wasm process (compiled from c++) that processes data inside a web application. Let's say the necessary code looks like this:
std::vector<JSONObject> data
for (size_t i = 0; i < data.size(); i++)
{
process_data(data[i]);
if (i % 1000 == 0) {
bool is_cancelled = check_if_cancelled();
if (is_cancelled) {
break;
}
}
}
This code basically "runs/processes a query" similar to a SQL query interface:
However, queries may take several minutes to run/process and at any given time the user may cancel their query. The cancellation process would occur in the normal javascript/web application, outside of the service Worker running the wasm.
My question then is what would be an example of how we could know that the user has clicked the 'cancel' button and communicate it to the wasm process so that knows the process has been cancelled so it can exit? Using the worker.terminate() is not an option, as we need to keep all the loaded data for that worker and cannot just kill that worker (it needs to stay alive with its stored data, so another query can be run...).
What would be an example way to communicate here between the javascript and worker/wasm/c++ application so that we can know when to exit, and how to do it properly?
Additionally, let us suppose a typical query takes 60s to run and processes 500MB of data in-browser using cpp/wasm.
Update: I think there are the following possible solutions here based on some research (and the initial answers/comments below) with some feedback on them:
Use two workers, with one worker storing the data and another worker processing the data. In this way the processing-worker can be terminated, and the data will always remain. Feasible? Not really, as it would take way too much time to copy over ~ 500MB of data to the webworker whenever it starts. This could have been done (previously) using SharedArrayBuffer, but its support is now quite limited/nonexistent due to some security concerns. Too bad, as this seems like by far the best solution if it were supported...
Use a single worker using Emterpreter and using emscripten_sleep_with_yield. Feasible? No, destroys performance when using Emterpreter (mentioned in the docs above), and slows down all queries by about 4-6x.
Always run a second worker and in the UI just display the most recent. Feasible? No, would probably run into quite a few OOM errors if it's not a shared data structure and the data size is 500MB x 2 = 1GB (500MB seems to be a large though acceptable size when running in a modern desktop browser/computer).
Use an API call to a server to store the status and check whether the query is cancelled or not. Feasible? Yes, though it seems quite heavy-handed to long-poll with network requests every second from every running query.
Use an incremental-parsing approach where only a row at a time is parsed. Feasible? Yes, but also would require a tremendous amount of re-writing the parsing functions so that every function supports this (the actual data parsing is handled in several functions -- filter, search, calculate, group by, sort, etc. etc.
Use IndexedDB and store the state in javascript. Allocate a chunk of memory in WASM, then return its pointer to JavaScript. Then read database there and fill the pointer. Then process your data in C++. Feasible? Not sure, though this seems like the best solution if it can be implemented.
[Anything else?]
In the bounty then I was wondering three things:
If the above six analyses seem generally valid?
Are there other (perhaps better) approaches I'm missing?
Would anyone be able to show a very basic example of doing #6 -- seems like that would be the best solution if it's possible and works cross-browser.
For Chrome (only) you may use shared memory (shared buffer as memory). And raise a flag in memory when you want to halt. Not a big fan of this solution (is complex and is supported only in chrome). It also depends on how your query works, and if there are places where the lengthy query can check the flag.
Instead you should probably call the c++ function multiple times (e.g. for each query) and check if you should halt after each call (just send a message to the worker to halt).
What I mean by multiple time is make the query in stages (multiple function cals for a single query). It may not be applicable in your case.
Regardless, AFAIK there is no way to send a signal to a Webassembly execution (e.g. Linux kill). Therefore, you'll have to wait for the operation to finish in order to complete the cancellation.
I'm attaching a code snippet that may explain this idea.
worker.js:
... init webassembly
onmessage = function(q) {
// query received from main thread.
const result = ... call webassembly(q);
postMessage(result);
}
main.js:
const worker = new Worker("worker.js");
const cancel = false;
const processing = false;
worker.onmessage(function(r) {
// when worker has finished processing the query.
// r is the results of the processing.
processing = false;
if (cancel === true) {
// processing is done, but result is not required.
// instead of showing the results, update that the query was canceled.
cancel = false;
... update UI "cancled".
return;
}
... update UI "results r".
}
function onCancel() {
// Occurs when user clicks on the cancel button.
if (cancel) {
// sanity test - prevent this in UI.
throw "already cancelling";
}
cancel = true;
... update UI "canceling".
}
function onQuery(q) {
if (processing === true) {
// sanity test - prevent this in UI.
throw "already processing";
}
processing = true;
// Send the query to the worker.
// When the worker receives the message it will process the query via webassembly.
worker.postMessage(q);
}
An idea from user experience perspective:
You may create ~two workers. This will take twice the memory, but will allow you to "cancel" "immediately" once. (it will just mean that in the backend the 2nd worker will run the next query, and when the 1st finishes the cancellation, cancellation will again become immediate).
Shared Thread
Since the worker and the C++ function that it called share the same thread, the worker will also be blocked until the C++ loop is finished, and won't be able to handle any incoming messages. I think the a solid option would minimize the amount of time that the thread is blocked by instead initializing one iteration at a time from the main application.
It would look something like this.
main.js -> worker.js -> C++ function -> worker.js -> main.js
Breaking up the Loop
Below, C++ has a variable initialized at 0, which will be incremented at each loop iteration and stored in memory.
C++ function then performs one iteration of the loop, increments the variable to keep track of loop position, and immediately breaks.
int x;
x = 0; // initialized counter at 0
std::vector<JSONObject> data
for (size_t i = x; i < data.size(); i++)
{
process_data(data[i]);
x++ // increment counter
break; // stop function until told to iterate again starting at x
}
Then you should be able to post a message to the web worker, which then sends a message to main.js that the thread is no longer blocked.
Canceling the Operation
From this point, main.js knows that the web worker thread is no longer blocked, and can decide whether or not to tell the web worker to execute the C++ function again (with the C++ variable keeping track of the loop increment in memory.)
let continueOperation = true
// here you can set to false at any time since the thread is not blocked here
worker.expensiveThreadBlockingFunction()
// results in one iteration of the loop being iterated until message is received below
worker.onmessage = function(e) {
if (continueOperation) {
worker.expensiveThreadBlockingFunction()
// execute worker function again, ultimately continuing the increment in C++
} {
return false
// or send message to worker to reset C++ counter to prepare for next execution
}
}
Continuing the Operation
Assuming all is well, and the user has not cancelled the operation, the loop should continue until finished. Keep in mind you should also send a distinct message for whether the loop has completed, or needs to continue, so you don't keep blocking the worker thread.
I have a set of API endpoints in Express. One of them receives a request and starts a long running process that blocks other incoming Express requests.
My goal to make this process non-blocking. To understand better inner logic of Node Event Loop and how I can do it properly, I want to replace this long running function with my dummy long running blocking function that would start when I send a request to its endpoint.
I suppose, that different ways of making the dummy function blocking could cause Node manage these blockings differently.
So, my question is - how can I make a basic blocking process as a function that would run infinitely?
You can use node-webworker-threads.
var Worker, i$, x$, spin;
Worker = require('webworker-threads').Worker;
for (i$ = 0; i$ < 5; ++i$) {
x$ = new Worker(fn$);
x$.onmessage = fn1$;
x$.postMessage(Math.ceil(Math.random() * 30));
}
(spin = function(){
return setImmediate(spin);
})();
function fn$(){
var fibo;
fibo = function(n){
if (n > 1) {
return fibo(n - 1) + fibo(n - 2);
} else {
return 1;
}
};
return this.onmessage = function(arg$){
var data;
data = arg$.data;
return postMessage(fibo(data));
};
}
function fn1$(arg$){
var data;
data = arg$.data;
console.log("[" + this.thread.id + "] " + data);
return this.postMessage(Math.ceil(Math.random() * 30));
}
https://github.com/audreyt/node-webworker-threads
So, my question is - how can I make a basic blocking process as a function that would run infinitely?
function block() {
// not sure why you need that though
while(true);
}
I suppose, that different ways of making the dummy function blocking could cause Node manage these blockings differently.
Not really. I can't think of a "special way" to block the engine differently.
My goal to make this process non-blocking.
If it is really that long running you should really offload it to another thread.
There are short cut ways to do a quick fix if its like a one time thing, you can do it using a npm module that would do the job.
But the right way to do it is setting up a common design pattern called 'Work Queues'. You will need to set up a queuing mechanism, like rabbitMq, zeroMq, etc. How it works is, whenever you get a computation heavy task, instead of doing it in the same thread, you send it to the queue with relevant id values. Then a separate node process commonly called a 'worker' process will be listening for new actions on the queue and will process them as they arrive. This is a worker queue pattern and you can read up on it here:
https://www.rabbitmq.com/tutorials/tutorial-one-javascript.html
I would strongly advise you to learn this pattern as you would come across many tasks that would require this kind of mechanism. Also with this in place you can scale both your node servers and your workers independently.
I am not sure what exactly your 'long processing' is, but in general you can approach this kind of problem in two different ways.
Option 1:
Use the webworker-threads module as #serkan pointed out. The usual 'thread' limitations apply in this scenario. You will need to communicate with the Worker in messages.
This method should be preferable only when the logic is too complicated to be broken down into smaller independent problems (explained in option 2). Depending on complexity you should also consider if native code would better serve the purpose.
Option 2:
Break down the problem into smaller problems. Solve a part of the problem, schedule the next part to be executed later, and yield to let NodeJS process other events.
For example, consider the following example for calculating the factorial of a number.
Sync way:
function factorial(inputNum) {
let result = 1;
while(inputNum) {
result = result * inputNum;
inputNum--;
}
return result;
}
Async way:
function factorial(inputNum) {
return new Promise(resolve => {
let result = 1;
const calcFactOneLevel = () => {
result = result * inputNum;
inputNum--;
if(inputNum) {
return process.nextTick(calcFactOneLevel);
}
resolve(result);
}
calcFactOneLevel();
}
}
The code in second example will not block the node process. You can send the response when returned promise resolves.
I'm trying to use Emscripten to write a Software to run in browser but also on other architectures (e.g. Android, PC-standalone app).
The Software structure is something like this:
main_program_loop() {
if (gui.button_clicked()) {
run_async(some_complex_action, gui.text_field.to_string())
}
if (some_complex_action_has_finished())
{
make_use_of(get_result_from_complex_action());
}
}
some_complex_action(string_argument)
{
some_object = read_local(string_argument);
interm_res = simple_computation(some_object);
other_object = expensive_computation(interm_res);
send_remote(some_object.member_var, other_object);
return other_object.member_var;
}
Let's call main_program_loop the GUI or frontend, some_complex_action the intermediate layer, and read_local, send_remode and expensive_computation the backend or lower layer.
Now the frontend and backend would be architecture specific (e.g. for Javascript read_local could use IndexDB, send_remote could use fetch),
but the intermediate layer should make up more then 50% of the code (that's why I do not want to write it two times in two different languages, and instead write it once in C and transpile it to Javascript, for Android I would use JNI).
Problems come in since in Javascript the functions on the lowest layer (fetch etc) run asyncronously (return a promise or require a callback).
One approach I tried was to use promises and send IDs through the intermediate layer
var promises = {};
var last_id = 0;
handle_click() {
var id = Module.ccall('some_complex_action', 'number', ['string'], [text_field.value]);
promises[id].then((result) => make_us_of(result));
}
recv_remote: function(str) {
promises[last_id] = fetch(get_url(str)).then((response) => response.arrayBuffer());
last_id += 1;
return last_id - 1;
}
It works for the simple case of
some_complex_action(char *str)
{
return recv_remote(str);
}
But for real cases it seem to be getting really complicated, maybe impossible. (I tried some approach where I'd given every function a state and every time a backend function finishes, the function is recalled and advances it's state or so, but the code started getting complicated like hell.) To compare, if I was to call some_complex_action from C or Java, I'd just call it in a thread separate from the GUI thread, and inside the thread everything would happen synchronously.
I wished I could just call some_complex_action from an async function and put await inside recv_remote but of cause I can put await only directly in the async function, not in some function called down the line. So that idea did not work out either.
Ideally if somehow I could stop execution of the intermediate Emscripten transpiled code until the backend function has completed, then return from the backend function with the result and continue executing the transpiled code.
Has anyone used Emterpreter and can imagine that it could help me get to my goal?
Any ideas what I could do?
For the past two days I have been working with chrome asynchronous storage. It works "fine" if you have a function. (Like Below):
chrome.storage.sync.get({"disableautoplay": true}, function(e){
console.log(e.disableautoplay);
});
My problem is that I can't use a function with what I'm doing. I want to just return it, like LocalStorage can. Something like:
var a = chrome.storage.sync.get({"disableautoplay": true});
or
var a = chrome.storage.sync.get({"disableautoplay": true}, function(e){
return e.disableautoplay;
});
I've tried a million combinations, even setting a public variable and setting that:
var a;
window.onload = function(){
chrome.storage.sync.get({"disableautoplay": true}, function(e){
a = e.disableautoplay;
});
}
Nothing works. It all returns undefined unless the code referencing it is inside the function of the get, and that's useless to me. I just want to be able to return a value as a variable.
Is this even possible?
EDIT: This question is not a duplicate, please allow me to explain why:
1: There are no other posts asking this specifically (I spent two days looking first, just in case).
2: My question is still not answered. Yes, Chrome Storage is asynchronous, and yes, it does not return a value. That's the problem. I'll elaborate below...
I need to be able to get a stored value outside of the chrome.storage.sync.get function. I -cannot- use localStorage, as it is url specific, and the same values cannot be accessed from both the browser_action page of the chrome extension, and the background.js. I cannot store a value with one script and access it with another. They're treated separately.
So my only solution is to use Chrome Storage. There must be some way to get the value of a stored item and reference it outside the get function. I need to check it in an if statement.
Just like how localStorage can do
if(localStorage.getItem("disableautoplay") == true);
There has to be some way to do something along the lines of
if(chrome.storage.sync.get("disableautoplay") == true);
I realize it's not going to be THAT simple, but that's the best way I can explain it.
Every post I see says to do it this way:
chrome.storage.sync.get({"disableautoplay": true, function(i){
console.log(i.disableautoplay);
//But the info is worthless to me inside this function.
});
//I need it outside this function.
Here's a tailored answer to your question. It will still be 90% long explanation why you can't get around async, but bear with me — it will help you in general. I promise there is something pertinent to chrome.storage in the end.
Before we even begin, I will reiterate canonical links for this:
After calling chrome.tabs.query, the results are not available
(Chrome specific, excellent answer by RobW, probably easiest to understand)
Why is my variable unaltered after I modify it inside of a function? - Asynchronous code reference (General canonical reference on what you're asking for)
How do I return the response from an asynchronous call?
(an older but no less respected canonical question on asynchronous JS)
You Don't Know JS: Async & Performance (ebook on JS asynchronicity)
So, let's discuss JS asynchonicity.
Section 1: What is it?
First concept to cover is runtime environment. JavaScript is, in a way, embedded in another program that controls its execution flow - in this case, Chrome. All events that happen (timers, clicks, etc.) come from the runtime environment. JavaScript code registers handlers for events, which are remembered by the runtime and are called as appropriate.
Second, it's important to understand that JavaScript is single-threaded. There is a single event loop maintained by the runtime environment; if there is some other code executing when an event happens, that event is put into a queue to be processed when the current code terminates.
Take a look at this code:
var clicks = 0;
someCode();
element.addEventListener("click", function(e) {
console.log("Oh hey, I'm clicked!");
clicks += 1;
});
someMoreCode();
So, what is happening here? As this code executes, when the execution reaches .addEventListener, the following happens: the runtime environment is notified that when the event happens (element is clicked), it should call the handler function.
It's important to understand (though in this particular case it's fairly obvious) that the function is not run at this point. It will only run later, when that event happens. The execution continues as soon as the runtime acknowledges 'I will run (or "call back", hence the name "callback") this when that happens.' If someMoreCode() tries to access clicks, it will be 0, not 1.
This is what called asynchronicity, as this is something that will happen outside the current execution flow.
Section 2: Why is it needed, or why synchronous APIs are dying out?
Now, an important consideration. Suppose that someMoreCode() is actually a very long-running piece of code. What will happen if a click event happened while it's still running?
JavaScript has no concept of interrupts. Runtime will see that there is code executing, and will put the event handler call into the queue. The handler will not execute before someMoreCode() finishes completely.
While a click event handler is extreme in the sense that the click is not guaranteed to occur, this explains why you cannot wait for the result of an asynchronous operation. Here's an example that won't work:
element.addEventListener("click", function(e) {
console.log("Oh hey, I'm clicked!");
clicks += 1;
});
while(1) {
if(clicks > 0) {
console.log("Oh, hey, we clicked indeed!");
break;
}
}
You can click to your heart's content, but the code that would increment clicks is patiently waiting for the (non-terminating) loop to terminate. Oops.
Note that this piece of code doesn't only freeze this piece of code: every single event is no longer handled while we wait, because there is only one event queue / thread. There is only one way in JavaScript to let other handlers do their job: terminate current code, and let the runtime know what to call when something we want occurs.
This is why asynchronous treatment is applied to another class of calls that:
require the runtime, and not JS, to do something (disk/network access for example)
are guaranteed to terminate (whether in success or failure)
Let's go with a classic example: AJAX calls. Suppose we want to load a file from a URL.
Let's say that on our current connection, the runtime can request, download, and process the file in the form that can be used in JS in 100ms.
On another connection, that's kinda worse, it would take 500ms.
And sometimes the connection is really bad, so runtime will wait for 1000ms and give up with a timeout.
If we were to wait until this completes, we would have a variable, unpredictable, and relatively long delay. Because of how JS waiting works, all other handlers (e.g. UI) would not do their job for this delay, leading to a frozen page.
Sounds familiar? Yes, that's exactly how synchronous XMLHttpRequest works. Instead of a while(1) loop in JS code, it essentially happens in the runtime code - since JavaScript cannot let other code execute while it's waiting.
Yes, this allows for a familiar form of code:
var file = get("http://example.com/cat_video.mp4");
But at a terrible, terrible cost of everything freezing. A cost so terrible that, in fact, the modern browsers consider this deprecated. Here's a discussion on the topic on MDN.
Now let's look at localStorage. It matches the description of "terminating call to the runtime", and yet it is synchronous. Why?
To put it simply: historical reasons (it's a very old specification).
While it's certainly more predictable than a network request, localStorage still needs the following chain:
JS code <-> Runtime <-> Storage DB <-> Cache <-> File storage on disk
It's a complex chain of events, and the whole JS engine needs to be paused for it. This leads to what is considered unacceptable performance.
Now, Chrome APIs are, from ground up, designed for performance. You can still see some synchronous calls in older APIs like chrome.extension, and there are calls that are handled in JS (and therefore make sense as synchronous) but chrome.storage is (relatively) new.
As such, it embraces the paradigm "I acknowledge your call and will be back with results, now do something useful meanwhile" if there's a delay involved with doing something with runtime. There are no synchronous versions of those calls, unlike XMLHttpRequest.
Quoting the docs:
It's [chrome.storage] asynchronous with bulk read and write operations, and therefore faster than the blocking and serial localStorage API.
Section 3: How to embrace asynchronicity?
The classic way to deal with asynchronicity are callback chains.
Suppose you have the following synchronous code:
var result = doSomething();
doSomethingElse(result);
Suppose that, now, doSomething is asynchronous. Then this becomes:
doSomething(function(result) {
doSomethingElse(result);
});
But what if it's even more complex? Say it was:
function doABunchOfThings() {
var intermediate = doSomething();
return doSomethingElse(intermediate);
}
if (doABunchOfThings() == 42) {
andNowForSomethingCompletelyDifferent()
}
Well.. In this case you need to move all this in the callback. return must become a call instead.
function doABunchOfThings(callback) {
doSomething(function(intermediate) {
callback(doSomethingElse(intermediate));
});
}
doABunchOfThings(function(result) {
if (result == 42) {
andNowForSomethingCompletelyDifferent();
}
});
Here you have a chain of callbacks: doABunchOfThings calls doSomething immediately, which terminates, but sometime later calls doSomethingElse, the result of which is fed to if through another callback.
Obviously, the layering of this can get messy. Well, nobody said that JavaScript is a good language.. Welcome to Callback Hell.
There are tools to make it more manageable, for example Promises and async/await. I will not discuss them here (running out of space), but they do not change the fundamental "this code will only run later" part.
Section TL;DR: I absolutely must have the storage synchronous, halp!
Sometimes there are legitimate reasons to have a synchronous storage. For instance, webRequest API blocking calls can't wait. Or Callback Hell is going to cost you dearly.
What you can do is have a synchronous cache of the asynchronous chrome.storage. It comes with some costs, but it's not impossible.
Consider:
var storageCache = {};
chrome.storage.sync.get(null, function(data) {
storageCache = data;
// Now you have a synchronous snapshot!
});
// Not HERE, though, not until "inner" code runs
If you can put ALL your initialization code in one function init(), then you have this:
var storageCache = {};
chrome.storage.sync.get(null, function(data) {
storageCache = data;
init(); // All your code is contained here, or executes later that this
});
By the time code in init() executes, and afterwards when any event that was assigned handlers in init() happens, storageCache will be populated. You have reduced the asynchronicity to ONE callback.
Of course, this is only a snapshot of what storage looks at the time of executing get(). If you want to maintain coherency with storage, you need to set up updates to storageCache via chrome.storage.onChanged events. Because of the single-event-loop nature of JS, this means the cache will only be updated while your code doesn't run, but in many cases that's acceptable.
Similarly, if you want to propagate changes to storageCache to the real storage, just setting storageCache['key'] is not enough. You would need to write a set(key, value) shim that BOTH writes to storageCache and schedules an (asynchronous) chrome.storage.sync.set.
Implementing those is left as an exercise.
Make the main function "async" and make a "Promise" in it :)
async function mainFuction() {
var p = new Promise(function(resolve, reject){
chrome.storage.sync.get({"disableautoplay": true}, function(options){
resolve(options.disableautoplay);
})
});
const configOut = await p;
console.log(configOut);
}
Yes, you can achieve that using promise:
let getFromStorage = keys => new Promise((resolve, reject) =>
chrome.storage.sync.get(...keys, result => resolve(result)));
chrome.storage.sync.get has no returned values, which explains why you would get undefined when calling something like
var a = chrome.storage.sync.get({"disableautoplay": true});
chrome.storage.sync.get is also an asynchronous method, which explains why in the following code a would be undefined unless you access it inside the callback function.
var a;
window.onload = function(){
chrome.storage.sync.get({"disableautoplay": true}, function(e){
// #2
a = e.disableautoplay; // true or false
});
// #1
a; // undefined
}
If you could manage to work this out you will have made a source of strange bugs. Messages are executed asynchronously which means that when you send a message the rest of your code can execute before the asychronous function returns. There is not guarantee for that since chrome is multi-threaded and the get function may delay, i.e. hdd is busy.
Using your code as an example:
var a;
window.onload = function(){
chrome.storage.sync.get({"disableautoplay": true}, function(e){
a = e.disableautoplay;
});
}
if(a)
console.log("true!");
else
console.log("false! Maybe undefined as well. Strange if you know that a is true, right?");
So it will be better if you use something like this:
chrome.storage.sync.get({"disableautoplay": true}, function(e){
a = e.disableautoplay;
if(a)
console.log("true!");
else
console.log("false! But maybe undefined as well");
});
If you really want to return this value then use the javascript storage API. This stores only string values so you have to cast the value before storing and after getting it.
//Setting the value
localStorage.setItem('disableautoplay', JSON.stringify(true));
//Getting the value
var a = JSON.stringify(localStorage.getItem('disableautoplay'));
var a = await chrome.storage.sync.get({"disableautoplay": true});
This should be in an async function. e.g. if you need to run it at top level, wrap it:
(async () => {
var a = await chrome.storage.sync.get({"disableautoplay": true});
})();
Yes I'm aware that asking for a formal memory model in Javascript is a hopeless undertaking, so I'm settling for "All browsers follow these rules" or something.
My problem is the following: I have to send events in a defined interval to a server, but events may be added to my array while doing so, i.e.:
function storeEvent(event) {
// may be called at any time
storedEvents.push(event);
}
function broadcastEvents() {
if (storedEvents.length !== 0) {
var eventString = JSON.stringify(storedEvents);
storedEvents = [];
// send eventString to server
}
window.setTimeout(broadcastEvents, BROADCAST_TIMER);
}
There's an obvious race condition in there and not even think of the missing memory barriers.
What to do? Really missing the Java memory model here..
There isn't any race condition.
All JavaScript code in the browser is single-threaded.
The setTimeout callback will run on the UI thread while it isn't doing anything else.