I would like to implement fd_read from the WASI API by waiting for the user to type some text in an HTML input field, and then continuing with the rest of the WASI calls. I.e., with something like:
fd_read = (fd, iovs, iovsLen, nread) => {
// only care about 'stdin'
if(fd !== STDIN)
return WASI_ERRNO_BADF;
const encoder = new TextEncoder();
const view = new DataView(memory.buffer);
view.setUint32(nread, 0, true);
// create a UInt8Array for each buffer
const buffers = Array.from({ length: iovsLen }, (_, i) => {
const ptr = iovs + i * 8;
const buf = view.getUint32(ptr, true);
const bufLen = view.getUint32(ptr + 4, true);
return new Uint8Array(memory.buffer, buf, bufLen);
});
// get input for each buffer
buffers.forEach(buf => {
const input = waitForUserInput();
buf.set(encoder.encode(input));
view.setUint32(nread, view.getUint32(nread, true) + input.length, true);
});
return WASI_ESUCCESS;
}
The implementation works if the variable input is provided. For example, setting const input = "1\n" passes that string to a scanf call in my C program, and it reads in a value of 1.
However, I'm struggling to "stop" the JavaScript execution while waiting for the input to be provided. I understand that JavaScript is event-driven and can't be "paused" in the traditional sense, but trying to provide the input as a callback/Promise has the problem of the function still executing, causing nothing to get passed to stdin:
buffers.forEach(buf => {
let input;
waitForUserInput().then(value => {
input = value;
});
buf.set(encoder.encode(input));
view.setUint32(nread, view.getUint32(nread, true) + input.length, true);
});
Since input is still waiting to be set, nothing gets encoded in the buffer and stdin just reads a 0.
Is there a way to wait for the input with async/await, or maybe a "hack-y" solution with setTimeout? I know that window.Prompt() would stop the execution, but I want the input to be a part of the page. Looking for vanilla JavaScript solutions.
You want to connect asynchronous JavaScript APIs to synchronous WebAssembly APIs. This is a common problem for which WebAssembly itself doesn't yet have a built-in solution, but there are some at the tooling level. In particular, you might want to take a look at Asyncify - I've written a detailed post on how it helps solve those use-cases and how to use it here: https://web.dev/asyncify/
Particularly for WASI, the post also showcases a demo that connects fd_read and other synchronous operations to async APIs from File System Access. You can find live demo at https://wasi.rreverser.com/ and its code at https://github.com/GoogleChromeLabs/wasi-fs-access.
For example, here is an implementation of the fd_read function you're interested in, that uses async-await to wait for asynchronous API: https://github.com/GoogleChromeLabs/wasi-fs-access/blob/4c2d29fdfe79abb9b48bd44e296c2019f55d0eec/src/bindings.ts#L449-L461
You should be able to adapt same approach, Asyncify tooling and potentially even the same code to your example using setTimeout or input events.
Related
In the non-blocking event loop of JavaScript, is it safe to read and then alter a variable? What happens if two processes want to change a variable nearly at the same time?
Example A:
Process 1: Get variable A (it is 100)
Process 2: Get variable A (it is 100)
Process 1: Add 1 (it is 101)
Process 2: Add 1 (it is 101)
Result: Variable A is 101 instead of 102
Here is a simplified example, having an Express route. Lets say the route gets called 1000 per second:
let counter = 0;
const getCounter = () => {
return counter;
};
const setCounter = (newValue) => {
counter = newValue;
};
app.get('/counter', (req, res) => {
const currentValue = getCounter();
const newValue = currentValue + 1;
setCounter(newValue);
});
Example B:
What if we do something more complex like Array.findIndex() and then Array.splice()? Could it be that the found index has become outdated because another event-process already altered the array?
Process A findIndex (it is 12000)
Process B findIndex (it is 34000)
Process A splice index 12000
Process B splice index 34000
Result: Process B removed the wrong index, should have removed 33999 instead
const veryLargeArray = [
// ...
];
app.get('/remove', (req, res) => {
const id = req.query.id;
const i = veryLargeArray.findIndex(val => val.id === id);
veryLargeArray.splice(i, 1);
});
Example C:
What if we add an async operation into Example B?
const veryLargeArray = [
// ...
];
app.get('/remove', (req, res) => {
const id = req.query.id;
const i = veryLargeArray.findIndex(val => val.id === id);
someAsyncFunction().then(() => {
veryLargeArray.splice(i, 1);
});
});
This question was kind of hard to find the right words to describe it. Please feel free to update the title.
As per #ThisIsNoZaku's link, Javascript has a 'Run To Completion' principle:
Each message is processed completely before any other message is processed.
This offers some nice properties when reasoning about your program, including the fact that whenever a function runs, it cannot be pre-empted and will run entirely before any other code runs (and can modify data the function manipulates). This differs from C, for instance, where if a function runs in a thread, it may be stopped at any point by the runtime system to run some other code in another thread.
A downside of this model is that if a message takes too long to complete, the web application is unable to process user interactions like click or scroll. The browser mitigates this with the "a script is taking too long to run" dialog. A good practice to follow is to make message processing short and if possible cut down one message into several messages.
Further reading: https://developer.mozilla.org/en-US/docs/Web/JavaScript/EventLoop
So, for:
Example A: This works perfectly fine as a sitecounter.
Example B: This works perfectly fine as well, but if many requests happen at the same time then the last request submitted will be waiting quite some time.
Example C: If another call to \remove is sent before someAsyncFunction finishes, then it is entirely possible that your array will be invalid. The way to resolve this would be to move the index finding into the .then clause of the async function.
IMO, at the cost of latency, this solves a lot of potentially painful concurrency problems. If you must optimise the speed of your requests, then my advice would be to look into different architectures (additional caching, etc).
I have a set of API endpoints in Express. One of them receives a request and starts a long running process that blocks other incoming Express requests.
My goal to make this process non-blocking. To understand better inner logic of Node Event Loop and how I can do it properly, I want to replace this long running function with my dummy long running blocking function that would start when I send a request to its endpoint.
I suppose, that different ways of making the dummy function blocking could cause Node manage these blockings differently.
So, my question is - how can I make a basic blocking process as a function that would run infinitely?
You can use node-webworker-threads.
var Worker, i$, x$, spin;
Worker = require('webworker-threads').Worker;
for (i$ = 0; i$ < 5; ++i$) {
x$ = new Worker(fn$);
x$.onmessage = fn1$;
x$.postMessage(Math.ceil(Math.random() * 30));
}
(spin = function(){
return setImmediate(spin);
})();
function fn$(){
var fibo;
fibo = function(n){
if (n > 1) {
return fibo(n - 1) + fibo(n - 2);
} else {
return 1;
}
};
return this.onmessage = function(arg$){
var data;
data = arg$.data;
return postMessage(fibo(data));
};
}
function fn1$(arg$){
var data;
data = arg$.data;
console.log("[" + this.thread.id + "] " + data);
return this.postMessage(Math.ceil(Math.random() * 30));
}
https://github.com/audreyt/node-webworker-threads
So, my question is - how can I make a basic blocking process as a function that would run infinitely?
function block() {
// not sure why you need that though
while(true);
}
I suppose, that different ways of making the dummy function blocking could cause Node manage these blockings differently.
Not really. I can't think of a "special way" to block the engine differently.
My goal to make this process non-blocking.
If it is really that long running you should really offload it to another thread.
There are short cut ways to do a quick fix if its like a one time thing, you can do it using a npm module that would do the job.
But the right way to do it is setting up a common design pattern called 'Work Queues'. You will need to set up a queuing mechanism, like rabbitMq, zeroMq, etc. How it works is, whenever you get a computation heavy task, instead of doing it in the same thread, you send it to the queue with relevant id values. Then a separate node process commonly called a 'worker' process will be listening for new actions on the queue and will process them as they arrive. This is a worker queue pattern and you can read up on it here:
https://www.rabbitmq.com/tutorials/tutorial-one-javascript.html
I would strongly advise you to learn this pattern as you would come across many tasks that would require this kind of mechanism. Also with this in place you can scale both your node servers and your workers independently.
I am not sure what exactly your 'long processing' is, but in general you can approach this kind of problem in two different ways.
Option 1:
Use the webworker-threads module as #serkan pointed out. The usual 'thread' limitations apply in this scenario. You will need to communicate with the Worker in messages.
This method should be preferable only when the logic is too complicated to be broken down into smaller independent problems (explained in option 2). Depending on complexity you should also consider if native code would better serve the purpose.
Option 2:
Break down the problem into smaller problems. Solve a part of the problem, schedule the next part to be executed later, and yield to let NodeJS process other events.
For example, consider the following example for calculating the factorial of a number.
Sync way:
function factorial(inputNum) {
let result = 1;
while(inputNum) {
result = result * inputNum;
inputNum--;
}
return result;
}
Async way:
function factorial(inputNum) {
return new Promise(resolve => {
let result = 1;
const calcFactOneLevel = () => {
result = result * inputNum;
inputNum--;
if(inputNum) {
return process.nextTick(calcFactOneLevel);
}
resolve(result);
}
calcFactOneLevel();
}
}
The code in second example will not block the node process. You can send the response when returned promise resolves.
Recently I have been trying to use the Web workers interface to experiment with threads in JavaScript.
Trying to make contains with web workers, following these steps:
Split the initial array to pieces of equal size
Create a web worker for each piece that runs .contains on that piece
When and if the value is found in any of the pieces, it returns true without waiting for all workers to finish.
Here is what I tried:
var MAX_VALUE = 100000000;
var integerArray = Array.from({length: 40000000}, () => Math.floor(Math.random() * MAX_VALUE));
var t0 = performance.now();
console.log(integerArray.includes(1));
var t1 = performance.now();
console.log("Call to doSomething took " + (t1 - t0) + " milliseconds.");
var promises = [];
var chunks = [];
while(integerArray.length) {
chunks.push(integerArray.splice(0,10000000));
}
t0 = performance.now();
chunks.forEach(function(element) {
promises.push(createWorker(element));
});
function createWorker(arrayChunk) {
return new Promise(function(resolve) {
var v = new Worker(getScriptPath(function(){
self.addEventListener('message', function(e) {
var value = e.data.includes(1);
self.postMessage(value);
}, false);
}));
v.postMessage(arrayChunk);
v.onmessage = function(event){
resolve(event.data);
};
});
}
firstTrue(promises).then(function(data) {
// `data` has the results, compute the final solution
var t1 = performance.now();
console.log("Call to doSomething took " + (t1 - t0) + " milliseconds.");
});
function firstTrue(promises) {
const newPromises = promises.map(p => new Promise(
(resolve, reject) => p.then(v => v && resolve(true), reject)
));
newPromises.push(Promise.all(promises).then(() => false));
return Promise.race(newPromises);
}
//As a worker normally take another JavaScript file to execute we convert the function in an URL: http://stackoverflow.com/a/16799132/2576706
function getScriptPath(foo){ return window.URL.createObjectURL(new Blob([foo.toString().match(/^\s*function\s*\(\s*\)\s*\{(([\s\S](?!\}$))*[\s\S])/)[1]],{type:'text/javascript'})); }
Any browser and cpu tried, it is extremely slow compared to just do a simple contains to the initial array.
Why is this so slow?
What is wrong with the code above?
References
Waiting for several workers to finish
Wait for the first true returned by promises
Edit: The issue is not about .contains() in specific, but it could be other array functions, e.g. .indexOf(), .map(), forEach() etc. Why splitting the work between web workers takes much longer...
This is a bit of a contrived example so it's hard to help optimize for what you're trying to do specifically but one easily-overlooked and fix-able slow path is copying data to the web-worker. If possible you can use ArrayBuffers and SharedArrayBuffers to transfer data to and from web workers quickly.
You can use the second argument to the postMessage function to transfer ownership of an arrayBuffer to the web worker. It's important to note that that buffer will no longer be usable by the main thread until it is transferred back by the web worker. SharedArrayBuffers do not have this limitation and can be read by many workers at once but aren't necessarily supported in all browsers due to a security concern (see mdn for more details)
For example
const arr = new Float64Array(new ArrayBuffer(40000000 * 8));
console.time('posting');
ww.postMessage(arr, [ arr.buffer ]);
console.timeEnd('posting');
takes ~0.1ms to run while
const arr = new Array(40000000).fill(0);
console.time('posting');
ww.postMessage(arr, [ arr ]);
console.timeEnd('posting');
takes ~10000ms to run. This is JUST to transfer the data in the message, not to run the worker logic itself.
You can read more on the postMessage transferList argument here and transferable types here. It's important to note that the way your example is doing a timing comparison includes the web worker creation time, as well, but hopefully this gives a better idea for where a lot of that time is going and how it can be better worked around.
You're doing a lot more work between t0 and t1 compared to a simple contains. These extra steps include:
converting function -> string -> regex -> blob -> object URL
calling new worker -> parses object URL -> JS engine interprets code
sending web worked data -> serialized on main thread -> deserialized in worker (likely in memory struct that's copied actually, so not super slow)
You're better off creating the thread first, then continuously handing it data. It may not be faster but it won't lock up your UI.
Also, if you're repeatedly searching through the array may I suggest converting it into a map where the key is the array value and the value is the index.
e.g.
array ['apple', 'coconut', 'kiwi'] would be converted to { apple: 1, coconut: 2, kiwi:3 }
searching through the map would occur in amortized normal time (fast), vs the array would be a linear search (slow as hell for large sets).
I'm trying to use Emscripten to write a Software to run in browser but also on other architectures (e.g. Android, PC-standalone app).
The Software structure is something like this:
main_program_loop() {
if (gui.button_clicked()) {
run_async(some_complex_action, gui.text_field.to_string())
}
if (some_complex_action_has_finished())
{
make_use_of(get_result_from_complex_action());
}
}
some_complex_action(string_argument)
{
some_object = read_local(string_argument);
interm_res = simple_computation(some_object);
other_object = expensive_computation(interm_res);
send_remote(some_object.member_var, other_object);
return other_object.member_var;
}
Let's call main_program_loop the GUI or frontend, some_complex_action the intermediate layer, and read_local, send_remode and expensive_computation the backend or lower layer.
Now the frontend and backend would be architecture specific (e.g. for Javascript read_local could use IndexDB, send_remote could use fetch),
but the intermediate layer should make up more then 50% of the code (that's why I do not want to write it two times in two different languages, and instead write it once in C and transpile it to Javascript, for Android I would use JNI).
Problems come in since in Javascript the functions on the lowest layer (fetch etc) run asyncronously (return a promise or require a callback).
One approach I tried was to use promises and send IDs through the intermediate layer
var promises = {};
var last_id = 0;
handle_click() {
var id = Module.ccall('some_complex_action', 'number', ['string'], [text_field.value]);
promises[id].then((result) => make_us_of(result));
}
recv_remote: function(str) {
promises[last_id] = fetch(get_url(str)).then((response) => response.arrayBuffer());
last_id += 1;
return last_id - 1;
}
It works for the simple case of
some_complex_action(char *str)
{
return recv_remote(str);
}
But for real cases it seem to be getting really complicated, maybe impossible. (I tried some approach where I'd given every function a state and every time a backend function finishes, the function is recalled and advances it's state or so, but the code started getting complicated like hell.) To compare, if I was to call some_complex_action from C or Java, I'd just call it in a thread separate from the GUI thread, and inside the thread everything would happen synchronously.
I wished I could just call some_complex_action from an async function and put await inside recv_remote but of cause I can put await only directly in the async function, not in some function called down the line. So that idea did not work out either.
Ideally if somehow I could stop execution of the intermediate Emscripten transpiled code until the backend function has completed, then return from the backend function with the result and continue executing the transpiled code.
Has anyone used Emterpreter and can imagine that it could help me get to my goal?
Any ideas what I could do?
in my node server I have a variable,
var clicks = 0;
each time a user clicks in the webapp, a websocket event sends a message. on the server,
clicks++;
if (clicks % 10 == 0) {
saveClicks();
}
function saveClicks() {
var placementData = JSON.stringify({'clicks' : clicks});
fs.writeFile( __dirname + '/clicks.json', placementData, function(err) {
});
}
At what rate do I have to start worrying about overwrites? How would I calculate this math?
(I'm looking at creating a MongoDB json object for each click but I'm curious what a native solution can offer).
From the node.js doc for fs.writeFile():
Note that it is unsafe to use fs.writeFile() multiple times on the
same file without waiting for the callback. For this scenario,
fs.createWriteStream() is strongly recommended.
This isn't a math problem to figure out when this might cause a problem - it's just bad code that gives you the chance of a conflict in circumstances that cannot be predicted. The node.js doc clearly states that this can cause a conflict.
To make sure you don't have a conflict, write the code in a different way so a conflict cannot happen.
If you want to make sure that all writes happen in the proper order of incoming requests so the last request to arrive is always the one who ends up in the file, then you make need to queue your data as it arrives (so order is preserved) and write to the file in a way that opens the file for exclusive access so no other request can write while that prior request is still writing and handle contention errors appropriately.
This is an issue that databases mostly do for you automatically so it may be one reason to use a database.
Assuming you weren't using clustering and thus do not have multiple processes trying to write to this file and that you just want to make sure the last value sent is the one written to the file by this process, you could do something like this:
var saveClicks = (function() {
var isWriting = false;
var lastData;
return function() {
// always save most recent data here
lastData = JSON.stringify({'clicks' : clicks});
if (!isWriting) {
writeData(lastData);
}
function writeData(data) {
isWriting = true;
lastData = null;
fs.writeFile(__dirname + '/clicks.json', data, function(err) {
isWriting = false;
if (err) {
// decide what to do if an error occurs
}
// if more data arrived while we were writing this, then write it now
if (lastData) {
writeData(lastData);
}
});
}
}
})();
#jfriend00 is definitely right about createWriteStream and already made a point about the database, and everything's pretty much said, but I would like to emphasize on the point about databases because basically the file-saving approach seems weird to me.
So, use databases.
Not only would this save you from the headache of tracking such things, but would significantly speed up things (remember that the way stuff is done in node, the numerous file reading-writing processes would be parallelized in a single thread, so basically if one of them lasts for ages, it might slightly affect the overall performance).
Redis is a perfect solution to store key-value data, so you can store data like clicks per user in a Redis database which you'll have to get running alongside anyway when your get enough traffic :)
If you're not convinced yet, take a look at this simple benchmark:
Redis:
var async = require('async');
var redis = require("redis"),
client = redis.createClient();
console.time("To Redis");
async.mapLimit(new Array(100000).fill(0), 1, (el, cb) => client.set("./test", 777, cb), () => {
console.timeEnd("To Redis");
});
To Redis: 5410.383ms
fs:
var async = require('async');
var fs = require('fs');
console.time("To file");
async.mapLimit(new Array(100000).fill(0), 1, (el, cb) => fs.writeFile("./test", 777, cb), () => {
console.timeEnd("To file");
});
To file: 20344.749ms
And, by the way, you can significantly increase the number of clicks after which the progress would be stored (now it's 10) by simply adding this "click-saver" to the socket socket.on('disconnect', ....