How to know if a function definition has been changed in NodeJS? - javascript

I'm trying to write a caching system for some time-intensive functions (Network requests/computation heavy) and I need to generate a fingerprint from the functions in order to invalidate the cached results once a developer changes the functions.
I have tried the following approach to generate the fingerprint:
const crypto = require('crypto')
const generateFingerprintOfFunc = (inputFunc) => {
const cypher = crypto.createHash('sha256');
cypher.update(inputFunc.toString());
return cypher.digest('hex');
}
However, the problem with this approach is that the fingerprint won't be changed once the developer changes any of the functions that are called inside the function that is being fingerprinted because that function's definition hasn't really changed.
const foo = () => {
return bar() + 1;
}
const bar = () => {
return 1;
}
const fingerprintOfFoo = generateFingerprintOfFunc(foo); // 489290d22f653965a59e2e5fbb7b626535babd660f7f49501fc88c3e7fbc0176
Now I will change the bar function:
const foo = () => {
return bar() + 1;
}
const bar = () => {
return 10;
}
const fingerprintOfFoo = generateFingerprintOfFunc(foo); // 489290d22f653965a59e2e5fbb7b626535babd660f7f49501fc88c3e7fbc0176
As you can see the fingerprint has not changed while the return value of the function has.
Why do I need to do this?
I'm trying to generate dynamic and automatic mocks for my expensive functions during development and testing. See this SO Question

I wanna know if there is a way that v8 would give me the path of the files making up a certain function and all of its internals.
No.
It can't. JavaScript is too dynamic.
Consider this example:
let helper;
const foo = () => helper() + 1;
const bar = () => 1;
const baz = () => 2;
helper = bar;
So far, this does the same as your example. Now suppose someone either changed the last line to read, or added a new line that reads helper = baz. In that case, no function's definition has changed, but foo's behavior has! In fact, one can take that same idea to construct an even simpler case:
let global = 42;
const foo = () => global;
If someone changes global (whether statically in the source, or dynamically), foo's return value will change, but no function definition will. In case of dynamic assignment, not even anything in the source will change. And of course such an assignment could depend on arbitrary conditions/circumstances, such as user interaction, time of day, Math.random(), whatever.
In general, the only way (for anyone, including the engine) to figure out what a function will return is to execute it.
Memoization works if developers carefully choose functions that lend themselves to memoization.
An automated system that takes any arbitrary function (without imposing any limitations on what the function does) and memoizes it, or determines whether it will the same value as last time, without actually executing it, is impossible to create in JavaScript.

What do you mean when you say "changes the function"? As in, you have jest in watch mode and want to only pick up changes when they make fundamental changes? I think the real question is why this is necessary, to begin with.
How are you generating dynamic mocks? Does this imply that you repeatedly creating these mocks in real-time with network requests? Then they aren't really mocks. You are just making real requests and then deciding to temporarily store it somewhere. You could just skip that data-saving step and your resulting testing process would be identical. That's a glorified integration test
I guess others can disagree, but your unit testing philosophy seems a little flawed. Dynamic mocks imply that you have no control over what situation ends up getting tested. If you are not in full control of your mocks, couldn't you end up testing the exact same case (or edge case) repeatedly? That's a waste of resources and can lead to flawed tests. Covering all of your intended cases would happen by coincidence, as opposed to explicit intent. Your unit tests should be deterministic. Not stochastic.
It seems like you need to solve your testing methodology, as opposed to accepting that your tests are slow and developing strategies on how to work around it.

Related

WASI-libc wait for user input when writing to stdin

I would like to implement fd_read from the WASI API by waiting for the user to type some text in an HTML input field, and then continuing with the rest of the WASI calls. I.e., with something like:
fd_read = (fd, iovs, iovsLen, nread) => {
// only care about 'stdin'
if(fd !== STDIN)
return WASI_ERRNO_BADF;
const encoder = new TextEncoder();
const view = new DataView(memory.buffer);
view.setUint32(nread, 0, true);
// create a UInt8Array for each buffer
const buffers = Array.from({ length: iovsLen }, (_, i) => {
const ptr = iovs + i * 8;
const buf = view.getUint32(ptr, true);
const bufLen = view.getUint32(ptr + 4, true);
return new Uint8Array(memory.buffer, buf, bufLen);
});
// get input for each buffer
buffers.forEach(buf => {
const input = waitForUserInput();
buf.set(encoder.encode(input));
view.setUint32(nread, view.getUint32(nread, true) + input.length, true);
});
return WASI_ESUCCESS;
}
The implementation works if the variable input is provided. For example, setting const input = "1\n" passes that string to a scanf call in my C program, and it reads in a value of 1.
However, I'm struggling to "stop" the JavaScript execution while waiting for the input to be provided. I understand that JavaScript is event-driven and can't be "paused" in the traditional sense, but trying to provide the input as a callback/Promise has the problem of the function still executing, causing nothing to get passed to stdin:
buffers.forEach(buf => {
let input;
waitForUserInput().then(value => {
input = value;
});
buf.set(encoder.encode(input));
view.setUint32(nread, view.getUint32(nread, true) + input.length, true);
});
Since input is still waiting to be set, nothing gets encoded in the buffer and stdin just reads a 0.
Is there a way to wait for the input with async/await, or maybe a "hack-y" solution with setTimeout? I know that window.Prompt() would stop the execution, but I want the input to be a part of the page. Looking for vanilla JavaScript solutions.
You want to connect asynchronous JavaScript APIs to synchronous WebAssembly APIs. This is a common problem for which WebAssembly itself doesn't yet have a built-in solution, but there are some at the tooling level. In particular, you might want to take a look at Asyncify - I've written a detailed post on how it helps solve those use-cases and how to use it here: https://web.dev/asyncify/
Particularly for WASI, the post also showcases a demo that connects fd_read and other synchronous operations to async APIs from File System Access. You can find live demo at https://wasi.rreverser.com/ and its code at https://github.com/GoogleChromeLabs/wasi-fs-access.
For example, here is an implementation of the fd_read function you're interested in, that uses async-await to wait for asynchronous API: https://github.com/GoogleChromeLabs/wasi-fs-access/blob/4c2d29fdfe79abb9b48bd44e296c2019f55d0eec/src/bindings.ts#L449-L461
You should be able to adapt same approach, Asyncify tooling and potentially even the same code to your example using setTimeout or input events.

Is there any way for a Javascript object to know when it is going out of scope?

In languages such as C++, being able to detect when an object goes out of scope is extremely useful in a wide range of use-cases (e.g., smart pointers, file access, mutexes, profiling). And I'm not talking about memory management vs. garbage collection here, as that is a different topic.
Consider a simple C++ example like this
class Profiler {
uint64 m_startTime;
Profiler::Profiler() :
m_startTime(someTimeFunction()) {
}
Profiler::~Profiler() {
const uint64 diff = someTimeFunction() - m_time;
printf("%llu ms have passed\n", diff);
}
}
We could do something like this
{
Profiler p; // creation of Profiler object on the stack
// Do some calculations
} // <- p goes out of scope and you will get the output message reporting the number of ms that have been passed
This clean example demonstrates the power of scoping. As a programmer I don't need to worry about manually calling methods; the rule is simple: the destructor is called once the object goes out of scope and I can make use of that.
In Javascript however there is no way, that I'm aware of at least, of mimicking this behavior. In the old days when let and const were not part of the language, this would've been less useful and even dangerous as one would never really know when a var went out of scope.
But when block scoping was added, I expected the programmer to get a bit of control over when an object goes out of scope. For instance a special method (like contructor()) that is called when it goes out of scope. But AFAIK this hasn't been done. Is there any reason as to why this isn't added?
Now we manually have to call a function, which defeats the whole purpose of block scoping for said use-case.
This would be the Javascript equivalent of the above C++ code:
class Profiler {
constructor() {
this.m_time = new Date().getTime();
}
report() {
const diff = new Date().getTime() - this.m_time;
console.log(`${diff} ms have passed`);
}
}
And the use-case would be this
{
const p = new Profiler;
// Do some calculations
p.report();
}
Which obviously is a lot less ideal. Because if I'm not careful in placing that p.report(); at the end of the block, the reporting is incorrect.
If there is a way to still do this, please let me know.
[EDIT]
The closest thing I came up with is this 'hackery'. I used async, but obviously that can be left out if all code in the block is synchronous.
// My profiler class
class Profiler {
constructor() {
this.m_time = new Date().getTime();
}
// unscope() is called when the instance 'goes out of scope'
unscope() {
const diff = new Date().getTime() - this.m_time;
console.log(`took ${diff} ms`);
}
};
async function Scope(p_instance, p_func) {
await p_func();
p_instance.unscope();
};
await Scope(new Profiler(), async () =>
{
console.log("start scope")
await sleep(100);
await Scope(new Profiler(), async () =>
{
await sleep(400);
});
await sleep(3000);
console.log("end scope")
});
console.log("after scope")
which resulted in
start scope
took 401 ms
end scope
took 3504 ms
after scope
Which is what I expected. But again
await Scope(new Profiler(), async () =>
{
});
is far less intuitive than simply
{
const p = new Profiler();
}
But at least I'm able to do what I want. If anyone else has a better solution, please let me know.
A destructor would be one way to achieve it but sadly Javascript doesn't support it. The reason you automatically get profile report after it goes out of scope is probably because it's destructor is called in C++. As of now there is no destructor in Javascript. Doing it manuelly as you showed would be the easiest way to achieve the C++ wanted behavior.
Different languages bring different syntax and capabilities to do stuff. Sometimes you need to adapt to the differences. C++ code is less prone to error in your case but I do not think there is an easy way to achieve the wanted behavior

How to run an infinite blocking process in NodeJS?

I have a set of API endpoints in Express. One of them receives a request and starts a long running process that blocks other incoming Express requests.
My goal to make this process non-blocking. To understand better inner logic of Node Event Loop and how I can do it properly, I want to replace this long running function with my dummy long running blocking function that would start when I send a request to its endpoint.
I suppose, that different ways of making the dummy function blocking could cause Node manage these blockings differently.
So, my question is - how can I make a basic blocking process as a function that would run infinitely?
You can use node-webworker-threads.
var Worker, i$, x$, spin;
Worker = require('webworker-threads').Worker;
for (i$ = 0; i$ < 5; ++i$) {
x$ = new Worker(fn$);
x$.onmessage = fn1$;
x$.postMessage(Math.ceil(Math.random() * 30));
}
(spin = function(){
return setImmediate(spin);
})();
function fn$(){
var fibo;
fibo = function(n){
if (n > 1) {
return fibo(n - 1) + fibo(n - 2);
} else {
return 1;
}
};
return this.onmessage = function(arg$){
var data;
data = arg$.data;
return postMessage(fibo(data));
};
}
function fn1$(arg$){
var data;
data = arg$.data;
console.log("[" + this.thread.id + "] " + data);
return this.postMessage(Math.ceil(Math.random() * 30));
}
https://github.com/audreyt/node-webworker-threads
So, my question is - how can I make a basic blocking process as a function that would run infinitely?
function block() {
// not sure why you need that though
while(true);
}
I suppose, that different ways of making the dummy function blocking could cause Node manage these blockings differently.
Not really. I can't think of a "special way" to block the engine differently.
My goal to make this process non-blocking.
If it is really that long running you should really offload it to another thread.
There are short cut ways to do a quick fix if its like a one time thing, you can do it using a npm module that would do the job.
But the right way to do it is setting up a common design pattern called 'Work Queues'. You will need to set up a queuing mechanism, like rabbitMq, zeroMq, etc. How it works is, whenever you get a computation heavy task, instead of doing it in the same thread, you send it to the queue with relevant id values. Then a separate node process commonly called a 'worker' process will be listening for new actions on the queue and will process them as they arrive. This is a worker queue pattern and you can read up on it here:
https://www.rabbitmq.com/tutorials/tutorial-one-javascript.html
I would strongly advise you to learn this pattern as you would come across many tasks that would require this kind of mechanism. Also with this in place you can scale both your node servers and your workers independently.
I am not sure what exactly your 'long processing' is, but in general you can approach this kind of problem in two different ways.
Option 1:
Use the webworker-threads module as #serkan pointed out. The usual 'thread' limitations apply in this scenario. You will need to communicate with the Worker in messages.
This method should be preferable only when the logic is too complicated to be broken down into smaller independent problems (explained in option 2). Depending on complexity you should also consider if native code would better serve the purpose.
Option 2:
Break down the problem into smaller problems. Solve a part of the problem, schedule the next part to be executed later, and yield to let NodeJS process other events.
For example, consider the following example for calculating the factorial of a number.
Sync way:
function factorial(inputNum) {
let result = 1;
while(inputNum) {
result = result * inputNum;
inputNum--;
}
return result;
}
Async way:
function factorial(inputNum) {
return new Promise(resolve => {
let result = 1;
const calcFactOneLevel = () => {
result = result * inputNum;
inputNum--;
if(inputNum) {
return process.nextTick(calcFactOneLevel);
}
resolve(result);
}
calcFactOneLevel();
}
}
The code in second example will not block the node process. You can send the response when returned promise resolves.

Emscripten sandwiched by asynchronous Javascript Code

I'm trying to use Emscripten to write a Software to run in browser but also on other architectures (e.g. Android, PC-standalone app).
The Software structure is something like this:
main_program_loop() {
if (gui.button_clicked()) {
run_async(some_complex_action, gui.text_field.to_string())
}
if (some_complex_action_has_finished())
{
make_use_of(get_result_from_complex_action());
}
}
some_complex_action(string_argument)
{
some_object = read_local(string_argument);
interm_res = simple_computation(some_object);
other_object = expensive_computation(interm_res);
send_remote(some_object.member_var, other_object);
return other_object.member_var;
}
Let's call main_program_loop the GUI or frontend, some_complex_action the intermediate layer, and read_local, send_remode and expensive_computation the backend or lower layer.
Now the frontend and backend would be architecture specific (e.g. for Javascript read_local could use IndexDB, send_remote could use fetch),
but the intermediate layer should make up more then 50% of the code (that's why I do not want to write it two times in two different languages, and instead write it once in C and transpile it to Javascript, for Android I would use JNI).
Problems come in since in Javascript the functions on the lowest layer (fetch etc) run asyncronously (return a promise or require a callback).
One approach I tried was to use promises and send IDs through the intermediate layer
var promises = {};
var last_id = 0;
handle_click() {
var id = Module.ccall('some_complex_action', 'number', ['string'], [text_field.value]);
promises[id].then((result) => make_us_of(result));
}
recv_remote: function(str) {
promises[last_id] = fetch(get_url(str)).then((response) => response.arrayBuffer());
last_id += 1;
return last_id - 1;
}
It works for the simple case of
some_complex_action(char *str)
{
return recv_remote(str);
}
But for real cases it seem to be getting really complicated, maybe impossible. (I tried some approach where I'd given every function a state and every time a backend function finishes, the function is recalled and advances it's state or so, but the code started getting complicated like hell.) To compare, if I was to call some_complex_action from C or Java, I'd just call it in a thread separate from the GUI thread, and inside the thread everything would happen synchronously.
I wished I could just call some_complex_action from an async function and put await inside recv_remote but of cause I can put await only directly in the async function, not in some function called down the line. So that idea did not work out either.
Ideally if somehow I could stop execution of the intermediate Emscripten transpiled code until the backend function has completed, then return from the backend function with the result and continue executing the transpiled code.
Has anyone used Emterpreter and can imagine that it could help me get to my goal?
Any ideas what I could do?

Node.js and Mutexes

I'm wondering if mutexes/locks are required for data access within Node.js. For example, lets say I've created a simple server. The server provides a couple protocol methods to add to and remove from an internal array. Do I need to protect the internal array with some type of mutex?
I understand Javascript (and thus Node.js) is single threaded. I'm just not clear on how events are handled. Do events interrupt? If that is the case, my app could be in the middle of reading the array, get interrupted to run an event callback which changes the array, and then continue processing the array which has now been changed by the event callback.
Locks and mutexes are indeed necessary sometimes, even if Node.js is single-threaded.
Suppose you have two files that must have the same content and not having the same content is considered an inconsistent state. Now suppose you need to change them without blocking the server. If you do this:
fs.writeFile('file1', 'content', function (error) {
if (error) {
// ...
} else {
fs.writeFile('file2', 'content', function (error) {
if (error) {
// ...
} else {
// ready to continue
}
});
}
});
you fall in an inconsistent state between the two calls, when another function in the same script may be able to read the two files.
The rwlock module is perfect to handle these cases.
I'm wondering if mutexes/locks are required for data access within Node.js.
Nope! Events are handled the moment there's no other code to run, this means there will be no contention, as only the currently running code has access to that internal array. As a side-effect of node being single-threaded, long computations will block all other events until the computation is done.
I understand Javascript (and thus Node.js) is single threaded. I'm just not clear on how events are handled. Do events interrupt?
Nope, events are not interrupted. For example, if you put a while(true){} into your code, it would stop any other code from being executed, because there is always another iteration of the loop to be run.
If you have a long-running computation, it is a good idea to use process.nextTick, as this will allow it to be run when nothing else is running (I'm fuzzy on this: the example below shows that I'm probably right about it running uninterrupted, probably).
If you have any other questions, feel free to stop into #node.js and ask questions. Also, I asked a couple people to look at this and make sure I'm not totally wrong ;)
var count = 0;
var numIterations = 100;
while(numIterations--) {
process.nextTick(function() {
count = count + 1;
});
}
setTimeout(function() {
console.log(count);
}, 2);
//
//=> 100
//
Thanks to AAA_awright of #node.js :)
I was looking for solution for node mutexes. Mutexes are sometimes necessary - you could be running multiple instances of your node application and may want to assure that only one of them is doing some particular thing. All solutions I could find were either not cross-process or depending on redis.
So I made my own solution using file locks: https://github.com/Perennials/mutex-node
Mutexes are definitely necessary for a lot of back end implementations. Consider a class where you need to maintain synchronicity of async execution by constructing a promise chain.
let _ = new WeakMap();
class Foobar {
constructor() {
_.set(this, { pc : Promise.resolve() } );
}
doSomething(x) {
return new Promise( (resolve,reject) => {
_.get(this).pc = _.get(this).pc.then( () => {
y = some value gotten asynchronously
resolve(y);
})
})
}
}
How can you be sure that a promise is not left dangling via race condition? It's frustrating that node hasn't made mutexes native since javascript is so inherently asynchronous and bringing third party modules into the process space is always a security risk.

Categories