how to 'mark' an Object for garbage collection in NodeJS - javascript

I have a recursive function:
const fn = async (val) => {
let obj = await make_some_api_call();
// example response {a: 'apple', b : 'bbb', c: 'ccc', d: 'ddd'};
if ('a' in obj) {
const var1 = obj.a;
obj = null;
return fn(var1);
}
}
I want the Object obj to be gc'ed after each run.
I have a Object property value assigned to a local variable (var1), will setting obj=null force it to be gc'ed in next cycle?
Meaning after each run will Object obj get gc'ed?
If not, how to achieve this?

Since some commentors have missed the fact that this is potentially recursive code, I want to point out that this answer is written in that recursive context. If it wasn't recursive, then setting obj = null would not be necessary because the variable obj would immediately be eligible for garabage collection anyway as soon as the function returned.
will setting obj=null force it to be gc'ed in next cycle?
Assuming that no other code such as code inside of await make_some_api_call(); has any persistent references to obj, then setting obj = null will clear your one reference to that variable and will make it "eligible" for garbage collection at the point in time where nodejs next runs a garbage collection cycle.
That may or may not be "after each run". You don't really describe what "after each run" means as it pertains to your code, but in any case, when the garbage collector runs is not precisely defined.
Nodejs will run a GC cycle when it thinks it needs to and when it appears to have time to. Garbage collection can sometimes be a bit lazy. If you're busy doing a lot of things, then nodejs will attempt to not get in the way of what your code is doing just to run the GC. It will try to wait until the event loop isn't really doing anything. There are exceptions to this, but to avoid impacting run-time performance, it looks for opportunities to run GC when the event loop is idle.
In your recursive code, you do have an await so assuming that takes some amount of time to resolve its promise, then that next await could be an opportunity for nodejs to run the GC cycle to clean up the obj from the prior recursive call.
I should also point out that code like this can also be written with some sort of loop instead of using recursion and that sometimes simplifies things. For one, it prevents stack-build up. Since there is no complicated local function context or lots of function arguments, this could easily be turned into a while(more) kind of loop with the await and some sort of condition test inside the loop that either sets the more flag or uses break or return to stop the looping when done. If this may recurse many times, just avoiding the stack build-up (which also includes a promise for each async function called recursively) could be beneficial.
Here's some similar pseudo-code that avoids the recursion and automatically reuses the obj variable (freeing the reference to the prior object to it is available for GC):
const fn = async (val) => {
// value for the first one comes from the function argument,
// subsequent iterations get the value from the prior call
let var1 = val;
let obj;
while (true) {
obj = await make_some_api_call(var1);
if (!('a' in obj)) {
// all done
break;
}
// run again using obj.a
var1 = obj.a;
}
}

Related

How to "help" the Garbage Collector of Node JS to free unused variables that are on the Heap?

Should I help the GC to free unused variables ?
The problem is that something eats memory on my program ,either an object or an array.
I see the memory getting bigger and bigger only on a certain Load Stress on the Backend (Kubernetes on the Cloud , not locally).
The question is , should I set any array that I don't need to undefined once I'm done using it in any place on my code ?
For example:
async emitMessageToKafka(......) {
let someArray: any[] = [];
// manipulations of "someArray"
// Emit to Kafka "someArray"
someArray = undefined; // Correct of not ??
// ...
// ...
}
Should I help the GC to free unused variables?
It should be left to the GC to decide whether to free memory to which there is no more (strong) reference.
should I set any array that I don't need to undefined once I'm done using it in any place on my code?
No, not in general.
If there isn't much more time or memory-consuming activity happening in the execution context that has the scope of that variable, then it is useless. For instance, here it would be useless:
async func() {
let someArray = [];
// ... working with someArray ...
await someAsyncFunc(someArray);
someArray = undefined;
return 1 + 2 * 3;
}
The reference to that array is anyway lost when the function executes that return, so there is no gain. Either way, the GC can free the related memory.
However, it could help when the rest of the execution is significant, and will still have that variable in scope, but your code will not use that variable anymore.
For instance:
async func() {
let someArray = [];
// ... working with someArray ...
await someAsyncFunc(someArray);
someArray = undefined;
await someAsyncFunc2();
// ... some more code that doesn't use someArray ...
// ...
}
That could be useful, as it allows the GC to already free the memory for the array (if there are no other references to it) while someAsyncFunc2 has not yet resolved. But in that case, it is better practice to limit the scope of your variable, like so:
async func() {
{
let someArray = [];
// ... working with someArray ...
await someAsyncFunc(someArray);
}
await someAsyncFunc2();
// ... some more code that CANNOT use someArray ...
// ...
}

Lifespan of JS closure context objects?

Background
I'm trying to port the elixir's actor model language primitives into JS. I came up with a solution (in JS) to emulate the receive elixir keyword, using a "receiver" function and a generator.
Here's a simplified implementation and demo to show you the idea.
APIs:
type ActorRef: { send(msg: any): void }
type Receiver = (msg: any) => Receiver
/**
* `spawn` takes a `initializer` and returns an `actorRef`.
* `initializer` is a factory function that should return a `receiver` function.
* `receiver` is called to handle `msg` sent through `actorRef.send(msg)`
*/
function spawn(initializer: () => Receiver): ActorRef
Demo:
function* coroutine(ref) {
let result
while (true) {
const msg = yield result
result = ref.receive(msg)
}
}
function spawn(initializer) {
const ref = {}
const receiver = initializer()
ref.receive = receiver
const gen = coroutine(ref)
gen.next()
function send(msg) {
const ret = gen.next(msg)
const nextReceiver = ret.value
ref.receive = nextReceiver
}
return { send }
}
function loop(state) {
console.log('current state', state)
return function receiver(msg) {
if (msg.type === 'ADD') {
return loop(state + msg.value)
} else {
console.log('unhandled msg', msg)
return loop(state)
}
}
}
function main() {
const actor = spawn(() => loop(42))
actor.send({ type: 'ADD', value: 1 })
actor.send({ type: 'BLAH', value: 1 })
actor.send({ type: 'ADD', value: 1 })
return actor
}
window.actor = main()
Concern
Above model works. However I'm a bit concern about the performance impact of this approach, I'm not clear about the memory impact of all the closure contexts it creates.
function loop(state) {
console.log('current state', state) // <--- `state` in a closure context <─┐ <─────┐
return function receiver(msg) { // ---> `receiver` closure reference ──┘ │
if (msg.type === 'ADD') { │
return loop(state + msg.value) // ---> create another context that link to this one???
} else {
console.log('unhandled msg', msg)
return loop(state)
}
}
}
loop is the "initializer" that returns a "receiver". In order to maintain a internal state, I keep it (state variable) inside the closure context of the "receiver" function.
When receive a message, the current receiver can modifies the internal state, and pass it to loop and recursively create a new receiver to replace current one.
Apparently the new receiver also has a new closure context that keeps the new state. This process seems to me may create a deep chain of linked context objects that prevents GC?
I know that context objects referenced by closure could be linked under some circumstance. And if they're linked, they are obviously not released before the inner-most closure is released. According to this article V8 optimization is very conservative on this regard, the picture doesn't look pretty.
Questions
I'd be very grateful if someone can answer these questions:
Does the loop example creates deeply linked context objects?
What does the lifespan of context object look like in this example?
If current example does not, can this receiver creates receiver mechanism ends up creating deeply linked context objects under other situation?
If "yes" to question 3, can you please show an example to illustrate such situation?
Follow-Up 1
A follow-up question to #TJCrowder.
Closures are lexical, so the nesting of them follows the nesting of the source code.
Well said, that's something obvious but I missed 😅
Just wanna confirm my understanding is correct, with an unnecessarily complicated example (pls bear with me).
These two are logically equivalent:
// global context here
function loop_simple(state) {
return msg => {
return loop_simple(state + msg.value)
}
}
// Notations:
// `c` for context, `s` for state, `r` for receiver.
function loop_trouble(s0) { // c0 : { s0 }
// return r0
return msg => { // c1 : { s1, gibberish } -> c0
const s1 = s0 + msg.value
const gibberish = "foobar"
// return r1
return msg => { // c2 : { s2 } -> c1 -> c0
const s2 = s1 + msg.value
// return r2
return msg => {
console.log(gibberish)
// c3 is not created, since there's no closure
const s3 = s2 + msg.value
return loop_trouble(s3)
}
}
}
}
However the memory impact is totally different.
step into loop_trouble, c0 is created holding s0; returns r0 -> c0.
step into r0, c1 is created, holding s1 and gibberish, returns r1 -> c1.
step into r1, c2 is created, holding s2, returns r2 -> c2
I believe in the above case, when r2 (the inner most arrow function) is used as the "current receiver", it's actually not just r2 -> c2, but r2 -> c2 -> c1 -> c0, all three context objects are kept (Correct me if I'm already wrong here).
Question: which case is true?
All three context objects are kept simply because of the gibberish variable that I deliberately put in there.
Or they're kept even if I remove gibberish. In other word, the dependency of s1 = s0 + msg.value is enough to link c1 -> c0.
Follow-Up 2
So environment record as a "container" is always retained, as of what "content" is included in the container might vary across engines, right?
A very naive unoptimized approach could be blindly include into the "content" all local variables, plus arguments and this, since the spec didn't say anything about optimization.
A smarter approach could be peek into the nest function and check what exactly is needed, then decide what to include into content. This is referred as "promotion" in the article I linked, but that piece of info dates back to 2013 and I'm afraid it might be outdated.
By any chance, do you have more up-to-date information on this topic to share? I'm particularly interested in how V8 implements such strategy, cus my current work heavily relies on electron runtime.
Note: This answer assumes you're using strict mode. Your snippet doesn't. I recommend always using strict mode, by using ECMAScript modules (which are automatically in strict mode) or putting "use strict"; at the top of your code files. (I'd have to think more about arguments.callee.caller and other such monstrosities if you wanted to use loose mode, and I haven't below.)
Does the loop example creates deeply linked context objects?
Not deeply, no. The inner calls to loop don't link the contexts those calls create to the context where the call to them was made. What matters is where the function loop was created, not where it was called from. If I do:
const r1 = loop(1);
const r2 = r1({type: "ADD", value: 2});
That creates two functions, each of which closes over the context in which it was created. That context is the call to loop. That call context links to the context where loop is declared — global context in your snippet. The contexts for the two calls to loop don't link to each other.
What does the lifespan of context object look like in this example?
Each of them is retained as long as the receiver function referring to it is retained (at least in specification terms). When the receiver function no longer has any references, it and the context are both eligible for GC. In my example above, r1 doesn't retain r2, and r2 doesn't retain r1.
If current example does not, can this receiver creates receiver mechanism ends up creating deeply linked context objects under other situation?
It's hard to rule everything out, but I wouldn't think so. Closures are lexical, so the nesting of them follows the nesting of the source code.
If "yes" to question 3, can you please show an example to illustrate such situation?
N/A
Note: In the above I've used "context" the same way you did in the question, but it's probably worth noting that what's retained is the environment record, which is part of the execution context created by a call to a function. The execution context isn't retained by the closure, the environment record is. But the distinction is a very minor one, I mention it only because if you're delving into the spec, you'll see that distinction.
Re your Follow-Up 1:
c3 is not created, since there's no closure
c3 is created, it's just that it isn't retained after the end of the call, because nothing closes over it.
Question: which case is true?
Neither. All three contexts (c0, c1, and c2) are kept (at least in specification terms) regardless of whether there's a gibberish variable or an s0 parameter or s1 variable, etc. A context doesn't have to have parameters or variables or any other bindings in order to exist. Consider:
// ge = global environment record
function f1() {
// Environment record for each call to f1: e1(n) -> ge
return function f2() {
// Environment record for each call to f2: e2(n) -> e1(n) -> ge
return function f3() {
// Environment record for each call to f3: e3(n) -> e2(n) -> e1(n) -> ge
};
};
}
const f = f1()();
Even though e1(n), e2(n), and e3(n) have no parameters or variables, they still exist (and in the above they'll have at least two bindings, one for arguments and one for this, since those aren't arrow functions). In the code above e1(n) and e2(n) are both retained as long as f continues to refer to the f3 function created by f1()().
At least, that's how the specification defines it. In theory those environment records could be optimized away, but that's a detail of the JavaScript engine implementation. V8 did some closure optimization at one stage but backed off most of it because (as I understand it) it cost more in execution time than it made up for in memory reduction. But even when they were optimizing, I think it was the contents of the environment records they optimized (removing unused bindings, that sort of thing), not whether they continued to exist. See below, I found a blog post from 2018 indicating that they do leave them out entirely sometimes.
Re Follow-Up 2:
So environment record as a "container" is always retained...
In specification terms, yes; that isn't necessarily what engines literally do.
...as of what "content" is included in the container might vary across engines, right?
Right, all the spec dictates is behavior, not how you achieve it. From the section on environment records linked above:
Environment Records are purely specification mechanisms and need not correspond to any specific artefact of an ECMAScript implementation.
...but that piece of info dates back to 2013 and I'm afraid it might be outdated.
I think so, yes, not least because V8 has changed engines entirely since then, replacing Full-codegen and Crankshaft with Ignition and TurboFan.
By any chance, do you have more up-to-date information on this topic to share?
Not really, but I did find this V8 blog post from 2018 which says they do "elide" context allocation in some cases. So there is definitely some optimization that goes on.

NodeJS: Asynchronous map for blocking jobs

Lets take an example where I have a huge array with elements being stringified JSON. I want to iterate over this array and convert all strings to JSON using JSON.parse(which blocks the event-loop).
var arr = ["{...}", "{...}", ... ] //input array
Here is the first approach(may keep the event loop blocked for some time):
var newArr = arr.map(function(val){
try{
var obj = JSON.parse(val);
return obj;
}
catch(err){return {};}
});
The second approach was using async.map method(Will this be more efficient compared to the first approach?):
var newArr = [];
async.map(arr,
function(val, done){
try{
var obj = JSON.parse(val);
done(null, obj);
}
catch(err){done(null, {});}
},
function(err, results){
if(!err)
newArr = results;
}
);
If the second approach is same or almost same then what is efficient way of doing this in node.js.
I came across child processes, will this be a good approach for this problem?
I don't think async.map guarantees a non-blocking handling of a sync function. Though it wraps your function with an asyncify function, I can't find anything in that code that actually makes it non-blocking. It's one of the problems I've encountered with async in the past (but maybe it's improved now)
You could definitely handroll your own solution with child processes, but it might be easier to use something like https://github.com/audreyt/node-webworker-threads
use async.map but wrap the callback in setImmediate(done)
I find the async functions quite convenient but not very efficient; if the mapped computation is very fast, calling done via setImmediate only once every 10 times and calling it directly otherwise will run visibly faster. (The setImmediate breaks up the call stack and yields to the event loop, but the setImmediate overhead is non-negligible)

How do you carry mutating data into callbacks within loops?

I constantly run into problems with this pattern with callbacks inside loops:
while(input.notEnd()) {
input.next();
checkInput(input, (success) => {
if (success) {
console.log(`Input ${input.combo} works!`);
}
});
}
The goal here is to check every possible value of input, and display the ones that pass an asynchronous test after confirmed. Assume the checkInput function performs this test, returning a boolean pass/fail, and is part of an external library and can't be modified.
Let's say input cycles through all combinations of a multi-code electronic jewelry safe, with .next incrementing the combination, .combo reading out the current combination, and checkInput asynchronously checking if the combination is correct. The correct combinations are 05-30-96, 18-22-09, 59-62-53, 68-82-01 are 85-55-85. What you'd expect to see as output is something like this:
Input 05-30-96 works!
Input 18-22-09 works!
Input 59-62-53 works!
Input 68-82-01 works!
Input 85-55-85 works!
Instead, because by the time the callback is called, input has already advanced an indeterminate amount of times, and the loop has likely already terminated, you're likely to see something like the following:
Input 99-99-99 works!
Input 99-99-99 works!
Input 99-99-99 works!
Input 99-99-99 works!
Input 99-99-99 works!
If the loop has terminated, at least it will be obvious something is wrong. If the checkInput function is particularly fast, or the loop particularly slow, you might get random outputs depending on where input happens to be at the moment the callback checks it.
This is a ridiculously difficult bug to track down if you find your output is completely random, and the hint for me tends to be that you always get the expected number of outputs, they're just wrong.
This is usually when I make up some convoluted solution to try to preserve or pass along the inputs, which works if there is a small number of them, but really doesn't when you have billions of inputs, of which a very small number are successful (hint, hint, combination locks are actually a great example here).
Is there a general purpose solution here, to pass the values into the callback as they were when the function with the callback first evaluated them?
If you want to iterate one async operation at a time, you cannot use a while loop. Asynchronous operations in Javascript are NOT blocking. So, what your while loop does is run through the entire loop calling checkInput() on every value and then, at some future time, each of the callbacks get called. They may not even get called in the desired order.
So, you have two options here depending upon how you want it to work.
First, you could use a different kind of loop that only advances to the next iteration of the loop when the async operation completes.
Or, second, you could run them all in a parallel like you were doing and capture the state of your object uniquely for each callback.
I'm assuming that what you probably want to do is to sequence your async operations (first option).
Sequencing async operations
Here's how you could do that (works in either ES5 or ES6):
function next() {
if (input.notEnd()) {
input.next();
checkInput(input, success => {
if (success) {
// because we are still on the current iteration of the loop
// the value of input is still valid
console.log(`Input ${input.combo} works!`);
}
// do next iteration
next();
});
}
}
next();
Run in parallel, save relevant properties in local scope in ES6
If you wanted to run them all in parallel like your original code was doing, but still be able to reference the right input.combo property in the callback, then you'd have to save that property in a closure (2nd option above) which let makes fairly easy because it is separately block scoped for each iteration of your while loop and thus retains its value for when the callback runs and is not overwritten by other iterations of the loop (requires ES6 support for let):
while(input.notEnd()) {
input.next();
// let makes a block scoped variable that will be separate for each
// iteration of the loop
let combo = input.combo;
checkInput(input, (success) => {
if (success) {
console.log(`Input ${combo} works!`);
}
});
}
Run in parallel, save relevant properties in local scope in ES5
In ES5, you could introduce a function scope to solve the same problem that let does in ES6 (make a new scope for each iteration of the loop):
while(input.notEnd()) {
input.next();
// create function scope to save value separately for each
// iteration of the loop
(function() {
var combo = input.combo;
checkInput(input, (success) => {
if (success) {
console.log(`Input ${combo} works!`);
}
});
})();
}
You could use the new feature async await for asynchronous calls, this would let you wait for the checkInput method to finish when inside the loop.
You can read more about async await here
I believe the snippet below achieves what you are after, I created a MockInput function that should mock the behaviour of your input. Note the Async and await keywords in the doAsyncThing method and keep an eye on the console when running it.
Hope this clarifies things.
function MockInput() {
this.currentIndex = 0;
this.list = ["05-30-96", "18-22-09", "59-62-53", "68-82-0", "85-55-85"];
this.notEnd = function(){
return this.currentIndex <= 4;
};
this.next = function(){
this.currentIndex++;
};
this.combo = function(){
return this.list[this.currentIndex];
}
}
function checkInput(input){
return new Promise(resolve => {
setTimeout(()=> {
var isValid = input.currentIndex % 2 > 0; // 'random' true or false
resolve( `Input ${input.currentIndex} - ${input.combo()} ${isValid ? 'works!' : 'did not work'}`);
}, 1000);
});
}
async function doAsyncThing(){
var input = new MockInput();
while(input.notEnd()) {
var result = await checkInput(input);
console.log(result);
input.next();
}
console.log('Done!');
}
doAsyncThing();

How is this JavaScript function caching its results?

After reading over it several times, I still don't understand how this example code from page 76 of Stoyan Stefanov's "JavaScript Patterns" works. I'm not a ninja yet. But to me, it reads like it's only storing an empty object:
var myFunc = function (param) {
if (!myFunc.cache[param]) {
var result = {};
// ... expensive operation ...
myFunc.cache[param] = result;
}
return myFunc.cache[param];
};
// cache storage
myFunc.cache = {};
Unless that unseen "expensive operation" is storing back to result, I don't see anything being retained.
Where are the results being stored?
P.S.: I've read Caching the return results of a function from John Resig's Learning Advanced JavaScript, which is a similar exercise, and I get that one. But the code is different here.
You've answered your own question -- the author assumes that the expensive operation will store its result in result.
The cache would otherwise only contain empty objects, as you've noted.
the results are being stored in the object literal called 'cache'. What the code is specifically doing is:
when myFunc gets executed with a param, the function first checks the cache. If there is a value for 'param' in the cache, it returns it. If not, you do the expensive operation, and then cache the result(with param as the key), so the next time the function is called with the same param the cache is used.
It says // expensive operation - the inference is that you implement code there which assigns variables into the result var, or sets the result var to another Object (which is the result of an expensive operation)

Categories