Trying to Understand Javascript Closures + Memory Leaks - javascript

I've been reading up a lot on closures in Javascript. I come from a more traditional (C, C++, etc) background and understand call stacks and such, but I am having troubles with memory usage in Javascript. Here's a (simplified) test case I set up:
function updateLater(){
console.log('timer update');
var params = new Object();
for(var y=0; y<1000000; y++){
params[y] = {'test':y};
}
}
Alternatively, I've also tried using a closure:
function updateLaterClosure(){
return (function(){
console.log('timer update');
var params = new Object()
for(var y=0; y<1000000; y++)
{
params[y] = {'test':y};
}
});
}
Then, I set an interval to run the function...
setInterval(updateLater, 5000); // or var c = updateLaterClosure(); setInterval(c,5000);
The first time the timer runs, the Memory Usage jumps form 50MB to 75MB (according to Chrome's Task Manager). The second time it goes above 100MB. Occasionally it drops back down a little, but never below 75MB.
Check it out yourself: https://local.phazm.com:4435/Streamified/extension/branches/lib/test.html
Clearly, params is not being fully garbage collected, because the memory from the first timer call is not being freed... yet, neither is it adding 25MB of memory on EACH call, so it is not as if the garbage collection is NEVER happening... it almost seems as though one instance of "params" is always being kept around. I've tried setting up a sub-closure and other things... no dice.
What is MOST disturbing, though, is that the memory usage trends upwards. It might "just" be 75MB for now, but leave it running for long enough (overnight) and it'll get to 500 MB.
Ideas?
Thanks!

Allocating 25mb causes a GC to happen. This GC cleans up the last instance but of course not the current. So you always have one instance around.
GC does not happen when the program is idle. It does not happen between your timer calls so the memory stays around.

That is not even a closure. A closure is when you return something from a function, like an array, function, object or anything that can contain references, and it carries with it all the local members of that function.
what you have there is just a case of a very long loop that is building a very big object. and maybe your memory does not get reclaimed as fast as you are building the huge objects.

Related

Javascript Garbage Collection. Creating Objects and Vars?

I am currently building a game in Javascript. After some testing, I was beginning to notice occasional lag which could only be caused by the GC kicking in. II decided to run a profile on it. The result shows that the GC is in fact the culprit:
I read that creating new objects causes a lot of GC. I am wondering if something like this:
var x = [];
Creates any Garbage as well, since primitive types in Java don't do this. Since there are no real types in Javascript, I am unsure. Furthermore, which of these is the best for creating the least amount of garbage:
Option 1:
function do() {
var x = [];
...
}
Option 2:
var x = [];
function do() {
x = [];
...
}
Option 3:
function do() {
x = [];
...
}
Or Option 4:
function do() {
var x = [];
...
delete x;
}
Option 5:
var x = [];
function do() {
x.length = 0;
...
}
The do function is called 30 Times a Second in my case. And it runs several operations on the array.
I am wondering this, because I just made all of my variables global to try to prevent them from being collected by the GC, but the GC did not change much.
Could you also provide some common examples of things that create a lot of Garbage and some alternatives.
Thank you.
Can you also show the memory of timeline? If you have GC issues those should be blatantly obvious there as you would see a sawtooth wave graph. Whenever the graph drops, that's the GC kicking in, blocking your thread to empty our the trash and that's the main cause of memory related freezing
Example of sawtooth wave graph (the blue graph is memory):
Generally speaking, which object instantiation do you use does not matter than much since the memory impact of a [] is minimal, what you're interested in is the content of the arrays, but to go through your options:
Option 1: This is generally OK, with one consideration: Closures. You should try to avoid closures as much as possible since they're generally the main cause for GC.
Option 2: Avoid referencing things outside of your scope, it doesn't help memory-wise and it makes your app a bit slower since it has to go up the closure chain to find the match. No benefit to doing this
Option 3: never ever do this, you always want to define x somewhere otherwise you're purposely leaking into the global scope and therefore it will potentially never be GCed
Option 4: This is actually an interesting one. normally delete x does not do anything since delete only acts on properties of an object. In case you didn't know, delete actually returns a boolean that signifies whether the object has been deleted or not, so you can run this example in the chrome console:
function tmp () {
var a = 1;
delete a; // false
console.log('a=', a) // 1
b = 2;
delete b; // true !!!
console.log('b=', b) // Exception
}
tmp();
What the?! well when you say b = 2 (without the var) it's the same thing as writing window.b = 2 so when you're delete b, you're basically doing delete window.b which satisfy the "only delete property clause".
Still, DON'T DO THIS!
Option 5: This one actually saves you a tiny tiny bit of memory since it doesn't have to GC x, HOWEVER: it does have to GC all the content of x which is generally much greater in size that x itself therefore it won't make a difference
This is a fantastic article if you want to learn more about memory profiling + common memory performance pitfalls: http://www.smashingmagazine.com/2012/11/writing-fast-memory-efficient-javascript/

Deleting large Javascript objects when process is running out of memory

I'm a novice to this kind of javascript, so I'll give a brief explanation:
I have a web scraper built in Nodejs that gathers (quite a bit of) data, processes it with Cheerio (basically jQuery for Node) creates an object then uploads it to mongoDB.
It works just fine, except for on larger sites. What's appears to be happening is:
I give the scraper an online store's URL to scrape
Node goes to that URL and retrieves anywhere from 5,000 - 40,000 product urls to scrape
For each of these new URLs, Node's request module gets the page source then loads up the data to Cheerio.
Using Cheerio I create a JS object which represents the product.
I ship the object off to MongoDB where it's saved to my database.
As I say, this happens for thousands of URLs and once I get to, say, 10,000 urls loaded I get errors in node. The most common is:
Node: Fatal JS Error: Process out of memory
Ok, here's the actual question(s):
I think this is happening because Node's garbage cleanup isn't working properly. It's possible that, for example, the request data scraped from all 40,000 urls is still in memory, or at the very least the 40,000 created javascript objects may be. Perhaps it's also because the MongoDB connection is made at the start of the session and is never closed (I just close the script manually once all the products are done). This is to avoid opening/closing the connection it every single time I log a new product.
To really ensure they're cleaned up properly (once the product goes to MongoDB I don't use it anymore and can be deleted from memory) can/should I just simply delete it from memory, simply using delete product?
Moreso (I'm clearly not across how JS handles objects) if I delete one reference to the object is it totally wiped from memory, or do I have to delete all of them?
For instance:
var saveToDB = require ('./mongoDBFunction.js');
function getData(link){
request(link, function(data){
var $ = cheerio.load(data);
createProduct($)
})
}
function createProduct($)
var product = {
a: 'asadf',
b: 'asdfsd'
// there's about 50 lines of data in here in the real products but this is for brevity
}
product.name = $('.selector').dostuffwithitinjquery('etc');
saveToDB(product);
}
// In mongoDBFunction.js
exports.saveToDB(item){
db.products.save(item, function(err){
console.log("Item was successfully saved!");
delete item; // Will this completely delete the item from memory?
})
}
delete in javascript is NOT used to delete variables or free memory. It is ONLY used to remove a property from an object. You may find this article on the delete operator a good read.
You can remove a reference to the data held in a variable by setting the variable to something like null. If there are no other references to that data, then that will make it eligible for garbage collection. If there are other references to that object, then it will not be cleared from memory until there are no more references to it (e.g. no way for your code to get to it).
As for what is causing the memory accumulation, there are a number of possibilities and we can't really see enough of your code to know what references could be held onto that would keep the GC from freeing up things.
If this is a single, long running process with no breaks in execution, you might also need to manually run the garbage collector to make sure it gets a chance to clean up things you have released.
Here's are a couple articles on tracking down your memory usage in node.js: http://dtrace.org/blogs/bmc/2012/05/05/debugging-node-js-memory-leaks/ and https://hacks.mozilla.org/2012/11/tracking-down-memory-leaks-in-node-js-a-node-js-holiday-season/.
JavaScript has a garbage collector that automatically track which variable is "reachable". If a variable is "reachable", then its value won't be released.
For example if you have a global variable var g_hugeArray and you assign it a huge array, you actually have two JavaScript object here: one is the huge block that holds the array data. Another is a property on the window object whose name is "g_hugeArray" that points to that data. So the reference chain is: window -> g_hugeArray -> the actual array.
In order to release the actual array, you make the actual array "unreachable". you can break either link the above chain to achieve this. If you set g_hugeArray to null, then you break the link between g_hugeArray and the actual array. This makes the array data unreachable thus it will be released when the garbage collector runs. Alternatively, you can use "delete window.g_hugeArray" to remove property "g_hugeArray" from the window object. This breaks the link between window and g_hugeArray and also makes the actual array unreachable.
The situation gets more complicated when you have "closures". A closure is created when you have a local function that reference a local variable. For example:
function a()
{
var x = 10;
var y = 20;
setTimeout(function()
{
alert(x);
}, 100);
}
In this case, local variable x is still reachable from the anonymous time out function even after function "a" has returned. If without the timeout function, then both local variable x and y will become unreachable as soon as function a returns. But the existence of the anonymous function change this. Depending on how the JavaScript engine is implemented, it may choose to keep both variable x and y (because it doesn't know whether the function will need y until the function actually runs, which occurs after function a returns). Or if it is smart enough, it can only keep x. Imagine that if both x and y points to big things, this can be a problem. So closure is very convenient but at times it is more likely to cause memory issues and can make it more difficult to track memory issues.
I faced same problem in my application with similar functionality. I've been looking for memory leaks or something like that. The size of consumed memory my process has reached to 1.4 GB and depends on the number of links that must be downloaded.
The first thing I noticed was that after manually running the Garbage Collector, almost all memory was freed. Each page that I downloaded took about 1 MB, was processed and stored in the database.
Then I install heapdump and looked at the snapshot of the application. More information about memory profiling you can found at Webstorm Blog.
My guess is that while the application is running, the GC does not start. To do this, I began to run application with the flag --expose-gc, and began to run GC manually at the time of implementation of the program.
const runGCIfNeeded = (() => {
let i = 0;
return function runGCIfNeeded() {
if (i++ > 200) {
i = 0;
if (global.gc) {
global.gc();
} else {
logger.warn('Garbage collection unavailable. Pass --expose-gc when launching node to enable forced garbage collection.');
}
}
};
})();
// run GC check after each iteration
checkProduct(product._id)
.then(/* ... */)
.finally(runGCIfNeeded)
Interestingly, if you do not use const, let, var, etc when you define something in the global scope, it seems be an attribute of the global object, and deleting returns true. This could cause it to be garbage collected. I tested it like this and it seems to have the intended impact on my memory usage, please let me know if this is incorrect or if you got drastically different results:
x = [];
process.memoryUsage();
i = 0;
while(i<1000000) {
x.push(10.5);
}
process.memoryUsage();
delete x
process.memoryUsage();

Pattern for no-allocation loops in JavaScript?

Say we're writing a browser app where smooth animation is critical. We know garbage collection can block execution long enough to cause a perceptible freeze, so we need to minimize the amount of garbage we create. To minimize garbage, we need to avoid memory allocation while the main animation loop is running.
But that execution path is strewn with loops:
var i = things.length; while (i--) { /* stuff */ }
for (var i = 0, len = things.length; i < len; i++) { /* stuff */ }
And their var statements allocate memory can allocate memory that the garbage collector may remove, which we want to avoid.
So, what is a good strategy for writing loop constructs in JavaScript that avoid allocating memory each one? I'm looking for a general solution, with pros and cons listed.
Here are three ideas I've come up with:
1.) Declare "top-level" vars for index and length; reuse them everywhere
We could declare app.i and app.length at the top, and reuse them again and again:
app.i = things.length; while (app.i--) { /* stuff */ }
for (app.i = 0; app.i < app.length; app.i++) { /* stuff */ }
Pros: Simple enough to implement. Cons: Performance hit by dereferencing the properties might mean a Pyrrhic victory. Might accidentally misuse/clobber properties and cause bugs.
2.) If array length is known, don't loop -- unroll
We might be guaranteed that an array has a certain number of elements. If we do know what the length will be in advance, we could manually unwind the loop in our program:
doSomethingWithThing(things[0]);
doSomethingWithThing(things[1]);
doSomethingWithThing(things[2]);
Pros: Efficient. Cons: Rarely possible in practice. Ugly? Annoying to change?
3.) Leverage closures, via the factory pattern
Write a factory function that returns a 'looper', a function that performs an action on the elements of a collection (a la _.each). The looper keeps private reference to index and length variables in the closure that is created. The looper must reset i and length each time it's called.
function buildLooper() {
var i, length;
return function(collection, functionToPerformOnEach) { /* implement me */ };
}
app.each = buildLooper();
app.each(things, doSomethingWithThing);
Pros: More functional, more idiomatic? Cons: Function calls add overhead. Closure access has shown to be slower than object look-up.
And their var statements allocate memory can allocate memory that the garbage collector may remove, which we want to avoid.
This is slightly misinformed. Simply using var does not allocate memory on the heap. When a function is called, each variable used in the function is allocated in advance on the stack. When the function completes execution, the stack frame is popped and the memory is immediately dereferenced.
Where garbage collection-related memory concerns become a problem is when you're allocating objects on the heap. That means any of the following:
Closures
Event listeners
Arrays
Objects
For the most part, anything where typeof foo returns "function" or "object" (or any of the new ES6 typeof return values) will generate an object on the heap. There's probably more that I can't think of right now.
The thing about objects on the heap is that they can refer to other objects on the heap. So for instance:
var x = {};
x.y = {};
delete x;
In the example above, the browser simply can't deallocate the slot for x, because the value contained within it is of variable size. It lives on the heap, where it could then point to other objects (in this case, the object at x.y). Another possibility is that there's a second reference to the same object:
var x = {};
window.foo = x;
delete x;
The browser simply can't remove the object at x from memory, since something else is still pointed at it.
So long story short, don't worry about removing variables, because they work perfectly well and are totally performant. Heap allocations are the real enemy when it comes to garbage collection, but even a few small heap allocations here and there won't hurt most apps.

Can I use setTimeout to create a cheap infinite loop?

var recurse = function(steps, data, delay) {
if(steps == 0) {
console.log(data.length)
} else {
setTimeout(function(){
recurse(steps - 1, data, delay);
}, delay);
}
};
var myData = "abc";
recurse(8000, myData, 1);
What troubles me with this code is that I'm passing a string on 8000 times. Does this result in any kind of memory problem?
Also, If I run this code with node.js, it prints immediately, which is not what I would expect.
If you're worried about the string being copied 8,000 times, don't be, there's only one copy of the string; what gets passed around is a reference.
The bigger question is whether the object created when you call a function (called the "variable binding object" of the "execution context") is retained, because you're creating a closure, and which has a reference to the variable object for the context and thus keeps it in memory as long as the closure is still referenced somewhere.
And the answer is: Yes, but only until the timer fires, because once it does nothing is referencing the closure anymore and so the garbage collector can reclaim them both. So you won't have 8,000 of them outstanding, just one or two. Of course, when and how the GC runs is up to the implementation.
Curiously, just earlier today we had another question on a very similar topic; see my answer there as well.
It prints immediately because the program executes "immediately". On my Intel i5 machine, the whole operation takes 0.07s, according to time node test.js.
For the memory problems, and wether this is a "cheap infinite loop", you'll just have to experiment and measure.
If you want to create an asynchronous loop in node, you could use process.nextTick. It will be faster than setTimeout(func, 1).
In general Javascript does not support tail call optimization, so writing recursive code normally runs the risk of causing a stack overflow. If you use setTimeout like this, it effectively resets the call stack, so stack overflow is no longer a problem.
Performance will be the problem though, as each call to setTimeout generally takes a fair bit of time (around 10 ms), even if you set delay to 0.
The '1' is 1 millisecond. It might as well be a for loop. 1 second is 1000. I recently wrote something similar checking on the progress of a batch of processes on the back end and set a delay of 500. Older browsers wouldn't see any real difference between 1 and about 15ms if I remember correctly. I think V8 might actually process faster than that.
I don't think garbage collection will be happening to any of the functions until the last iteration is complete but these newer generations of JS JIT compilers are a lot smarter than the ones I know more about so it's possible they'll see that nothing is really going on after the timeout and pull those params from memory.
Regardless, even if memory is reserved for every instance of those parameters, it would take a lot more than 8000 iterations to cause a problem.
One way to safeguard against potential problems with more memory intensive parameters is if you pass in an object with the params you want. Then I believe the params will just be a reference to a set place in memory.
So something like:
var recurseParams ={ steps:8000, data:"abc", delay:100 } //outside of the function
//define the function
recurse(recurseParams);
//Then inside the function reference like this:
recurseParams.steps--

Does a this setTimeout create any memory leaks?

Does this code create any memory leaks? Or is there anything wrong with the code?
HTML:
<div id='info'></div>
Javascript:
var count = 0;
function KeepAlive()
{
count++;
$('#info').html(count);
var t=setTimeout(KeepAlive,1000);
}
KeepAlive();
Run a test here:
http://jsfiddle.net/RjGav/
You should probably use setInterval instead:
var count = 0;
function KeepAlive() {
$('#info').html(++count);
}
var KAinterval = setInterval(KeepAlive, 1000);
You can cancel it if you ever need to by calling clearInterval(KAinterval);.
I think this will leak because the successive references are never released. That is, the first call immediately creates a closure by referencing the function from within itself. When it calls itself again, the new reference is from the instance created on the first iteration, so the first one could again never be released.
You could test this theory pretty easily by changing the interval to something very small and watch the memory in chrome...
(edit) theory tested with your fiddle, actually, I'm wrong it doesn't leak, at least in Chrome. But that's no guarantee some other browser (e.g. older IE) isn't as good at garbage collecting.
But whether or not it leaks, there's no reason not to use setInterval instead.
This should not create a leak, because the KeepAlive function will complete in a timely manner and thus release all variables in that function. Also, in your current code, there is no reason to set the t var as it is unused. If you want to use it to cancel your event, you should declare it in a higher scope.
Other than that, I see nothing "wrong" with your code, but it really depends on what you are trying to do. For example, if you are trying to use this as a precise timer, it will be slower than a regular clock. Thus, you should either consider either setting the date on page load and calculating the difference when you need it or using setInterval as g.d.d.c suggested.
It is good to have setInterval method like g.d.d.c mentioned.
Moreover, it is better to store $('#info') in a variable outside the function.
Checkout http://jsfiddle.net/RjGav/1/

Categories