Can I use setTimeout to create a cheap infinite loop? - javascript

var recurse = function(steps, data, delay) {
if(steps == 0) {
console.log(data.length)
} else {
setTimeout(function(){
recurse(steps - 1, data, delay);
}, delay);
}
};
var myData = "abc";
recurse(8000, myData, 1);
What troubles me with this code is that I'm passing a string on 8000 times. Does this result in any kind of memory problem?
Also, If I run this code with node.js, it prints immediately, which is not what I would expect.

If you're worried about the string being copied 8,000 times, don't be, there's only one copy of the string; what gets passed around is a reference.
The bigger question is whether the object created when you call a function (called the "variable binding object" of the "execution context") is retained, because you're creating a closure, and which has a reference to the variable object for the context and thus keeps it in memory as long as the closure is still referenced somewhere.
And the answer is: Yes, but only until the timer fires, because once it does nothing is referencing the closure anymore and so the garbage collector can reclaim them both. So you won't have 8,000 of them outstanding, just one or two. Of course, when and how the GC runs is up to the implementation.
Curiously, just earlier today we had another question on a very similar topic; see my answer there as well.

It prints immediately because the program executes "immediately". On my Intel i5 machine, the whole operation takes 0.07s, according to time node test.js.
For the memory problems, and wether this is a "cheap infinite loop", you'll just have to experiment and measure.
If you want to create an asynchronous loop in node, you could use process.nextTick. It will be faster than setTimeout(func, 1).

In general Javascript does not support tail call optimization, so writing recursive code normally runs the risk of causing a stack overflow. If you use setTimeout like this, it effectively resets the call stack, so stack overflow is no longer a problem.
Performance will be the problem though, as each call to setTimeout generally takes a fair bit of time (around 10 ms), even if you set delay to 0.

The '1' is 1 millisecond. It might as well be a for loop. 1 second is 1000. I recently wrote something similar checking on the progress of a batch of processes on the back end and set a delay of 500. Older browsers wouldn't see any real difference between 1 and about 15ms if I remember correctly. I think V8 might actually process faster than that.
I don't think garbage collection will be happening to any of the functions until the last iteration is complete but these newer generations of JS JIT compilers are a lot smarter than the ones I know more about so it's possible they'll see that nothing is really going on after the timeout and pull those params from memory.
Regardless, even if memory is reserved for every instance of those parameters, it would take a lot more than 8000 iterations to cause a problem.
One way to safeguard against potential problems with more memory intensive parameters is if you pass in an object with the params you want. Then I believe the params will just be a reference to a set place in memory.
So something like:
var recurseParams ={ steps:8000, data:"abc", delay:100 } //outside of the function
//define the function
recurse(recurseParams);
//Then inside the function reference like this:
recurseParams.steps--

Related

Deleting large Javascript objects when process is running out of memory

I'm a novice to this kind of javascript, so I'll give a brief explanation:
I have a web scraper built in Nodejs that gathers (quite a bit of) data, processes it with Cheerio (basically jQuery for Node) creates an object then uploads it to mongoDB.
It works just fine, except for on larger sites. What's appears to be happening is:
I give the scraper an online store's URL to scrape
Node goes to that URL and retrieves anywhere from 5,000 - 40,000 product urls to scrape
For each of these new URLs, Node's request module gets the page source then loads up the data to Cheerio.
Using Cheerio I create a JS object which represents the product.
I ship the object off to MongoDB where it's saved to my database.
As I say, this happens for thousands of URLs and once I get to, say, 10,000 urls loaded I get errors in node. The most common is:
Node: Fatal JS Error: Process out of memory
Ok, here's the actual question(s):
I think this is happening because Node's garbage cleanup isn't working properly. It's possible that, for example, the request data scraped from all 40,000 urls is still in memory, or at the very least the 40,000 created javascript objects may be. Perhaps it's also because the MongoDB connection is made at the start of the session and is never closed (I just close the script manually once all the products are done). This is to avoid opening/closing the connection it every single time I log a new product.
To really ensure they're cleaned up properly (once the product goes to MongoDB I don't use it anymore and can be deleted from memory) can/should I just simply delete it from memory, simply using delete product?
Moreso (I'm clearly not across how JS handles objects) if I delete one reference to the object is it totally wiped from memory, or do I have to delete all of them?
For instance:
var saveToDB = require ('./mongoDBFunction.js');
function getData(link){
request(link, function(data){
var $ = cheerio.load(data);
createProduct($)
})
}
function createProduct($)
var product = {
a: 'asadf',
b: 'asdfsd'
// there's about 50 lines of data in here in the real products but this is for brevity
}
product.name = $('.selector').dostuffwithitinjquery('etc');
saveToDB(product);
}
// In mongoDBFunction.js
exports.saveToDB(item){
db.products.save(item, function(err){
console.log("Item was successfully saved!");
delete item; // Will this completely delete the item from memory?
})
}
delete in javascript is NOT used to delete variables or free memory. It is ONLY used to remove a property from an object. You may find this article on the delete operator a good read.
You can remove a reference to the data held in a variable by setting the variable to something like null. If there are no other references to that data, then that will make it eligible for garbage collection. If there are other references to that object, then it will not be cleared from memory until there are no more references to it (e.g. no way for your code to get to it).
As for what is causing the memory accumulation, there are a number of possibilities and we can't really see enough of your code to know what references could be held onto that would keep the GC from freeing up things.
If this is a single, long running process with no breaks in execution, you might also need to manually run the garbage collector to make sure it gets a chance to clean up things you have released.
Here's are a couple articles on tracking down your memory usage in node.js: http://dtrace.org/blogs/bmc/2012/05/05/debugging-node-js-memory-leaks/ and https://hacks.mozilla.org/2012/11/tracking-down-memory-leaks-in-node-js-a-node-js-holiday-season/.
JavaScript has a garbage collector that automatically track which variable is "reachable". If a variable is "reachable", then its value won't be released.
For example if you have a global variable var g_hugeArray and you assign it a huge array, you actually have two JavaScript object here: one is the huge block that holds the array data. Another is a property on the window object whose name is "g_hugeArray" that points to that data. So the reference chain is: window -> g_hugeArray -> the actual array.
In order to release the actual array, you make the actual array "unreachable". you can break either link the above chain to achieve this. If you set g_hugeArray to null, then you break the link between g_hugeArray and the actual array. This makes the array data unreachable thus it will be released when the garbage collector runs. Alternatively, you can use "delete window.g_hugeArray" to remove property "g_hugeArray" from the window object. This breaks the link between window and g_hugeArray and also makes the actual array unreachable.
The situation gets more complicated when you have "closures". A closure is created when you have a local function that reference a local variable. For example:
function a()
{
var x = 10;
var y = 20;
setTimeout(function()
{
alert(x);
}, 100);
}
In this case, local variable x is still reachable from the anonymous time out function even after function "a" has returned. If without the timeout function, then both local variable x and y will become unreachable as soon as function a returns. But the existence of the anonymous function change this. Depending on how the JavaScript engine is implemented, it may choose to keep both variable x and y (because it doesn't know whether the function will need y until the function actually runs, which occurs after function a returns). Or if it is smart enough, it can only keep x. Imagine that if both x and y points to big things, this can be a problem. So closure is very convenient but at times it is more likely to cause memory issues and can make it more difficult to track memory issues.
I faced same problem in my application with similar functionality. I've been looking for memory leaks or something like that. The size of consumed memory my process has reached to 1.4 GB and depends on the number of links that must be downloaded.
The first thing I noticed was that after manually running the Garbage Collector, almost all memory was freed. Each page that I downloaded took about 1 MB, was processed and stored in the database.
Then I install heapdump and looked at the snapshot of the application. More information about memory profiling you can found at Webstorm Blog.
My guess is that while the application is running, the GC does not start. To do this, I began to run application with the flag --expose-gc, and began to run GC manually at the time of implementation of the program.
const runGCIfNeeded = (() => {
let i = 0;
return function runGCIfNeeded() {
if (i++ > 200) {
i = 0;
if (global.gc) {
global.gc();
} else {
logger.warn('Garbage collection unavailable. Pass --expose-gc when launching node to enable forced garbage collection.');
}
}
};
})();
// run GC check after each iteration
checkProduct(product._id)
.then(/* ... */)
.finally(runGCIfNeeded)
Interestingly, if you do not use const, let, var, etc when you define something in the global scope, it seems be an attribute of the global object, and deleting returns true. This could cause it to be garbage collected. I tested it like this and it seems to have the intended impact on my memory usage, please let me know if this is incorrect or if you got drastically different results:
x = [];
process.memoryUsage();
i = 0;
while(i<1000000) {
x.push(10.5);
}
process.memoryUsage();
delete x
process.memoryUsage();

Trying to Understand Javascript Closures + Memory Leaks

I've been reading up a lot on closures in Javascript. I come from a more traditional (C, C++, etc) background and understand call stacks and such, but I am having troubles with memory usage in Javascript. Here's a (simplified) test case I set up:
function updateLater(){
console.log('timer update');
var params = new Object();
for(var y=0; y<1000000; y++){
params[y] = {'test':y};
}
}
Alternatively, I've also tried using a closure:
function updateLaterClosure(){
return (function(){
console.log('timer update');
var params = new Object()
for(var y=0; y<1000000; y++)
{
params[y] = {'test':y};
}
});
}
Then, I set an interval to run the function...
setInterval(updateLater, 5000); // or var c = updateLaterClosure(); setInterval(c,5000);
The first time the timer runs, the Memory Usage jumps form 50MB to 75MB (according to Chrome's Task Manager). The second time it goes above 100MB. Occasionally it drops back down a little, but never below 75MB.
Check it out yourself: https://local.phazm.com:4435/Streamified/extension/branches/lib/test.html
Clearly, params is not being fully garbage collected, because the memory from the first timer call is not being freed... yet, neither is it adding 25MB of memory on EACH call, so it is not as if the garbage collection is NEVER happening... it almost seems as though one instance of "params" is always being kept around. I've tried setting up a sub-closure and other things... no dice.
What is MOST disturbing, though, is that the memory usage trends upwards. It might "just" be 75MB for now, but leave it running for long enough (overnight) and it'll get to 500 MB.
Ideas?
Thanks!
Allocating 25mb causes a GC to happen. This GC cleans up the last instance but of course not the current. So you always have one instance around.
GC does not happen when the program is idle. It does not happen between your timer calls so the memory stays around.
That is not even a closure. A closure is when you return something from a function, like an array, function, object or anything that can contain references, and it carries with it all the local members of that function.
what you have there is just a case of a very long loop that is building a very big object. and maybe your memory does not get reclaimed as fast as you are building the huge objects.

A "too much recursion" error in Firefox only sometimes?

I have a pretty simple thing I'm doing with javascript and basically only sometimes will javascript give me a "too much recursion" error.
The code in question:
if(pageLoad===undefined){
var pageLoad=function(){};
}
var pageLoad_uniqueid_11=pageLoad;
var pageLoad=function(){
pageLoad_uniqueid_11();
pageLoad_uniqueid_12();
};
var pageLoad_uniqueid_12=function(){
alert('pageLoad');
};
$(document).ready(function(){
pageLoad();
});
(yes I know there are better way of doing this. This is difficult to change though, especially because of ASP.Net partial postbacks which aren't shown).
Anyway, when the too much recursion error happens, it continues to happen until I restart Firefox. when I restart Firefox it all works as normal again. How do I fix this?
I've also made a jsbin example
Update
Ok I've found out how to reliably reproduce it in our code, but it doesn't work for the jsbin example. If I create a new tab and go to the same page(have two tabs of the same address) and then refresh the first tab two times then I get this error consistently. We are not using any kind of session or anything else that I can think of that could cause such a problem to only occur in one tab!
Update 2
Not as reliable as I thought, but it definitely only occurs when more than one tab of the same page is open. It'll occur every few reloads of one of the tabs open
I've also updated my code to show an alert when pageLoad(the if statement) is initially undefined and when it is initially defined. Somehow, both alerts are showing up. This code is not duplicated in the rendered page and there is no way that it is being called twice. It is in a top level script element not surrounded by a function or anything! My code ends up looking like
if(pageLoad===undefined){
var pageLoad=function(){};
alert('new');
} else {
alert('old');
}
The code in question -- by itself -- should never result in an infinite recursion issue -- there is no function-statement and all the function objects are eagerly assigned to the variables. (If pageload is first undefined it will be assigned a No-Operation function, see next section.)
I suspect there is additional code/events that is triggering the behavior. One thing that may cause it is if the script/code is triggered twice during a page lifetime. The 2nd time pageload will not be undefined and will keep the original value, which if it is the function that calls the other two functions, will lead to infinite recursion.
I would recommend cleaning up the approach -- and having any issues caused by the complications just disappear ;-) What is the desired intent?
Happy coding.
This is just some additional info for other people trying to look for similar "too much recursion" errors in their code. Looks like firefox (as an example) gets too much recursion at about 6500 stack frames deep in this example: function moose(n){if(n%100 === 0)console.log(n);moose(n+1)};moose(0) . Similar examples can see depths of between 5000 and 7000. Not sure what the determining factors are, but it seems the number of parameters in the function drastically decrease the stack frame depth at which you get a "too much recursion" error. For example, this only gets to 3100:
function moose(n,m,a,s,d,fg,g,q,w,r,t,y,u,i,d){if(n%100 === 0)console.log(n);moose(n+1)};moose(0)
If you want to get around this, you can use setTimeout to schedule iterations to continue from the scheduler (which resets the stack). This obviously only works if you don't need to return something from the call:
function recurse(n) {
if(n%100 === 0)
setTimeout(function() {
recurse(n+1)
},0)
else
recurse(n+1)
}
Proper tail calls in ECMAScript 6 will solve the problem for some cases where you do need to return something from calls like this. Until then, for cases with deep recursion, the only answers are using either iteration, or the setTimeout method I mentioned.
I came across this error. The scenario in my case was different. The culprit code was something like this (which is simple concatenation recessively)
while(row)
{
string_a .= row['name'];
}
I found that JavaScript throws error on 180th recursion. Up till 179 loop, the code runs fine.
The behaviors in Safaris is exactly the same, except that the error it shows is "RangeError: Maximum call stack size exceeded." It throws this error on 180 recursion as well.
Although this is not related to function call but it might help somebody who are stuck with it.
Afaik, this error can also appear if you state a wrong parameter for your ajax request, like
$.getJSON('get.php',{cmd:"1", elem:$('#elem')},function(data) { // ... }
Which then should be
elem:$('#elem').val()
instead.
This will also cause the "too much recursion" issue:
class account {
constructor() {
this.balance = 0; // <-- property: balance
}
set balance( amount ) { // <-- set function is the same name as the property.
this.balance = amount; // <-- AND property: balance (unintended recursion here)
}
}
var acc = new account();
Using unique names is important.
Ok, so why is this happening?
In the set function it isn't actually setting the property to amount, instead it's calling the set function again because in the scope of the set function it is the same syntax for both setting the property AND calling the set function.
Because in that scope this is the same as account and (account OR this).balance = amount can both call the set function OR set the property.
The solution to this is to simply change the name of either the property or the set function in any way (and of course update the rest of the code accordingly).

Does a this setTimeout create any memory leaks?

Does this code create any memory leaks? Or is there anything wrong with the code?
HTML:
<div id='info'></div>
Javascript:
var count = 0;
function KeepAlive()
{
count++;
$('#info').html(count);
var t=setTimeout(KeepAlive,1000);
}
KeepAlive();
Run a test here:
http://jsfiddle.net/RjGav/
You should probably use setInterval instead:
var count = 0;
function KeepAlive() {
$('#info').html(++count);
}
var KAinterval = setInterval(KeepAlive, 1000);
You can cancel it if you ever need to by calling clearInterval(KAinterval);.
I think this will leak because the successive references are never released. That is, the first call immediately creates a closure by referencing the function from within itself. When it calls itself again, the new reference is from the instance created on the first iteration, so the first one could again never be released.
You could test this theory pretty easily by changing the interval to something very small and watch the memory in chrome...
(edit) theory tested with your fiddle, actually, I'm wrong it doesn't leak, at least in Chrome. But that's no guarantee some other browser (e.g. older IE) isn't as good at garbage collecting.
But whether or not it leaks, there's no reason not to use setInterval instead.
This should not create a leak, because the KeepAlive function will complete in a timely manner and thus release all variables in that function. Also, in your current code, there is no reason to set the t var as it is unused. If you want to use it to cancel your event, you should declare it in a higher scope.
Other than that, I see nothing "wrong" with your code, but it really depends on what you are trying to do. For example, if you are trying to use this as a precise timer, it will be slower than a regular clock. Thus, you should either consider either setting the date on page load and calculating the difference when you need it or using setInterval as g.d.d.c suggested.
It is good to have setInterval method like g.d.d.c mentioned.
Moreover, it is better to store $('#info') in a variable outside the function.
Checkout http://jsfiddle.net/RjGav/1/

Javascript Anonymous Functions and Global Variables

I thought I would try and be clever and create a Wait function of my own (I realise there are other ways to do this). So I wrote:
var interval_id;
var countdowntimer = 0;
function Wait(wait_interval) {
countdowntimer = wait_interval;
interval_id = setInterval(function() {
--countdowntimer <=0 ? clearInterval(interval_id) : null;
}, 1000);
do {} while (countdowntimer >= 0);
}
// Wait a bit: 5 secs
Wait(5);
This all works, except for the infinite looping. Upon inspection, if I take the While loop out, the anonymous function is entered 5 times, as expected. So clearly the global variable countdowntimer is decremented.
However, if I check the value of countdowntimer, in the While loop, it never goes down. This is despite the fact that the anonymous function is being called whilst in the While loop!
Clearly, somehow, there are two values of countdowntimer floating around, but why?
EDIT
Ok, so I understand (now) that Javascript is single threaded. And that - sort of - answers my question. But, at which point in the processing of this single thread, does the so called asynchronous call using setInterval actually happen? Is it just between function calls? Surely not, what about functions that take a long time to execute?
There aren't two copies of the variable lying around. Javascript in web browsers is single threaded (unless you use the new web workers stuff). So the anonymous function never has the chance to run, because Wait is tying up the interpreter.
You can't use a busy-wait functions in browser-based Javascript; nothing else will ever happen (and they're a bad idea in most other environments, even where they're possible). You have to use callbacks instead. Here's a minimalist reworking of that:
var interval_id;
var countdowntimer = 0;
function Wait(wait_interval, callback) {
countdowntimer = wait_interval;
interval_id = setInterval(function() {
if (--countdowntimer <=0) {
clearInterval(interval_id);
interval_id = 0;
callback();
}
}, 1000);
}
// Wait a bit: 5 secs
Wait(5, function() {
alert("Done waiting");
});
// Any code here happens immediately, it doesn't wait for the callback
Edit Answering your follow-up:
But, at which point in the processing of this single thread, does the so called asynchronous call using setInterval actually happen? Is it just between function calls? Surely not, what about functions that take a long time to execute?
Pretty much, yeah — and so it's important that functions not be long-running. (Technically it's not even between function calls, in that if you have a function that calls three other functions, the interpreter can't do anything else while that (outer) function is running.) The interpreter essentially maintains a queue of functions it needs to execute. It starts starts by executing any global code (rather like a big function call). Then, when things happen (user input events, the time to call a callback scheduled via setTimeout is reached, etc.), the interpreter pushes the calls it needs to make onto the queue. It always processes the call at the front of the queue, and so things can stack up (like your setInterval calls, although setInterval is a bit special — it won't queue a subsequent callback if a previous one is still sitting in the queue waiting to be processed). So think in terms of when your code gets control and when it releases control (e.g., by returning). The interpreter can only do other things after you release control and before it gives it back to you again. And again, on some browsers (IE, for instance), that same thread is also used for painting the UI and such, so DOM insertions (for instance) won't show up until you release control back to the browser so it can get on with doing its painting.
When Javascript in web browsers, you really need to take an event-driven approach to designing and coding your solutions. The classic example is prompting the user for information. In a non-event-driven world, you could do this:
// Non-functional non-event-driven pseudo-example
askTheQuestion();
answer = readTheAnswer(); // Script pauses here
doSomethingWithAnswer(answer); // This doesn't happen until we have an answer
doSomethingElse();
That doesn't work in an event-driven world. Instead, you do this:
askTheQuestion();
setCallbackForQuestionAnsweredEvent(doSomethingWithAnswer);
// If we had code here, it would happen *immediately*,
// it wouldn't wait for the answer
So for instance, askTheQuestion might overlay a div on the page with fields prompting the user for various pieces of information with an "OK" button for them to click when they're done. setCallbackForQuestionAnswered would really be hooking the click event on the "OK" button. doSomethingWithAnswer would collect the information from the fields, remove or hide the div, and do something with the info.
Most Javascript implementation are single threaded, so when it is executing the while loop, it doesn't let anything else execute, so the interval never runs while the while is running, thus making an infinite loop.
There are many similar attempts to create a sleep/wait/pause function in javascript, but since most implementations are single threaded, it simply doesn't let you do anything else while sleeping(!).
The alternative way to make a delay is to write timeouts. They can postpone an execution of a chunk of code, but you have to break it in many functions. You can always inline functions so it makes it easier to follow (and to share variables within the same execution context).
There are also some libraries that adds some syntatic suggar to javascript making this more readable.
EDIT:
There's an excelent blog post by John Resig himself about How javascript timers work. He pretty much explains it in details. Hope it helps.
Actually, its pretty much guaranteed that the interval function will never run while the loop does as javascript is single-threaded.
There is a reason why no-one has made Wait before (and so many have tried); it simply cannot be done.
You will have to resort to braking up your function into bits and schedule these using setTimeout or setInterval.
//first part
...
setTimeout(function(){
//next part
}, 5000/*ms*/);
Depending on your needs this could (should) be implemented as a state machine.
Instead of using a global countdowntimer variable, why not just change the millisecond attribute on setInterval instead? Something like:
var waitId;
function Wait(waitSeconds)
{
waitId= setInterval(function(){clearInterval(waitId);}, waitSeconds * 1000);
}

Categories