How to free memory of previous stack frame in Javascript - javascript

I have a number of functions calling the next one in a chain, processing a rather large set of data to an equally large set of different data:
function first_step(input_data, second_step_callback)
{
result = ... // do some processing
second_step_callback(result, third_step);
}
function second_step(intermediate_data, third_step_callback)
{
result = ... // do some processing
third_step_callback(result);
}
function third_step(intermediate_data) { }
first_step(huge_data, second_step);
In third_step I am running out of memory (Chrome seems to kill the tab when memory usage reaches about 1.5 GB).
I think, when reaching third_step(), the input_data from first_step() is still retained, because first_step() is on the call stack, isn't it? At least when the debugger is running, I can see the data.
Obviously I don't need it anymore. In first_step() there is no code after second_step_callback(result, third_step);. Maybe if I could free that memory, my tab might survive processing a data set of this size. Can I do this?

Without seeing a lot more of what you're really doing that is using memory, it's hard for us to tell whether you're just using too much memory or whether you just need to let earlier memory get freed up.
And, memory in Javascript is not "owned" by stack frames so the premise of the question seems to make a bit of a wrong assumption. Memory in Javascript is garbage collected and is eligible for GC when no live, reachable code still has a reference to the data and will get garbage collected the next time the garbage collector gets to run (during JS idle time).
That said, if you have code that makes a succession of nested function calls like your question shows, you can reduce the amount of memory usage, by doing some of these things:
Clear variables that hold large data (just set them to null) that are no longer needed.
Reduce the use of intermediate variables that hold large data.
Reduce the copying of data.
Reduce string manipulations with intermediate results because each one creates block of memory that then has to be reclaimed.
Clear the stack by using setTimeout() to run the next step in the chain and allow the garbage collector a chance to do its thing on earlier temporary variables.
Restructure how you process or store the data to fundamentally use less memory.

Related

Do languages like JS with a copying GC ever store anything on the cpu registers?

I am learning about GC's and I know there's a thing called HandleScope which 'protects' your local variables from the GC and updates them if a gc heap copy happens. For example, if I have a routine which adds togother 2 values and I call it, it may invoke the garbage collector which will copy the Object that my value is pointing to (or the GC will not even know that the Object the value is pointing to is referenced). A really minimal example:
#include <vector>
Value addValues(Value a, Value b);
std::vector<Value*> gc_vals_with_protection;
Value func(Value a, Value b)
{
vars.push_back(&a); // set protection for a
gc_vals_with_protection.push_back(&b); // set protection for b
Value res = addValues(a, b); // do calcuations
gc_vals_with_protection.pop_back(); // remove protection for b
gc_vals_with_protection.pop_back(); // remove protection for a
return res;
}
But this has got me thinking, it will mean that a and b will NEVER be on the physical CPU registers because you have taken their addresses (and CPU registers don't have addresses) which will make calcuations on them inefficient. Also, at the beggining of every function, you would have to push back twice to the vector (https://godbolt.org/z/dc6vY1Yc5 for assembly).
I think I may be missing something, as this must be not optimal. Is there any other trick I am missing?
(V8 developer here.)
Do languages like JS with a copying GC ever store anything on the cpu registers?
Yes, of course. Pretty much anything at all that a CPU does involves its registers.
That said, JavaScript objects are generally allocated on the heap anyway, for at least the following reasons:
(1) They are bigger than a register. So registers typically hold pointers to objects on the heap. It's these pointers, not the objects themselves, that Handles are needed for (both to update them, and to inform the GC that there are references to the object in question, so the object must not be freed).
(2) They tend to be much longer-lived than the typical amount of time you can hold something in a register, which is only a couple of machine instructions: since the set of registers is so small, they are reused for something else all the time (regardless of JavaScript or GC etc), so whatever they held before will either be spilled (usually though not necessarily to the stack), or re-read from wherever it originally came from next time it's needed.
(3) They have "pointer identity": JavaScript code like obj1 === obj2 (for objects, not primitives) only works correctly when there is exactly one location where an object is stored. Trying to store objects in registers would imply copying them around, which would break this.
There is certainly some cost to creating Handles; it's faster than adding something to a std::vector though.
Also, when passing Handles from one function to another, the called function doesn't have to re-register anything: Handles can be passed around without having to create new entries in the HandleScope's backing store.
A very important observation is that JavaScript functions don't need Handles for their locals. When executing JavaScript, V8 carefully keeps track of the contents of the stack (i.e. spilled contents of registers), and can walk and update the stack directly. HandleScopes are only needed for C++ code dealing with JS objects, because this technique isn't possible for C++ stack frames (which are controlled by the C++ compiler). Such C++ code is typically not the most critical performance bottleneck of an app; so while its performance certainly matters, some amount of overhead is acceptable.
(Side note: one can "blindly" (i.e. without knowledge about their contents) scan C++ stack frames and do so-called "conservative" (instead of "precise") garbage collection; this comes with its own pros and cons, in particular it makes a moving GC impossible, so is not directly relevant to your question.)
Taking this one step further: sufficiently "hot" functions will get compiled to optimized machine code; this code is the result of careful analysis and hence can be quite aggressive about keeping values (primarily numbers) in registers as long as possible, for example for chains of calculations before the final result is eventually stored in some property of some object.
For completeness, I'll also mention that sometimes, entire objects can be held in registers: this is when the optimizing compiler successfully performs "escape analysis" and can prove that the object never "escapes" to the outside world. A simple example would be:
function silly(a, b) {
let vector = {x: a, y: b};
return vector.x + vector.y;
}
When this function gets optimized, the compiler can prove that vector never escapes, so it can skip the allocation and keep a and b in registers (or at least as "standalone values", they might still get spilled to the stack if the function is bigger and needs those registers for something else).

Memory management in browser with each function invocation?

In attempting to determine how to most effciently perform a series of repetitive steps, I find that I know very little about memory usage in a web browser. I've read about the garbage collector and memory leaks several times before but that is still very little. The specific scenario in which I am interested is as follows.
A set of data objects is retrieved from indexedDB using a getAll statement over a key range. The data is all text and would never exceed 0.25MB and is likely always less that 0.1MB.
The array of data objects is passed to a function that builds a node in a document fragment and replaces a node in the DOM.
As the user modifies/adds/deletes the data, the state is saved in the database using a get/put sequence, and an add if a new object is built. This is performed on one object at a time, not on the whole array.
When the user is finished, the next set of data is retrieved and the DOM node is replaced again. The users can navigate forward and backward through the packets of data they build.
After the new node is built, the variable that holds the array of data objects is no longer used, and is set to null in attempt at avoiding any potential memory leaks. But I started thnking about what really happens to that data and memory allocation each time the functions are invoked.
Suppose the user rapidly clicks through a series of data packets. Each time the data is retrieved and the new nodes built and replaced in the DOM, is the same memory space being used over and over again, or does each invocation take up more space, and the memory used in the former invocations is later released by the GC, such that rapidly clicking through ten nodes temporarily takes up ten nodes worth of data? I mean this to include all the data involved in these steps and not just the DOM node being replaced.
One reason I ask is that I was considering holding the array of data objects in RAM, and updating the objects as the user makes edits/builds more obejcts, and then just using a put operation to record it in the database rather than a get/put sequence. And that made me wonder what takes place if that array is held in a property of one of the functions. Will the data allocation of the function property repeatedly be re-used with each new invocation or will it take up more space each time also, until former invocations are released?
I thought maybe that, even when the variables are set to null after the new nodes are built, the GC may take time to release the memory anyway, such that memory is always being used and it may as well hold one data packet at a time and eliminate the need for a get and waiting for the onsuccess event to update the object and then put it back again.
Of course, all of this works very quickly with the get/put sequence, but I'd like to understand what is happening regarding memory; and, if setting the variables to null and not holding the array in RAM is really saving nothing, then there is no reason to use get, and the less work done may make it less likely that, after a user works with this tool for an hour or two, there would be a memory issue.
Thank you for considering my rather novice question.
Thank you for the comments. Although they are interesting, they don't really concern what I am asking. I'm not asking about memory leaks, but only commented that the variables are being set to null at the close of the functions. They are set to null because I read that if the references to an area of memory are 'broken' it helps the GC's mark-and-sweep algorithm identify that the particular area of memory may be released. And, if some type of reference back to a variable remains, it will at least be pointing to null rather than an area of remaining data. Whether or not that is true, I don't know for sure; but that is what I read. I read many articles on memory leaks but these three, article 1, article 2, and article 3 I was able to find again easily to provide for examples. In article 3, page 5 shows setting a variable to null as a way to protect against memory leaks by making an object unreachable.
I don't understand why the question would be considered a micro optimization. It's just a novice question about how memory is used in the browser when multiple invocations of functions take place in between GC cycles. And it's just an example of what I've been examining, and does not depend on indexedDB at all, nor whether or not memory leaks really exist in properly coded applications. The description I provided likely made it confusing.
If a function employs local variables to retrieve an array of data objects from an indexedDB object store and in the transaction.oncomplete handler passes reference to that area of memory to another function that uses it to build a document fragment and replace a DOM node what happens to the data? Two functions have reference to the array and one to the fragment. If the references are manually broken by setting the variables that pointed to them to null (and even if not set to null), eventually the GC will release that memory. But, if the user clicks through rapidly, repeatedly invoking these functions in a short interval of time, that is, between GC cycles, will each invocation allocate a new area of memory for the data array and fragment, such that in between the GC cycles, there could be ten sets of data areas held in RAM waiting to be relased? If the array was held in a property of the function that retrieves it, would each invocation re-use that same area of memory or would the function property simple change its reference to a new area of memory holding the new array and breaking reference to the area of memory holding the previous array, but there would still be ten sets of data areas in between GC cycles?
As I wrote before, I was wondering if there was a way to re-use the same memory area for each invocation. If not possible and if there would always be several sets of data held in RAM waiting to be released between GC cycles, it may be beneficial to retain reference to the current data set and use it to reduce the amount of work performed by the browser to save the current state, which, in itself is an entriely separate issue but one that depends on how the memory is used by the browser.
When I observe snap shots of the memory in Firefox developer tools, memory use grows as I step through these data sets, repeatedly retrieving new data and building a new fragment to replace the DOM node; but the amount of that data is relatively small and it may just build up until the GC cycle runs. From this observation, it appears that each invocation uses a new area of data and breaks the reference to the previous area of data. Therefore, maintaining a reference to the current data set is not a memory issue because there are always many such memory areas held in RAM anyway in between GC cycles.
Nonetheless, I must have an issue of some sort because after adding 100 data packets, and navigating up an down them through Next / Previous buttons, the data usage continues to grow and nearly all in the domNode section. It starts out at 7MB total, 6MB in domNode and 5MB of that in #document. #document remains at 5MB but domNode grows to at least 150MB as do nothing but move up and down the records, retrieving data, building and replacing nodes but never making an edit to the data; and never having more than one node for it is always replacement of exactly the same size because in this test the 100 data packets are identical copies. So, just getAll, build a fragment, replace a DOM node, over and over again.
Thank you.

Can garbage collection happen while the main thread is busy?

Let's say I have a long running loop:
// Let's say this loop takes 10 seconds to execute
for(let i = 0; i <= 1000000; ++i) {
const garbage = { i };
// some other code
}
Can the garbage collector run during the loop, or it can only run when the application is idle?
I didn't find any documentation related to this, but because Node.js has the --nouse-idle-notification which in theory disables GC, makes me think that the GC only runs when the idle notification is sent (when the main thread is not busy).
I am asking this because my loop sometimes has spikes in execution time and want to know if it's possible that the GC might run during the loop, resulting in the lag spike.
V8 developer here. The short answer is that the GC can run at any time and will run whenever it needs to.
Note that the GC is a fairly complex system: it performs several different tasks, and does most of them in incremental steps and/or concurrently with the main thread. In particular, every allocation can trigger a bit of incremental GC work. (Which implies that by very carefully avoiding all allocations, you can construct loops that won't cause GC activity while they run; but it's never the case that loops accumulate garbage that can't get collected -- unless you have a leak in your code of course, where objects are unintentionally being kept reachable.)
Can the garbage collector run during the loop, or it can only run when the application is idle?
It absolutely can and will run during the loop.
Node.js has the --nouse-idle-notification which in theory disables GC
No, it does not. There is no way to disable GC. That flag disables one particular mechanism for triggering GC activity, but that only means that GC will be triggered by other mechanisms.
the GC only runs when the idle notification is sent (when the main thread is not busy)
No, the idea is to run some extra GC cycles when there is idle time, to save some memory when the application is not busy.
my loop sometimes has spikes in execution time and want to know if it's possible that the GC might run during the loop, resulting in the lag spike
That could be. It could possibly also have to do with optimization or deoptimization of the function. Or it could be something else -- the operating system interrupting your process or assigning it to another CPU core, for example, or hundreds of other reasons. Computers are complex machines ;-)
if you set a variable to null -- garbage collection is done immediately
No, it is not. Garbage collection is never done immediately (at least not in V8).
As a concept the garbage collector works in a separate thread since in this way it will not block the main thread (UI thread in most cases).
As for your example, there is no problem for the garbage collection thread running in "parallel" to this loop since the value there const garbage = {key: i} will not get removed as long as it is being referenced.
Also note that there are several generations that the garbage collector passes your values through before removing them completely.

Javascript overhead of repeated allocation

I'm making a pretty simple game just for fun/practice but I still want to code it well regardless of how simple it is now, in case I want to come back to it and just to learn
So, in that context, my question is:
How much overhead is involved in object allocation? And how well does the interpreter already optimize this? I'm going to be repeatedly checking object grid positions, and if they are still in the same grid square, then no updating the grid array
if (obj.gridPos==obj.getCurrentGridPos()){
//etc
}
But, should I keep an outer "work" point object that the getCurrentGridPos() changes each time or should it return a new point object each time?
Basically, even if the overhead of creating a point object isnt all that much to matter in this scenario, which is faster?
EDIT:
this? which will get called every object each frame
function getGridPos(x,y){
return new Point(Math.ceil(x/25),Math.ceil(y/25));
}
or
//outside the frame by frame update function looping through every object each frame
tempPoint= new Point()
//and each object each frame calls this, passing in tempPoint and checking that value
function makeGridPos(pt,x,y){
pt.x = Math.ceil(x/25);
pt.y = Math.ceil(y/25);
}
Between your two code examples that you have now added, I know of no case where the first would be more efficient than the second. So, if you're trying to optimize for performance or memory use, then re-using an existing object will likely be more efficient than creating a new object each time you call the function.
Note: since JS refers to object by reference, you will have to make sure that your code is not elsewhere hanging on to that object and expecting it to keep its value.
Prior answer:
In all programming (regardless of how good the optimizer is), you are always better caching a result calculated as a result of accessing several member variables that you are using over and over again in the same function rather than recalling the function that calculates it over and over.
So, if you are calling obj.getCurrentGridPos() more than once and conditions have not changed such that it might return a different result, then you should cache it's value locally (in any language). This is just good programming.
var currentPos = obj.getCurrentGridPos();
And, then use that locally cached value:
if (obj.gridPos == currentPos) {
The interpreter may not be able to do this type of optimization for you because it may not be able to tell whether other operations might cause obj.getCurrentGridPos() to return something different from one call to another, but you the programmer can know that.
One other thing. If obj.getCurrentGridPos() returns an actual object, then you probably don't want to be using == or === to compare objects. That compares ONLY to see if they are literally the same object - it does not compare to see if the two objects have the same properties.
This question is VERY difficult to answer because of all the different javascript engine's out there. The "big 4" of browsers all have their own javascript engine/interpreter and each one is going to do their allocation, caching, GCing, etc.. differently.
The Chrome (and Safari) dev tools have a profiling tab where you can profile memory allocation, timings, etc of your code. This will be a place to start (at least for Chrome and Safari)
I'm not certain if IE or Firefox offer such tools, but I wouldn't be surprised if some third party tools exist for these browsers for testing such things...
Also, for reference -
Chrome uses the V8 javascript engine
I.E uses the Triton (I think it's still called that?) javascript engine
Firefox I believe uses Spidermonkey
Safari I'm not sure about, but think it's using the one that's part of WebKit.
It's my understanding that garbage collection stops execution on most JS engines. If you're going to be making many objects per iteration through your game loop and letting them go out of scope that will cause slowdown when the garbage collector takes over.
For this kind of situation you might consider making a singleton to pool your objects with a method to recycle them for reuse by deleting all of their properties, resetting their __proto__ to Object.prototype, and storing them in an array. You can then request recycled objects from the pool as needed, only increasing the pool size when it runs dry.
The short answer is to set one current position object and check against itself as you're literally going to use more memory if you create a new object every time you call getCurrentGridPos().
There may be a better place for the is this a new position check since you should only do that check once per iteration.
It seems optimal to set the currentGridPos using a RequestAnimationFrame and check against its current x y z positions before updating it so you can then trigger a changedPosition type event.
var currentPos = {
x:0,
y:0,
z:0
}
window.requestAnimationFrame(function(newPos){
if (currentPos.x != currentPos.x || currentPos.y != currentPos.y || currentPos.z != newPos.z) {
$.publish("positionChanged"); // tinypubsub https://gist.github.com/661855
}
})
So, just to be clear, yes I think you should keep an outer "work" point object that updates every iteration... and while it's updating you could check to see if its position has changed - this would be a more intentful way to organize the logic and ensure you don't call getCurrentPos more than once per iteration.

javascript memory leak

i've just noticed that some javascript i;ve just written appears to be leaking memory, it quite a simple piece of code - thanks to jquery - but I can watch it running in taskmanager and the memory usage is slowly clicking up by between 4 and 40 bytes.
All i'm doing is throwing some data at a asp mvc controller/action via getJSON:
$(document).ready(function () {
var olddata = "";
window.setInterval(function () {
var options = JSON.stringify({
orderby: "name"
});
var params = {
options: options,
data: olddata ? JSON.stringify(olddata) : ""
};
$.getJSON("/Home/GetTasks", params, function (json) {
olddata = json;
json = null;
});
params = null;
options = null;
}, 1000);
});
I've bumped up the timer value just to see the problem more readily. I'm obviously doing something wrong here butcan't see what.
Should I be cleaning up the getJSON call?
TIA.
How do you know you're actually leaking memory?
At small numbers like 4 and 40 bytes, you could just be seeing heap growth, but some of the new blocks in the heap are "free" and available for future use so while the overall app memory use grows, the memory isn't actually leaking and will be available for future use so it won't grow forever.
If this is the entire extent of your experiment, then I don't see any issues with the code.
There are three function closures here. The $(document).ready() closure lasts the lifetime of your code, but it's just a one-time deal so there should be no issue.
The anonymous function passed to setInterval() keeps the $(document).ready() closure alive. Each call to the setInterval() anonymous function should be a new call that will get a new set of local variables and release it's old ones when the prior calls runs to completion.
The anonymous function passed to getJSON() creates a closure on the setInterval anonymous function, but that closure should only last until the getJSON function finishes and when it does the setInterval() anonymous function closure should be released.
The only closure that I see that lasts here is the $(document).ready() closure which is something you intend and it's only created once so it should cause no leak.
All the local variables in the getJSON anonymous function are going to be released when the it finishes. The only data from the getJSON call that survives is your assignment of:
olddata = json;
But, each successive assignment is just replacing the data from the previous call so the previous data is no longer referenced and available for recycling by the garbage collector.
There are no DOM manipulations here so there is no opportunity for cross or circular references between DOM and JS.
My conclusion is that I don't see anything that will leak. I see plenty of things using temporary memory so I suspect what you're seeing in the process memory usage is just heap growth, but growth in a way that the memory that has grown will eventually get reused. If the browser is also caching the JSON results, you could be seeing memory cache growth too.
Unfortunately, in today's browsers, it's difficult to tell when it's really a memory leak versus a temporary expansion of browser memory used by caching, general heaps, etc... In the extreme, you could set all caches to be very small and run this for a long time (hundreds of thousands of iterations). If it's not a leak, memory use should eventually flatten out. If it is a leak, memory usage should continue to grow relatively linearly.
Disclaimer: the one disclaimer here is that I'm assuming that the jQuery function $.getJSON() doesn't leak itself and always finishes in a way that cleans up the closure it creates, even if the ajax call is not successful.

Categories