Garbage collection takes long [How to debug what is being collected?]

Garbage collection takes long [How to debug what is being collected?] - javascript

I have an issue where the main javascript thread is being blocked due to garbage collection. (see screenshot below). The use case is that I am streaming JSON from a server with oboe.js, process some nodes, and put the nodes in a WebGl renderer and render those while they are streaming. This works pretty well, except for the garbage collector causing 2fps due to collecting blocks of 9mb which takes ~500ms.
The problem is that I do not know what is being collected and how I can prevent it from being collected. So my question is twofold:
How to either reduce the size of blocks the garbage collector collects or postpone the moment that the garbage collector runs?
How to debug what is being collected?
A snippet of what is happening:

I found the issue after all. There was a parsing process that spit out a variable that was a very complex nested object.
someModule.parse(someString, function(result){
// result = complex deep nested object
// process result
});
After the process result stage, the result variable was flagged for garbage collection, which it immediately did. I cached the result in a temporary variable and deleted it 'manually' later, which solved the problem.

Related

Memory management in browser with each function invocation?

In attempting to determine how to most effciently perform a series of repetitive steps, I find that I know very little about memory usage in a web browser. I've read about the garbage collector and memory leaks several times before but that is still very little. The specific scenario in which I am interested is as follows.
A set of data objects is retrieved from indexedDB using a getAll statement over a key range. The data is all text and would never exceed 0.25MB and is likely always less that 0.1MB.
The array of data objects is passed to a function that builds a node in a document fragment and replaces a node in the DOM.
As the user modifies/adds/deletes the data, the state is saved in the database using a get/put sequence, and an add if a new object is built. This is performed on one object at a time, not on the whole array.
When the user is finished, the next set of data is retrieved and the DOM node is replaced again. The users can navigate forward and backward through the packets of data they build.
After the new node is built, the variable that holds the array of data objects is no longer used, and is set to null in attempt at avoiding any potential memory leaks. But I started thnking about what really happens to that data and memory allocation each time the functions are invoked.
Suppose the user rapidly clicks through a series of data packets. Each time the data is retrieved and the new nodes built and replaced in the DOM, is the same memory space being used over and over again, or does each invocation take up more space, and the memory used in the former invocations is later released by the GC, such that rapidly clicking through ten nodes temporarily takes up ten nodes worth of data? I mean this to include all the data involved in these steps and not just the DOM node being replaced.
One reason I ask is that I was considering holding the array of data objects in RAM, and updating the objects as the user makes edits/builds more obejcts, and then just using a put operation to record it in the database rather than a get/put sequence. And that made me wonder what takes place if that array is held in a property of one of the functions. Will the data allocation of the function property repeatedly be re-used with each new invocation or will it take up more space each time also, until former invocations are released?
I thought maybe that, even when the variables are set to null after the new nodes are built, the GC may take time to release the memory anyway, such that memory is always being used and it may as well hold one data packet at a time and eliminate the need for a get and waiting for the onsuccess event to update the object and then put it back again.
Of course, all of this works very quickly with the get/put sequence, but I'd like to understand what is happening regarding memory; and, if setting the variables to null and not holding the array in RAM is really saving nothing, then there is no reason to use get, and the less work done may make it less likely that, after a user works with this tool for an hour or two, there would be a memory issue.
Thank you for considering my rather novice question.
Thank you for the comments. Although they are interesting, they don't really concern what I am asking. I'm not asking about memory leaks, but only commented that the variables are being set to null at the close of the functions. They are set to null because I read that if the references to an area of memory are 'broken' it helps the GC's mark-and-sweep algorithm identify that the particular area of memory may be released. And, if some type of reference back to a variable remains, it will at least be pointing to null rather than an area of remaining data. Whether or not that is true, I don't know for sure; but that is what I read. I read many articles on memory leaks but these three, article 1, article 2, and article 3 I was able to find again easily to provide for examples. In article 3, page 5 shows setting a variable to null as a way to protect against memory leaks by making an object unreachable.
I don't understand why the question would be considered a micro optimization. It's just a novice question about how memory is used in the browser when multiple invocations of functions take place in between GC cycles. And it's just an example of what I've been examining, and does not depend on indexedDB at all, nor whether or not memory leaks really exist in properly coded applications. The description I provided likely made it confusing.
If a function employs local variables to retrieve an array of data objects from an indexedDB object store and in the transaction.oncomplete handler passes reference to that area of memory to another function that uses it to build a document fragment and replace a DOM node what happens to the data? Two functions have reference to the array and one to the fragment. If the references are manually broken by setting the variables that pointed to them to null (and even if not set to null), eventually the GC will release that memory. But, if the user clicks through rapidly, repeatedly invoking these functions in a short interval of time, that is, between GC cycles, will each invocation allocate a new area of memory for the data array and fragment, such that in between the GC cycles, there could be ten sets of data areas held in RAM waiting to be relased? If the array was held in a property of the function that retrieves it, would each invocation re-use that same area of memory or would the function property simple change its reference to a new area of memory holding the new array and breaking reference to the area of memory holding the previous array, but there would still be ten sets of data areas in between GC cycles?
As I wrote before, I was wondering if there was a way to re-use the same memory area for each invocation. If not possible and if there would always be several sets of data held in RAM waiting to be released between GC cycles, it may be beneficial to retain reference to the current data set and use it to reduce the amount of work performed by the browser to save the current state, which, in itself is an entriely separate issue but one that depends on how the memory is used by the browser.
When I observe snap shots of the memory in Firefox developer tools, memory use grows as I step through these data sets, repeatedly retrieving new data and building a new fragment to replace the DOM node; but the amount of that data is relatively small and it may just build up until the GC cycle runs. From this observation, it appears that each invocation uses a new area of data and breaks the reference to the previous area of data. Therefore, maintaining a reference to the current data set is not a memory issue because there are always many such memory areas held in RAM anyway in between GC cycles.
Nonetheless, I must have an issue of some sort because after adding 100 data packets, and navigating up an down them through Next / Previous buttons, the data usage continues to grow and nearly all in the domNode section. It starts out at 7MB total, 6MB in domNode and 5MB of that in #document. #document remains at 5MB but domNode grows to at least 150MB as do nothing but move up and down the records, retrieving data, building and replacing nodes but never making an edit to the data; and never having more than one node for it is always replacement of exactly the same size because in this test the 100 data packets are identical copies. So, just getAll, build a fragment, replace a DOM node, over and over again.
Thank you.

How to free memory of previous stack frame in Javascript

I have a number of functions calling the next one in a chain, processing a rather large set of data to an equally large set of different data:
function first_step(input_data, second_step_callback)
{
result = ... // do some processing
second_step_callback(result, third_step);
}
function second_step(intermediate_data, third_step_callback)
{
result = ... // do some processing
third_step_callback(result);
}
function third_step(intermediate_data) { }
first_step(huge_data, second_step);
In third_step I am running out of memory (Chrome seems to kill the tab when memory usage reaches about 1.5 GB).
I think, when reaching third_step(), the input_data from first_step() is still retained, because first_step() is on the call stack, isn't it? At least when the debugger is running, I can see the data.
Obviously I don't need it anymore. In first_step() there is no code after second_step_callback(result, third_step);. Maybe if I could free that memory, my tab might survive processing a data set of this size. Can I do this?

Without seeing a lot more of what you're really doing that is using memory, it's hard for us to tell whether you're just using too much memory or whether you just need to let earlier memory get freed up.
And, memory in Javascript is not "owned" by stack frames so the premise of the question seems to make a bit of a wrong assumption. Memory in Javascript is garbage collected and is eligible for GC when no live, reachable code still has a reference to the data and will get garbage collected the next time the garbage collector gets to run (during JS idle time).
That said, if you have code that makes a succession of nested function calls like your question shows, you can reduce the amount of memory usage, by doing some of these things:
Clear variables that hold large data (just set them to null) that are no longer needed.
Reduce the use of intermediate variables that hold large data.
Reduce the copying of data.
Reduce string manipulations with intermediate results because each one creates block of memory that then has to be reclaimed.
Clear the stack by using setTimeout() to run the next step in the chain and allow the garbage collector a chance to do its thing on earlier temporary variables.
Restructure how you process or store the data to fundamentally use less memory.

can objects be marked not to be garbage collected?

I don't know too much about the JavaScript garbage collector, just that it attempts to manage references so that unreferenced objects can be periodically purged from memory. I was thinking about something that I thought might improve performance if it was feasible by the language implementers.
It would go something like this. In a file add a line:
"no gc";
This is similar to the use strict setting. It would mark everything defined in the file as not for garbage collection. I'm thinking this would be used in libraries like jQuery and underscore. All of the helper methods would be marked and stored in a separate area of memory that is not managed by the GC.
While I know this might end up keeping around stuff that is not ever used; it would at least isolate it from periodic GC process. So while we perhaps gobble up some extra memory, we at least lighten the load of GC processing.
I apologize for the naivety of this suggestion as I have never implemented GC. I am just wondering if this idea is feasible or if JavaScript somehow does this already.

if you want to keep them as cache then you have global scope.
In browser global scope is window,
hence consider if you dont want object X to never get garbage collected then simply you can write
window.nogc = X;
since window which is global scoped ,will be never garbage collected so its child references also wont be garbage colleted until we explicitly make it.

Garbage collection only runs when the thread is free. Nothing would be saved because GC only occurs when the system isn't busy.
So no, this isn't possible.

You can ensure that an object does not get collected by referencing it from a GC root, but not that it doesn't get processed by the GC.
The reason is that GC in JS VMs is usually implemented via Mark-and-Sweep, or a method that is functionally equivalent. The basic premise is that the GC goes through a cycle that goes like this:
The GC marks all objects in the heap as being "potentially available for release". This is usually done by a flag toggle which changes the interpretation of the existing marks on objects from meaning "need to keep" to "safe to release". So no actual iteration over the objects occurs at this stage. Telling the GC to "not mark" certain objects would actually require extra operations, not less.
The GC starts at the GC roots, and traverse through the reference tree, changing the mark of objects from "safe to release" to "need to keep". This is the "Mark" phase. GC roots can be global objects, the currently executing stack, pending callbacks on the event loop, etc. The tree-traversal itself can be done in various way as well, with DFS being the simplest.
The GC goes over all objects in the heap, and removes any that is still marked "safe to release". This is the "Sweep" phase. Many optimizations exist in this phase, which allows the GC to free memory used by a group of objects in a single operation. Nevertheless, this requires at least some level of iteration, over groups of objects, if not over objects themselves.
Now here's the issue with setting a "non-GC" arena: Suppose that an object in your "non-GC" arena was to reference a regular object. During the "Mark" phase, if that object is not marked, it would be released. That means that all of your "non-collectible" objects need to be GC roots in order to "keep" regular objects to which they refer. And being a GC root offers no performance advantage over being directly referenced by a GC root. Both must equally participate in the "Mark" phase.
The only alternative is for your non-collectible object to not be able to strongly reference collectible objects at all. This can be achieved by having your own binary heap, rather than using native JS objects. The binary data would not be interpreted as references, and the heap object would finish its marking phase in a single operation. This is what asm.js does: It preallocates a large array to act as its internal heap. To the VM, the entire array counts as one big object, and therefore no garbage collection is done on any data structures that are encoded within it. Using this method does have the flaw that you need to encode all of your objects and data structures into the heap's binary format, and decode them when you want to use them. When working with asm.js, this is handled by the compiler.

in JavaScript if you want to kill a object, you have to delete all the references.
let's say we have a object like this:
var myobj={ obj : {} };
and we want to GC the obj object, we have to do this:
myobj["obj"] = null;
delete myobj["obj"];
but if you define another variable referenced to that object, it won't be killed with those line. for instance if you do:
var obj = myobj.obj;
if you really want to delete the object, you have do the same for this variable:
obj = null;
delete obj;
then if there is no reference it would be killed.
in other word if you want to prevent your object from GC, create a reference some where and keep it private.

Garbage Collection in Chrome

I'm having some issues with garbage collection in Chrome. I have some AJAX code that retrieves a large number of objects from a web service (in the tens of thousands), and it then transforms the data into various objects. At some point shortly after the response is received, the JS hangs for around 7 seconds while Chrome does garbage collection.
I want to delay GC until after my code finishes running. I thought saving a reference to the original JSON objects that were returned by the service and then disposing of it later would do the trick, but it hasn't had any effect, the GC still occurs right after the AJAX response arrives. When I try to take a Heap Snapshot, to verify this is what's causing the GC, Chrome crashes (something it's really good at doing, I might add...)
A couple related questions:
Does Chrome not use a separate thread for GC?
Is there anything I can do to delay the GC until after my code has finished running?

Firefox console.log preventing JavaScript String from getting garbage collected?

With the following code whenever console.log is enabled the String referenced by o.big will not get garbage collected. As soon as I remove the logging statement the memory for the big String gets freed after the execution of the handler function finishes.
I am using Firefox 9.0.1 and the memory profiling was done with about:memory.
$(function() {
var handler = function() {
var o = {};
o.big = (new Array(20*1024*1024)).join("x");
console.log(o.big);
delete o.big;
};
$("#btn").click(handler);
});
I am fairly new to JavaScript and it would be great if someone could point out to me why the String does not get marked by gc if used within console.log.

Although I'm not too familiar with Firefox's / Firebug's handling of console.log() I assume that the console showing the "logged" object provides a way of examining and interacting with it. This is at least the case for Chrome.
Therefore, the console needs a reference to the object which will be kept in memory and cannot be garbage collected until the console releases the reference (which may not happen until the page hosting the script is reloaded).
Finally, keep in mind that there is no explicit relationship between the delete operator and garbage collection.

Not sure if Firefox keeps a reference to the original string there. I'd argue that console.log() retains a copy, since strings are first-class members in JS.
You can see the string-chars memory usage dropping in about:memory, but heap-unclassified rising. This may be related to https://bugzilla.mozilla.org/show_bug.cgi?id=563700, or it may mean that FF's GC is broken.

We Keep Coding

JavaScript is the programming language of the Web.