Javascript memory management: delete and splice

Javascript memory management: delete and splice - javascript

When I remove items from an array in javascript using the splice method, an array of the removed items are returned.
var a = [{name:'object1'},{name:'object2'},{name:'object3'}];
// a.splice(0,2) -> [{name:'object1'},{name:'object2'}]
// Where do these guys live now? Are they really gone?
Do I then need to call 'delete' on those returned objects to make sure they are taken out of memory? Does the garbage collector just handle this? Can I trust that?

The objects are 'gone' (from your perspective) and the GC will actually free the memory when it deems appropriate. JavaScript does not give you explicit control over garbage collection.
If you're concerned about performance, it's generally better (after profiling, of course) to focus on saving allocations rather than worrying about when exactly things will get GC'd, since that behavior will change depending on which JS engine you're on.

Related

How do I make use of weak references in javascript?

I have some very large objects that are used intensively, but occasionally in my Node.JS program. Loading these objects are expensive. In total they occupy more memory than I have in the system.
Is there any way to create a "weak reference" in JavaScript, so that the garbage collector will remove my objects when memory is low, and then I can check whether the object is and reload it again if it was garbage collected since my last access?
The particular use case I had in mind was cartographic reprojection and tiling of gigabytes of map imagery.

Is there any way to create a "weak reference" in Javascript, so that the garbage collector will remove my objects when memory is low?
No, Javascript does not have that.
I don't think a weakMap or weakSet will offer you anything useful here. They don't do what you're asking for. Instead, they allow you to have a reference to something that will NOT prohibit garbage collection. But, if there are no other references to the data, then it will be garbage collected immediately. So, they won't keep the data around for awhile like you want. If you have any other reference to those objects (to keep them around), then they will never get garbage collected. Javascript doesn't offer a weak reference that is only garbage collected when memory starts to get full. Something is either eligible for garbage collection or it isn't. If it's eligible, it will be freed in the next GC pass.
It sounds like what you probably want is a memory cache. You could decide how large you want the cache to be and then keep items in the cache based on some strategy. The most common strategy is LRU (least recently used) where you kick an item out of the cache when you reach the cache size limit and you need to load a new item in the cache. With LRU, you keep track of when an item was last used from the cache and you kick out the oldest one. If you are trying to manage the cache to a memory usage size, you will have to have some scheme for estimating the memory usage of your objects in the cache.
Note that many databases will essentially offer you this functionality as a built-in feature since they will usually contain some sort of caching themselves so if you request an item from the database that you have recently requested, it probably comes from a read cache. You don't really say what your objects are so it's hard for us to suggest exactly how they could be used from a database.

Why is 'delete' slow in javascript?

I just stumbled upon this jsperf result: http://jsperf.com/delet-is-slow
It shows that using delete is slow in javascript but I am not sure I get why. What is the javascript engine doing behind the scene to make things slow?

I think the question is not why delete is slow... The speed of a simple delete operation is not worth measuring...
The JS perf link that you show does the following:
Create two arrays of 6 elements each.
Delete at one of the indexes of one array.
Iterate through all the indexes of each array.
The script shows that iterating through an array o which delete was applied is slower than iterating though a normal array.
You should ask yourself, why delete makes an array slow?
The engine internally stores array elements in contiguous memory space, and access them using an numeric indexer.
That's what they call a fast access array.
If you delete one of the elements in this ordered and contiguous index, you force the array to mutate into dictionary mode... thus, what before was the exact location of the item in the array (the indexer) becomes the key in the dictionary under which the array has to search for the element.
So iterating becomes slow, because don't move into the next space in memory anymore, but you perform over and over again a hash search.

You'll get a lot of answers here about micro-optimisation but delete really does sometimes have supreme problems where it becomes incredibly slow in certain scenarios that people must be aware of in JS. These are to my knowledge edge cases and may or may not apply to you.
I recommend to profile and benchmark in different browsers to detect these anomalies.
I honestly don't know the reasons as I tend to workaround it this but I would guess combinations of quirks in the GC (it is might be getting invoked too often), brutal rehashing, optimisations for other cases and weird object structure/bad time complexity.
The cases usually involve moderate to large numbers of keys, for example:
Delete from objects with many keys:
(function() {
var o={},s,i,c=console;
s=new Date();for(i=0;i<1000000;i+=10)o[i]=i;c.log('Set: '+(new Date()-s));
s=new Date();for(i=0;i<50000;i+=10)delete(o[i]);c.log('Delete: '+(new Date()-s));})();
Chrome:
Set: 21
Delete: 2084
Firefox:
Set: 74
Delete: 2
I have encountered a few variations of this and they are not always easy to reproduce. A signature is that it usually seems to degrade exponentially. In one case in Firefox delete inside a for in loop would degrade to around 3-6 operations a second where as deleting when iterating Object.keys would be fine.
I personally tend to think that these cases can be considered bugs. You get massive asymptotic and disproportionate performance degradation that you can work around in ways that shouldn't change the time or space complexity or that might even normally make performance moderately worse. This means that when considered as a declarative language JS gets the implementation/optimisations wrong. Map does not have the same problem with delete so far that I have seen.
Reasons:
To be sure, you would have to look into the source code or run some profiling.
delete in various scenarios can change speed arbitrarily based on how engines are written and this can change from version to version.
JavaScript objects tend to not be used with large amounts of properties and delete is called relatively infrequently in every day usage. They're also used as part of the language heavily (they're actually just associative arrays). Virtually everything relies on an implementation of an object. IF you create a function, that makes an object, if you put in a numeric literal it's actually an object.
It's quite possible for it to be slow purely because it hasn't been well optimised (either neglect or other priorities) or due to mistakes in implementation.
There are some common possible causes aside from optimisation deficit and mistakes.
Garbage Collection:
A poorly implemented delete function may inadvertently trigger garbage collection excessively.
Garbage collection has to iterate everything in memory to find out of there are any orphans, traversing variables and references as a graph.
The need to detect circular references can make GC especially expensive. GC without circular reference detection can be done using reference counters and a simple check. Circular reference checking requires traversing the reference graph or another structure for managing references and in either case that's a lot slower than using reference counters.
In normal implementations, rather than freeing and rearranging memory every time something is deleted, things are usually marked as possible to delete with GC deferred to perform in bulk or batches intermittently.
Mistakes can potentially lead to the entire GC being triggered too frequently. That may involve someone having put an instruction to run the GC every delete or a mistake in its statistical heuristics.
Resizing:
It's quite possible for large objects as well the memory remapping to shrink them might not be well optimised. When you have dynamically sized structures internally in the engine it can be expensive to resize them. Engines will also have varying levels of their own memory management on top of that of that provides by the operating system which will significantly complicate things.
Where an engine manages its own memory efficiently, even a delete that deletes an object efficiently without the need for a full GC run (has no circular references) this can trigger the need to rearrange memory internally to fill the gap.
That may involve reallocating data of the new size, then copying everything from the old memory to the new memory before freeing the old memory. It can also require updating all of the pointers to the old memory in some cases (IE, where a pointer pointer [all roads lead to Rome] wasn't used).
Rehashing:
It may also rehash (needs as the object is an associative array) on deletes. Often people only rehash on demand, when there are hash collisions but some engines may also rehash on deletes.
Rehashing on deletes prevents a problem where you can add 10 items to an object, then add a million objects, then remove those million objects and the object left with the 10 items will both take up more memory and be slower.
A hash with ten items needs ten slots with an optimum hash, though it'll actually require 16 slots. That times the size of a pointer will be 16 * 8 bytes or 128bytes. When you add the million items then it needs a million slots, 20 bit keys or 8 megabytes. If you delete the million keys without rehashing it then the object you have with ten items is taking up 8 megabyte when it only needs 128 bytes. That makes it important to rehash on item removal.
The problem with this is that when you add you know if you need to rehash or not because there will be a collision. When deleting, you don't know if you need to rehash or not. It's easy to make a performance mistake with that and rehash every time.
There are a number of strategies for reasonable downhashing intervals although without making things complicated it can be easy to make a mistake. Simple approaches tend to work moderately well (IE, time since last rehash, size of key versus minimum size, history of collision pairs, tombstone keys and delete in bulk as part of GC, etc) on average but can also easily get stuck in corner cases. Some engines might switch to a different hash implementation for large objects such as nested where as others might try to implement one for all.
Rehashing tends to work the same as resizing for simply implementations, make an entirely new one then insert the old one into it. However for rehashing a lot more needs to be done beforehand.
No Bulk:
It doesn't let you delete a bunch of things at once from a hash. It's usually much more efficient to delete items from a hash in bulk (most operations on the same thing work better in bulk) but the delete keyword only allows you to delete one by one. That makes it slow by design for cases with multiple deletes on the same object.
Due to this, only a handful of implementation of delete would be comparable to creating a new object and inserting the items you want to keep for speed (though this doesn't explain why with some engines delete is slow in its own right for a single call).
V8:
Slow delete of object properties in JS in V8
Apparently this was caused due to switching out implementations and a problem similar to but different to the downsizing problem seen with hashes (downsizing to flat array / runthrough implementation rather than hash size). At least I was close.
Downsizing is a fairly common problem in programming which results in what is somewhat akin to a game of Jenga.

can objects be marked not to be garbage collected?

I don't know too much about the JavaScript garbage collector, just that it attempts to manage references so that unreferenced objects can be periodically purged from memory. I was thinking about something that I thought might improve performance if it was feasible by the language implementers.
It would go something like this. In a file add a line:
"no gc";
This is similar to the use strict setting. It would mark everything defined in the file as not for garbage collection. I'm thinking this would be used in libraries like jQuery and underscore. All of the helper methods would be marked and stored in a separate area of memory that is not managed by the GC.
While I know this might end up keeping around stuff that is not ever used; it would at least isolate it from periodic GC process. So while we perhaps gobble up some extra memory, we at least lighten the load of GC processing.
I apologize for the naivety of this suggestion as I have never implemented GC. I am just wondering if this idea is feasible or if JavaScript somehow does this already.

if you want to keep them as cache then you have global scope.
In browser global scope is window,
hence consider if you dont want object X to never get garbage collected then simply you can write
window.nogc = X;
since window which is global scoped ,will be never garbage collected so its child references also wont be garbage colleted until we explicitly make it.

Garbage collection only runs when the thread is free. Nothing would be saved because GC only occurs when the system isn't busy.
So no, this isn't possible.

You can ensure that an object does not get collected by referencing it from a GC root, but not that it doesn't get processed by the GC.
The reason is that GC in JS VMs is usually implemented via Mark-and-Sweep, or a method that is functionally equivalent. The basic premise is that the GC goes through a cycle that goes like this:
The GC marks all objects in the heap as being "potentially available for release". This is usually done by a flag toggle which changes the interpretation of the existing marks on objects from meaning "need to keep" to "safe to release". So no actual iteration over the objects occurs at this stage. Telling the GC to "not mark" certain objects would actually require extra operations, not less.
The GC starts at the GC roots, and traverse through the reference tree, changing the mark of objects from "safe to release" to "need to keep". This is the "Mark" phase. GC roots can be global objects, the currently executing stack, pending callbacks on the event loop, etc. The tree-traversal itself can be done in various way as well, with DFS being the simplest.
The GC goes over all objects in the heap, and removes any that is still marked "safe to release". This is the "Sweep" phase. Many optimizations exist in this phase, which allows the GC to free memory used by a group of objects in a single operation. Nevertheless, this requires at least some level of iteration, over groups of objects, if not over objects themselves.
Now here's the issue with setting a "non-GC" arena: Suppose that an object in your "non-GC" arena was to reference a regular object. During the "Mark" phase, if that object is not marked, it would be released. That means that all of your "non-collectible" objects need to be GC roots in order to "keep" regular objects to which they refer. And being a GC root offers no performance advantage over being directly referenced by a GC root. Both must equally participate in the "Mark" phase.
The only alternative is for your non-collectible object to not be able to strongly reference collectible objects at all. This can be achieved by having your own binary heap, rather than using native JS objects. The binary data would not be interpreted as references, and the heap object would finish its marking phase in a single operation. This is what asm.js does: It preallocates a large array to act as its internal heap. To the VM, the entire array counts as one big object, and therefore no garbage collection is done on any data structures that are encoded within it. Using this method does have the flaw that you need to encode all of your objects and data structures into the heap's binary format, and decode them when you want to use them. When working with asm.js, this is handled by the compiler.

in JavaScript if you want to kill a object, you have to delete all the references.
let's say we have a object like this:
var myobj={ obj : {} };
and we want to GC the obj object, we have to do this:
myobj["obj"] = null;
delete myobj["obj"];
but if you define another variable referenced to that object, it won't be killed with those line. for instance if you do:
var obj = myobj.obj;
if you really want to delete the object, you have do the same for this variable:
obj = null;
delete obj;
then if there is no reference it would be killed.
in other word if you want to prevent your object from GC, create a reference some where and keep it private.

How do you clear memory in Javascript?

var Obj = function(){}; var X = new Obj();
will X = null properly clear memory?
Also would this be equivalent?
var Obj = function(){};
var X = {};
X.obj = new Obj();
delete(X.obj);
EDIT
It would seem that although deleting X.obj would NOT immediately clear memory, it would help the garbage collection. If I don't delete X.obj, there would still be a pointer to an object and so the GC may not clean it up.
Although I'm picking #delnan's answer, if you're reading this you should def also catch Benubird's article.
I also notice I accidentally wrote delete(X) originally instead of delete(X.obj) - sorry.

The short answer is that you don't. delete simply removes a reference (and not in the way you try to use it, see the above link - delete is one of those language features few people actually understand), nothing more. The implementation clears memory for you, but it's not your business when (and even if, strictly speaking - this is why one shouldn't rely on finalizers in GC'd languages that offer them) it does. Note though:
Only objects that can be proven to be unreachable (i.e. no way to access it) to all code can be removed. What keeps references to whom is usually fairly obvious, as least conceptually. You just need to watch out when dealing with lots of closures, as they may capture more variables than you think. Also note that circular references are cleaned up properly.
There's a bug in old (but sadly still used) IE versions involving garbage collection of JS event handlers and DOM elements. Google (perhaps even SO) should have better material on my memory.
On the plus side, that means you won't get dangling pointer bugs or (save of course the aforementioned pitfalls) memory leaks.

No, that will not clear memory.
Read this:
http://perfectionkills.com/understanding-delete/

No - Javascript runs GC when it feels like it.

The Delete method only deletes the reference - not the object. Any other references would be left out in the open waiting for the garbage collector.
JavaScript has its own GC, and it will run around and clean things up when nothing refers to them anymore.
I still think it's a good practice to null objects.
Deleteing an object also helps the GC because it will see something dangling, and say "I'm going to eat you because you're all alone (and now some cynical laugh)".
You should look at Deleting Objects in JavaScript
Even though there's a GC, you still want to ensure your script is optimized for performance as peoples computers, browsers, and fricken toolbars (and the number of them), will vary.

Generally speaking, memory management in Javascript is user-agent-specific. The basics of the garbage collector are through reference-counting. So, by setting a reference to null (using the delete keyword or by explicit assignment), you can assure yourself that a reference will be cleaned up, IF the object does not have any references that will live outside of its creation scope. That being the case, the GC will have already cleaned up any objects or variables whose scope has ended without your explicitly setting it to null.
There are some things to take care of, though - circular references are easy to create in JS, especially between a DOM element and an object. Care must be taken to clear (or not create in the first place) references to and/or from DOM elements within objects. If you do create a to/from reference related to DOM, be sure to explicitly clean them up by setting the references to null - both on your object and on the DOM element. Simply setting a parent object to null is not sufficient if there are child objects with references to/from DOM or localStorage because those references will live on, and if there was any reference from the child to the parent, then the parent will live on in memory because of that reference.
Web pages can actually leak trash in your memory this way - after you navigate away, the circular references keep objects and DOM elements in memory until you've restarted the browser!
An article on the subject: http://docstore.mik.ua/orelly/webprog/jscript/ch11_03.htm, and another detailed look: http://blogs.msdn.com/b/ericlippert/archive/2003/09/17/53038.aspx

JavaScript memory is generally handled similarly to Java - I mean there is (or there should be) a garbage collector which would delete the object if there is no references to it. So yes, simply "nullifying " the reference is the only way you should "handle" freeing memory, and the real freeing is the JS host part.

Does JavaScript have a memory heap?

In other words, what options do I have to allocate memory in JavaScript?
I know you can allocate memory either globally, or inside function scope. Can I allocate memory dynamically? What does the new operator really mean?
Edit: here's a specific example. How would you implement reading an integer value from the user - n, and then read n integers into an array?

you can't allocate memory. you can create objects. that's what new does.
now, javascript is a queer creature: functions are also objects in javascript. So this mean that you can instantiate prettymuch everything using new.
So, the new operator means that a new object is being created.
Javascript also garbage-collects these variables, just like it happens in java. So if you know java, it should be easy for you to draw parallels.
cheers,
jrh
PS: when you allocate objects, you really are allocating memory. Only, you are not doing that explicitly. You can allocate an array, and make it behave like a memory buffer, but that will degrade javascript performance drastically: javascript arrays are not in-memory buffers, they are also objects (like everything else).

JavaScript has garbage collection and handles this for you.
However, you can help it by using the delete operator where appropriate.
From the Apple JavaScript Coding Guidelines:
Just as you used the new operator to
create an object, you should delete
objects when you are finished with
them, like this:
delete myObjectVariable;
The JavaScript runtime automatically
garbage collects objects when their
value is set to null. However, setting
an object to null doesn’t remove the
variable that references the object
from memory. Using delete ensures that
this memory is reclaimed in addition
to the memory used by the object
itself. (It is also easier to see
places where your allocations and
deallocations are unbalanced if you
explicitly call delete.)
Steve

Hmmm sounds to me like you are coming from the memory focused language and trying to shoe horn that logic into JS. Yes JS uses memory (of course), but we have garbage collection to take care of cleaning it all up.
If you are after specifics about the guts of memory allocation then you will have to hunt around for that. But as a rule thumb, when you use var, new or declaring a new function (or closure) you are gobbling up memory. You can get vars to null to flag them for garbage collection and you can use the delete keyword too although few do either of these unless they work Server-side (like myself with ASP JScript) where its important.

Javascript is really, really friendly — really, too friendly by half!
If you have an array of 3 elements, and you want to add a fourth, you can just act as if that array location already exists:
var arr = ['zero', 'one', 'two'];
// Now you have arr[0], arr[1] and arr[2].
// arr.length is equal to 3.
.
// to add arr[8]:
arr[8] = 'eight';
// Now you have arr[0] through arr[8]. arr.length is equal to 9.
// and arr[3] through arr[7] exist, and
// are initialized to undefined. (If I remember right.)
So being really specific with memory allocation is unnecessary when adding elelments.

No, you don’t need to and can’t allocate memory. The JavaScript interpreter does that automatically.

To answer the title of the question, if you are to trust in MDN, most JavaScript implementations have a heap:
Heap
Objects are allocated in a heap which is just a name to denote a large
mostly unstructured region of memory.
Several Runtimes Communicating Together
A web worker or a cross-origin iframe has its own stack, heap, and
message queue. Two distinct runtimes can only communicate through
sending messages via the postMessage method. This method adds a
message to the other runtime if the latter listens to message events.
For a deeper dive into memory management, there is also an article here although much of this is implementation specific.

You do not need to manually manage memory in Javascript. Both heap and stacks are used under the hood to manage memory and it depends on the implementation. Usually, local variables are on the stack and objects are on the heap.

We Keep Coding

JavaScript is the programming language of the Web.