This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Is there a library for a Set data type in Javascript?
Is there a way to create a JavaScript data structure that mimics a c++ set? I need to perform searches in log(n) time, but can't find anything in the language that serves well. I've seen a couple of questions saying that I should represent the set as an object. Will that work? The key and payload of the array are numbers.
For unordered sets, you'll probably be better off with a hash table implementation. These do O(1) lookups, so long as the hash table doesn't get overloaded.
For ordered, in-memory sets, the standard answers seem to be treaps (good average time, high standard deviation) and red-black trees (poor average time, low standard deviation). These are both O(logn) lookup.
It will have to, in javascript everything is an object.
If you need ordered set (which allows you to loop from the smallest element to the largest element, in the order you define), you can implement your own data structure in JS. I can't give more information on this, since I haven't done this first hand.
If you are satisfied with unordered set, you can implement it as followed:
Define a normalized String representation of the object to be stored in the set. If you need a set with number, just use the string representation of the number. If you need a set of user-defined object, you can pick out the properties that defines the identity of the object, and use them in the normalized String representation.
Create a JS Object to use as set by mapping normalized String representation to the actual object.
Searching can be done by checking the property name with the String representation.
Insertion to the set can be done by: first check whether the String representation is already there by searching, then map the String representation to the actual object if the object is not yet in the set.
Deletion can be done by using delete, with the property name being the String representation of the object to be deleted.
Related
while I was taking online CS class, the topic was about an overview of how Array and memory works, and the teacher used C as an Example as that you cannot simply add extra element to an existing Array while the Array is already full with element. As a beginner developer who started with JavaScript, even though I know JavaScript is a high level language and Array.push() is already a familiar function to me, but it seems like it doesn't fit such context, out of curiosity, I've search through google and StackOverflow, I just don't see people discussing why JavaScript can just add extra elements to an existing Array.
Does JavaScript simply create a new Array with added element and point the variable I've already assigned to to the new Array or something else?
Arrays in javascript are not the same as an array in C. In C you'll allocate a new array with type and a fixed size and if you want to alter the size you'll have to create a new one. For javascript arrays on the other hand if you have a look at the description of arrays over at MDN you'll see this line.
Arrays are list-like objects whose prototype has methods to perform traversal and mutation operations. Neither the length of a JavaScript array nor the types of its elements are fixed. Since an array's length can change at any time, and data can be stored at non-contiguous locations in the array
So while it is called "array" it is more like a list which lets you resize it freely (and also store anything, because javascript).
There is also another question about the semantics of the push/pop methods here on SO that can shine some more light on those: How does the Javascript Array Push code work internally
A JavaScript Array is not like a C array, it's more like a C++ std::vector. It grows in length by dynamically allocating memory when needed.
As stated in the reference documentation for std::vector:
Vectors usually occupy more space than static arrays, because more memory is allocated to handle future growth. This way a vector does not need to reallocate each time an element is inserted, but only when the additional memory is exhausted. [...] Reallocations are usually costly operations in terms of performance.
How do relatively modern languages such as ruby/python/js etc may store multiple data types in arrays and are still able to access any element from the array using its index in O(1) time?
As far as I understand, we do simple mathematics to determine the memory address pointing to any element, and we do so by the index multiplied by the size of each element of the array.
Firstly, neither the Ruby Language Specification nor the Python Language Specification nor the ECMAScript Language Specification prescribe any particular implementation strategy for arrays (or lists as they are called in Python). Every implementor is free to implement them however they wish.
Secondly, lumping them all together doesn't make much sense. For example, in ECMAScript, arrays are really just objects with numeric properties, and actually, those numeric properties aren't even really numeric, they are strings.
Third, they don't really store multiple data types. E.g. Ruby only has one data type: objects. Since everything is an object, everything has the same type, so there is no problem storing objects in arrays.
Fourth, at least the Ruby Language Specification does not actually guarantee that array access is O(1). It is highly likely that a Ruby Implementation which does not provide O(1) access would be rejected by the community, but it would not violate any spec.
Now, of course, any implementor is allowed to be as clever as they want to be. E.g. V8 detects when all values of an array are numbers and then stores the array differently. But that is a private internal implementation detail of V8.
In memory, a heterogeneous array is an array of pointers. Every array element stores the memory address of the item at that position in the array.
Since memory addresses are all the same size, you can find the address of each address by multiplying the array index by the address size, and adding it to the base address of the array.
I've got a list of 100,000 items that live in memory (all of them big ints stored as strings).
The data structure these come in doesn't really matter. Right now they live in an array like so:
const list = ['1','2','3'...'100000'];
Note: The above is just an example - in reality, each entry is an 18 digit string.
I need to check for an object's existence. Currently I'm doing:
const needToCheck = '3';
const doesInclude = list.includes(needToCheck);
However there's a lot of ways I could do this existence check. I need this to be as performant as possible.
A few other avenues I could follow are:
Create a Map with the value being undefined
Create an object ({}) and create the keys of the object as the entries in list, then use hasOwnProperty.
Use a Set()
Use some other sort of data structure (a tree?) due to the fact that these are all numbers. However, due to the fact that these are all 18 digits in length, so maybe that'll be less performant.
I can accept a higher upfront cost to build the data structure to get a bigger speed increase later, as this is for a URL route that will be hit >1MM times a day.
Array.prototype.includes is an O(n) operation, which is not desirable - every time you want to check whether a value exists, you'll have to iterate over much of the collection (perhaps the entire collection).
A Map, Set, or object are better, since checking whether they have a value is an O(1) operation.
A tree is not desirable either, because lookup will necessarily take a number of operations down the tree, which could be an issue if the tree is large and you want to lookup frequently - so the O(1) solution is better.
A Map, while it works, probably isn't appropriate because you just want to see if a value exists - you don't need key-value pairs, just values. A Set is composed of only values (and Set.has is indeed O(1)), so that's the best choice for this situation. An object with keys, while it could work too, might not be a good idea because it may create many unnecessary hidden classes - a Set is more designed towards dynamic values at runtime.
So, the Set approach looks to be the most performant and appropriate choice.
You might also consider the possibility of moving the calculation to the server. 100,000 items isn't necessarily too much, but it's still a surprisingly large amount to see client-side.
Unconventionally, you could also use an object and set each of your 100,000 items as a property because under the hood, the JavaScript Object is implemented with a hash table.
For example,
var numbers = {
"1": 1243213,
"2": 4314121,
"3": 3142123
...
}
You could then very quickly check if an item existed by checking if numbers["1"] === undefined. And not only that, but you can also get the value of of the property at the same time.
However, this method does come with some drawbacks like iterating through the list becoming a lot more complicated (though still possible).
For reference, see https://stackoverflow.com/a/24196259/8250558
My question is why do we need sets in ES6 and what is their actual difference from an array of strings? Can you define an example where a set is more accurate than an array of strings? I get it that maps save you the trouble of messing with objects, but sets just seem to serve no purpose.
The purpose of a Set is to enforce uniqueness. If the value you try to put in there already exists, then you will still just have one entry, not two like you would when pushing to an array. Additionally, trying to check if a set contains a certain element is a quick operation (constant time, aka O(1)), while trying to do the same with an array is slower (linear time, aka O(N)).
So, I just learnt about python's implementation of a hash-table, which is dictionary.
So here are what I understand so far, please correct me if I'm wrong:
A dictionary is basically a structured data that contains key-value pairs.
When we want to search for a key, we can directly call dict[key]. This is possible because python does a certain hash function on the key. The hash results is the index of the value in the dictionary. This way, we can get to the value directly after doing the hash function, instead of iterating through a list.
Python will update the hash-table by increasing the amount of 'buckets' when the hash-table is filled 2/3rd of its maximum size.
Python will always ensure that every 'buckets' will only have 1 entry in it so that the performance on lookup will be optimal, no iterations needed.
My first question is, am I understanding python dictionary correctly?
Second, does the javascript object also has all these 4 features? If not, is there another built-in javascript implementantion of dictionary/hash-table in general?
JavaScript Objects can be used as dictionaries, but see Map for details on a JavaScript Map implementation. Some key takeaways are:
The Object prototype can potentially cause key collisions
Object keys can be Strings or Symbols. Map keys can be any value.
There is no direct means of determining how many "map" entries an Object has, whereas Map.prototype.size tells you exactly how many entries it has.
As a rule of thumb: if you're creating what is semantically a collection (an associative array), use a Map. If you've got different types of values to store, use an Object.