My question is why do we need sets in ES6 and what is their actual difference from an array of strings? Can you define an example where a set is more accurate than an array of strings? I get it that maps save you the trouble of messing with objects, but sets just seem to serve no purpose.
The purpose of a Set is to enforce uniqueness. If the value you try to put in there already exists, then you will still just have one entry, not two like you would when pushing to an array. Additionally, trying to check if a set contains a certain element is a quick operation (constant time, aka O(1)), while trying to do the same with an array is slower (linear time, aka O(N)).
Related
I have two 2D arrays of ints and are the same length, but they are very large. I want to find if there exists at least one difference between the two arrays.
Note: I don't need to find out what the differences are, I just need to return true if there is at least one difference else false.
Right now I'm using two for loops to iterate through the indices and check if arr1[i][j] !== arr2[i][j], but this is taking over 60 seconds worst case due to size.
Is there a better way to make this comparison?
As long as you maintain the data structures in question (nested arrays), you can't get any faster than this, you have to check every element in general.
A few particular optimizations that you may conditionally apply however:
I'm fairly certain that flat arrays are faster to index, and you can use i * width + j to maintain your accesors similar, though obviously not identical.
If one of the arrays doesn't change often, and is so a reference you're checking against, you'll likely get very good results by hashing. You store the hash of your reference array every time you change it (again, it has to not be often), and every time you need to run your check against a new array you can hash he tested array and check it against the other calculated hash. Note that this can give false positives, so if the hashes match you need to actually check every element to make sure it actually is an equality.
And note that the above aren't either-or, implementing both will give you very good results!
I've got a list of 100,000 items that live in memory (all of them big ints stored as strings).
The data structure these come in doesn't really matter. Right now they live in an array like so:
const list = ['1','2','3'...'100000'];
Note: The above is just an example - in reality, each entry is an 18 digit string.
I need to check for an object's existence. Currently I'm doing:
const needToCheck = '3';
const doesInclude = list.includes(needToCheck);
However there's a lot of ways I could do this existence check. I need this to be as performant as possible.
A few other avenues I could follow are:
Create a Map with the value being undefined
Create an object ({}) and create the keys of the object as the entries in list, then use hasOwnProperty.
Use a Set()
Use some other sort of data structure (a tree?) due to the fact that these are all numbers. However, due to the fact that these are all 18 digits in length, so maybe that'll be less performant.
I can accept a higher upfront cost to build the data structure to get a bigger speed increase later, as this is for a URL route that will be hit >1MM times a day.
Array.prototype.includes is an O(n) operation, which is not desirable - every time you want to check whether a value exists, you'll have to iterate over much of the collection (perhaps the entire collection).
A Map, Set, or object are better, since checking whether they have a value is an O(1) operation.
A tree is not desirable either, because lookup will necessarily take a number of operations down the tree, which could be an issue if the tree is large and you want to lookup frequently - so the O(1) solution is better.
A Map, while it works, probably isn't appropriate because you just want to see if a value exists - you don't need key-value pairs, just values. A Set is composed of only values (and Set.has is indeed O(1)), so that's the best choice for this situation. An object with keys, while it could work too, might not be a good idea because it may create many unnecessary hidden classes - a Set is more designed towards dynamic values at runtime.
So, the Set approach looks to be the most performant and appropriate choice.
You might also consider the possibility of moving the calculation to the server. 100,000 items isn't necessarily too much, but it's still a surprisingly large amount to see client-side.
Unconventionally, you could also use an object and set each of your 100,000 items as a property because under the hood, the JavaScript Object is implemented with a hash table.
For example,
var numbers = {
"1": 1243213,
"2": 4314121,
"3": 3142123
...
}
You could then very quickly check if an item existed by checking if numbers["1"] === undefined. And not only that, but you can also get the value of of the property at the same time.
However, this method does come with some drawbacks like iterating through the list becoming a lot more complicated (though still possible).
For reference, see https://stackoverflow.com/a/24196259/8250558
So, I just learnt about python's implementation of a hash-table, which is dictionary.
So here are what I understand so far, please correct me if I'm wrong:
A dictionary is basically a structured data that contains key-value pairs.
When we want to search for a key, we can directly call dict[key]. This is possible because python does a certain hash function on the key. The hash results is the index of the value in the dictionary. This way, we can get to the value directly after doing the hash function, instead of iterating through a list.
Python will update the hash-table by increasing the amount of 'buckets' when the hash-table is filled 2/3rd of its maximum size.
Python will always ensure that every 'buckets' will only have 1 entry in it so that the performance on lookup will be optimal, no iterations needed.
My first question is, am I understanding python dictionary correctly?
Second, does the javascript object also has all these 4 features? If not, is there another built-in javascript implementantion of dictionary/hash-table in general?
JavaScript Objects can be used as dictionaries, but see Map for details on a JavaScript Map implementation. Some key takeaways are:
The Object prototype can potentially cause key collisions
Object keys can be Strings or Symbols. Map keys can be any value.
There is no direct means of determining how many "map" entries an Object has, whereas Map.prototype.size tells you exactly how many entries it has.
As a rule of thumb: if you're creating what is semantically a collection (an associative array), use a Map. If you've got different types of values to store, use an Object.
TL:DR
Beside the use of the convent Array helper functions (which I could theoretically create for objects), and considering the performance advantage of Object lookups, what reason could be given to use an Array instead of an Object?
Objects
From what I understand, because JavaScript objects use hash tables to lookup their key -> data pairs, the look-up time, no matter the length of the object is very small.
For example if I want a really fast dictionary look up, in the past I've (and we can condense the syntax but that's besides the point) stored dictionary data in JSON as
"apple" : "apple",
and then used
if (Dictionary.apple) console.log("Yep it's a word!");
And the result return very very fast regardless of whether my dictionary contains 30,000 words or 300,000.
Arrays
On the other hand, unless I know the number an array item is attached to, I have to loop through the entire array, causing larger lookup times the further the item is down the list.
The good thing I know of about using an array is that I get access to convenient functions such as slice, but these could probably be created for use with objects.
My Question
So, considering the lookup efficiency of objects, I'd currently choose an object over an array for every situation. But I could easily be wrong about this.
Beside the use of the convent Array helper functions (which I could theoretically create for objects), and considering the performance advantage of Object lookups, what reason could be given to use an Array instead of an Object?
You're comparing apples to oranges here. If you need to map from arbitrary string keys to values, as in your example with "apple", then you use an object. (In ES2015, you might alternatively use a Map instance.)
If you have a whole bunch of oranges, and you want to keep them in a list numbered from 0, you put the oranges in an array and index by which (numbered) orange you want.
The process of locating a property on an object is the same whether the object is a plain Object instance or an Array instance. In modern JavaScript runtime environments, it's safe to assume that the process for looking up number-indexed array properties is appropriately optimized to be even faster than the hash lookup for arbitrary string-named properties. That, however, is a completely separate issue from the nature of the work you need to do and the choice of data structure. Either you have a list of things, such that the order of the things in the list is the salient relationship between them, or you have named things that you need to access by those names. The two situations are conceptually different.
One big difference is the order of elements.
Looping through objects keys can't guarantee any specific order.
Looping through array keys will always give you the same order of elements.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Is there a library for a Set data type in Javascript?
Is there a way to create a JavaScript data structure that mimics a c++ set? I need to perform searches in log(n) time, but can't find anything in the language that serves well. I've seen a couple of questions saying that I should represent the set as an object. Will that work? The key and payload of the array are numbers.
For unordered sets, you'll probably be better off with a hash table implementation. These do O(1) lookups, so long as the hash table doesn't get overloaded.
For ordered, in-memory sets, the standard answers seem to be treaps (good average time, high standard deviation) and red-black trees (poor average time, low standard deviation). These are both O(logn) lookup.
It will have to, in javascript everything is an object.
If you need ordered set (which allows you to loop from the smallest element to the largest element, in the order you define), you can implement your own data structure in JS. I can't give more information on this, since I haven't done this first hand.
If you are satisfied with unordered set, you can implement it as followed:
Define a normalized String representation of the object to be stored in the set. If you need a set with number, just use the string representation of the number. If you need a set of user-defined object, you can pick out the properties that defines the identity of the object, and use them in the normalized String representation.
Create a JS Object to use as set by mapping normalized String representation to the actual object.
Searching can be done by checking the property name with the String representation.
Insertion to the set can be done by: first check whether the String representation is already there by searching, then map the String representation to the actual object if the object is not yet in the set.
Deletion can be done by using delete, with the property name being the String representation of the object to be deleted.