So, I just learnt about python's implementation of a hash-table, which is dictionary.
So here are what I understand so far, please correct me if I'm wrong:
A dictionary is basically a structured data that contains key-value pairs.
When we want to search for a key, we can directly call dict[key]. This is possible because python does a certain hash function on the key. The hash results is the index of the value in the dictionary. This way, we can get to the value directly after doing the hash function, instead of iterating through a list.
Python will update the hash-table by increasing the amount of 'buckets' when the hash-table is filled 2/3rd of its maximum size.
Python will always ensure that every 'buckets' will only have 1 entry in it so that the performance on lookup will be optimal, no iterations needed.
My first question is, am I understanding python dictionary correctly?
Second, does the javascript object also has all these 4 features? If not, is there another built-in javascript implementantion of dictionary/hash-table in general?
JavaScript Objects can be used as dictionaries, but see Map for details on a JavaScript Map implementation. Some key takeaways are:
The Object prototype can potentially cause key collisions
Object keys can be Strings or Symbols. Map keys can be any value.
There is no direct means of determining how many "map" entries an Object has, whereas Map.prototype.size tells you exactly how many entries it has.
As a rule of thumb: if you're creating what is semantically a collection (an associative array), use a Map. If you've got different types of values to store, use an Object.
Related
How do relatively modern languages such as ruby/python/js etc may store multiple data types in arrays and are still able to access any element from the array using its index in O(1) time?
As far as I understand, we do simple mathematics to determine the memory address pointing to any element, and we do so by the index multiplied by the size of each element of the array.
Firstly, neither the Ruby Language Specification nor the Python Language Specification nor the ECMAScript Language Specification prescribe any particular implementation strategy for arrays (or lists as they are called in Python). Every implementor is free to implement them however they wish.
Secondly, lumping them all together doesn't make much sense. For example, in ECMAScript, arrays are really just objects with numeric properties, and actually, those numeric properties aren't even really numeric, they are strings.
Third, they don't really store multiple data types. E.g. Ruby only has one data type: objects. Since everything is an object, everything has the same type, so there is no problem storing objects in arrays.
Fourth, at least the Ruby Language Specification does not actually guarantee that array access is O(1). It is highly likely that a Ruby Implementation which does not provide O(1) access would be rejected by the community, but it would not violate any spec.
Now, of course, any implementor is allowed to be as clever as they want to be. E.g. V8 detects when all values of an array are numbers and then stores the array differently. But that is a private internal implementation detail of V8.
In memory, a heterogeneous array is an array of pointers. Every array element stores the memory address of the item at that position in the array.
Since memory addresses are all the same size, you can find the address of each address by multiplying the array index by the address size, and adding it to the base address of the array.
I am a beginner to Javascript and learning the concepts. I came across the below code snippet as part of my learning process.
<script>
var data=[];
data.push("100");
data.push(100);
var object=[];
object.string="100";
object.number=100;
data.push(object);
console.log(data);
</script>
The above code snippet defines an Array and pushes a String, Number and an Object into it. I think this violates the definition of an Array which reads as follows:
Array is a container which can hold a fix number of items and these
items should be of the same type.
I would like to know if Array is the right term to be used in this case as the elements are not of the same type.
You understand/read them incorrect (in case of javascript).
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array
Arrays are list-like objects whose prototype has methods to perform traversal and mutation operations. Neither the length of a JavaScript array nor the types of its elements are fixed.
And yes, this definition is purely for Javascript. For ex: Java arrays are strictly type based.
The "should be" here most likely refers to the fact that logically homogeneous collections are easier to deal with than heterogeneous collections. If you have an array of numbers, you can do aggregate operations such as summing them easily; if you have a mixed array of stuff, you need to pick it apart one by one and can't easily do anything with it. That's often a sign of a badly structured program.
There's no technical reason why collections must be homogeneous, especially in a dynamically typed language like Javascript (more so in statically typed languages).
TL:DR
Beside the use of the convent Array helper functions (which I could theoretically create for objects), and considering the performance advantage of Object lookups, what reason could be given to use an Array instead of an Object?
Objects
From what I understand, because JavaScript objects use hash tables to lookup their key -> data pairs, the look-up time, no matter the length of the object is very small.
For example if I want a really fast dictionary look up, in the past I've (and we can condense the syntax but that's besides the point) stored dictionary data in JSON as
"apple" : "apple",
and then used
if (Dictionary.apple) console.log("Yep it's a word!");
And the result return very very fast regardless of whether my dictionary contains 30,000 words or 300,000.
Arrays
On the other hand, unless I know the number an array item is attached to, I have to loop through the entire array, causing larger lookup times the further the item is down the list.
The good thing I know of about using an array is that I get access to convenient functions such as slice, but these could probably be created for use with objects.
My Question
So, considering the lookup efficiency of objects, I'd currently choose an object over an array for every situation. But I could easily be wrong about this.
Beside the use of the convent Array helper functions (which I could theoretically create for objects), and considering the performance advantage of Object lookups, what reason could be given to use an Array instead of an Object?
You're comparing apples to oranges here. If you need to map from arbitrary string keys to values, as in your example with "apple", then you use an object. (In ES2015, you might alternatively use a Map instance.)
If you have a whole bunch of oranges, and you want to keep them in a list numbered from 0, you put the oranges in an array and index by which (numbered) orange you want.
The process of locating a property on an object is the same whether the object is a plain Object instance or an Array instance. In modern JavaScript runtime environments, it's safe to assume that the process for looking up number-indexed array properties is appropriately optimized to be even faster than the hash lookup for arbitrary string-named properties. That, however, is a completely separate issue from the nature of the work you need to do and the choice of data structure. Either you have a list of things, such that the order of the things in the list is the salient relationship between them, or you have named things that you need to access by those names. The two situations are conceptually different.
One big difference is the order of elements.
Looping through objects keys can't guarantee any specific order.
Looping through array keys will always give you the same order of elements.
Object and asscoiative array in php/js is hash map,
so I worry about the reliability if I store many elements in an array.
I worry data will be overwritten when collision occureed.
What hash is used in php/js, and how to find two collision string to test it?
[...] the array's internal hashing mechanism will convert your string
to an integer it can then use to address a bucket within the array.
PHP's arrays aren't true/real arrays - they are some sort of Linked
HashMap internally. Considering that multiple strings can boild down
to the same hash, each bucket is a list itself. If there are multiple
elements within the same bucket, each key has to be evaluated.
Extracted from PHP associative array's keys (indexes) limitations? (rodneyrehm answer)
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Is there a library for a Set data type in Javascript?
Is there a way to create a JavaScript data structure that mimics a c++ set? I need to perform searches in log(n) time, but can't find anything in the language that serves well. I've seen a couple of questions saying that I should represent the set as an object. Will that work? The key and payload of the array are numbers.
For unordered sets, you'll probably be better off with a hash table implementation. These do O(1) lookups, so long as the hash table doesn't get overloaded.
For ordered, in-memory sets, the standard answers seem to be treaps (good average time, high standard deviation) and red-black trees (poor average time, low standard deviation). These are both O(logn) lookup.
It will have to, in javascript everything is an object.
If you need ordered set (which allows you to loop from the smallest element to the largest element, in the order you define), you can implement your own data structure in JS. I can't give more information on this, since I haven't done this first hand.
If you are satisfied with unordered set, you can implement it as followed:
Define a normalized String representation of the object to be stored in the set. If you need a set with number, just use the string representation of the number. If you need a set of user-defined object, you can pick out the properties that defines the identity of the object, and use them in the normalized String representation.
Create a JS Object to use as set by mapping normalized String representation to the actual object.
Searching can be done by checking the property name with the String representation.
Insertion to the set can be done by: first check whether the String representation is already there by searching, then map the String representation to the actual object if the object is not yet in the set.
Deletion can be done by using delete, with the property name being the String representation of the object to be deleted.