This implementation from Geeks for Geeks strangely utilize array to implement bfs.
Link: https://www.geeksforgeeks.org/implementation-graph-javascript/
In the step "add the starting node to the queue", it even assigns "startingNode" as an index of the "visited" array, which sounds very off(it would make a bit more sense if it used object as the type). I was wondering if they incorrectly wrote this or if not, I'd like someone more experienced to explain this to me.
Thank you!
The traversal needs an efficient way to lookup a particular node in the visited set. That can be done efficiently with an array lookup such as if (!visited[neigh]). Another option would be using a string key and an object to represent the visited set, but it would likely be less efficient. Referring to the nodes as objects only would complicate things since there would be no natural key one could use to determine membership in the visited set.
Related
Each element in list must have a unic key. So how can create this key?
I did function which return keys like this:
QV8938
XN0210
DC7389
DC8376
HA8357
etc. With random. Is it normal to create such keys?
Is it normal to create such keys?
No, and it's not a good idea. Read the Lists and Keys documentation.
There are three separate problems with random keys:
If they're random, you might end up with the same key on more than one item. (You can mitigate this problem with a sufficiently-powerful random generator like ones used for GUIDs, though.)
You'll cause unnecessary work committing to the DOM for something that hasn't actually changed when its key changes.
You'll cause display errors by not re-committing to the DOM when something does change but its key doesn't change.
Instead, derive your keys from the data for the items being displayed, which is always possible, even if it means you have to create artificial keys for them and then reuse those artificial keys.
You might be tempted to use array indexes as keys. That's not a good idea either in almost any case (that post is linked from the documentation above). The only exception is a static array of static items — nothing in it is ever modified, nor does its order ever change; in that one case, using the index as a key is okay.
I've got a list of 100,000 items that live in memory (all of them big ints stored as strings).
The data structure these come in doesn't really matter. Right now they live in an array like so:
const list = ['1','2','3'...'100000'];
Note: The above is just an example - in reality, each entry is an 18 digit string.
I need to check for an object's existence. Currently I'm doing:
const needToCheck = '3';
const doesInclude = list.includes(needToCheck);
However there's a lot of ways I could do this existence check. I need this to be as performant as possible.
A few other avenues I could follow are:
Create a Map with the value being undefined
Create an object ({}) and create the keys of the object as the entries in list, then use hasOwnProperty.
Use a Set()
Use some other sort of data structure (a tree?) due to the fact that these are all numbers. However, due to the fact that these are all 18 digits in length, so maybe that'll be less performant.
I can accept a higher upfront cost to build the data structure to get a bigger speed increase later, as this is for a URL route that will be hit >1MM times a day.
Array.prototype.includes is an O(n) operation, which is not desirable - every time you want to check whether a value exists, you'll have to iterate over much of the collection (perhaps the entire collection).
A Map, Set, or object are better, since checking whether they have a value is an O(1) operation.
A tree is not desirable either, because lookup will necessarily take a number of operations down the tree, which could be an issue if the tree is large and you want to lookup frequently - so the O(1) solution is better.
A Map, while it works, probably isn't appropriate because you just want to see if a value exists - you don't need key-value pairs, just values. A Set is composed of only values (and Set.has is indeed O(1)), so that's the best choice for this situation. An object with keys, while it could work too, might not be a good idea because it may create many unnecessary hidden classes - a Set is more designed towards dynamic values at runtime.
So, the Set approach looks to be the most performant and appropriate choice.
You might also consider the possibility of moving the calculation to the server. 100,000 items isn't necessarily too much, but it's still a surprisingly large amount to see client-side.
Unconventionally, you could also use an object and set each of your 100,000 items as a property because under the hood, the JavaScript Object is implemented with a hash table.
For example,
var numbers = {
"1": 1243213,
"2": 4314121,
"3": 3142123
...
}
You could then very quickly check if an item existed by checking if numbers["1"] === undefined. And not only that, but you can also get the value of of the property at the same time.
However, this method does come with some drawbacks like iterating through the list becoming a lot more complicated (though still possible).
For reference, see https://stackoverflow.com/a/24196259/8250558
I'm learning about data structures formally for the first time. To me, some of the benefits traditionally described of linked lists (easier memory allocation and faster input and deletion to the body of the list) seem moot in js given the way arrays work (like objects with numbered keys).
Can anyone give an example of why I'd want to use a linked list in javascript?
As the comments note, you'd do this if you need constant time insertion/deletion from the list.
There are two ways Array could be reasonably implemented that allow for populating non-contiguous indices:
As an actual C-like contiguous block of memory large enough to contain all the indices used; unpopulated indices would contain a reference to a dummy value so they wouldn't be treated as populated entries (and excess capacity beyond the max index could be left as garbage, since the length says it's not important)
As a hash table keyed by integers (based on a C-like array)
In either case, the cost to insert at the end is going to be amortized O(1), but with spikes of O(n) work done whenever the capacity of the underlying C-like array is exhausted (or the hash load threshold exceeded) and a reallocation is necessary.
If you insert at the beginning or in the middle (and the index in question is in use), you have the same possible work spikes as before, and new problems. No, the data for the existing indices doesn't change. But all the keys above the entry you're forcing in have to be incremented (actually or logically). If it's a plain C-like array implementation, that's mostly just a memmove (modulo the needs of garbage collections and the like). If it's a hash table implementation, you need to essentially read every element (from the highest index to the lowest, which means either sorting the keys or looking up every index below the current length, even if it's a sparse Array), pop it out, and reinsert it with a key value that is one higher. For a big Array, the cost could be enormous. It's possible the implementation in some browsers might do some clever nonsense by using an "offset" that would internally use negative "keys" relative to the offset to avoid the rehash while still inserting before index 0, but it would make all operations more expensive in exchange for making shift/unshift cheaper.
Point is, a linked list written in JavaScript would incur overhead for being JS (which usually runs more slowly than built-ins for similar magnitude work). But (ignoring the possibility of the memory manager itself introducing work spikes), it's predictable:
If you want to insert or delete from the beginning or the end, it's fixed work (one allocation or deallocation, and reference fixups)
If you are iterating and inserting/deleting as you go, aside from the cost of iteration, it's the same fixed work
If it turns out that offsets aren't used to implement shift/unshift in your browser's implementation (with them, shift would usually be cheap, and unshift cheap until you've unshift-ed more than you've shift-ed), then you'd definitely want to use a linked list when working with a FIFO queue of potentially unbounded size
It's wholly possible all browsers use offsets to optimize (avoiding memmove or re-keying under certain conditions, though it can't avoid occasional realloc and memmove or rehashing without wasting memory). I don't know one way or the other what the major browsers do, but if you're trying to write portable code with consistent performance, you probably don't want to assume that they sacrificed general performance to make the shift/unshift case faster with huge Arrays, they might have preferred to make all other operations faster and assumed shift/unshift would only be used with small Arrays where the cost is trivial.
I think there are some legit cases / reasons to prefer linked lists:
Reason 1:
As others already described, insertion and deletion operations perform fixed in O(1) time for linked lists. This might be a significant advantage depending on your problem.
Reason 2:
You can do things with linked lists that you can't do with arrays. This is due to the nature of a linked list -> every list entry has got references to it's follower (and prececessor if it's a double linked list).
Example1:
So if you have a linked list of items cou could store a reference to a "currentItem" in a variable. If you need to access the item's neighbors you could just write:
curItem.getNext();
or
curItem.getPrev();
Now you could argue that you could do the same with arrays while curItem is just the current Index. Basically this is true (and in most cases I would use that), but remember that in javascript it is possible to skip indices. So if your array looks like this, the index-method would not work as easily as thought:
myArray = [];
myArray[10] = 'a';
myArray[20] = 'b';
If you find yourself in that kind of situation, maybe a linked ist is the better choice.
However, if you need random access to the data (which is more seldom than it seems in most cases) you would go with arrays almost every time.
Example2:
If you want to "split" your list into 2 separate lists, this would also be possible O(1) time. With arrays you'd need to use slice, which is more imperformant. However, this is only an issue if you work with large datasets and perform this operation often. 20 repetitions of slicing of an array of 10 million strings took about 4 seconds on my machine, whereas the separation of one list into 2 took <1 second (providing you already have a reference to the list element where you want to start the separation of course!).
Conclusion:
In some cases you would benefit from a list's nature and it's performance. In some cases, you would suffer from it's imperformance (inability to randomly access multiple data).
I've never used a list in javascript, but similar structures like trees or graphs are used for data representation (in both backend and frontend javascript). So analyzing/learning list implementations in javascript is a good idea for more complex structures.
#noob-in-need I recommend you watch this video about the JavaScript garbage collector: https://www.youtube.com/watch?v=RWmzxyMf2cE
In it he explains why using a linked list can give you finer-grain control over your code's speed (as ShadowRanger discusses in depth) and also prevent unexpected garbage collection slowdowns. Plus it was filmed on talk-like-a-pirate day. :)
This boils down to the very basic differences of array vs linkedlist.
Inserting a new element in an array of elements is expensive, because room has to be created for the new elements and to create room, the existing elements need to be shifted. But for a linked list it's just change of references.
But reading and random access is easier in array than in linkedlist. Random access is not allowed. We have to access elements sequentially starting from the first node. So we cannot do binary search with linked lists.
Extra memory space for a pointer which is used to store reference for the next is required with each element of the list.
It is surprisingly rare to find use cases where linked lists outperform data structures built on top of arrays:
Arrays tend to be more cache friendly and adding to the back is fast on average (O(n) worst case, but O(1) amortized)
The constant-time benefits of a linked list falls apart once you need to search for the location. So, if you need to find the location in the list, you can remove it in O(1), but finding it is already O(n) and the array structure will most likely outperform linked list.
Still, scenarios exist where linked lists are used and where their constant-time operations shine. Schedulers are a good example because latency is important and guaranteed O(1) becomes a factor. Linked lists are used in the Linux kernel, but since you asked for a JavaScript example, the NodeJs runtime uses them for implementing timers:
Timer (design & implementation): lib/internal/timer.js
Linked list implementation: internal/linkedlist.js
Below you can find a simple comparison between Linked List and Array
Ref: https://en.wikipedia.org/wiki/Linked_list#Disadvantages
I'm trying to explain Map (aka hash table, dict) to someone who's new to programming. While the concepts of Array (=list of things) and Set (=bag of things) are familiar to everyone, I'm having a hard time finding a real-world metaphor for Maps (I'm specifically interested in python dicts and Javascript Objects). The often used dictionary/phone book analogy is incorrect, because dictionaries are sorted, while Maps are not - and this point is important to me.
So the question is: what would be a real world phenomena or device that behaves like Map in computing?
I agree with delnan in that the human example is probably too close to that of an object. This works well if you are trying to transition into explaining how objects are implemented in loosely typed languages, however a map is a concept that exists in Java and C# as well. This could potentially be very confusing if they begin to use those languages.
Essentially you need to understand that maps are instant look-ups that rely on a unique set of values as keys. These two things really need to be stressed, so here's a decent yet highly contrived example:
Lets say you're having a party and everyone is supposed to bring one thing. To help the organizer, everyone says what their first name is and what they're bringing. Now lets pretend there are two ways to store this information. The first is by putting it down on a list and the second is by telling someone with a didactic memory. The contrived part is that they can only identify you through you're first name (so he's blind and has a cochlear implant so everyone sounds like a robot, best I can come up with).
List: To add, you just append to the bottom of the list. To back out you just remove yourself from the list. If you want to see who is bringing something and what they're bringing, then you have to scan the entire list until you find them. If you don't find them after scanning, then they're clearly they're not on the list and not bringing anything. The list would clearly allow duplicates of people with the same first name.
Dictionary (contrived person): You don't append to the end of the list, you just tell him someone's first name and what they're bringing. If you want to know what someone is bringing you just ask by name and he immediately tells you. Likewise if two people of the same name tell him they're bringing something, he'll think its the same person just changing what they're bringing. If someone hasn't signed up you would ask by name, but he'd be confused and ask you what you're talking about. Also you would have to say when you tell the guy that someone is no longer bringing something he would lose all memory of them, so yeah highly contrived.
You might also want to show why the list is sufficient if you don't care who brings what, but just need to know what all is being brought. Maybe even leave the names off the list, to stress key/value pairs with the dictionary.
Perhaps it would be the analogy of a human being that your meeting for the first time:
Each person has an unordered amount of attributes, each of these attributes can only have 1 value, which is unique (like hair=long, eye_color=blue). And you would discover these attributes in no particular order.
So for a person she can have a shoesize=38, hair_color=brown and eye_color=blue and when reciting (human_dict.get('shoe_size')) this to someone else you would mention the attributes in no particular order except by attribute name.
I have seen cases where a large list of people were binned according to their last N digits of their identifying number, in order to save on key search. This binning is somewhat similar to hashing, and may help explain it.
Are you successful in explaining the array in a logical way..that array is a storage where elements are kept at first position. second position , third position....first,second.third are basically keys...
Now extend it to say maps are storage where are keys are not necessarily numbers..lets say they are strings...or even numbers which are not consecutive or have any relationship
Conversely lets say in array A(of int) are maps where index 1 is mapped to A's address, 2 to the address of A + 4 and so on....
In some restaurants when you make your order in the counter, they give you a number to identify your order. The numbers :
Don't need to be sorted.
Don't need to be consecutive
The only idea of the numbers is that they can find your order easily. In the map/hash table/associative array world the number would be the key and your order the value.
After you finish your order they can use the same number for another order. So the number is basically the identifier for an order at certain point in time, this would fit the Javascript Object example where the properties of the objects can change their value.
Long story short : I'd like to treat several javascript associative arrays as a database (where the arrays are tables). The relations could be represented by special fields inside the arrays. I'm not interested in the persistence aspect of a database, I only want to be able to query the arrays with a SQL-like language and retrieve sets of data in the form of associative arrays.
My question : Is there any javascript library that has such features ? Otherwise, is there any library that can at least take care of the SQL-like language part ?
Thanks
I believe the closest thing to what you need is the jLinq library. It can operate with js objects and arrays much in the same way you would do with a database, but in a slightly different way. You don't really write queries, but use methods to construct them. Overall it's way better I think.
Some googling found this: http://ajaxian.com/archives/two-js-solutions-to-run-sql-like-statements-on-arrays-and-objects which seemed interesting.
Can I ask why you want to do this?
I came across this question while searching something sort of related. Wanted to share with you (9 years later, I do realize) that I have the same want/need, often, where the script I'm working on has a lot of cross-referencing to do between various sources of information. I use PowerShell. Enumerating arrays of objects, from within a loop, which is enumerating other objects, is just bad/slow/horrible.
To date, my solution has been to take all my arrays and then make hashtables from them, where the key/name is a property value that is common across all the arrays (e.g. ObjectId (GUID)), and the value is the entire object from the array. With this, while in my loop which is enumerating Array#1, I can check for the presence of this current item in any of the other arrays simply by checking the existence of the key in the corresponding hashtables, and that way there's no enumerating the other array, there's just direct , equal effort access to the correct item in the array (but really coming from the newly built hashtable).
So my arrays are just temporary collection buckets, then everything I do from there uses the hashtables, which are just index/lookup tables.
What I was searching for when I stumbled here, was for solutions to keep track of all the different hashtables in the building/planning phase of my scripts.