I'm trying to explain Map (aka hash table, dict) to someone who's new to programming. While the concepts of Array (=list of things) and Set (=bag of things) are familiar to everyone, I'm having a hard time finding a real-world metaphor for Maps (I'm specifically interested in python dicts and Javascript Objects). The often used dictionary/phone book analogy is incorrect, because dictionaries are sorted, while Maps are not - and this point is important to me.
So the question is: what would be a real world phenomena or device that behaves like Map in computing?
I agree with delnan in that the human example is probably too close to that of an object. This works well if you are trying to transition into explaining how objects are implemented in loosely typed languages, however a map is a concept that exists in Java and C# as well. This could potentially be very confusing if they begin to use those languages.
Essentially you need to understand that maps are instant look-ups that rely on a unique set of values as keys. These two things really need to be stressed, so here's a decent yet highly contrived example:
Lets say you're having a party and everyone is supposed to bring one thing. To help the organizer, everyone says what their first name is and what they're bringing. Now lets pretend there are two ways to store this information. The first is by putting it down on a list and the second is by telling someone with a didactic memory. The contrived part is that they can only identify you through you're first name (so he's blind and has a cochlear implant so everyone sounds like a robot, best I can come up with).
List: To add, you just append to the bottom of the list. To back out you just remove yourself from the list. If you want to see who is bringing something and what they're bringing, then you have to scan the entire list until you find them. If you don't find them after scanning, then they're clearly they're not on the list and not bringing anything. The list would clearly allow duplicates of people with the same first name.
Dictionary (contrived person): You don't append to the end of the list, you just tell him someone's first name and what they're bringing. If you want to know what someone is bringing you just ask by name and he immediately tells you. Likewise if two people of the same name tell him they're bringing something, he'll think its the same person just changing what they're bringing. If someone hasn't signed up you would ask by name, but he'd be confused and ask you what you're talking about. Also you would have to say when you tell the guy that someone is no longer bringing something he would lose all memory of them, so yeah highly contrived.
You might also want to show why the list is sufficient if you don't care who brings what, but just need to know what all is being brought. Maybe even leave the names off the list, to stress key/value pairs with the dictionary.
Perhaps it would be the analogy of a human being that your meeting for the first time:
Each person has an unordered amount of attributes, each of these attributes can only have 1 value, which is unique (like hair=long, eye_color=blue). And you would discover these attributes in no particular order.
So for a person she can have a shoesize=38, hair_color=brown and eye_color=blue and when reciting (human_dict.get('shoe_size')) this to someone else you would mention the attributes in no particular order except by attribute name.
I have seen cases where a large list of people were binned according to their last N digits of their identifying number, in order to save on key search. This binning is somewhat similar to hashing, and may help explain it.
Are you successful in explaining the array in a logical way..that array is a storage where elements are kept at first position. second position , third position....first,second.third are basically keys...
Now extend it to say maps are storage where are keys are not necessarily numbers..lets say they are strings...or even numbers which are not consecutive or have any relationship
Conversely lets say in array A(of int) are maps where index 1 is mapped to A's address, 2 to the address of A + 4 and so on....
In some restaurants when you make your order in the counter, they give you a number to identify your order. The numbers :
Don't need to be sorted.
Don't need to be consecutive
The only idea of the numbers is that they can find your order easily. In the map/hash table/associative array world the number would be the key and your order the value.
After you finish your order they can use the same number for another order. So the number is basically the identifier for an order at certain point in time, this would fit the Javascript Object example where the properties of the objects can change their value.
Related
Lets say I have a relation E = mgh. And I have four textboxes on my site. Given three values, I can compute the value of the fourth variable by rearranging the variables by hand. But how to do so programmatically, efficiently, without redundant code.
Now the worst way of doing (which I have painfully done in the past in a different language) is manually checking which of the four variables is not inputted in the textboxes and then use a precomputed relation like g = E/mh to get its value. But this is bad because as the formula becomes more and more complex with more variable or different powers, it is tiresome and impractical to do by hand.
So what I'm looking for, is a Proof of Concept for JavaScript which can take a certain no of values, and a relation relating those values to find out the value of the missing variable on its own. Also note: the relation may not be a simple relation it might be complex like E=(m^1/2)(g^1/3)(h^4)
An Example of this might be this site where I can plug in some values and it will auto-compute the missing value on its own. I don't know what underlying structure they might be using, as they might have all the relations precomputed by hand.
EDIT
For all those out there, who feel this question being like asking for a Library recommendation, it is not. I'm looking for a minimal example which can do the following:
Using Array's/ Objects transform a given relation into another. (Generally referred in mathematical terms as rearranging the terms)
Now Using the above relation compute without using eval()
I know what to do for the second, i.e take the new expression, find the variable and then put the value at the place of the variable in the string.
Then evaluate the expression. Like ("y=x+3*e") -> (y="3+3*30")
How do I make the first step possible?
(OR) Do you have a better approach for this problem.
I have an Excel and there is an example of how it looks
enter image description here
I am using Pentaho, with the purpose of creating a new row(related to) in which I will show if a person has a relation with another one, I will consider that two-person are related if they have the same Dirección (address). For instance, María Isabel Hevilla Castro and Miguel Manceras Fernández live in the same place, then in relation to of María Isabel Hevilla Castro it will be Miguel Manceras Fernández and on the contrary, in Miguel Manceras Fernández it will be María IsabelHevilla Castro.
I have tried to solve this using a Javascript modified value, but I'm just beginning to learn Javascript and I don't know how to solve this problem.
Could somebody help me, or give me a clue.
If your addresses are clean you can do this with a self-join on Dirección.
The idea is that you sort by Dirección, then duplicate the stream, rename the name field to something else (Nombre2 or Related_to) and inner join them by Dirección. This will result in records for every combination that has the same Dirección, including the person themselves. That is fixed by filtering the rows, keeping only the ones where Nombre is not equal to Nombre2.
The basic flow can be extended with cleanup of address fields (Calculator step can do similarity scores) beforehand or extra processing afterwards for the related_to field.
This is likely better accomplished using a loop in something like Python, R, or Javascript as you already mentioned.
Pentaho is fundamentally designed to process data on a row-by-row basis. There aren't that many functions in Pentaho that allow you to do analysis across a column of data.
If you have to use Pentaho for this rather than something like Python or Javascript, then I'd suggest sorting on the Direccion column, and then using the Analytic query step to analyze across rows. This will probably only work if you have a maximum of two people per address, but this might get you where you need to go.
This implementation from Geeks for Geeks strangely utilize array to implement bfs.
Link: https://www.geeksforgeeks.org/implementation-graph-javascript/
In the step "add the starting node to the queue", it even assigns "startingNode" as an index of the "visited" array, which sounds very off(it would make a bit more sense if it used object as the type). I was wondering if they incorrectly wrote this or if not, I'd like someone more experienced to explain this to me.
Thank you!
The traversal needs an efficient way to lookup a particular node in the visited set. That can be done efficiently with an array lookup such as if (!visited[neigh]). Another option would be using a string key and an object to represent the visited set, but it would likely be less efficient. Referring to the nodes as objects only would complicate things since there would be no natural key one could use to determine membership in the visited set.
The question is to implement a web service that can read a 10GB file and store all distinct words & their occurrences. The requirements needs to be solved in O(n) or better complexity. The next part of the question is to write all client side code to allow search based on keypress.
How do I approach this problem? What would you suggest, are the main sub-headings?Do we need to use some sort of in-memory caching? Can 1 computer handle searching 10GB of data? Is there an approximation I should consider for distinct words based on Language (For example, in Cracking the coding interview I read there are about 600,000 distinct words in a language). How do I handle scalability of a system built this way? I really need help structuring my thoughts! Thanks in advance!
You shouldn't be using JavaScript for this. Pretty much any language will have better performance.
But, setting that aside, let's answer the question. What you'll want to do is create a Set and iterate through all words. Given the size of the data, you'll probably want to split it into chunks beforehand or at read time.
Just adding the key to the Set every time will suffice, as set only contains unique elements.
Alternatively, if you have 10+GB of RAM, just put the whole thing into an array and cast it to a set. Then you'll be able to read the unique values. It'll take quite a while, though.
I have a use case where I need to do complicated string matching on records of which there are about 5.1 Million of. When I say complicated string matching, I mean using library to do fuzzy string matching. (http://blog.bripkens.de/fuzzy.js/demo/)
The database we use at work is SAP Hana which is excellent for retrieving and querying because it's in memory so I would like to avoid pulling data out of there and re-populating it in memory on the application layer but at the same time I cannot take advantages of the libraries (there is an API for fuzzy matching in the DB but it's not comprehensive enough for us).
What is the middle ground here? If I do pre-processing and associate words in the DB with certain keywords the user might search for I can cut down the overhead but are there any best practises that are employed when It comes to this ?
If it matters. The list is a list of Billing Descriptors (that show up on CC statements) therefore, the user will search these descriptors to find out which companies the descriptor belongs too.
Assuming your "billing descriptor" is a single column, probably of type (N)VARCHAR I would start with a very simple SAP HANA fuzzy search, e.g.:
SELECT top 100 SCORE() AS score, <more fields>
FROM <billing_documents>
WHERE CONTAINS(<bill_descr_col>, <user_input>, FUZZY(0.7))
ORDER BY score DESC;
Maybe this is already good enough when you want to apply your js library on the result set. If not, I would start to experiment with the similarCalculationMode option, like 'similarcalculationmode=substringsearch' etc. And I would always have a look at the response times, they can be higher when using some of the options.
Only if response times are to high, or many active concurrent users are using your query, I would try to create a fuzzy search index on your search column. If you need more search options, you can also create a fullext index.
But that all really depends on you use case, the values you want to compare etc.
There is a very comprehensive set of features and options for different use cases, check help.sap.com/hana/SAP_HANA_Search_Developer_Guide_en.pdf.
In a project we did a free style search on several address columns (name, surname, company name, post code, street) and we got response times of 100-200ms on ca 6 Mio records WITHOUT using any special indexes.