Smart autocomplete with tensorflow.js - javascript

Is it possible to build some sort of autocomplete select which suggests values based on previous inputs?
Let’s say I have airport select with 5k items. I would like give user top 5 values to select based on his previous preferences.

Deep learning approach:
Basically what you want is an RNN powered word-level prediction model. Such models are used in text generation. You can refer to one example here.
These models specifically take in a word ( t ) and predict the next word ( t1 ). Then taking t1 it predicts t2. This cycle goes on until the max number of words is reached.
If I had to train such an RNN model on a single sentence like
Hello world from TensorFlow
Then the feature-label pairs would look like:
X, Y
Hello, World
World, from
from, TensorFlow
These sentences could be trained from the user's previous inputs.
Things you will require in building such a model:
Lots of user data.
A ranking algorithm to get the top N predictions
Probably, a GPU or TPU. CPUs might prove to be very slow and might take days.
Non-Deep learning approach:
You can Bayes Theorem or simply check which word has the highest probability after a specific word. The word "morning" will have the highest probability after the word "good".
Tip:
Mostly, you will also need to handle custom user words which you may not add in your dictionary. It could be the name of a person. So while collecting data, you need to tag such words into a common word like .

Related

Managing and searching through millions of integers/coordinates in Javascript

I have an array of RGB value integers/whole numbers, ranging from 0 to 255. Eight Different lists.
For example one list has 8,349,042 values in it. OR 2,783,014 RGB sets of three.
The core objective is the user selects a pixel in an image.
That pixel's (R,G,B) value is grabbed and searched for within these lists. It exists in one of these lists, as all the lists together contain all the possible RGB combinations (16,777,216)
I'm trying to figure out what's the best way to store and search through these values.
again: these values don't change, they are hard coded lists(see Bullet Point below), with a known range.
The search query would be at minimum 3 times every event which would be every 10-30 seconds or so if the user was spamming.
OR Best case scenario, if the storage and search technique is fast enough I would like to: run the search on every pixel in an image (of maybe 800 x 600 or smaller resolutions) to have more data to play with. If I run into any memory limitations, I plan to work with it and use it as restrictions for my game design.
I used Javascript to generate these lists, going through and assigning each value based on how close it was to a base color.
[maybe unimportant how I made these lists]
I first assigned black and white RGB lists based on hard numeric limitations, then the rest of the RGB values were looped through and assigned to their closest base colors, red, yellow, cyan, green, blue and magenta. If there was a tie in distance I gave it to currently shortest list to try to keep it somewhat balanced. I may try to optimize this later and generate a new list, but not during runtime, just raw data.
I saved the results in hard text, and they are currently stored as a text that I can dump into large array.
At first I was trying to store the data as a JSON file along with my scripts. But I struggled to read the data and save it to an array. I ran into issues with using fetch and async and not being able to have the array where I needed it. Testing with console.log(arr) and getting undefined. I'm guessing because it wasn't loaded yet.
I can just paste it hard coded into the array but it's ginormous and I know there has to be a better way.
Also, hearing about differences in arrays vs sets vs objects
and different searching techniques within them.
Most of them seemed to be more tactful for multi variable arrays like name and age and location databases.
Since my data is all numeric I was thinking it could be a more bit/byte based approach?
I was reading some things on trees and hashs, bit crunching encryption?
Trees seemed nice for quicker searching as I could try to assign each branch of the tree to each R, G and B of the value, but I would also need to figure out how to convert my Giant single list of numbers into that, it also could be just the search type and that may depend on how I store the data.
I also struggled to understand the difference between front end and back end. I believe everything I've tried would be considered front end as I'm only testing my code in a browser.
I was pointed towards Node for backend but got lost in trying to get the console to run things.
I'm willing to give any of these things a try but I don't want to go down a path and find out it can't do what I want, or not optimally enough, like a server burden, or user burden with waiting too long or unable to do things because of user data security, requiring the user to do something more than just load the game, permissives wise.
SO I'm hoping someone can give me suggestions on what I should pursue so I can knuckle down and have a better focus on what I need to learn to be well versed and best tackle problem.
EDIT: Simplified question: In Javascript, I have an array of 2 million (x,y,z) numbers. What's the best way to search that array for a specific (x,y,z) value?
Would it be better to store the data in a different format than an array for constant searches?
I'm not certain I've understood the overall goal here but have a suggestion to consider if you are trying to assign a predetermined value to some (almost random as it is picked as a pixel colour from an image) rgb number set.
I assume the list/dictionary you have made allocates some value to each rgb number set and that it can be regenerated or reformatted if needed.
There are a maximum of 16,777,216 rgba values (256^3). Javascript arrays can have up to 2^32-2 (almost 4.3 billion) elements. Therefore the suggestion is to reformat your dictionary to be a 3-dimensional array where each dimension is indexed 0-255. The array can be declared and assigned as an array literal in a regular js script (text file) like colourDictionary= [[[val0]..[val255]]..[[]..[]],...[[val0]..[val255]]..[[]..[]]]; and each value accessed arithmetically in constant time using the pixel values as colourDictionary[r][g][b]
To be useful without writing lines to cater for missing values, your gaps (you mention a list of 8,349,042, around half the available number combinations) could be filled with the values of nearest neighbours.
Apologies if I've missed the point.

Pairing multiple images in javascript (with multiple options for each)

Edit: Since I'm just planning my approach to the project, I have no code to show for it. My question is more about how i could approach the problem (arrays, objects or if there are smarter ways to do it) than about the actual code. Since I'm pretty new Im looking for a "way in" how I could tackle the project and look at tutorials.
(Since I'm pretty new to coding I'm trying to solidify the basics of html/css/js by building my own projects. With this project my problem is finding my way in, since I struggle with finding the correct mechanics, so I cant find any tutorials because i dont know where to start)
The project
I'm trying to build a project which pairs 3 images ("background", "base" and "addition"), starting from one which is selected.
For example: I'm selecting in image with the "background" woods and it will pair it with a small river as "base" and foxes as "addition".
If the user doesn't like the pairing, he can click a btn and it will pair the same selected image (in this case the wood) with other two compatible images (for example: a mossy stone and squirrels).
However: it wouldn pair my selected image, which in this case is a wood with a desert floor and dolphins.
The user also could start by selecting a "addition", for example a crab, which will be paired with a beach and smaller pebbles.
The problem
I'm pretty sure that the main focus with this project should be on arrays.
I thought i could do 3 arrays (one for the background, one for the base and one for the addition) and from there just pair random values from the different arrays. However, should i make 3 arrays for every probability or is there a way I could filter the pairs I want and the pairs i dont want?
An other approach i thought of was to let every image in the array be an object, so that i can assign attributes (for example: climate, color, sea level..) to the objects and filter them based on that.
In this case however I'm struggling with the code to first filter the correct pairings and then select a random pairing from those which were selected.
Every bit of help is appreciated,
thank you!

Algorithms: Aggregate of substrings to determine relevant information

I am trying to do an aggregate algorithm that will get the most important elements in a text based on user highlights.
Imagine you have a text having n words where you have the ability to select k continuous words from the text as a "relevant highlight", where 1<=k<=n.(k is a substring of n)
Assuming we select anywhere from 10 to 10000 of these k highlights, is there any algorithm that can determine the most important information?
Consider that many of the highlights would overlap and we need to take that into account. I am also preferably looking for a solution in javascript since it's for a chrome extension.
This is NOT for a class, this is for a personal project concerning crowd-based summarization.
Suppose that each user highlights some stretches of text and that you know what those highlights are. You could sum, for each word in the text, how many people highlighted it. One thing you could calculate is, for some fixed k and N, a total of k stretches using at most N words in all, such the sum of the number of times those N words were highlighted was a maximum.
You can do this with dynamic programming, working left to right within the text. For each point in the text and each possible allowed combination of (# highlights, # total words highlighted, whether current word is highlighted) you work out the score for the best answer terminating at that point satisfying those constraints. You can work out the best answer at each point by using the best answers for the previous word - consider the possible scores you get if you take any one of the existing best answers and either extend a current highlight, if that last word was highlighted, or start a new highlight. At the end you track the best answer for the whole text back from right to left.
This gives you a summary in the form of the best section of k stretches to highlight, using at most N words to pick up as many of the words highlighted by users as possible. No doubt there are variations on this for different scores or for different highlighting constraints - it might be easier to compute the best combination of k stretches, where each stretch is of at most M characters.

String similarity [duplicate]

I'm building a website that should collect various news feeds and would like the texts to be compared for similarity. What i need is some sort of a news text similarity algorithm.
I know that php has the similar_text function and am not sure how good it is + i need it for javascript.
So if anyone could point me to an example or a plugin or any instruction on how this is possible or at least where to look and start investigating.
There's a javascript implementation of the Levenshtein distance metric, which is often used for text comparisons. If you want to compare whole articles or headlines though you might be better off looking at intersections between the sets of words that make up the text (and frequencies of those words) rather than just string similarity measures.
The question whether two texts are similar is a philosophical one as long as you don't specify exactly what it should mean. Consider the Strings "house" and "mouse". Seen from a semantic level they are not very similar, but they are very similar regarding their "physical appearance", because only one letter is different (and in this case you could go by Levenshtein distance).
To decide about similarity you need an appropriate text representation. You could – for instance – extract and count all n-grams and compare the two resulting frequency-vectors using a similarity measure as e.g. cosine similarity. Or you could stem the words to their root form after having removed all stopwords, sum up their occurrences and use this as input for a similarity measure.
There are plenty approaches and papers about that topic, e.g. this one about short texts. In any case: The higher the abstraction level where you want to decide if two texts are similar the more difficult it will get. I think your question is a non-trivial one (and hence my answer rather abstract) ... ;-)

Can anyone recommend a well performing interface to allow the user to organize a large number of items in HTML?

Currently for "group" management you can click the name of the group in a list of available groups and it will take you to a page with two side by side multi-select list boxes. Between the two list boxes are Add and Remove buttons. You select all the "users" from the left list and click 'Add' and they will appear in the right list, and vice versa. This works fairly well for a small amount of data.
The problem lies when you start having thousands of users. Not only is it difficult and time consuming to search through (despite having a 'filter' at the top that will narrow results based on a string), but you will eventually reach a point where your computer's power and the number of list items apex and the whole browser starts to lag horrendously.
Is there a better interface idea for managing this? Or are there any well known tricks to make it perform better and/or be easier to use when there are many 'items' in the lists?
Implement an Ajax function that hooks on keydown and checks the characters the user has typed into the search/filter box so far (server-side). When the search results drop below 50, push those to the browser for display.
Alternatively, you can use a jQuery UI Autocomplete plugin, and set the minimum number of characters to 3 to trigger the search. This will limit the number of list items that are pushed to the browser.
I would get away from using the native list box in your browser and implement a solution in HTML/CSS using lists or tables (depending on your needs). Then you can use JavaScript and AJAX to pull only the subset of data you need. Watch the user's actions and pull the next 50 records before they actually get to them. That will give the illusion of all of the records being loaded at runtime.
The iPhone does this kind of thing to preserve memory for it's TableViews. I would take that idea and apply it to your case.
I'd say you hit the nail on the head with the word 'filter'. I'm not the hugest fan of parallel multi-selects like what you are describing, but that is almost beside the point, whatever UX element you use, you are going to run into a problem given thousands of items. Thus, filtering. Filtering with a search string is a fine solution, but I suspect searching by name is not the fastest way to get to the users that the admin here wants. What else do you know about the users? How are they grouped.
For example, if these users were students at a highschool, we would know some meta-data about them: What grade are they in? How old are they? What stream of studies are they in? What is their GPA? ... providing filtering on these pieces of metadata is one way of limiting the number of students available at a time. If you have too many to start with, and it is causing performance problems, consider just limiting them, have a button to load more, and only show 100 at a time.
Update: the last point here is essentially what Jesse is proposing below.

Categories