Storing large array in MongoDB - javascript

I'm working on a little side project which has a search capability. I'm using typeahead.js attached to a REST api built with expressJS and mongoDB. I'm wondering what the best approach to two problems I have it. I'm primarily a front-end guy just starting out with Node and MongoDB. Here are the two issues I need help with. But first a little background to better understand the issues.
The site I'm building allows you to upload videos. You can add tags to these videos. When searching for a video I want to be able to search through these tags using the typeahead.js. Just like YouTube.
So here are the issues.
1 - I have a "tags" collection in MongoDB. When uploading a video I take the tags for that video and add them to this collection which I'll use for predictive searching. As time progresses this collection should have plenty of tags to search through. The issue I'm having is how to insert only the unique tags (the ones that don't already exist). For example say I want to insert the following document into MongoDB:
{
tags: "tag1, tag2, tag3, tag4, tag5, tag6, tag7, tag8"
}
The collection already has "tag1, tag2, tag4 and tag7". So I only want to insert 3, 5, 6 and 8. My issues/question is what would be the best approach to do this. Should I just first query the collection, parse through it and compare each tag, separate the ones that don't exist and then "append" them to the collection? The issue I see with this is that, again, as time progresses this will be alot to parse through. So I'm not sure what the best approach here is.
2 - Would storing all of the tags in a simple array in a collection be the best approach? In time this array will be EXTREMEMLY large. Again I'm not a database guy, so I don't have a great understanding of how to approach an issue like this.
As always any and all help is much appreciated.

Since mongodb can't do joins I would store the tags in each video document a la myVideo.tags = ['sports', 'baseball', 'pitcher']. Then to power your autosuggest I would periodically map/reduce across the videos collection and output the set of active tags to a separate tags collection. You could even compute a popularity score and store something like {tag: 'baseball', score: 156} for the case where the 'baseball' tag was used in 156 videos, and use that to sort your autosuggest results so that more popular tags are shown earlier when the user is typing 'ba' for example 'baseball' is listed before 'baking' because it's a more likely correct completion vs being alphabetically second.
Here's an example of exactly this straight out of the mongodb cookbook.
To point 2 in your question, nope. Never store an unbounded-length set of data as an array within a mongodb document. There's a maximum document size (currently 16MB), so anything that will just grow and grow over time must be a collection of distinct documents.

Related

AWS most efficient way of finding datapoints with a particular tag (string value in a list)

I have a very newbie question on AWS. Let's say that I am running a store that offers 1.000.000 different products. Each of the products have their own row in dynamoDB in a table named products. Now I would like to attach a list of tags to each product for example ['football', 'outdoor', 'sport' ... ], so that when the customer searches for sport products with that tag shows up in the results.
I am thinking of the best way to approach this in order to offer fast but also cost-efficient searches, I have so far thought of 2 viable options:
Option 1: Include a tags field for each product that takes a list of tags.
'product_3' -> ['football', 'outdoor', 'sport']
Option 2: Create a new table where the key is each tag and includes a field that takes a list of products instead.
'sport' -> ['product_1', product_3, ... ]
I am inclined to go with option 2 since it feels like it will render the faster search, but I want to double check that I haven't made any wrong assumptions of missed any other superior option.
Would also be great to have an infrastructure that worked with word2vec so that products related to the search word also shows up, even if they are not identical string values.
DynamoDB is a powerful and very useful database, but it is not designed for search. My suggestion would be to use the correct tool for the job.
The pattern that I've used successfully multiple times is to use DynamoDB Streams and Lambda to replicate a table to an Elasticsearch index.
You can then have that string set of tags on each item in DynamoDB and manage them their. Your nominal read when you know the item hash key can be done against DynamoDB. When you want to search you hit Elasticsearch and get all of the benefit and flexibility it provides for searching. One of those benefits being really good pagination compared to DynamoDB's API as well as the ability to sort based on other attributes.

Is it better to store this information in an array or database?

I am creating a list and each item contains 3 properties: Caption, link, and image name. The list will be a "Trending now" list, where a related picture about the article is shown, a caption to the article, and a link to that article. Would it be best to store this information in an array or a database? With an array I could update the list, by adding 3 new properties, and removing the last 3. With a database, I could make a form where I submit the 3 properties and it'll update on its own without me touching the code. Would it be better to make this system in a Javascript array, or database? Wouldn't it be better to make it into an array for faster speeds? The list will have 10 items, each item has 3 properties.
For a list of 10 items, you can definitively go with a simple Array. If you need to store a bigger amount of data, than try localStorage.
Whichever solution you use, keep in mind that it will always be processed and stored in the browser.
Razor - your question touches many principles of programming. Being a beginner myself, I remember having had exactly those questions not too long ago.
That is why I answer in that 'beginner's' spirit:
if this were to be a web application then your 'trending now list' might be written as a ul list with li items in a section in an index.html in html code with css.style to style the list and your page.
Then you might use javascript, jQuery, d3.js etc. or other languages such as php to access and do something with data from those html elements.
pseudo-code example to get a value from an element:
var collectedValue = $("#your_element_id).value;
To get values into an array you would loop over your item.values pseudo-code:
for (all item.value collectable){
my_array.push(item.values);
}
Next you would have to decide how to store and retrieve values and or arrays.
This can be done client side, more or less meaning: it stays with the browser you are working in. Or server side, meaning more or less your browser interacts with a server on an 'other' computer to save and retrieve data.
Speed is an issue when you have huge data sets, otherwise it is not really an issue; certainly not for a short list. In my thinking speed is a question for commercial applications.
In case you use a database, then you have to learn the database language and manage the database, access to it, and so on. It is not overly complex in mysql and php ( that is how I started ) but you would have to learn it.
In html/css/javascript solutions others have pointed out 'JSON', which is often used for purposes such as yours.
'localStorage' is very straight forward, but has its limitations. Both these are easily understandable.
To start, I myself worked through simple database examples about mysql/php
Then I improved my html/css experience. Then I started gaining experience with javascript.
Best would be if you would enable yourself to answer your questions yourself:
learn principles of html/css to present your trending now list
learn principles of javascript to manipulate your elements
Set up a server on your computer to learn how to interact with server side ( with MAMP or such packages)
learn principles of mysql/php
learn about storage options client or server side
It is fun, takes a while, and your preferences and programming solutions will depend on your abilities in the various languages.
I hope this answer is not perceived as being too simplistic or condescending; your question seemed to imply that such 'beginner's talk' might be helpful to you.

SQL joining a large amount of tables and comparing queries to return a specific result

I am working with a database that was handed down to me. It has approximately 25 tables, and a very buggy query system that hasn't worked correctly for a while. I figured, instead of trying to bug test the existing code, I'd just start over from scratch. I want to say before I get into it, "I'm not asking anyone to build the code for me". I'm not that lazy, all I want to know is, what would be the best way to lay out the code? The existing query uses "JOIN" to combine the results of all the tables in one variable, and spits it into the query. I have been told in other questions displaying this code, that it's just too much, and far too many bugs to try to single out what is causing the break.
What would be the most efficient way to query these tables that reference each other?
Example: Person chooses car year, make, model. PHP then gathers that information, and queries the SQL database to find what parts have matching year, vehicle id's, and parts compatible. It then uses those results to pull parts that have matching car model id's, OR vehicle id's(because the database was built very sloppily, and compares all the different tables to produce: Parts, descriptions, prices, part number, sku number, any retailer notes, wheelbase, drive-train compatibility, etc.
I've been working on this for two weeks, and I'm approaching my deadline with little to no progress. I'm about to scrap their database, and just do data entry for a week, and rebuild their mess if it would be easier, but if I can use the existing pile of crap they've given me, and save some time, I would prefer it.
Would it be easier to do a couple queries and compare the results, then use those results to query for more results, and do it step by step like that, or is one huge query comparing everything at once more efficient?
Should I use JOIN and pull all the tables at once and compare, or pass the input into individual variables, and pass the PHP into javascript on the client side to save server load? Would it be simpler to break the code up so I can identify the breaking points, or would using one long string decrease query time, and server loads? This is a very complex question, but I just want to make sure there aren't too many responses asking for clarification on trivial areas. I'm mainly seeking the best advice possible on how to handle this complicated situation.
Rebuild the database then make a php import to bring over the data.

Best way in Javascript to check if string is inside an huge txt/csv?

My real world problem is: users of my mobile app type their city and I have to make sure it really exists, and that it is correctly written (caseinsensitive, so these are correct: New York, NEW york, new york. This is not correct: newyork)
There are online apis that work quite well (Google Geocode API for example) but:
After a very little amount of requests, you have to pay (2.500/day right now)
Users must be connected to the internet
That's why I tought that an offline-local solution would be better. There are many websites (like Maxmind) where you can download a list containing every city in the world. I could embed this huge txt/csv right inside my application and do a string search locally (it's a big file, ok, but not that big. It's just a onetime download of something like 30-40MB of uncompressed .txt)
I'm trying to avoid jQuery at all costs and I don't want to use any PHP/MySQL solutions (even if fulltext indexes could be handy), that's why I'm trying to do all this just using javascript.
Given a string as input, let's say "city3", what's the best/fastest way to check if it's inside an (external) huge list like:
city1,
city2,
city3,
city4,
[...]
After solving this (big) problem: if there are no exact matches, is there a way to search for the correct city without freezing the device for 10 minutes?
In the example before, lets say the user types "cit y3" or "cyty3" or "cìty3": can any js function tell him that he might be looking for "city3"? Is this kind of search too slow in this scenario?
Thanks
If speed is an issue then I would recommend loading the data into a JavaScript object and performing an in-memory search rather than repeatedly scanning a big blob of text in a file.
Try formatting the data into JSON with the city names as keys, that will give you good search performance.
A Workaround is creating a Database either SQL either noSQL, and Query this database through your JavaScript Code, using jquery Json functions.
Using a SQL Database ideal would be either MySQL either MariaDB An enhanced, drop-in replacement for MySQL.
In this solution you will probably need a Backend such as PHP to fetch the data from your Database convert them to JSON Format, and then get them through your JavaScript using jQUery Library , with the $.getJSON function
Using a noSQL Database ideal would be MongoDB.
In this solution you can fetch your data directly from javascript, also with the $.getJSON function.
Example for MongoDB Provided Here
if you dont want to use database i think you can do this:
-first , instead use one big file split it into several files. (you can write a script for this and use it just one time for split the big file). in each file put cities that starts with (example) aa , second file cityes that starts with ab.
-then for each city check first letters and then search inside that file.
For example if you need to search for city "Ahmedabad" it will search only in the files with cities that starts with Ah. Probably this is not the best solution ,at the end you got 421 file instead 1 , but reasearch will be faster.

EnsureIndex for likes in MongoDB

well, i am creating a network that allows users creating posts and like them.
Asking on stackoverflow i've understood how to structure my database:
A collection which includes a document for each post.
A collection which includes a document for each like, in each of these documents there is a reference to post is referenced to.
When i want to get ALL likes about a post i can query the like collection looking for the reference to that post.
And till here i am ok. But assuming i'll have millions documents in like collection, i wondered how could i query and search among them in not too long time.
And i was advised of ensureIndex, in this case, i have to ensureindex of the field which contains reference to a post.
But when do i have to create this index? is enough to create it once (for example when i set up my database) and it will be as default in mongodb or do i have to do it during application life-time? thank you
But assuming i'll have millions documents in like collection, i wondered how could i query and search among them in not too long time.
I assume you would most likely want to do a count on the likes as an example?
You can't, instead you use optimizations to combat this. A count on millions of rows might get a bit slow.
A typical scenario are counters in SQL techs that you use to amend the parent row with a sum figure of its children.
Same applies to MongoDB.
You would aggregate important data to the top.
If you require to actually query the likes to show some who have liked it then you limit those likes. Google+ and other networks tend to limit the amount of likes they show to about 1,000.
And i was advised of ensureIndex,
Adding indexes to a database does help with actually searching for documents.
But when do i have to create this index? is enough to create it once
Yes, MongoDB will manage the index itself. You only need to ensure it once.

Categories