AWS most efficient way of finding datapoints with a particular tag (string value in a list) - javascript

I have a very newbie question on AWS. Let's say that I am running a store that offers 1.000.000 different products. Each of the products have their own row in dynamoDB in a table named products. Now I would like to attach a list of tags to each product for example ['football', 'outdoor', 'sport' ... ], so that when the customer searches for sport products with that tag shows up in the results.
I am thinking of the best way to approach this in order to offer fast but also cost-efficient searches, I have so far thought of 2 viable options:
Option 1: Include a tags field for each product that takes a list of tags.
'product_3' -> ['football', 'outdoor', 'sport']
Option 2: Create a new table where the key is each tag and includes a field that takes a list of products instead.
'sport' -> ['product_1', product_3, ... ]
I am inclined to go with option 2 since it feels like it will render the faster search, but I want to double check that I haven't made any wrong assumptions of missed any other superior option.
Would also be great to have an infrastructure that worked with word2vec so that products related to the search word also shows up, even if they are not identical string values.

DynamoDB is a powerful and very useful database, but it is not designed for search. My suggestion would be to use the correct tool for the job.
The pattern that I've used successfully multiple times is to use DynamoDB Streams and Lambda to replicate a table to an Elasticsearch index.
You can then have that string set of tags on each item in DynamoDB and manage them their. Your nominal read when you know the item hash key can be done against DynamoDB. When you want to search you hit Elasticsearch and get all of the benefit and flexibility it provides for searching. One of those benefits being really good pagination compared to DynamoDB's API as well as the ability to sort based on other attributes.

Related

How to store product addons in Firestore DB?

So, I have a product saved to my Firestore DB, but I havent figure out how I can manage product addons, for example lets say the item was a pizza, and the person wanted extra cheese or extra tomatoes. Im not sure the best way to map this info in my DB. Lets say the following is my product entry (removed things like added timestamps for simplicities sake)
{
itemName (String)
itemBasePrice (number)
itemStock (number)
availableAt (array (of stores))
}
What is the best way for me to map product addons like extra cheese, or pepperoni, etc. should I make a DB entry for each addon? I feel like its expensive to make 100 calls from the DB to add one item to your basket?
Based on the description you provided, I think for this use case, nested fields or arrays should be the best option. I suggest you refer to this documentation and blog on how to structure data and use of arrays. If there is more information that needs to be tracked for each add ons, then you can consider using sub-collections. There are pros and cons for each way to structure the data. It really depends on how you are planning to use/query the data.

Designing SQLite table for Elements that have custom numbers of custom fields

I have an interesting situation where I'm working with posts. I don't know how the user will want to structure the posts. It would either be one block of text, or structured in an a-> b -> c structure where a, b, and c are all text blocks, and if represented as a table, there would be an unknown number of columns and unknown number of rows.
Outside of the post data, there is the possibility of adding custom attributes to the post. Most of these would be shorter text strings, but an unknown number of them.
Understanding that a json object would probably be the simplest solution, I have to fit this into a self-serving db. SQLite seems to be the current accepted solution for Redwoodjs, the framework I'm building out of. How would I go about storing this kind of data within Redwoodjs using the prisma.js that it comes with?
Edit: The text blocks need to be separate when displaying the post and able to be referenced separately. There is another part of the project that will link to each text block specifically. The user would be choosing how many columns there are before entering any posts (configured in settings), but the rows would have to be updated dynamically. Closest example I can think of is like a test management software where you have precondition, execution steps, and expected results across the top for columns, and each additional step is a row.
Well, there are two routes that you could take. If possible use a NoSQL database, such as mongoDB, which Prisma has support for. There you would be able to create a JSON like structure with as many or as little paragraphs you would like.
If that is not possible a workaround, since SQLite does not support JSON data, you could store the stringified JSON data in a text field, and then parse it. This is not the optimal solution, so if possible use the first one.

What is the best way to do complicated string search on 5M records ? Application layer or DB layer?

I have a use case where I need to do complicated string matching on records of which there are about 5.1 Million of. When I say complicated string matching, I mean using library to do fuzzy string matching. (http://blog.bripkens.de/fuzzy.js/demo/)
The database we use at work is SAP Hana which is excellent for retrieving and querying because it's in memory so I would like to avoid pulling data out of there and re-populating it in memory on the application layer but at the same time I cannot take advantages of the libraries (there is an API for fuzzy matching in the DB but it's not comprehensive enough for us).
What is the middle ground here? If I do pre-processing and associate words in the DB with certain keywords the user might search for I can cut down the overhead but are there any best practises that are employed when It comes to this ?
If it matters. The list is a list of Billing Descriptors (that show up on CC statements) therefore, the user will search these descriptors to find out which companies the descriptor belongs too.
Assuming your "billing descriptor" is a single column, probably of type (N)VARCHAR I would start with a very simple SAP HANA fuzzy search, e.g.:
SELECT top 100 SCORE() AS score, <more fields>
FROM <billing_documents>
WHERE CONTAINS(<bill_descr_col>, <user_input>, FUZZY(0.7))
ORDER BY score DESC;
Maybe this is already good enough when you want to apply your js library on the result set. If not, I would start to experiment with the similarCalculationMode option, like 'similarcalculationmode=substringsearch' etc. And I would always have a look at the response times, they can be higher when using some of the options.
Only if response times are to high, or many active concurrent users are using your query, I would try to create a fuzzy search index on your search column. If you need more search options, you can also create a fullext index.
But that all really depends on you use case, the values you want to compare etc.
There is a very comprehensive set of features and options for different use cases, check help.sap.com/hana/SAP_HANA_Search_Developer_Guide_en.pdf.
In a project we did a free style search on several address columns (name, surname, company name, post code, street) and we got response times of 100-200ms on ca 6 Mio records WITHOUT using any special indexes.

How to query in couchdb with multiple key combinations using javascript without writing separate view for each combination?

I am trying to fetch documents from couchdb based on certain specific filters through javascript. For example i need to get the list of employees from a db where the key can be either city, age, state, gross income, gender or a combination of two or more such keys.
The problem i am facing is as the number of possible keys increase the number of views i need to write also increases drastically. I want to avoid writing so many views. So is it possible to do so ??
In addition to checking out Matt's suggestion about couchdb-lucene, you might also look into list functions: they're quite useful when you have a small set of basic view queries that will reduce the number of records fetched to a manageable level and you want to do a bunch of ad-hoc queries that further filter those records.

Storing large array in MongoDB

I'm working on a little side project which has a search capability. I'm using typeahead.js attached to a REST api built with expressJS and mongoDB. I'm wondering what the best approach to two problems I have it. I'm primarily a front-end guy just starting out with Node and MongoDB. Here are the two issues I need help with. But first a little background to better understand the issues.
The site I'm building allows you to upload videos. You can add tags to these videos. When searching for a video I want to be able to search through these tags using the typeahead.js. Just like YouTube.
So here are the issues.
1 - I have a "tags" collection in MongoDB. When uploading a video I take the tags for that video and add them to this collection which I'll use for predictive searching. As time progresses this collection should have plenty of tags to search through. The issue I'm having is how to insert only the unique tags (the ones that don't already exist). For example say I want to insert the following document into MongoDB:
{
tags: "tag1, tag2, tag3, tag4, tag5, tag6, tag7, tag8"
}
The collection already has "tag1, tag2, tag4 and tag7". So I only want to insert 3, 5, 6 and 8. My issues/question is what would be the best approach to do this. Should I just first query the collection, parse through it and compare each tag, separate the ones that don't exist and then "append" them to the collection? The issue I see with this is that, again, as time progresses this will be alot to parse through. So I'm not sure what the best approach here is.
2 - Would storing all of the tags in a simple array in a collection be the best approach? In time this array will be EXTREMEMLY large. Again I'm not a database guy, so I don't have a great understanding of how to approach an issue like this.
As always any and all help is much appreciated.
Since mongodb can't do joins I would store the tags in each video document a la myVideo.tags = ['sports', 'baseball', 'pitcher']. Then to power your autosuggest I would periodically map/reduce across the videos collection and output the set of active tags to a separate tags collection. You could even compute a popularity score and store something like {tag: 'baseball', score: 156} for the case where the 'baseball' tag was used in 156 videos, and use that to sort your autosuggest results so that more popular tags are shown earlier when the user is typing 'ba' for example 'baseball' is listed before 'baking' because it's a more likely correct completion vs being alphabetically second.
Here's an example of exactly this straight out of the mongodb cookbook.
To point 2 in your question, nope. Never store an unbounded-length set of data as an array within a mongodb document. There's a maximum document size (currently 16MB), so anything that will just grow and grow over time must be a collection of distinct documents.

Categories