Importance of 'Type Tables' (like a BOOK_TYPE table) when defining models using ORM (like ActiveRecord) for web APIs - javascript

By Type Table, I'm talking about things like type in a books table which can map out to a book_types table that lists possible types.
Most implementations I see and just define a string column and use validation to compare it against a set of types ['hardcover', 'softcover', 'ebook']
Is it worth it to create another model BookType and establish a one-to-many relationship with Book table?

The reason tables like this are used in relational models is to avoid data duplication in the database. Each type string will exist on a single row in the type table; this means if you ever need to change that string, you don't have to update every row that uses it. It might not seem likely right now, or likely in every case, but not setting it up in a normalised manner could prove painful at some later point.
There are potentially other benefits:
You might save space by having an integer foreign key ID on each row instead of a large string.
If it would be useful to have a short code alongside a longer descriptive field for each type, a separate type table will save space and make it easier to maintain the description field in the same way as described above.
Such a descriptive field might well be helpful if someone needs to use the data for reporting or business intelligence purposes later on down the line, in cases where a single-word type isn't already clear-cut.

Related

Designing SQLite table for Elements that have custom numbers of custom fields

I have an interesting situation where I'm working with posts. I don't know how the user will want to structure the posts. It would either be one block of text, or structured in an a-> b -> c structure where a, b, and c are all text blocks, and if represented as a table, there would be an unknown number of columns and unknown number of rows.
Outside of the post data, there is the possibility of adding custom attributes to the post. Most of these would be shorter text strings, but an unknown number of them.
Understanding that a json object would probably be the simplest solution, I have to fit this into a self-serving db. SQLite seems to be the current accepted solution for Redwoodjs, the framework I'm building out of. How would I go about storing this kind of data within Redwoodjs using the prisma.js that it comes with?
Edit: The text blocks need to be separate when displaying the post and able to be referenced separately. There is another part of the project that will link to each text block specifically. The user would be choosing how many columns there are before entering any posts (configured in settings), but the rows would have to be updated dynamically. Closest example I can think of is like a test management software where you have precondition, execution steps, and expected results across the top for columns, and each additional step is a row.
Well, there are two routes that you could take. If possible use a NoSQL database, such as mongoDB, which Prisma has support for. There you would be able to create a JSON like structure with as many or as little paragraphs you would like.
If that is not possible a workaround, since SQLite does not support JSON data, you could store the stringified JSON data in a text field, and then parse it. This is not the optimal solution, so if possible use the first one.

How compare every element in a row in Pentaho

I have an Excel and there is an example of how it looks
enter image description here
I am using Pentaho, with the purpose of creating a new row(related to) in which I will show if a person has a relation with another one, I will consider that two-person are related if they have the same Dirección (address). For instance, María Isabel Hevilla Castro and Miguel Manceras Fernández live in the same place, then in relation to of María Isabel Hevilla Castro it will be Miguel Manceras Fernández and on the contrary, in Miguel Manceras Fernández it will be María IsabelHevilla Castro.
I have tried to solve this using a Javascript modified value, but I'm just beginning to learn Javascript and I don't know how to solve this problem.
Could somebody help me, or give me a clue.
If your addresses are clean you can do this with a self-join on Dirección.
The idea is that you sort by Dirección, then duplicate the stream, rename the name field to something else (Nombre2 or Related_to) and inner join them by Dirección. This will result in records for every combination that has the same Dirección, including the person themselves. That is fixed by filtering the rows, keeping only the ones where Nombre is not equal to Nombre2.
The basic flow can be extended with cleanup of address fields (Calculator step can do similarity scores) beforehand or extra processing afterwards for the related_to field.
This is likely better accomplished using a loop in something like Python, R, or Javascript as you already mentioned.
Pentaho is fundamentally designed to process data on a row-by-row basis. There aren't that many functions in Pentaho that allow you to do analysis across a column of data.
If you have to use Pentaho for this rather than something like Python or Javascript, then I'd suggest sorting on the Direccion column, and then using the Analytic query step to analyze across rows. This will probably only work if you have a maximum of two people per address, but this might get you where you need to go.

Basic question concerning indexedDB object store data structure for efficiency in edits versus retrieval of data

I am trying to determine which is the most efficient way to layout data in an indexedDB object store for editing and retrieving. I'm fairly new to these concepts, but the processes of editing and retrieving appear to be opposites in terms of efficiency.
The data I'm dealing with consists of sets of template forms, in which each form, although very similar, can have different numbers of certain pieces of information and different amounts of data for each element. Roughly, though, each form would generally have between 50 and 75 separate data elements, some a boolean value only and others paragraphs of text.
I was planning on saving the data for each form as one entry in an object store. Then I started considering that all the data for a single form has to be retrieved and re-written to the object store for an edit to a single data element. That seemed inefficient, and it appeared better to have multiple entries(keys) for each section of data or even each individual data element in a single form.
That may be better for saving edits to individual data elements in a form, but when the data is retrieved to display an existing form, multiple keys of variable number would have to be retrieved instead of one. So, it appears that what makes saving edits to individual form elements more efficient makes retrieval of all the data for a entire form less efficient.
So, my question is which is more difficult for the browser? To extract and replace a larger object of data for a single key, or many individual extractions of smaller objects comprising the same amount of total data?
Lots of edits will be made but there will also be a lot of moving back and forth between different forms; so, it doesn't appear that one of these processes will take place more or less often than the other making the decision of layout that which makes the most often used most efficient.
I could complicate the code and the data structure in the object store and gain nothing. Or, perhaps, a small set of a fixed number of sections (keys) for each question would split the difference, so to speak.
Thank you for considering my very novice question.
Added Description
I'm not trying to complicate things but just understand better. Please assume there were 1,000 properties total per form. If a user edits one property only on a form, then either the object storing the 1,000 properties has be retrieved with a get and held in a variable, the one property value edited, and then put back. Or, a new object has to be built from all the data in the form and then that object put back as an overwrite of the existing object. So, either way a large object has to be moved and held somewhere outside the database. There is no way to just alter the one value in the database without pulling out all the data.
Do the steps of locating the key, deserializing the object, and reserializing the object, and whatever else is involved consume a considerable portion of the total time regardless of the size of the actual data object? Or would there be a significant difference between editing a key holding a single-property object versus editing one property in a key holding an object of 1,000 properties?
If there is a considerable difference, then on the editing side it would save time because I'm almost never going to save all 1,000 properties at once, but only as they are completed, such as in an onchange event. However, on the retrieve-and-display side, perhaps a cursor or an index would be opened on those 1,000 individual keys to retrieve the data and populate the HTML. That, I assume, would take more time that retrieving and holding one large object and using it to populate the HTML. So there would be some sort of competition between making saving faster versus making retrieving all the form data at once to populate the HTML faster.
I like the one-large-object approach because it simpler to code and follow, but just wanted to understand if there is any visible difference to the user between the two. Thank you.
After working with this for awhile longer, I think I was stupid in that I should have considered that, since these are both asynchronous processes, saving is the lesser concern for user experience than retrieval. Populating the screen with new data quickly is more important than saving it quickly.

What is the best way to do complicated string search on 5M records ? Application layer or DB layer?

I have a use case where I need to do complicated string matching on records of which there are about 5.1 Million of. When I say complicated string matching, I mean using library to do fuzzy string matching. (http://blog.bripkens.de/fuzzy.js/demo/)
The database we use at work is SAP Hana which is excellent for retrieving and querying because it's in memory so I would like to avoid pulling data out of there and re-populating it in memory on the application layer but at the same time I cannot take advantages of the libraries (there is an API for fuzzy matching in the DB but it's not comprehensive enough for us).
What is the middle ground here? If I do pre-processing and associate words in the DB with certain keywords the user might search for I can cut down the overhead but are there any best practises that are employed when It comes to this ?
If it matters. The list is a list of Billing Descriptors (that show up on CC statements) therefore, the user will search these descriptors to find out which companies the descriptor belongs too.
Assuming your "billing descriptor" is a single column, probably of type (N)VARCHAR I would start with a very simple SAP HANA fuzzy search, e.g.:
SELECT top 100 SCORE() AS score, <more fields>
FROM <billing_documents>
WHERE CONTAINS(<bill_descr_col>, <user_input>, FUZZY(0.7))
ORDER BY score DESC;
Maybe this is already good enough when you want to apply your js library on the result set. If not, I would start to experiment with the similarCalculationMode option, like 'similarcalculationmode=substringsearch' etc. And I would always have a look at the response times, they can be higher when using some of the options.
Only if response times are to high, or many active concurrent users are using your query, I would try to create a fuzzy search index on your search column. If you need more search options, you can also create a fullext index.
But that all really depends on you use case, the values you want to compare etc.
There is a very comprehensive set of features and options for different use cases, check help.sap.com/hana/SAP_HANA_Search_Developer_Guide_en.pdf.
In a project we did a free style search on several address columns (name, surname, company name, post code, street) and we got response times of 100-200ms on ca 6 Mio records WITHOUT using any special indexes.

Easiest, Most Efficient way of associating data from different tables Without Associations

This is something that has intrigued me a lot recently. It's a general SQL/Relational Database problem coming from a guy who prefers Mongo.
What I want is, to, as such, associate data from different tables in the most efficient, easiest way, without using associations and assuming I can't restructure or re-model the db.
So, for example, with FQL (which doesn't have associations), if I asked for the name and eid of all the events my current user has been invited to, I'd also like to know whether my current user is going, but that info is in the 'event_member' table.
In this instance I've an interest in another column (rsvp_status) in event_member, one that I'd like to be associated with the columns from event, i.e eid and name.
In this case the instinct may be to say that since every event has a name, an eid and a rsvp_status then we could say sort by eid and then match each nth item (for n=1 to whatever), because there's guaranteed to be the same number, but there are many cases when we can't do that.
And I know I could do separate queries and then iterate through and match them by eid, but basically I'm looking for a generic, simple,efficient solution for the associations idea if one exists. Preferably in javascript.
What you are looking for here is a simple JOIN of two or more tables. http://www.w3schools.com/sql/sql_join.asp
You do not have to have any relations between tables in order to perform JOINS. The relations are just a constraint to ensure that bad/invalid data can't propagete to the tables. For example an event_member with eid of unexisting user. Anyway you are free to JOIN tables as you like :)
Here is a way to connect to Sql Server using javascript How to connect to SQL Server database from JavaScript in the browser?

Categories