Designing SQLite table for Elements that have custom numbers of custom fields - javascript

I have an interesting situation where I'm working with posts. I don't know how the user will want to structure the posts. It would either be one block of text, or structured in an a-> b -> c structure where a, b, and c are all text blocks, and if represented as a table, there would be an unknown number of columns and unknown number of rows.
Outside of the post data, there is the possibility of adding custom attributes to the post. Most of these would be shorter text strings, but an unknown number of them.
Understanding that a json object would probably be the simplest solution, I have to fit this into a self-serving db. SQLite seems to be the current accepted solution for Redwoodjs, the framework I'm building out of. How would I go about storing this kind of data within Redwoodjs using the prisma.js that it comes with?
Edit: The text blocks need to be separate when displaying the post and able to be referenced separately. There is another part of the project that will link to each text block specifically. The user would be choosing how many columns there are before entering any posts (configured in settings), but the rows would have to be updated dynamically. Closest example I can think of is like a test management software where you have precondition, execution steps, and expected results across the top for columns, and each additional step is a row.

Well, there are two routes that you could take. If possible use a NoSQL database, such as mongoDB, which Prisma has support for. There you would be able to create a JSON like structure with as many or as little paragraphs you would like.
If that is not possible a workaround, since SQLite does not support JSON data, you could store the stringified JSON data in a text field, and then parse it. This is not the optimal solution, so if possible use the first one.

Related

How compare every element in a row in Pentaho

I have an Excel and there is an example of how it looks
enter image description here
I am using Pentaho, with the purpose of creating a new row(related to) in which I will show if a person has a relation with another one, I will consider that two-person are related if they have the same Dirección (address). For instance, María Isabel Hevilla Castro and Miguel Manceras Fernández live in the same place, then in relation to of María Isabel Hevilla Castro it will be Miguel Manceras Fernández and on the contrary, in Miguel Manceras Fernández it will be María IsabelHevilla Castro.
I have tried to solve this using a Javascript modified value, but I'm just beginning to learn Javascript and I don't know how to solve this problem.
Could somebody help me, or give me a clue.
If your addresses are clean you can do this with a self-join on Dirección.
The idea is that you sort by Dirección, then duplicate the stream, rename the name field to something else (Nombre2 or Related_to) and inner join them by Dirección. This will result in records for every combination that has the same Dirección, including the person themselves. That is fixed by filtering the rows, keeping only the ones where Nombre is not equal to Nombre2.
The basic flow can be extended with cleanup of address fields (Calculator step can do similarity scores) beforehand or extra processing afterwards for the related_to field.
This is likely better accomplished using a loop in something like Python, R, or Javascript as you already mentioned.
Pentaho is fundamentally designed to process data on a row-by-row basis. There aren't that many functions in Pentaho that allow you to do analysis across a column of data.
If you have to use Pentaho for this rather than something like Python or Javascript, then I'd suggest sorting on the Direccion column, and then using the Analytic query step to analyze across rows. This will probably only work if you have a maximum of two people per address, but this might get you where you need to go.

MySQL suggestions on DB design of N° values in 1 column or 1 column for value

I need to move my local project to a webserver and it is time to start saving things locally (users progress and history).
The main idea is that the webapp every 50ms or so will calculate 8 values that are related to the user which is using the webapp.
My questions are:
Should i use MySQL to store the data? At the moment im using a plain text file with a predefined format like:
Option1,Option2,Option3
Iteration 1
value1,value2,value3,value4,value5
Iteration 2
value1,value2,value3,value4,value5
Iteration 3
value1,value2,value3,value4,value5
...
If so, should i use 5 (or more in the future) columns (one for each value) and their ID as Iteration? Keep in mind i will have 5000+ Iterations per session (roughly 4mins)
Each users can have 10-20 sessions a day.
Will the DB become too big to be efficient?
Due to the sample speed a call to the DB every 50 ms seems a problem to me (especially since i have to animate the webpage heavily). I was wondering if it would be better to implement a Save button which populate all the DB with all the 5000+ values in one go. If so what could it be the best way?
Would it be better to save the *.txt directly in a folder in the webserver? Something like DB/usernameXYZ/dateZXY/filename_XZA.txt . To me yes, way less effort. If so which is the function that allows me to do so (possible JS/HTML).
The rules are simple, and are discussed in many Q&A here.
With rare exceptions...
Do not have multiple tables with the same schema. (Eg, one table per User)
Do not splay an array across columns. Use another table.
Do not put an array into a single column as a commalist. Exception: If you never use SQL to look at the individual items in the list, then it is ok for it to be an opaque text field.
Be wary of EAV schema.
Do batch INSERTs or use LOAD DATA. (10x speedup over one-row-per-INSERT)
Properly indexed, a billion-row table performs just fine. (Problem: It may not be possible to provide an adequate index.)
Images (a la your .txt files) could be stored in the filesystem or in a TEXT column in the database -- there is no universal answer of which to do. (That is, need more details to answer your question.)
"calculate 8 values that are related to the user" -- to vague. Some possibilities:
Dynamically computing a 'rank' is costly and time-consuming.
Summary data is best pre-computed
Huge numbers (eg, search hits) are best approximated
Calculating age from birth date - trivial
Data sitting in the table for that user is, of course, trivial to get
Counting number of 'friends' - it depends
etc.

Importance of 'Type Tables' (like a BOOK_TYPE table) when defining models using ORM (like ActiveRecord) for web APIs

By Type Table, I'm talking about things like type in a books table which can map out to a book_types table that lists possible types.
Most implementations I see and just define a string column and use validation to compare it against a set of types ['hardcover', 'softcover', 'ebook']
Is it worth it to create another model BookType and establish a one-to-many relationship with Book table?
The reason tables like this are used in relational models is to avoid data duplication in the database. Each type string will exist on a single row in the type table; this means if you ever need to change that string, you don't have to update every row that uses it. It might not seem likely right now, or likely in every case, but not setting it up in a normalised manner could prove painful at some later point.
There are potentially other benefits:
You might save space by having an integer foreign key ID on each row instead of a large string.
If it would be useful to have a short code alongside a longer descriptive field for each type, a separate type table will save space and make it easier to maintain the description field in the same way as described above.
Such a descriptive field might well be helpful if someone needs to use the data for reporting or business intelligence purposes later on down the line, in cases where a single-word type isn't already clear-cut.

SQL joining a large amount of tables and comparing queries to return a specific result

I am working with a database that was handed down to me. It has approximately 25 tables, and a very buggy query system that hasn't worked correctly for a while. I figured, instead of trying to bug test the existing code, I'd just start over from scratch. I want to say before I get into it, "I'm not asking anyone to build the code for me". I'm not that lazy, all I want to know is, what would be the best way to lay out the code? The existing query uses "JOIN" to combine the results of all the tables in one variable, and spits it into the query. I have been told in other questions displaying this code, that it's just too much, and far too many bugs to try to single out what is causing the break.
What would be the most efficient way to query these tables that reference each other?
Example: Person chooses car year, make, model. PHP then gathers that information, and queries the SQL database to find what parts have matching year, vehicle id's, and parts compatible. It then uses those results to pull parts that have matching car model id's, OR vehicle id's(because the database was built very sloppily, and compares all the different tables to produce: Parts, descriptions, prices, part number, sku number, any retailer notes, wheelbase, drive-train compatibility, etc.
I've been working on this for two weeks, and I'm approaching my deadline with little to no progress. I'm about to scrap their database, and just do data entry for a week, and rebuild their mess if it would be easier, but if I can use the existing pile of crap they've given me, and save some time, I would prefer it.
Would it be easier to do a couple queries and compare the results, then use those results to query for more results, and do it step by step like that, or is one huge query comparing everything at once more efficient?
Should I use JOIN and pull all the tables at once and compare, or pass the input into individual variables, and pass the PHP into javascript on the client side to save server load? Would it be simpler to break the code up so I can identify the breaking points, or would using one long string decrease query time, and server loads? This is a very complex question, but I just want to make sure there aren't too many responses asking for clarification on trivial areas. I'm mainly seeking the best advice possible on how to handle this complicated situation.
Rebuild the database then make a php import to bring over the data.

What is the best way to store a field that supports markdown in my database when I need to render both HTML and "simple text" views?

I have a database and I have a website front end. I have a field in my front end that is text now but I want it to support markdown. I am trying to figure out the right was to store in my database because I have various views that needs to be supported (PDF reports, web pages, excel files, etc)?
My concern is that since some of those views don't support HTML, I don't just want to have an HTML version of this field.
Should I store 2 copies (one text only and one HTML?), or should I store HTML and on the fly try to remove them HTML tags when I am rendering out to Excel for example?
I need to figure out correct format (or formats) to store in the database to be able to render both:
HTML, and
Regular text (with no markdown or HTML syntax)
Any suggestions would be appreciated as I don't want to go down the wrong path. My point is that I don't want to show any HTML tags or markdown syntax in my Excel output.
Decide like this:
Store the original data (text with markdown).
Generate the derived data (HTML and plaintext) on the fly.
Measure the performance:
If it's acceptable, you're done, woohoo!
If not, cache the derived data.
Caching can be done in many ways... you can generate the derived data immediately, and store it in the database, or you can initially store NULLs and do the generation lazily (when and if it's needed). You can even cache it outside the database.
But whatever you do, make sure the cache is never "stale" - i.e. when the original data changes, the derived data in the cache must be re-generated or at least marked as "dirty" somehow. One way to do that is via triggers.
You need to store your data in a canonical format. That is, in one true format within your database. It sounds like this format should be a text column that contains markdown. That answers the database-design part of your question.
Then, depending on what format you need to export, you should take the canonical format and convert it to the required output format. This might be just outputting the markdown text, or running it through some sort of parser to remove the markdown or convert it to HTML.
Most everyone seems to be saying to just store the data as HTML in the database and then process it to turn it into plain text. In my opinion there are some downsides to that:
You will likely need application code to strip the HTML and extract the plain text. Imagine if you did this in SQL Server. What if you want to write a stored procedure/query that has the plain text version? How do you extract plain text in SQL? It's possible with a function, but it's a lot of work.
Processing the HTML blob can be slow. I would imagine for small HTML blobs it will be very fast, but there is certainly more overhead than just reading a plain text field.
HTML parsers don't always work well/they can be complex. The idea is that your users can be very creative and insert blobs that won't work well with your parser. I know from experience that it's not always trivial to extract plain text from HTML well.
I would propose what most email providers do:
Store a rich text/HTML version and a plain text version. Two fields in the database.
As is the use case with email providers, the users might want those two fields to have different content.
You can write a UI function that lets the user enter in HTML and then transforms it via the application into a plain text version. This gives the user a nice starting point and they can massage/edit the plain text version before saving to the database.
Always store the source, in your case it is markdown.
Also store the formats that are frequently used.
Use on demand conversion/rendering for less frequent used formats.
Explanation:
Always have the source. You may need it for various purpose, e.g. the same input can be edited, audit trail, debugging etc etc.
No overhead for processor/ram if the same format is frequently requested, you are trading it with the disk storage which is cheap comparing to the formars.
Occasional overhead, see the #2
I would suggest to store it in the HTML format, since is the richest one in this case, and remove the tags when obtaining the data for other formats (such PDF, Latex or whatever). In the following question you'll find a way to remove tags easily.
Regular expression to remove HTML tags
From my point of view, storing data (original and downgraded) in two separate fields is a waste of space, but also an integrity problem, since one of the fields could be -in theory- modified without changing the second one.
Good luck!
I think that what I'd do - if storage is not an issue - would be store the canonical version, but automatically generate from it, in persisted, computed fields, whatever other versions one might need. You want the fields to be persisted because it's pointless doing the conversion every time you need the data. And you want them to be computed because you don't want them to get out of synch with the canonical version.
In essence this is using the database as a cache for the other versions, but a cache that guarantees you data integrity.

Categories