What is the best way to store a field that supports markdown in my database when I need to render both HTML and "simple text" views? - javascript

I have a database and I have a website front end. I have a field in my front end that is text now but I want it to support markdown. I am trying to figure out the right was to store in my database because I have various views that needs to be supported (PDF reports, web pages, excel files, etc)?
My concern is that since some of those views don't support HTML, I don't just want to have an HTML version of this field.
Should I store 2 copies (one text only and one HTML?), or should I store HTML and on the fly try to remove them HTML tags when I am rendering out to Excel for example?
I need to figure out correct format (or formats) to store in the database to be able to render both:
HTML, and
Regular text (with no markdown or HTML syntax)
Any suggestions would be appreciated as I don't want to go down the wrong path. My point is that I don't want to show any HTML tags or markdown syntax in my Excel output.

Decide like this:
Store the original data (text with markdown).
Generate the derived data (HTML and plaintext) on the fly.
Measure the performance:
If it's acceptable, you're done, woohoo!
If not, cache the derived data.
Caching can be done in many ways... you can generate the derived data immediately, and store it in the database, or you can initially store NULLs and do the generation lazily (when and if it's needed). You can even cache it outside the database.
But whatever you do, make sure the cache is never "stale" - i.e. when the original data changes, the derived data in the cache must be re-generated or at least marked as "dirty" somehow. One way to do that is via triggers.

You need to store your data in a canonical format. That is, in one true format within your database. It sounds like this format should be a text column that contains markdown. That answers the database-design part of your question.
Then, depending on what format you need to export, you should take the canonical format and convert it to the required output format. This might be just outputting the markdown text, or running it through some sort of parser to remove the markdown or convert it to HTML.

Most everyone seems to be saying to just store the data as HTML in the database and then process it to turn it into plain text. In my opinion there are some downsides to that:
You will likely need application code to strip the HTML and extract the plain text. Imagine if you did this in SQL Server. What if you want to write a stored procedure/query that has the plain text version? How do you extract plain text in SQL? It's possible with a function, but it's a lot of work.
Processing the HTML blob can be slow. I would imagine for small HTML blobs it will be very fast, but there is certainly more overhead than just reading a plain text field.
HTML parsers don't always work well/they can be complex. The idea is that your users can be very creative and insert blobs that won't work well with your parser. I know from experience that it's not always trivial to extract plain text from HTML well.
I would propose what most email providers do:
Store a rich text/HTML version and a plain text version. Two fields in the database.
As is the use case with email providers, the users might want those two fields to have different content.
You can write a UI function that lets the user enter in HTML and then transforms it via the application into a plain text version. This gives the user a nice starting point and they can massage/edit the plain text version before saving to the database.

Always store the source, in your case it is markdown.
Also store the formats that are frequently used.
Use on demand conversion/rendering for less frequent used formats.
Explanation:
Always have the source. You may need it for various purpose, e.g. the same input can be edited, audit trail, debugging etc etc.
No overhead for processor/ram if the same format is frequently requested, you are trading it with the disk storage which is cheap comparing to the formars.
Occasional overhead, see the #2

I would suggest to store it in the HTML format, since is the richest one in this case, and remove the tags when obtaining the data for other formats (such PDF, Latex or whatever). In the following question you'll find a way to remove tags easily.
Regular expression to remove HTML tags
From my point of view, storing data (original and downgraded) in two separate fields is a waste of space, but also an integrity problem, since one of the fields could be -in theory- modified without changing the second one.
Good luck!

I think that what I'd do - if storage is not an issue - would be store the canonical version, but automatically generate from it, in persisted, computed fields, whatever other versions one might need. You want the fields to be persisted because it's pointless doing the conversion every time you need the data. And you want them to be computed because you don't want them to get out of synch with the canonical version.
In essence this is using the database as a cache for the other versions, but a cache that guarantees you data integrity.

Related

CKEditor 5 - allow span elements and attributes

I am trying to make a custom plugin with CKEditor 5 Framework. However I am not able to insert (via editor.setData()) any attributes for paragraphs and other elements like span. Is there any way to achieve that?
Thanks!
CKEditor 5 implements a custom data model about which you can read more in the Architecture introduction guide.
The existence of a custom data model means that the editor needs to know how to convert that model to a view structure (the DOM) for editing. Also, since typically the editor outputs HTML (or a structurally "compatible" format as Markdown, BBCode, etc.) similar conversion needs to be done to get the data from the editor. Finally, the editor needs to be able to convert the view to the model so you are able to load data into the editor.
Side note: You might also want to save the model directly into your database which would save you from converting the view to the model (on setData()), but while possible it still means that the editor needs to know how to convert the model to the view for editing and the view to the model for pasting.
What does all this mean? It means that unless a particular piece of content can be picked by an existing editor feature, it will be dropped. It simply won't be converted from the view to the model on data load and hence will be forgotten.
Therefore, it's all about converters. You need to teach your editor how to understand HTML and how to render HTML. Actually, you also need to teach it how these particular pieces of (at this point) the model can be edited (by configuring the schema and implementing a proper UI).
So, how to write converters and configure the schema?
Well, this is a problem at the moment because right now (as of Dec 2017) we're in a middle of a CKEditor 5 engine refactoring. The architecture we have is great but the APIs proved to be too hard to use, so we're now improving them which means that whatever I'd write here would be invalid next month. So, instead, I recommend going through the source of the CKEditor 5 packages (e.g. see the plugins in the basic styles package).

Strategies for changing saved app data format without breaking existing saves (JavaScript+IndexedDB)

I have an app that indexes thousands of files and stores information about the files and how they relate in JSON format on the user's computer. I'm using JavaScript and IndexedDB. The important points are the data isn't stored in a central database I control, it must be in JSON format and there's lots of data.
As I add more features in the future, it's likely I'll want to change the JSON format e.g. adding new fields, renaming fields, normalising data that wasn't normalised before.
I haven't released the app yet and I'm nervous about doing so because 1) if I change the data format, I have to be careful I don't break loading of data in the previous format 2) having to account for old data formats will slow down how aggressively I can change the app.
Are there any strategies I can use to lessen the impact file format changes have on my development speed and risk of bugs?
That's why you have to specify a version when you open the database. Then if your schema changes, increment the version and write code in your onupgradeneeded handler to deal with altering the stored data from old versions.
What dumbmatter said, but another thing to consider is to store a version field in the object itself. Read this in first, then dynamically determine how to interpret the object's other fields.

Best Way To Show Formatted Text From SQL on Client?

What is the best way to handle formatted text that is saved in SQL but needs to be shown on the client? The original, that is going to be saved in SQL, is in google docs and PDF right now.
I have had two suggestions so far:
Just copy and paste the text as is, and then put it in <pre></pre> tags. This seems like useless advice, since it's not like I can save tables in a string.
Convert to html. Save html in the SQL string, take it to the client, and just show the result.
The second option seems straight-forward enough, but I have no idea what the accepted approach is when doing things like these, so wanted to ask. Also, please let me know the common things to watch out for. For example, it seems some people do something with JSON parse instead of just html.
edit: The document in question is a huge legal document with tables, bullet points, different fonts, etc, etc.
I would reccomend converting the text to html before saving it to the sql server. Here is a link to the google docs api that covers opening and converting a document to html. This might be a good place to start.
It depends on what your use of the text/field is.
Document Storage
You can save your PDF documents as blobs in a SQL table, this does not make it easy to modify those documents however.
Modifiable text
Copy and pasting text is tedious and very likely to lose formatting, unless you use a WYSIWYG frontend to re-parse the formatting into html (http://ckeditor.com/) or another.
Converting the files directly into html would be 'faster' but also less control over the stored data.

Using client-side tools to store all the words in a dictionary

Right now I'm planning out how to make a page that uses my own implementation of a spellcheck algorithm. This of course requires reading from a dictionary. I downloaded a text file of all the words in the English dictionary and my plan was to store them in a JavaScript array on page load. I understand that the better way to do this would be to use server-side tools -- write a script that places all the words from the text file into rows of a table in my MySQL database, implement the part of my algorithm that checks for the word by using PHP connected to that table, etc. -- but right now I don't have a good enough knowledge of server-side tools and I'm just making this page for fun anyways. I was wondering:
(1) Is what I'm trying to do completely idiotic (even though I've checked and verified that a JavaScript array will be able to hold all the words)?
(2) Is there any type of fast-lookup structure I can use in JavaScript, or does the interpreter a.k.a. broswer implement large arrays as a hash table or binary tree anyways?
(3) When I put <body onload="get_words()">, I want get_words() to return a variable that is the array of my words or I somehow want the function to create a global variable. How do I do this? Do I have to put something like <body onload="dict=get_words()"> ?????

create html elements on the serverside VS get data as JSON and create tags with javascript

I want to create a AJAX search to find and list topics in a forum (just topic link and subject).
The question is: Which one of the methods is better and faster?
GET threads list as a JSON string and convert it to an object, then loop over items and create a <li/> or <tr>, write data (link, subject) and append it to threads list. (jQuery Powered)
GET threads list which it wrapped in HTML tags and print it (or use innerHTML and $(e).html())
Thanks...
I prefer the second method.
I figure server-side you have to either convert your data to JSON or html format so why not go directly to the one the browser understands and avoid having to reprocess it client-side. Also you can easily adapt the second method to degrade gracefully for users who have disabled JavaScript (such that they still see the results via standard non-JS links.)
I'm not sure which way is better (I assume the second method is better as it would seem to touch the data less) but a definitive way to found out is try both ways and measure which one does better.
'Faster' is probably the second method.
'Better' is probably subjective.
For example, I've been in situations (as a front end dev) where I couldn't alter the html the server was returning and i wished they would have just delivered a json object so i could design the page how i wanted.
Also, (perhaps not specific to your use case), serving up all the html on initial page load could increase the page size and load time.
Server generated HTML is certainly faster if the javascript takes long time to process the JSON and populate the html.
However, for maintainability, JS is better. You can change HTML generation just by changing JS, not having to update server side code, making a delta release etc etc.
Best is to measure how slow it really is. Sometimes we think it is slow, but then you try it out in real world and you don't really see a big difference. You might have the major delay in transmitting the JSON object. That delay will still be there and infact increase if you send an html representation from the server.
So, if you bottleneck really is parsing JSON and generating html, not the transmission from server, then sending html from server makes sense.
However, you can do a lot of optimization in producing the html and parsing JSON. There are so many tricks to make that faster. Best if you show me the code and I can help you make a fast JS based implementation or can tell you to do it on the server.

Categories