Fillable PDF to HTML - javascript

Is there any way to create simple fillable embeded PDFs that allows me to extract the text via JS or ASP?
Now I know there are some libraries like iTextSharp, pdf2html etc. but I have found that these are just either overly complex or insufficient for my needs.
The scenario is this, I am trying to embed a tax document which the client may fill out, upon saving the document, the fields are then extracted into an object. As of now I have converted the PDF to SVG with inkscape but this still feels a bit bloated.
I just want to iterate through each field and store it accordingly.
Here's an example of one of the documents:
http://www.cra-arc.gc.ca/E/pbg/tf/t4/t4flat-fill-13b.pdf

One of the ways is to employ FDF or XFDF submits.
Basically, browser displays the PDF, user fills it and clicks a submit button. PDF viewer sends information about filled fields to specified URL.
You can choose format of the submit while creating the PDF.
Following is from the XML Forms Data Format Specification
FDF is a simplified version of PDF. PDF and FDF represent information
with a key/value pair, also referred to as an entry. This example
shows the T and V keys with values enclosed in parentheses:
/T(Street)/V(345 Park Ave.)
XFDF, on the other hand, represents an entry with an XML
element/content or attribute/value pair, as shown in the correspond
XFDF:
<field name="Street">
<value>345 Park Ave.</value>
</field>
Please make sure that not all PDF viewers might be able to submit forms data.

Related

How to automatically generate pre-filled links of google forms

I hope you're doing well.
I want to share a google form with some friends to fill their personal preferences for a trip.
I have some people's information on excel but not all of them.
So I found that it is possible to send personalized pre-filled form URLs to your recipients, but it is very manual : https://support.yet-another-mail-merge.com/hc/en-us/articles/115004266085-Send-personalized-pre-filled-form-URLs-to-your-recipients
Do you know an easy way to automate the generation of pre-filled forms to my friends ?
Generate the manual URL once with obvious place holder data.
Then create an excel file with all the data you want to pre-fill.
Use an excel formula to create the URLs. Copy the column with the formulas and paste as values. The links are now unique to the data in their row.
Use an ampersands to concatenate the data, quotes “around text” and use the substitute function to encode special characters to HTML, Below the cell containing first entry is A2.
You can nest substitutions like so SUBSTITUTE(SUBSTITUTE(A2," ","%20"),"'","%27") to handle data with spaces and, in this case, apostrophes.
Click here to see how that would look in MS Excel

Fill pdf form created in scribus from client side javascript

I have a pdf form, into which I want to fill a password generated in javascript, so that the user can print it. The password is sensitive and may not be send to the server, so this has to happen in client side javascript. In this post it is possible using adobe acrobat.
The Idea is, that one creates a pre-filled form with a unique value, and than replaces that value using somple search and replace in javascript when generating the final pdf for displaying the user.
Since I do not own actobat, I thought I try it with scribus.
I generated a test form in scribus and gave it the prefilled value %HELLO%. But looking at the resuling pdf, I do not see that I can replace the %HELLO% value by the password with simple text replacement.
It turns out, while this post already gives the answer in the code it does not explain it.
The value of TextField has to be converted to a sequence of hex-encoded unicode characters (so each 4 digits) and it has to start with "fffe". Using this string, one can do the search and replace in the pdf document.
The code also updates the "xref" in the pdf, which one has to do when the length of the pdf changes (or some elements are positioned different in the file). Since I did not change the length of the value of the TextField, I did not have to do that.

Transfer Text notes taken on a webpage into a word doc

I'm fairly new to HTML/CSS/Javascript etc so forgive me if I totally noob all over the place here. I'll provide as much context as I can to help you understand what I'm trying to accomplish.
I work in IT as an infrastructure incident mgr for a large company. While working incidents, we typically need to contact several people using 2 different web apps, take notes of what transpires in word, who did what to fix A,B,C or look up instructions regarding certain situations. I'm trying to build an application that consolidates all of these under a single pane of glass.
For the purposes of this thread, I'll solely be discussing the notes taking aspect which covers several areas. At my company for reasons outside the scope of this discussion, it was decided that notes would be entered into word and not the ticketing system. Why not something better? Its Above my pay grade, don't really care...
We use a word doc template that has predefined areas that you fill in as the incident progresses. Example:
Incident Ticket(s): INC12345
Problem Ticket(s): PMI12345
Change Ticket(s): etc
Vendor Ticket(s):
Date: 9/18/15
Incident Manager that started incident: Billy Jo
Incident Manager that ended incident:
Summaries:
Subject/Title (from Alert):
So I have a text area floated to the right of the page that I would like to do at least 2 things to start with:
Have it ask you every line of question in the template that you fill in the values for. i.e Incident ticket: inc11111. When you fill in the answer and hit submit, it populates that field in word.
There's a notes section at the bottom of the word template. I'm thinking of entering a notes only textarea that strictly populates that part of the template ALONG with timestamps from your computer clock.
I've searched the internet for several hours trying to find something highlighting how you might do this but only see office docs telling you how to copy and paste text into a word doc. Please let me know if you need any specific info
You want to have to your users input data via a web-based form, and you want this to result in creation of a word document. The missing link is the server application in-between.
You would create an HTML form as normal, with whatever fields / inputs / validations you needed. On submitting this form, the data would be sent to your server. Here you would need to implement a server application that accepts form based inputs and generates a file from them (in this case, a word document).
The process of actually generating the word documents is most likely not trivial, but also not impossible. A quick search for generating word docs on the server throws up a few resources:
Building Server-Side Document Generation
Create and Manipulate Word Documents Programmatically Using DocX
Generating a Downloadable Word Document in the Browser
Once the document had been created, it could either be offered for download, or automatically emailed somewhere.

PHP Equivalent of Coldfusion's cfpdfform

I have a legal application using government forms. These forms are fill in PDFs (FDF Data).
We have sets of data in JSON format stored in a database. I want to be able to take that data an insert it into the fill-in pdf. Coldfusion's cfpdfform seems to do that quite well. However Coldfusion appears to me to have some off-beat JSON formatting.
So, my request is simply, what is the best way to populate a pdf fill-in form with data in PHP or javascript?
In the alternative, our json data contains a number of objects and arrays within it. Is there any, non-tedious way of getting Coldfusion to understand its formatting without numerous cfloops within the datafile to get it into a struct?
Thanks so much.
I had a similar project spec last year. We had a 50-page legacy fillable pdf form that we wanted to bring up-to-date and integrate into a panel review workflow. I hit countless roadblocks, mostly due to end-user environments.
My ultimate solution was slightly out-of-the-box, but you may consider something similar:
I built the actual interactive form as a traditional HTML5\jQuery\CSS3 view that contained the form and methods for loading and saving form data to SQL. The business logic employed TCPDF (I think that's the lib I used - maybe FPDF of something like that) and an alternate stylesheet that re-renders the form data to classic, printable PDF.
I can't promise this is the best solution for your situation, but it nailed it for us.
You may take a look at our SetaPDF-FormFiller component (not free!). It allows you to fill in PDF forms in pure PHP.
You only need a kind of mapping logic from your json-objects to the PDF form field names. The filing process is that simple:
$writer = new SetaPDF_Core_Writer_Http('pdf-form-filled.pdf');
$document = SetaPDF_Core_Document::loadByFilename('pdf-form.pdf', $writer);
$formFiller = new SetaPDF_FormFiller($document);
$fields = $formFiller->getFields();
$fields['name']->setValue($jsonData->name);
$fields['gender']->setValue($jsonData->gender);
...
$document->save()->finish();

What is the best way to store a field that supports markdown in my database when I need to render both HTML and "simple text" views?

I have a database and I have a website front end. I have a field in my front end that is text now but I want it to support markdown. I am trying to figure out the right was to store in my database because I have various views that needs to be supported (PDF reports, web pages, excel files, etc)?
My concern is that since some of those views don't support HTML, I don't just want to have an HTML version of this field.
Should I store 2 copies (one text only and one HTML?), or should I store HTML and on the fly try to remove them HTML tags when I am rendering out to Excel for example?
I need to figure out correct format (or formats) to store in the database to be able to render both:
HTML, and
Regular text (with no markdown or HTML syntax)
Any suggestions would be appreciated as I don't want to go down the wrong path. My point is that I don't want to show any HTML tags or markdown syntax in my Excel output.
Decide like this:
Store the original data (text with markdown).
Generate the derived data (HTML and plaintext) on the fly.
Measure the performance:
If it's acceptable, you're done, woohoo!
If not, cache the derived data.
Caching can be done in many ways... you can generate the derived data immediately, and store it in the database, or you can initially store NULLs and do the generation lazily (when and if it's needed). You can even cache it outside the database.
But whatever you do, make sure the cache is never "stale" - i.e. when the original data changes, the derived data in the cache must be re-generated or at least marked as "dirty" somehow. One way to do that is via triggers.
You need to store your data in a canonical format. That is, in one true format within your database. It sounds like this format should be a text column that contains markdown. That answers the database-design part of your question.
Then, depending on what format you need to export, you should take the canonical format and convert it to the required output format. This might be just outputting the markdown text, or running it through some sort of parser to remove the markdown or convert it to HTML.
Most everyone seems to be saying to just store the data as HTML in the database and then process it to turn it into plain text. In my opinion there are some downsides to that:
You will likely need application code to strip the HTML and extract the plain text. Imagine if you did this in SQL Server. What if you want to write a stored procedure/query that has the plain text version? How do you extract plain text in SQL? It's possible with a function, but it's a lot of work.
Processing the HTML blob can be slow. I would imagine for small HTML blobs it will be very fast, but there is certainly more overhead than just reading a plain text field.
HTML parsers don't always work well/they can be complex. The idea is that your users can be very creative and insert blobs that won't work well with your parser. I know from experience that it's not always trivial to extract plain text from HTML well.
I would propose what most email providers do:
Store a rich text/HTML version and a plain text version. Two fields in the database.
As is the use case with email providers, the users might want those two fields to have different content.
You can write a UI function that lets the user enter in HTML and then transforms it via the application into a plain text version. This gives the user a nice starting point and they can massage/edit the plain text version before saving to the database.
Always store the source, in your case it is markdown.
Also store the formats that are frequently used.
Use on demand conversion/rendering for less frequent used formats.
Explanation:
Always have the source. You may need it for various purpose, e.g. the same input can be edited, audit trail, debugging etc etc.
No overhead for processor/ram if the same format is frequently requested, you are trading it with the disk storage which is cheap comparing to the formars.
Occasional overhead, see the #2
I would suggest to store it in the HTML format, since is the richest one in this case, and remove the tags when obtaining the data for other formats (such PDF, Latex or whatever). In the following question you'll find a way to remove tags easily.
Regular expression to remove HTML tags
From my point of view, storing data (original and downgraded) in two separate fields is a waste of space, but also an integrity problem, since one of the fields could be -in theory- modified without changing the second one.
Good luck!
I think that what I'd do - if storage is not an issue - would be store the canonical version, but automatically generate from it, in persisted, computed fields, whatever other versions one might need. You want the fields to be persisted because it's pointless doing the conversion every time you need the data. And you want them to be computed because you don't want them to get out of synch with the canonical version.
In essence this is using the database as a cache for the other versions, but a cache that guarantees you data integrity.

Categories