Find and replace pdf text with nodeJS - javascript

I'm using NodeJS to do an app that finds and replaces a text in a pdf. I have found some approaches:
Using some npm package, like pdfReader, that converts pdf to json. So I get the text and replaces it with what I want. The problem it's convert the output back to pdf.
The possible solution for the first item it's to convert the PDF to HTML, edit the HTML and convert it back to pdf. But most of the tutorials using NodeJS it's about convert HTML to PDF, not PDF to HTML.
Any solutions for this problem?
Update
I ended up using PDFKit to create the pdf files that i need. In my case, this solution don't to cover all the possibles. But if you have to find a word and replace it in an unpredictable pdf file, maybe this problem has no solution in nodeJS. The PDFKit lib has an open issue for this feature.

Look at this approach how to export json data to pdf file with specify format with Nodejs?. Basically uses your idea. Convert PDF to JSON and then render the JSON in html, then convert the HTML to pdf.

Related

convert word to pdf in React-Native

I want to develope a app like camScaner. In which user can scan any type of file like image, pdf etc and then convert that file from word to pdf. So I want to aske that, is there any way to convert docs file to pdf in react-native?
You can use libreoffice convert for achieve the task
Link >> https://www.npmjs.com/package/libreoffice-convert
Also there is one more library awesome-unoconv which will provide you the same thing and will convert word to pdf
Link >> https://www.npmjs.com/package/awesome-unoconv

Parsing PDF files using Python

(1) Is there a way to search for texts in a pdf file and go to that location in the pdf file using Python?
(2) Is there a way to highlight a text in a pdf file and that text get extracted, using Python?
I tried using Javascript pdf.js, which actually worked but I want to try Python. Any help would be appreciated. Thanks!
For searching for text within a PDF file you can use PyMuPDF or pdfminer. PyMuPDF would also let you create a PDF viewer and highlight the text if that's what you have in mind.

Trying to convert an html file to pdf using wkhtmltopdf

I want to use wkhtmltopdf to convert an html file to pdf. I was trying with various options with the wkhtmltopdf but its not giving the proper output what i wanted. I want to have the pdf with the same format as looking by saving it using control+p
The url is http://raindrops.in/subhashini/view/524e5aa14251df44518b4567
Please help me out how to use it.
If the problem is that you don't have the layout you're looking for, it is probably because you don't use wkhtmltopdf with the correct settings as far as screen size, margins etc. is concerned.
Check the available options here http://madalgo.au.dk/~jakobt/wkhtmltoxdoc/wkhtmltopdf-0.9.9-doc.html
Check your own screen settings (width/height), and use them in wkhtmltopdf.

Generate a Word document in JavaScript with Docx.js?

I am trying to use docx.js to generate a Word document but I can't seem to get it to work.
I copied the raw code into the Google Chrome console after amending line 247 to fix a "'textAlign' undefined error"
if (inNode.style && inNode.style.textAlign){..}
Which makes the function convertContent available. The result of which is an Object e.g.
JSON.stringify( convertContent($('<p>Word!</p>)[0]) )
Results in -
"{"string":
"<w:body>
<w:p>
<w:r>
<w:t xml:space=\"preserve\">Word!</w:t>
</w:r>
</w:p>
</w:body>"
,"charSpaceCount":5
,"charCount":5,
"pCount":1}"
I copied
<w:body>
<w:p>
<w:r>
<w:t xml:space="preserve">Word!</w:t>
</w:r>
</w:p>
</w:body>
into Notepad++ and saved it as a file with an extension of 'docx' but when I open it in MS Word but it says 'cannot be opened because there is a problem with the contents'.
Am I missing some attribute or XML tags or something?
You can generate a Docx Document from a template using docxtemplater (library I have created).
It can replace tags by their values (like a template engine), and also replace images in a paid version.
Here is a demo of the templating engine: https://docxtemplater.com/demo/
This code can't work on a JSFiddle because of the ajaxCalls to local files (everything that is in the blankfolder), or you should enter all files in ByteArray format and use the jsFiddle echo API: http://doc.jsfiddle.net/use/echo.html
I know this is an older question and you already have an answer, but I struggled getting this to work for a day, so I thought I'd share my results.
Like you, I had to fix the textAlign bug by changing the line to this:
if (inNode.style && inNode.style.textAlign)
Also, it didn't handle HTML comments. So, I had to add the following line above the check for a "#text" node in the for loop:
if (inNodeChild.nodeName === '#comment') continue;
To create the docx was tricky since there is absolutely no documentation on this thing as of yet. But looking through the code, I see that it is expecting the HTML to be in a File object. For my purposes, I wanted to use the HTML I rendered, not some HTML file the user has to select to upload. So I had to trick it by making my own object with the same property that it was looking for and pass it in. To save it to the client, I use FileSaver.js, which requires a blob. I included this function that converts base64 into a blob. So my code to implement it is this:
var result = docx({ DOM: $('#myDiv')[0] });
var blob = b64toBlob(result.base64, "application/vnd.openxmlformats-officedocument.wordprocessingml.document");
saveAs(blob, "test.docx");
In the end, this would work for simple Word documents, but isn't nearly sophisticated for anything more. I couldn't get any of my styles to render and I didn't even attempt to get images working. I've since abandoned this approach and am now researching DocxgenJS or some server-side solution.
You may find this link useful,
http://evidenceprime.github.io/html-docx-js/
An online demo here:
http://evidenceprime.github.io/html-docx-js/test/sample.html
You are doing the correct thing codewise, but your file is not a valid docx file. If you look through the docx() function in docx.js, you will see that a docx file is actually a zip containing several xml files.
I am using Open Xml SDK for JavaScript.
http://ericwhite.com/blog/open-xml-sdk-for-javascript/
Basically, on web server, I have a empty docx file as new template.
when user in browser click new docx file, I will retrieve the empty docx file as template, convert it to BASE64 and return it as Ajax response.
in client scripts, you convert the BASE64 string to byte array and using openxmlsdk.js to load the byte array as an javascript OpenXmlPackage object.
once you have the package loaded, you can use regular OpenXmlPart to create a real document. (inserting image, creating table/row ).
the last step is stream it out to end user as a document. this part is security related. in my code I send it back to webserver and gets saved temporarily. and prepare a http response to notify end user to download it.
Check the URL above, there are useful samples of doing this in JavaScript.

Creating MS Word Documents in iPhone using objective C

I have created a Rich Text Editor in UIWebview. My requirement is to save this text in .doc word file. How to achieve this. I am getting html content by using
NSString *strWebText = [webView stringByEvaluatingJavaScriptFromString:#"document.body.innerHTML"];
Now how can I proceed further to convert it in .doc format? Or is there any javascript function to convert text or save text to .doc file?
Microsoft Word can open a .doc file that is really .html and will open it as such. There isnt anyway for you to easily convert your html to a binary .doc file without significant code or the intervention of a server.
if you create a html file from a word doc, you will see the html produced. You will find certain headers at the top copy these in your html and word should open it correctly.
iOS does not have any built-in support for editing rich text or converting between formats. You'll have to find a library or write it yourself.
you can use :--
[strWebText writeToFile:#"Data.doc" atomically:YES encoding:NSUnicodeStringEncoding error:&error];

Categories