Parsing PDF files using Python - javascript

(1) Is there a way to search for texts in a pdf file and go to that location in the pdf file using Python?
(2) Is there a way to highlight a text in a pdf file and that text get extracted, using Python?
I tried using Javascript pdf.js, which actually worked but I want to try Python. Any help would be appreciated. Thanks!

For searching for text within a PDF file you can use PyMuPDF or pdfminer. PyMuPDF would also let you create a PDF viewer and highlight the text if that's what you have in mind.

Related

convert word to pdf in React-Native

I want to develope a app like camScaner. In which user can scan any type of file like image, pdf etc and then convert that file from word to pdf. So I want to aske that, is there any way to convert docs file to pdf in react-native?
You can use libreoffice convert for achieve the task
Link >> https://www.npmjs.com/package/libreoffice-convert
Also there is one more library awesome-unoconv which will provide you the same thing and will convert word to pdf
Link >> https://www.npmjs.com/package/awesome-unoconv

Find and replace pdf text with nodeJS

I'm using NodeJS to do an app that finds and replaces a text in a pdf. I have found some approaches:
Using some npm package, like pdfReader, that converts pdf to json. So I get the text and replaces it with what I want. The problem it's convert the output back to pdf.
The possible solution for the first item it's to convert the PDF to HTML, edit the HTML and convert it back to pdf. But most of the tutorials using NodeJS it's about convert HTML to PDF, not PDF to HTML.
Any solutions for this problem?
Update
I ended up using PDFKit to create the pdf files that i need. In my case, this solution don't to cover all the possibles. But if you have to find a word and replace it in an unpredictable pdf file, maybe this problem has no solution in nodeJS. The PDFKit lib has an open issue for this feature.
Look at this approach how to export json data to pdf file with specify format with Nodejs?. Basically uses your idea. Convert PDF to JSON and then render the JSON in html, then convert the HTML to pdf.

Text replacement in PDF document using javascript

I have some PDF templates that contain placeholders for things like a name, company, etc. They are in the format
<<'NAME'>> or <<'COMPANY'>>
Currently the process at my company is to replace all of these placeholders by hand when we get the information. I am trying to automate the process by getting the information from a CSV file and just doing a find and replace on the placeholders. However, the only files I have are inDesign files and PDF's for the templates. I looked at inDesign files, and as far as I can tell they are executable's and impossible to read in.
I was hoping someone knew of a way to read in a PDF file to do a regex on it to replace the placeholder text.

Creating MS Word Documents in iPhone using objective C

I have created a Rich Text Editor in UIWebview. My requirement is to save this text in .doc word file. How to achieve this. I am getting html content by using
NSString *strWebText = [webView stringByEvaluatingJavaScriptFromString:#"document.body.innerHTML"];
Now how can I proceed further to convert it in .doc format? Or is there any javascript function to convert text or save text to .doc file?
Microsoft Word can open a .doc file that is really .html and will open it as such. There isnt anyway for you to easily convert your html to a binary .doc file without significant code or the intervention of a server.
if you create a html file from a word doc, you will see the html produced. You will find certain headers at the top copy these in your html and word should open it correctly.
iOS does not have any built-in support for editing rich text or converting between formats. You'll have to find a library or write it yourself.
you can use :--
[strWebText writeToFile:#"Data.doc" atomically:YES encoding:NSUnicodeStringEncoding error:&error];

html to Generate or update xml from file list?

so im making a javascript image gallery for a friend- and he wants to be able to add new images later. Right now i have javascript generate the gallery from an xml file (which is created by custom picassa export). is it possible to have the html file update the xml file with new images- it needs to be able to add a couple tags for the format and know the file name. is it possible without using php?
thanks
If you're using JavaScript to parse the XML file, it will update with new content when he adds it to Picassa. Don't download the Picassa XML file, you should be able to link directly to it.

Categories