Convert PDF to a tree - javascript

I am writing an application in which I need to inspect the tree structure of PDF documents, modify this tree structure, and write the result back to another PDF.
The inspection and modification cannot happen in a dedicated library (e.g., PDFBox), since it is already written in a format-independent way for JSON-structured trees.
Ideally, what I need is a lossless conversion from PDF to any tree format (XML, JSON, ...) and back, in JavaScript or any other programming language, or as a command-line tool.
What I considered so far:
Using pdf2json. This converts a PDF to a JSON file. Unfortunately, the other direction (JSON->PDF) is not supported.
One can create a JSON with the Base64-encoded binary content of the PDF. This is lossless and works in both directions, but I am losing the tree structure that I want to inspect. Therefore, this is not an option.
Can anyone recommend a library or program to achieve this?

Related

How to create an interactive pdf file with javascript?

I want to create dynamic pdf files and embed initials and signature to them inside browser. I found pdfkit.org but I need a more comprehensive solution that allows creating functionalities similar to what you see in docusign. Are there solution out there that you can point me to?
From farther research, I learned that PDF & Javascript are good friends. The easiest way to achieve manipulating content within a pdf is to convert it to html, add the changes to it and then convert it back to pdf. There are a few different services available that help you with this kind of conversion. Here are a few of those services:
https://cloudconvert.com
http://www.pdfonline.com/
https://market.mashape.com/netservice/convert-pdf-to-html

Java object to javascript

I have a core java project, in which I have to create a data visualization in form of a graph.
This data visualization (a graph) is dumped in HTML file(in Plantuml format), which renders the graph in visual graph.
Now I am looking for a way where I can dump this graph data structure as well(which is actually interlinked java objects) in some format, such that I can read it in java script/jquery thereby reconstructing the whole graph, along with loading the HTML file and update the graph in HTML dynamically, using the graph data structure based on some input from user.
Since plantuml doesn't support dynamic events.
And since the HTML file generated dynamically, so creating JSP and dynamically loading it on server is not feasible.
I have seen some answers suggesting use of JAXB, JSON, but the question wasn't exactly as I needed.
I am thinking to dump it in xml and then read that xml in java script.But not sure how good this idea is.
Is there a better way?
JavaScript is not good at reading XML. The ECMA-357 standard was designed for this, but is all but forgotten. You can still find it in Rhino, but that doesn't suit your purposes.
JSON is really the language to use these days.
GSON or Jackson are common libraries for turning Java objects into JSON.
ECMA-357 - E4X
GSON
Jackson

Abap : convert svg to png

I have an svg file and I want to convert it to image png, I am searching any class that do this in ABAP but I could not find any results.
I tried to do this with Javascript then execute it from ABAP but my code in JS should be without DOM implementation or browser functionalities to be able to run it from ABAP.
SVG is - as its name implies - a vector graphics format while PNG is a raster graphics format. Converting vector graphics to raster graphics requires all kinds of "interesting" capabilities that ABAP isn't really suitable for, for instance rendering text in (almost) any font with various attributes and modifiers into a bitmap. I would be surprised if a pure ABAP solution existed at all. It should be possible from a technical point of view, but as you might imagine, it'd be an enormous task.
That being said, you might want to try to use the IMGCONV part of the Internet Graphics Service. I'm not sure whether it support SVG, but you might want to check out the classes CL_IGS_*.
You could try doing this with a GUI attached running windows. If that's an option. The back-end server-side Java interpreter does lack a DOM, yes. But perhaps you can find a library that can do this in Java without a DOM? Should be easier than doing the bit manipulations required in ABAP.
you cannot convert an SVG into a PNG or JPEG in abap.
First of all you have to download the SVG file and open it in Paint, if you use win. Then save it as bitmap in 256 color. After that you can convert it into png files. (why? If you load the file without converting it to 256-color bitmaps, SAP may not interpret the colors well. Withe can became gray and gray can became blu)
Advice: if you want to use the image in smartforms or adobeforms or sapscritp, best way is to upload it in bitmap in se78 transaction and call it in you printouts or alv header

Reading XLS/XLSX Data With JavaScript in NetSuite

I'm looking into potentially building code for NetSuite to read the contents of an Excel file (XLS or XLSX) within JavaScript in order to process the data. I can do this just fine with a CSV file, but I'd like to expand capabilities to read Excel worksheets.
I've seen a variety of scripts to read in Excel files, but they all seem to revolve around a dependency of Internet Explorer, and none of them seem to offer a solution on how to get the used columns and rows. They assume you already know this information ahead of time. NetSuite being what it is, these solutions don't really work, and you have to grab the base64 encoded contents of the file object stored in the system. This isn't an issue with CSV files, it's still just plain text.
I've done some testing and found that I get different results when trying to decode the string (I get something from XLS, but nothing from XLSX). I was wondering if anyone has tried and succeeded and reading data from these files formats in a NetSuite JavaScript implementation. If there's no good way, then I'll just have to force use of CSV, but I'd like to have some flexibility.
Essentially, you are asking for a javascript implementation of XLS and XLSX parsers. It is incredibly difficult, mostly due to the nature of the data format and the sheer amount of parsing required to get basic data).
I have built a basic version:
http://oss.sheetjs.com/js-xls/ (xls)
http://oss.sheetjs.com/js-xlsx/ (xlsx)

In-browser conversion of MS Word document to PDF

I would like to implement an in-browser Microsoft Word document merge feature that will convert the merged document into PDF and offer it to the user for download. I would like to this process to be supported in Google Chrome and Firefox. Here is how I would like it to work:
Client-side JavaScript obtains the Word template document in docx format, either from a server, or by asking the user for a file upload (which it can then read using the FileReader API)
The JavaScript uses its local data structures (e.g., data lists it has obtained via Ajax) to expand the template into a document. It can do this either directly, by unzipping the docx file and processing its contents, or using DOCx.js. The template expansion is just a matter of substituting template variables with values obtained from the local data structures.
The JavaScript then converts the expanded template into PDF.
The JavaScript offers the PDF file to the user for download, e.g., using Downloadify.
The difficulty I am having is in step 3. My understanding (based on all the Googling I have done so far) is that I have the following options:
Require that the local machine is a Windows machine, and invoke Word on it, to convert to PDF. This can be done using a little bit of scripting using WScript.shell, and it looks doable with Internet Explorer. But based on what I have read, it doesn't look like I can call WScript.shell from within either Chrome or Firefox, because of their security constraints.
I am open to trying Silverlight to do the conversion, but I have not found enough documentation on how to do this. Ideally, if I used Silverlight, I would like to write the Silverlight code in JavaScript, because (a) I don't know much CSharp, and (b) I think it would be much easier in JavaScript.
Create a web service that will convert a given docx file to a pdf file, and invoke that service via Ajax. I would rather not do this, if possible, for a few reasons: (a) I tried using docx4java (I am a reasonably skilled Java programmer) but the conversion process is far too slow, and it does not preserve document content very well; and (b) I would like to avoid a call out to the network, to avoid security issues. It does seem possible to write a little service on a Windows server for doing the conversion, and if there is no other good option, I might go that route.
If I have been unclear about anything, please let me know. I would appreciate your ideas and feedback.
I love command line tools.
Load the doc to your server and use LibreOffice to convert it to PDF via the command line
soffice.exe --headless --convert-to pdf --outdir E:\Docs\Out E:\Docs\In\a.doc
You can display a progress bar to the user and when complete give them the option to download the doc.
More info on LibreOffice's command line parameters go here
Done.
Old old question now, but for anyone who stumbles across this, web assembly (wasm) now makes this sort of approach possible.
We've just released https://www.npmjs.com/package/#nativedocuments/docx-wasm which can perform the conversion locally.

Categories