I have found a couple of libraries that allow to generate/edit/read PDFs in javascript like jsPDF, Mozzila's pdf.js.
And these were present before the advent of ES6 or HTML5. So no modern technologies were present.
I want to understand from a JS perspective how these libraries have achieved to do this? As far as I understand PDF file/format/container is Proprietary having Open SDKs for different languages which are then used by Softwares, like the one MS Word may use for converting DOC to PDF.
An SDK for JS seems unlikely since the whole code runs on the client side and does not support interfacing with binaries of other languages. So how would one actually create a pdf file in JS which runs in the browsers' js compiler. Looking at the libraries it seems that is what they have done.
Julien Viereck has a great youtube video online explaining how pdf.js works internally.
https://www.youtube.com/watch?v=Iv15UY-4Fg8
You can also browse the PDF.js codebase to learn more (https://github.com/mozilla/pdf.js/).
There's nothing inherently special about a PDF file that prevents it from being created by anything that can write text and binary to a file. The object definitions can become quite complex but it's still just a matter of conforming to the specification.
It's all laid out in the PDF Reference.
Related
I am trying to complete an impossible mission.
I need to generate docx documents on ServiceNow (server side) which implements the Javascript Rhino engine. Doing do on the client side is super easy, I usually use docxtemplater or similar great libraries. The problem here is that we need to build it on the server and using ServiceNow technologies (script includes, etc).
That said, I am trying to port the client docxtemplater version but I am struggling because on the server there is no concept of DOM.
At the same time, using the server side version is difficult because ServiceNow does not use Node js but Rhino, and all libraries out there are based on Node.
The best thing I was able to do using vanilla js is to generate a data uri that, when downloaded from the browser, returns a docx document, but I was wandering if anyone has any suggestions.
Thanks a lot.
There are at least two ways to accomplish this. One is to embrace the nightmare, and either transpile the OpenXml JS libs to ES5 compatibility or rewrite them. The other is to create a MS Word template, encode as Base64 text (as it's zipped XML) and save in ServiceNow, then unzip and traverse the XML using the ServiceNow XMLDocument2 library to update the text. Finally, you re-zip and save the file to create the updated OpenXml document.
The second solution requires you to get JSZip in ES5.
The source code to my solution is currently proprietary and I am not free to share it, but it can be done. Just make sure you've got a big enough budget, as it's not trivial and takes a fair amount of time to implement.
I have a node web app that needs to convert a docx file into pdf (using client side resources only and no plugins). I've found a possible solution by converting my docx into HTML using docxjs and then HTML to PDF using jspdf (docx->HTML->PDF).
This solution could make it but I encountered several issues especially with rendering. I know that docxjs doesn't keep the same rendering in HTML as the docx file so it is a problem...
So my question is do you know any free module/solution that could directly do the job without going through HTML (I'm open to odt as a source as well)? If not, what would you advise me to do?
Thanks
As you already know there is no ready-to-use and open libs for this.. You just can't get good results with available variants. My suggesition is:
Use third party API. Like https://market.mashape.com/convertapi/word2pdf-1#!documentation
Create your own service for this purpose. If you have such ability, I suggest to create a small server on node.js (I bet you know how to do this). You can use Libreoffice as a good converter with good render quality like this:
libreoffice -headless -invisible -convert-to pdf {$file_name} -outdir /www-disk/
Don't forget that this is usually takes a lot of time, do not block the request-answer flow: use separate process for each convert operation.
And the last thing. Libreoffice is not very lightweight but it has good quality. You can also find notable unoconv tool.
As of January 2019, there is docx-wasm, which works in node and performs the conversion locally where node is installed. Proprietary but freemium.
It appears that even after three years ncohen had not found an answer. It was also unclear if it had to be a free (as in dollars) solution.
The original requirements were:
using client side resources only and no plugins
Do you mean you don't want server side conversion? Right, I would like my app to be totally autonomous.
Since all the other answers/comments only offered server side component solutions, which the author clearly stated was not what they wanted, here is a proposed answer.
The company I work for has had this solution for a few years now, that can convert DOCX (not odt yet) files to PDF completely in the browser, with no server side component required. This currently uses either asm.js/PNaCl/WASM depending on the exact browser being used.
https://www.pdftron.com/samples/web/samples/viewing/viewing/
Open an office file using the demo above, and you will see no server communication. Everything is done client side. This demo works on mobile browsers also.
I would like to implement an in-browser Microsoft Word document merge feature that will convert the merged document into PDF and offer it to the user for download. I would like to this process to be supported in Google Chrome and Firefox. Here is how I would like it to work:
Client-side JavaScript obtains the Word template document in docx format, either from a server, or by asking the user for a file upload (which it can then read using the FileReader API)
The JavaScript uses its local data structures (e.g., data lists it has obtained via Ajax) to expand the template into a document. It can do this either directly, by unzipping the docx file and processing its contents, or using DOCx.js. The template expansion is just a matter of substituting template variables with values obtained from the local data structures.
The JavaScript then converts the expanded template into PDF.
The JavaScript offers the PDF file to the user for download, e.g., using Downloadify.
The difficulty I am having is in step 3. My understanding (based on all the Googling I have done so far) is that I have the following options:
Require that the local machine is a Windows machine, and invoke Word on it, to convert to PDF. This can be done using a little bit of scripting using WScript.shell, and it looks doable with Internet Explorer. But based on what I have read, it doesn't look like I can call WScript.shell from within either Chrome or Firefox, because of their security constraints.
I am open to trying Silverlight to do the conversion, but I have not found enough documentation on how to do this. Ideally, if I used Silverlight, I would like to write the Silverlight code in JavaScript, because (a) I don't know much CSharp, and (b) I think it would be much easier in JavaScript.
Create a web service that will convert a given docx file to a pdf file, and invoke that service via Ajax. I would rather not do this, if possible, for a few reasons: (a) I tried using docx4java (I am a reasonably skilled Java programmer) but the conversion process is far too slow, and it does not preserve document content very well; and (b) I would like to avoid a call out to the network, to avoid security issues. It does seem possible to write a little service on a Windows server for doing the conversion, and if there is no other good option, I might go that route.
If I have been unclear about anything, please let me know. I would appreciate your ideas and feedback.
I love command line tools.
Load the doc to your server and use LibreOffice to convert it to PDF via the command line
soffice.exe --headless --convert-to pdf --outdir E:\Docs\Out E:\Docs\In\a.doc
You can display a progress bar to the user and when complete give them the option to download the doc.
More info on LibreOffice's command line parameters go here
Done.
Old old question now, but for anyone who stumbles across this, web assembly (wasm) now makes this sort of approach possible.
We've just released https://www.npmjs.com/package/#nativedocuments/docx-wasm which can perform the conversion locally.
Is there a way to reporting with javascript to generate pdf file?
You can go for jsPDF.
jsPDF is an open-source library for
generating PDF documents using nothing
but Javascript. You can use it in a
Firefox extension, in Server Side
Javascript and with Data URIs in some
browsers.
Assuming you mean "JavaScript running in a browser" then, "No, not in any practical way (since, while you can use data: URIs, they see limited browser support and limited file size)".
If you are talking server side JS, then you can generate whatever data you like, including PDF. There is at least one library that helps you do that.
Take a look at Stimulsoft Reports.JS that is a pure JavaScript tool. It supports export to PDF, Excel etc.
I want to create torrent files with a Firefox extension written using javascript.
Torrent file creators are currently available as desktop applications in anything but javascript.
May be it is also possible to find a decent torrent file spec in java, as azurious, an open source p2p client, is written in java.
Can somebody please give me hints, or maybe some specs, to achieve it using javascript.
Javascript is normally run within a browser in a "sandboxed" environment, where it can't for example create files. If you want to use Javascript in a standalone environment, such as jslibs, that's a very different proposition, and creating files becomes possible. So is your issue with Javascript per se, as your question and tagging indicate, or with the sandboxing browsers typically perform on it?