Converting docx/odt to PDF using JavaScript - javascript

I have a node web app that needs to convert a docx file into pdf (using client side resources only and no plugins). I've found a possible solution by converting my docx into HTML using docxjs and then HTML to PDF using jspdf (docx->HTML->PDF).
This solution could make it but I encountered several issues especially with rendering. I know that docxjs doesn't keep the same rendering in HTML as the docx file so it is a problem...
So my question is do you know any free module/solution that could directly do the job without going through HTML (I'm open to odt as a source as well)? If not, what would you advise me to do?
Thanks

As you already know there is no ready-to-use and open libs for this.. You just can't get good results with available variants. My suggesition is:
Use third party API. Like https://market.mashape.com/convertapi/word2pdf-1#!documentation
Create your own service for this purpose. If you have such ability, I suggest to create a small server on node.js (I bet you know how to do this). You can use Libreoffice as a good converter with good render quality like this:
libreoffice -headless -invisible -convert-to pdf {$file_name} -outdir /www-disk/
Don't forget that this is usually takes a lot of time, do not block the request-answer flow: use separate process for each convert operation.
And the last thing. Libreoffice is not very lightweight but it has good quality. You can also find notable unoconv tool.
As of January 2019, there is docx-wasm, which works in node and performs the conversion locally where node is installed. Proprietary but freemium.

It appears that even after three years ncohen had not found an answer. It was also unclear if it had to be a free (as in dollars) solution.
The original requirements were:
using client side resources only and no plugins
Do you mean you don't want server side conversion? Right, I would like my app to be totally autonomous.
Since all the other answers/comments only offered server side component solutions, which the author clearly stated was not what they wanted, here is a proposed answer.
The company I work for has had this solution for a few years now, that can convert DOCX (not odt yet) files to PDF completely in the browser, with no server side component required. This currently uses either asm.js/PNaCl/WASM depending on the exact browser being used.
https://www.pdftron.com/samples/web/samples/viewing/viewing/
Open an office file using the demo above, and you will see no server communication. Everything is done client side. This demo works on mobile browsers also.

Related

Generate docx on Javascript Rhino

I am trying to complete an impossible mission.
I need to generate docx documents on ServiceNow (server side) which implements the Javascript Rhino engine. Doing do on the client side is super easy, I usually use docxtemplater or similar great libraries. The problem here is that we need to build it on the server and using ServiceNow technologies (script includes, etc).
That said, I am trying to port the client docxtemplater version but I am struggling because on the server there is no concept of DOM.
At the same time, using the server side version is difficult because ServiceNow does not use Node js but Rhino, and all libraries out there are based on Node.
The best thing I was able to do using vanilla js is to generate a data uri that, when downloaded from the browser, returns a docx document, but I was wandering if anyone has any suggestions.
Thanks a lot.
There are at least two ways to accomplish this. One is to embrace the nightmare, and either transpile the OpenXml JS libs to ES5 compatibility or rewrite them. The other is to create a MS Word template, encode as Base64 text (as it's zipped XML) and save in ServiceNow, then unzip and traverse the XML using the ServiceNow XMLDocument2 library to update the text. Finally, you re-zip and save the file to create the updated OpenXml document.
The second solution requires you to get JSZip in ES5.
The source code to my solution is currently proprietary and I am not free to share it, but it can be done. Just make sure you've got a big enough budget, as it's not trivial and takes a fair amount of time to implement.

Are there any reporting library in Javascript/Angular.js?

I want to generate some reports in a MEAN.js Application, that said, i manage the data in Angular, what i want to know is if there's a library to generate a PDF Report, for example, when using PHP there's dompdf, fpdf, etc...
Basically what i need to generate is something like this from Angular:
Are there any tool to generate the reports from Angular, or should i generate them from Node.js? if so what are the tools available for node.js?
I only know about jsreport for node.js
Server-side rendering with Node is definitely the way to go, the client side libraries never really worked well (I last checked about a year ago). I'd suggest using PhantomJS as it provides PDF rendering capabilities out of the box.
PhantomJS will use Webkit engine to generate the PDF for you. The actual rendering process is dead simple:
page.render('/tmp/file.pdf', function() {
// file is now written to disk
});
Of course you have to insert something on the page you're generating first. Check out the following post which describes one guy's implementation, the code quoted above comes from there: http://www.feedhenry.com/server-side-pdf-generation-node-js/

How to Upload Large files in Javascript in a Single Connection?

I am trying to write some HTML/JS code which will facilitate uploading large files (multi-GB) to a remote server. Previously we had been using a flash uploader which uploaded a given file in a single network request. The flash would open a network connection, read a chunk of a file into memory then write that chunk to the network connection then grab the next chunk then write to the network etc. etc. until the entire file is uploaded. It was done this way because most web browsers will attempt to read an entire file into memory before attempting to upload. When dealing with multi-GB files, this essentially crashes the client system because it uses all of the client memory. Now we are having issues with using flash, so it needs to go, we want to replace it without needing to modify the existing server-side code.
A few google searches for jquery uploaders reveals that there are plenty of libraries which support "chunking" but they "chunk" over multiple requests. We do not want to chunk a file over multiple network requests, we merely want the JS to read the file in chunks as it writes the file to a single network connection.
Anybody know a library which can do this out of the box?
We are not opposed to modifying an existing library if need be. Anyone have a snippet that resembles the bellow pseudo-code that I may be able to retrofit into a library?
connection = fopen(...);
fputs("123", connection);
... some unrelated code ...
fputs("456", connection);
fclose(connection);
(excuse my use of C functions in pseudo JS code ... I know that is not how you do it in JS, I am merely demonstrating at a low-level the flow for how I want to write to the network connection before closing it)
NOTE: We are not trying to "modernize" or improve this project extensively -- we are not trying to re-do this project. We have some old code that has sat here for years and we want to make as few changes to the server-side code as possible. I have more important projects to modernize and make more efficient -- this one we just need to work. Please don't advise me to impliment "proper" file chunking on the server side -- that was my suggestion, and if my suggestion were taken then that task would have been assigned to a different developer. Out of my control now, this is a client-side-only fix please!
Thanks, sorry for any headache!
You could try binaryjs. I haven't looked into the internals but I know it supports manually setting the chunk size. Maybe you can even set it to Infinity.
Specifically you could try:
var client = new BinaryClient('example.com', { chunkSize: Number.POSITIVE_INFINITY });
client.send('data...');
Note: binaryjs is a NodeJS server library, and a browser-compatible client library.

In-browser conversion of MS Word document to PDF

I would like to implement an in-browser Microsoft Word document merge feature that will convert the merged document into PDF and offer it to the user for download. I would like to this process to be supported in Google Chrome and Firefox. Here is how I would like it to work:
Client-side JavaScript obtains the Word template document in docx format, either from a server, or by asking the user for a file upload (which it can then read using the FileReader API)
The JavaScript uses its local data structures (e.g., data lists it has obtained via Ajax) to expand the template into a document. It can do this either directly, by unzipping the docx file and processing its contents, or using DOCx.js. The template expansion is just a matter of substituting template variables with values obtained from the local data structures.
The JavaScript then converts the expanded template into PDF.
The JavaScript offers the PDF file to the user for download, e.g., using Downloadify.
The difficulty I am having is in step 3. My understanding (based on all the Googling I have done so far) is that I have the following options:
Require that the local machine is a Windows machine, and invoke Word on it, to convert to PDF. This can be done using a little bit of scripting using WScript.shell, and it looks doable with Internet Explorer. But based on what I have read, it doesn't look like I can call WScript.shell from within either Chrome or Firefox, because of their security constraints.
I am open to trying Silverlight to do the conversion, but I have not found enough documentation on how to do this. Ideally, if I used Silverlight, I would like to write the Silverlight code in JavaScript, because (a) I don't know much CSharp, and (b) I think it would be much easier in JavaScript.
Create a web service that will convert a given docx file to a pdf file, and invoke that service via Ajax. I would rather not do this, if possible, for a few reasons: (a) I tried using docx4java (I am a reasonably skilled Java programmer) but the conversion process is far too slow, and it does not preserve document content very well; and (b) I would like to avoid a call out to the network, to avoid security issues. It does seem possible to write a little service on a Windows server for doing the conversion, and if there is no other good option, I might go that route.
If I have been unclear about anything, please let me know. I would appreciate your ideas and feedback.
I love command line tools.
Load the doc to your server and use LibreOffice to convert it to PDF via the command line
soffice.exe --headless --convert-to pdf --outdir E:\Docs\Out E:\Docs\In\a.doc
You can display a progress bar to the user and when complete give them the option to download the doc.
More info on LibreOffice's command line parameters go here
Done.
Old old question now, but for anyone who stumbles across this, web assembly (wasm) now makes this sort of approach possible.
We've just released https://www.npmjs.com/package/#nativedocuments/docx-wasm which can perform the conversion locally.

How to parse an excel file in JavaScript?

I am trying to write a small web tool which takes an Excel file, parses the contents and then compares the data with another dataset. Can this be easily done in JavaScript? Is there a JavaScript library which does this?
How would you load a file into JavaScript in the first place?
In addition, Excel is a proprietary format and complex enough that server side libraries with years in development (such as Apache POI) haven't yet managed to correctly 100% reverse engineer these Microsoft formats.
So I think that the answer is that you can't.
Update: That is in pure JavaScript.
Update 2: It is now possible to load files in JavaScript: https://developer.mozilla.org/en-US/docs/DOM/FileReader
In the past four years, there have been many advancements. HTML5 File API has been embraced by the major browser vendors and performance enhancements actually make it somewhat possible to parse excel files (both xls and xlsx) in the browser.
My entries in this space:
http://oss.sheetjs.com/js-xls/ (xls)
http://oss.sheetjs.com/js-xlsx/ (xlsx)
Both are pure-JS parsers
To do everything in js, you'll have to use ActiveX and probably the office web components as well. Just a suggestion, but you probably don't want to go this route; it'll be inefficient and IE/Win only. You'll be better off with a server based solution.
You will need to use ActiveX (see W3C Schools on the use of AJAX) and register the file in the hosting computers Dataconnectors (only the computer hosting the file). Unlike mentioned before, this method is not Microsoft platform dependant (for the client anyways) and you do not need to have Office components installed.
This can be done for most datafiles registered in Windows, including MDB's, and allows you as much control as you want, as you can assign different Windows Accounts for different purposes.
Like I said before, this all is serverside and has no impact on the client, apart from maybe retrieving credentials, actions and all that.
This method uses JavaScript, SQL (no, not even MSSQL, just SQL standard) and requires only that the hosting computer is running ANY Microsoft NT platform.
What Windows dataconnectors do is provide a generalised interface for various data components much like DirectX does for videocards and other peripherals. You can also use it to link an MDB (Microsoft Access) to a MySQL server and feed data live that way, which I believe is even simpler than using XLS spreadsheets...especially since you can import XLS into MDB.
Do you really need an Excel file? Why not use Excel to export the data in CSV or XML and load that?
The Excel file format is very specific to Excel's implementation. If you just need the data, use a file format that just contains the data.

Categories