Parse Microsoft Office files in Node.JS

Parse Microsoft Office files in Node.JS - javascript

I'm working on a web application where users can upload Microsoft Office Document files. Right now, our server is running Node.JS with Express.js and we're hosted on Heroku. Because of this, I don't think that I can install programs such as abiword or catdoc. I can handle the file uploads, but can't parse the contents of the document.
How can I read the contents of the doc file? The information will then be put into a database. It'd be nice to preserve basic formatting (bold, italic, underline), but not essential.

While there don't seem to be anything you can get with NPM that will do Word directly, you might be able to use a REST API to request it via another cloud service. For example Saaspose (they of the famous Aspose tools) have public API for Word, Excel, PDF, and others. They list node.js, javascript, and Heroku support on their page.
EDIT:
I see that Saaspose is now called Aspose for Cloud
Another API that claims something similar is Doxument

Office package: npm install office seems to provide at least part of the answer. I use it to read Excel files, so far have not tried any Word docs.

There doesn't seem to be any yet. See below for something that might help.
Can I read PDF or Word Docs with Node.js?

You can use mammoth to parse .docx files https://www.npmjs.com/package/mammoth
and xlsx to parse .xlsx files https://github.com/SheetJS/js-xlsx

Related

How to detect the file type if it's an excel file in JavaScript on the web?

I am using the FileReader() API to read files on the browser, I want to render something on the screen conditionally based on the file the user uploads:
JSON.
CSV.
Excel.
How can I detect if the file uploaded is an excel file? Do you use libraries for that or is there a genius way we can detect that blindly without going through all the chances?

Running out of space in comments, so I'll add an answer. Files will usually have an extension, on both Windows and Linux, but if you don't want to rely on that you can just try parsing the file in each of the formats to find the one that succeeds.
JSON use JSON.parse()
Excel see this question for .xlsx or this one for .xls, or use one of the many NPM packages
CSV the format is pretty simple, but there are gotchas for commas in the values. Some of the libraries are pretty efficient at streaming large files, so probably worth using an NPM package again

Import data from CSV to a table in Oracle with NodeJs

I have a doubt.
I would like to know if it is possible to insert data (100K data) that I have in a CSV file, directly to a table in Oracle, using NodeJs.
I have looked for several ways, but I have not found a solution, only the use of external tables but the problem is that I must save the CSV file in a specific directory.

I don't know NodeJs, sorry.
But, from Oracle's side of story, you could use an external table (as you've already mentioned). If that "specific directory" is located on the database server, great! as you'd have to create a directory (an Oracle object) which points to that filesystem directory (and grant read and possibly write privileges to user which will be using that directory).
If it isn't located on the server, you'd still be able to do it by using UNC (universal naming convention).
Another option is SQL*Loader, a command-line tool which can be used on your local PC (you don't have to have access to and "directory" which is related to the database server in any way). You do have to have SQL*Loader installed, of course. It comes with every Oracle database; if you don't have it installed on that PC, you'd install Oracle Client.

There are various Node.js modules to read CSV files. Use one, and then when you have the data in Node.js, you can use node-oracledb's executeMany() to insert multiple rows at once into the DB, see the node-oracledb documentation Batch Statement Execution and Bulk Loading.
However, I would probably go with the earlier solution and use SQL*Loader, see https://blogs.oracle.com/opal/oracle-instant-client-122-now-has-sqlloader-and-data-pump

Can NODE.JS library be accessed from the root directory of a website?

So, I'm making a quiz, and I've been wanting to save my answers to a text file. I want to use "Node.js", and I'm worried about this: They only offer an installer to install Node.js on your computer. Since I'm not working with servers or anything like that, and I'm just a hobbyist, the people I might first give this to may not have Node.js installed on their computer. Please do note that this is for a website, not a program.
Is this kind of thing possible to do without the use of a hosting service or a server? :
const lib = require('./libraries/NODE')
If it is, how would I do it?
Thanks for any help!

There are two options for you:
I would suggest using something like Electron- which will wrap the node runtime for you - https://www.electronjs.org/docs/tutorial/first-app which you can distribute to people. This will open up all the nodejs related functionality and more for you.
Another answer at SO though old, suggests using window.name vs writing out text files - Javascript/HTML Storage Options Under File Protocol (file://)
You cannot import nodejs runtime into the browser running on a file protocol.

Is there an npm module to modify a pdf file in node.js?

I'm building a node.js app on Bluemix that should take a pdf file as request and then grey out (blank) some part of the pdf file. And also here the pdf file is the same for all, and the area we need to blank out will be fixed. So can anybody suggest an npm module that can perform this kind of functionality?

Yes I guess the most common used library is pdf-lib. Take a look at the official page.

I suggest you to try HummusPDF. Specifically take a look at the Hummus - Modification page, that explains how to edit existing PDF documents. In your case you could try to use the feature that allows to draw shapes.

Please try Aspose.PDF Cloud SDK for Node.js available at GitHub and npm. It provides API methods for a wide range of document processing operations; including creation, manipulation, conversion and rendering of Pdf documents in the cloud. You can use Redaction Annotation to grey out the required PDF area.
P.S: I work with Aspose as Support Developer

how to embed a github git README file on a website

I am making a website for someone and they want to be able to have that site fetch the github readme markdown file at a certain URL and display it on the website, so that instead of having to write the readme in two places, it just pulls from github. Is that possible? How do I do it? I saw this:
https://github.com/coreyti/showdown
which turns markdown into html, but I'm still not how I would fetch the readme URL and convert it into an object that showdown could parse.
Any ideas would be greatly appreciated.

You can use StackEdit. It allows you to publish your markdown document on Github and on others locations at the same time in Markdown or HTML format. For instance, you could publish the HTML on a public Google Drive or Dropbox location.
NOTE: I'm the developer of StackEdit

GitHub has an option to show file source, Raw button in top right corner. Raw link for your example is: https://raw.github.com/coreyti/showdown/master/README.md
Assuming that README file is formatted in Markdown already, you could just fetch source and format it on your side, libraries most probably already exist for your language.
UPDATE
I wouldn't actually download file from GitHub every time the page on your site is requested. GitHub may be down, connection may be slow -- and this will affect visitors of your site. Instead, you may want to have a cron job running on the server that would download a file from GitHub, say, every five minutes, and cache it locally. Then, every time you need to display the file, you'll read a local copy and don't depend on GitHub server being accessible. As a drawback, you will have a certain synchronization delay (5 minutes in my example).

Heroku seems to be doing it by copying the rendered html https://elements.heroku.com/buildpacks/stouffi/heroku-i18n-js-buildpack-ruby#buildpack-instructions

I’ve written riss.awk to insert a README.md on your website, optionally performing some transformations in the process.
be able to have that site fetch the github readme
You can automatically fetch the original README.md file like this:
curl https://raw.githubusercontent.com/cljoly/readme-in-static-site/main/README.md | awk -f riss.awk >readme-in-static-site.md
convert it into an object that showdown could parse
I couldn’t find showdown’s documentation, but there might be some kind of API you could use to upload the file generated above (readme-in-static-site.md)? A cronjob-like task could then run this process every few hours to keep things up to date.

We Keep Coding

JavaScript is the programming language of the Web.

Parse Microsoft Office files in Node.JS - javascript

Office package: npm install office seems to provide at least part of the answer. I use it to read Excel files, so far have not tried any Word docs.

There doesn't seem to be any yet. See below for something that might help. Can I read PDF or Word Docs with Node.js?

You can use mammoth to parse .docx files https://www.npmjs.com/package/mammoth and xlsx to parse .xlsx files https://github.com/SheetJS/js-xlsx

Related

How to detect the file type if it's an excel file in JavaScript on the web?

Import data from CSV to a table in Oracle with NodeJs

Can NODE.JS library be accessed from the root directory of a website?

Is there an npm module to modify a pdf file in node.js?

how to embed a github git README file on a website

Categories

Resources