Get file(word, excel, ppt) metadata information in nodejs - javascript

I would like to get information on file, at least only the information on number of pages from nodejs in client side (react). I was able to get the same for PDF files using PDFJs. Could someone point as to how it can be done for other file types like word, xls and ppt ? If there are external APIs which would provide this service, pointing that would be helpful too.

For getting page count in docx and pdf files you can use https://www.npmjs.com/package/docx-pdf-pagecount
const getPageCount = require('docx-pdf-pagecount');
getPageCount('E:/sample/document/aa/test.docx')
.then(pages => {
console.log(pages);
})
.catch((err) => {
console.log(err);
});
getPageCount('E:/sample/document/vb.pdf')
.then(pages => {
console.log(pages);
})
.catch((err) => {
console.log(err);
});

You can use XLSX to parse spreadsheet-like files. XLSX can parse the files and return all the info of them.
But you can only retrieve the meta info until you use XLSX to parse those files. That means, no matter what, you have to parse them. If your files are big, it would be a performance issue for client browsers if you do it on client slide.
Update:
A hint, you can find some tools to detect the file type of the files, and deliver them to the corresponding parser the get the meta info.
For now, there is no such library implemented natively in JavaScript. If you are fine with some other none-pure node modules.
Like textract, see how it works.

Related

Create and download (html/css/js) files client-side

I've built a code playground (similar to liveweave) using react. the users can save their "code playgrounds" and access them later. (using firebase db) ,this "code playgrounds"are just made of HTML, CSS, and js. I´m trying to add a functionality which allow the user to download a playground, the idea is that it will generate 3 separate files (one for each language in the playground)
1) is there a way to generate (HTML CSS and js) files and populate them with content client-side?
2) if so, would there be any chance to group those files inside a .rar also client-side?
3) if generating these files client-side is not the optimal solution/not possible, how would you approach this problem?
I was thinking maybe in an express server that queries the data from the db and then response with those files, but I would like to try a client-sided solution
Finally decided to use fileSaver.js package
import { saveAs } from "file-saver";
const saveFiles = () => {
var blob = new Blob([makeHtml(getDocumentCode())], {
type: "text/plain;charset=utf-8",
});
saveAs(blob, `${getFileName()}.html`);
};
first, you need to make a blob out of the content of the file, in this case, it was HTML +css +js code, makeHtml function handles how the HTML document is constructed, then just pass that blob to the saveAs function along the name and extension for the file, then you can use saveFiles in response to any event like onClick
<button onClick={saveFile}>Download<button/>

How do I let user save a file and keep editing that file in browser Javascript only?

I believe it would not be possible due to security reason as stated in many other articles on StackOverflow. However, when I use the Diagram app at https://app.diagrams.net/ I realized they could ask me to save a file, and somehow keep that file reference and whenever I click Save on the app, my local file on hard drive changes (no new download).
I know it's only possible to upload/download a file and believe you cannot edit it (using Blob with FileReader etc). How do they achieve that? The app is open source but unfortunately plowing through the source code of their File Handler I still cannot find out what API they are using. I don't remember installing any plugin or app in my browser.
I also notice there is this permission in my browser so I guess it's some standard API, but even using that as keyword, all leads back to StackOverflow articles saying it's not possible.
Is it a new API I am not aware of? What am I looking for?
You can use localStorage to achieve this without needing any other permission from the user.
localStorage.setItem("data", JSON.stringify(data));
If your data is just JSON then this would work, however if you have custom data types, you can take a look here.
Edit:
Since you wanted to save the file directly to the device and edit it, you can take a look at File System Access API. This article here explains it.
You can load the file first by using,
let fileHandle;
butOpenFile.addEventListener('click', async () => {
[fileHandle] = await window.showOpenFilePicker();
const file = await fileHandle.getFile();
const contents = await file.text();
textArea.value = contents;
});
Once you have the file handle you should be able to write to the file without requesting to download a new file everytime there is a change.
async function writeFile(fileHandle, contents) {
// Create a FileSystemWritableFileStream to write to.
const writable = await fileHandle.createWritable();
// Write the contents of the file to the stream.
await writable.write(contents);
// Close the file and write the contents to disk.
await writable.close();
}
The codes are from the article I have linked above and the article explains everything much clearly. It's worth reading.

Moving Google STT from Cloud Functions to dedicated GAE

I'm using Cloud Functions to convert audio/mp4 from getUserMedia() placed in Storage bucket
To audio/x-flac format using ffmpeg for being able to transcribe it using Google STT
bucket
.file(file.name)
.download({ destination })
.then(() =>
ffmpeg(destination)
.setFfmpegPath(ffmpeg_static.path)
.audioChannels(1)
.audioFrequency(16000)
.format('flac')
.on('error', console.log)
.on('end', () =>
bucket
.upload(targetTempFilePath, { destination: targetStorageFilePath })
.then(() => {
fs.unlinkSync(destination);
fs.unlinkSync(targetTempFilePath);
});
)
.save(targetTempFilePath);
)
);
Workflow: client-side MP4 => Storage bucket trigger => STT => Firestore
It works great and I get clean FLAC files and STT works flawlessly in this combination!
But only IF
Input files are not larger than 1-2 Mb each (usually I have a series of 5-10 files coming in at once).
I'm aware of 10 Mb limit and now I want to let Cloud Functions handle image processing only and move all audio stuff to some dedicated GAE or GCE instance.
What's better to use: in this case GAE or GCP, dockerized or native, Python or Node, etc.
How exactly could the workflow be triggered on GCP instance after placing files on Storage?
Any thoughts or ideas would be greatly welcomed!
I would recommend you to use the Cloud Function as a Cloud Storage trigger.
In this way, you will be able to get the name of the file uploaded in your specific bucket.
You can check this documentation about Google Cloud Storage Triggers, in order to see some examples.
If you use Python, you can see the file name by using:
print('File: {}'.format(data['name']))
Once you got the name of the file, you can do the request to GAE in order to convert the audio.
I also found this post that explains how to call an URL hosted in Google App Engine, and I think it might be useful for you.
Hope this helps!

problem in accessing contents of the file using javascript with tomcat

The following code is inside <script> tag of index.html of my application and it works fine (basically the code is inside a function which gets called when a user clicks on a download button from the UI;the files are already present on the server/location mentioned below) when I access my app via web browser in the following manner:
file:///C:/jack/testing/ui/static/index.html
By works, I mean, I can see the 3 files getting downloaded in csv format with 3KB of size.
const requests = [
'file1_1555077233.csv',
'file2_1555077233.csv',
'file3_1555077233.csv'
].map(file => {
return fetch('file:///C:/jack/file/JACK/' + file)
.then(response => response.text())
.catch(console.error)
})
Promise.all(requests)
.then(contents => {
const zip = new JSZip()
contents.forEach((content, index) => {
zip.file(`file-${index}.csv`, content)
})
return zip.generateAsync({ type: 'blob' })
})
.then(blob => {
saveAs(blob, 'files.zip')
})
Scenario 1:
However, when I have my application deployed as WAR file in tomcat, I access my application in the following manner from the web browser :
http://localhost:8080/testing/index.html
In this scenario, the above javascript doesn't completely work. I mean, it downloads the 3 files but the size of the file is empty. This makes sense since I am accessing the website over HTTP and my files
should be served by the server.
On windows, how should I modify this line of code return fetch('file:///C:/jack/file/JACK/' + file) so that it starts working in this scenario.
Scenario 2:
Similarly, when I deployed the WAR in the tomcat running on RHEL server, I am accessing my application in the following manner:
https://myserver.com/testing/index/html
In this scenario also, I am noticing empty files are getting downloaded when I used the above code with this change:
return fetch('/srv/users/JACK' + file)
Scenario #1 is just used for my local testing and it's okay if it doesn't work for me because I can understand that I may not be able to access the files from my local directory. However, in Scenario 2, on RHEL server, I am trying to access the files from srv/users/JACK directory which isn't local or specific to any user. Why I am still seeing 3 empty csv files getting downloaded in this scenario? Please advise what would be best course of action to overcome this scenario.

Meteor reading csv file Papa Parse

guys I am new to Meteor. For my present application I am using openlayer so for openlayer I call Template.map.onRendered event which will load a map which has a overlay which shows marker on map and when we click on this marker an event is generated and a popup is called. Now the data to be shown in popup is hardcoded at present but I want to read it from a .csv file stored on server.
I checked online coders suggested to use Papa Parse with this code.
Papa.parse("http://example.com/file.csv", {
download: true,
complete: function(results) {
console.log(results);
}
});
My problems are:
But, I don't understand the code and how to use it to solve my
problem.
And also doing like this is safe or not in terms of browser
compatibility?
And in which folder I should save this .csv file. On internet it says private folder.
Sorry, I can't share the code using Jsfiddle as it is a private code and I am not allowed to share it.
PapaParse is mainly for client-side usage. It also has a Meteor wrapper, harrison:papa-parse, so you can try installing that as well:
meteor add harrison:papa-parse
Parse file via URL:
To parse CSV file on the server side via URL, you can try using the Parse Remote File option:
Papa.parse(url, {
download: true,
// rest of config ...
})
Parse file via CSV String:
Else you can store the file in /private folder, as it is a good option in Meteor to keep the file secure.
You can then access the file /private/file.csv using Assets.getText() method that will return a UTF-8 encoded string.
You can include the PapaParse string function in the Assets.getText() callback. After that, you can wrap the resulting function in a Meteor method, which you can call from the client using Meteor.call():
Meteor.methods({
'parseFile'() {
// read private asset as text
Assets.getText('/private/file.csv', (error, result) => {
if(error) {
return console.log(error);
}
// 'result' should be a UTF-8 string,
// parse it using PapaParse string function
return Papa.parse(result)
});
}
});
See if that works, read the documentation for more details.
If it doesn't work, check to see if you can read the source file at all on server-side. Often, it turns out to be a relative path error.
Alternative: BabyParse
You can also try to use the PapaParse Node.js fork called BabyParse, available on NPM. However, it cannot read files, only strings. So you will first have to read the CSV file via Assets.getText() and convert it into a CSV string. You can then feed the CSV string into BabyParse to get your result.

Categories