Issues with synchronously downloading a file in Node - javascript

With multiple methods to download the file and write it synchronously in my local application (currently using a module called download-file-sync), I am having issues with the file written using writeFileSync.
Here is my code:
var downloadFileSync = require('download-file-sync');
fs.writeFileSync("twc.mp4", downloadFileSync(sourceURLEncoded));
Now this technically writes something, and opening the file in Notepad++ shows at least the start of the file is identical to the same file downloaded via Chrome, with the same amount of lines. However, the file size is around doubled:
The Node download will not play, while the Chrome download does.
How am I able to achieve a successful synchronous file download in Node?

The reason is that download-file-sync calls curl and encodes the result as a string when you want "pure" bytes.
If the string isn't valid UTF-8 some characters may be expanded resulting in a different size and content than the original binary file.
To fix you can simply replace the module with the code it uses and make a new function where you use buffer (the default) for encoding:
function downloadFileSync(url) {
return require('child_process')
.execFileSync('curl', ['--silent', '-L', url]); // remove options {encoding: 'utf8'}
}
And try with that instead.

Related

Creating an Image File using node.js buffer from input text

I have a requirement in which I need to convert an input text to a png/jpeg file and then convert to base64 string and send as an input to an API.
I cannot use node.js fs module since I cannot physically create files.
So I was trying to user node.js Buffer module to achieve the same.
But the issue I'm facing is, I cannot add extension to it(I don't know if there is any such option).
Is there any other way of doing so?
Below is the code I tried...
function textToFileBase64(str){
var buf = Buffer.from(str, 'utf-8');
return buf.toString('base64');
}
The only problem with the above code is that it creates a file without extension and even if I need the file as abc.png, it says the file is damaged when I open it.

Using Javascript, how can I convert a Google doc to a PDF file then "slice" it into multiple PDFs?

I would like to replicate VBA code that I wrote for Microsoft Word docs using Google docs and Javascript instead. It would be used in an Angular app that I'm writing.
The VBA code searches for "slice tags" in the document and generates PDF files when it finds them. For example, if a document consists of 15 pages and there are slice tags on pages 4, 7, and 9, the code generates four files: Page1-4.pdf, Page5-7.pdf, Page8-9.pdf, and Page10-15.pdf.
In the Javascript implementation, the steps would be:
Convert the Google doc to a PDF file.
"Slice" the PDF file into multiple PDFs by searching the complete PDF for slice tags.
Place the sliced PDFs in a specified Google Drive folder.
Note that I asked a similar question here.
However, that's a Google Script implementation. What I realized is that in Google Scripts there is no way to address a document on page boundaries. It would be much simpler and cleaner if it did. Instead, I would like to take another approach.
Below is Javascript code that partially addresses step 1. It uses this call:
gapi.client.drive.files.export
However, this does NOT create a PDF file. Rather, it creates a blob of PDF content. Apparently, the expectation is that you transform the blob into an actual file but the Google APIs do not appear to provide a way to do this.
async function exportDocument(id, mimeType) {
let rtrn;
try {
await gapi.client.drive.files.export(
{'fileId': id,
'mimeType': mimeType,
'fields': 'webViewLink',
})
.then(function(resp) {
// This returns a content blob NOT an actual file
rtrn = resp;
// TODO: How to transform the blob into an actual file (e.g., PDF)
}).catch((err) => {
throw new Error(err.message);
});
} catch (e) {
throw new Error('exportDocument: ' + e.message);
}
return rtrn;
} // end exportDocument
A Blob is a binary representation of the raw data.
Typically, when working with PDFs, the Blob is turned into a File (File is a Blob),
and saved to the local filesystem. For an example, see How to convert Blob to File in JavaScript
There is also a 'file-saver' npm library, and, I believe, and Angular/Typescript version as well, if you don't want to do it yourself.
I haven't done it, but I imagine that if the library that you want to use to manipulate the PDF expects a PDF file, you could probably convert the Blob to a PDF without writing it to the filesystem, and then work from there.
BTW, your question has one or more 'close' votes, because you asked a very broad question (i.e. "how do I write my entire application") instead of asking the targeted question that you really needed answered.

Can dropbox-js readFile only get contents of a file of type "text/plain"?

I am using the dropbox-js API as a back-end to an application I am creating.
I need to get the contents of a file and I understand that the method "readFile" that is used to get the contents only really supports text files.
I can get the contents of a text file of type "text/plain" i.e. .txt files, using the following:
client.readFile(d2.path, {arrayBuffer: true}, function(error, contents){
var decoded = decodeUtf8(contents);
console.log(decoded);
});
The API reference for this method is here: http://coffeedoc.info/github/dropbox/dropbox-js/master/classes/Dropbox/Client.html#readFile-instance
The decode function was found here: https://gist.github.com/boushley/5471599
This does not seem to work for any other document type file. If I try and read a .docx / .doc file the result consists of what looks like scrambled characters. Should it be able to work with other document type files? How would I read it differently?
I really need it to support more than .txt files.
Edit:
This is a test document (.docx) that I tried to read:
This is how it is decoded (Contents shows that it is indeed an arrayBuffer, while Decoded is the actual string that is returned after decode:
readFile should work for any content type. Presumably the "scrambled characters" you see are exactly the content of the .docx or .doc file you're reading. (If you looked at the file via type on Windows or cat on Mac/Linux, you would see the same thing.)
So I think the issue you're having is that you want to somehow extract the text from a variety of file formats. Dropbox (and dropbox.js) won't help you with that particular problem... you'll need to find software that understands all those file formats and can convert them to the form you need. For example, textract is a Python library that can do this.

Generate a Word document in JavaScript with Docx.js?

I am trying to use docx.js to generate a Word document but I can't seem to get it to work.
I copied the raw code into the Google Chrome console after amending line 247 to fix a "'textAlign' undefined error"
if (inNode.style && inNode.style.textAlign){..}
Which makes the function convertContent available. The result of which is an Object e.g.
JSON.stringify( convertContent($('<p>Word!</p>)[0]) )
Results in -
"{"string":
"<w:body>
<w:p>
<w:r>
<w:t xml:space=\"preserve\">Word!</w:t>
</w:r>
</w:p>
</w:body>"
,"charSpaceCount":5
,"charCount":5,
"pCount":1}"
I copied
<w:body>
<w:p>
<w:r>
<w:t xml:space="preserve">Word!</w:t>
</w:r>
</w:p>
</w:body>
into Notepad++ and saved it as a file with an extension of 'docx' but when I open it in MS Word but it says 'cannot be opened because there is a problem with the contents'.
Am I missing some attribute or XML tags or something?
You can generate a Docx Document from a template using docxtemplater (library I have created).
It can replace tags by their values (like a template engine), and also replace images in a paid version.
Here is a demo of the templating engine: https://docxtemplater.com/demo/
This code can't work on a JSFiddle because of the ajaxCalls to local files (everything that is in the blankfolder), or you should enter all files in ByteArray format and use the jsFiddle echo API: http://doc.jsfiddle.net/use/echo.html
I know this is an older question and you already have an answer, but I struggled getting this to work for a day, so I thought I'd share my results.
Like you, I had to fix the textAlign bug by changing the line to this:
if (inNode.style && inNode.style.textAlign)
Also, it didn't handle HTML comments. So, I had to add the following line above the check for a "#text" node in the for loop:
if (inNodeChild.nodeName === '#comment') continue;
To create the docx was tricky since there is absolutely no documentation on this thing as of yet. But looking through the code, I see that it is expecting the HTML to be in a File object. For my purposes, I wanted to use the HTML I rendered, not some HTML file the user has to select to upload. So I had to trick it by making my own object with the same property that it was looking for and pass it in. To save it to the client, I use FileSaver.js, which requires a blob. I included this function that converts base64 into a blob. So my code to implement it is this:
var result = docx({ DOM: $('#myDiv')[0] });
var blob = b64toBlob(result.base64, "application/vnd.openxmlformats-officedocument.wordprocessingml.document");
saveAs(blob, "test.docx");
In the end, this would work for simple Word documents, but isn't nearly sophisticated for anything more. I couldn't get any of my styles to render and I didn't even attempt to get images working. I've since abandoned this approach and am now researching DocxgenJS or some server-side solution.
You may find this link useful,
http://evidenceprime.github.io/html-docx-js/
An online demo here:
http://evidenceprime.github.io/html-docx-js/test/sample.html
You are doing the correct thing codewise, but your file is not a valid docx file. If you look through the docx() function in docx.js, you will see that a docx file is actually a zip containing several xml files.
I am using Open Xml SDK for JavaScript.
http://ericwhite.com/blog/open-xml-sdk-for-javascript/
Basically, on web server, I have a empty docx file as new template.
when user in browser click new docx file, I will retrieve the empty docx file as template, convert it to BASE64 and return it as Ajax response.
in client scripts, you convert the BASE64 string to byte array and using openxmlsdk.js to load the byte array as an javascript OpenXmlPackage object.
once you have the package loaded, you can use regular OpenXmlPart to create a real document. (inserting image, creating table/row ).
the last step is stream it out to end user as a document. this part is security related. in my code I send it back to webserver and gets saved temporarily. and prepare a http response to notify end user to download it.
Check the URL above, there are useful samples of doing this in JavaScript.

Batch file conversion and usage

I'm having a problem with a small batch script that I'm writing. The point of the batch script is to run a Javascript file that converts an XML file to a csv file and then run a Python script which will analyze the just created csv file and create another csv file.
The batch script is below.
start XML-CSV_Converter.js
python CSV_ANALYZER.py
exit
I didn't write the XML-CSV converter;
It can be found here. (http://gotochriswest.com/blog/2011/05/05/excel-batch-convert-xls-to-csv/) The only thing I changed was I removed all of the alerts and input prompts so it doesn't wait on any user input. Simply put, all it does it look at every XML file in the current directory and produces a csv file in the same directory.
Whenever I run the batch script, I keep getting an IO error in my Python script because even though it can see the created file, it isn't able to open the file.
The exact error is:
"IOError: [Errno 2] No such file or directory: 'NAME_OF_FILE.csv'"
The part of the Python script which is causing errors is listed below.
dirList = os.listdir("C:\FOLDER")
for fname in dirList:
if fname.find(".csv") != -1:
inputFile = open(fname,'r') <---- Script halts here
Anybody know what could be causing the file not being opened in the Python script?
If I run the JavaScript file manually, and then the Python script manually, it works perfectly. But when I try to chain them together in a batch file it breaks. I would appreciate any and all ideas!
Thanks in advance!
The start command launches a program (or, in your case, a file) and immediately returns, without waiting for the program to exit. This means that when Python tries to open the file, the JS script is probably still running and has the file open exclusively for writing.
You should run the JS file by directly calling the Windows Scripting Host (cscript.exe). Just replace the start command with:
cscript //nologo XML-CSV_Converter.js
This will ensure that when the Python command is run, the JS script will have completed its job and safely terminated.
You forgot to add the folder name to the file name in the open() call. listdir() returns only the file names, so unless its path argument matches the current directory, you'll have to add it to the returned file names to properly identify them.
As a side note, you should also avoid using backward slashes in normal strings. In this case "C:\FOLDER" happens to work, but "C:\folder" wouldn't, because \f would be converted to \x0c (the linefeed character). You should use either a raw string (r'C:\FOLDER' which doesn't convert escaped characters), an escaped backslash ('C:\\FOLDER'), or the more portable 'C:/FOLDER' (on Windows '/' is the alternative path separator).
Also a simpler way to check for the extension is to test whether the name ends with it (the s.find(t) == -1 is not very Pythonic). All in all, the modified code could look something like this:
folder = "C:/FOLDER"
for fname in os.listdir(folder):
if fname.endswith(".csv"):
inputFile = open(os.path.join(folder, fname),'r')
The root cause is "python CSV_ANALYZER.py" got executed while the XML-CSV_Converter.js was not finished.
By type "start XML-CSV_Converter.js", it won't wait for this command to finish before executing the next command.
If you have installed js interpreter, you can remove the `start' in the first line, an then it should work.

Categories