invalid local file header signature error in yauzl - javascript

I am working on an unzip feature which requires to read zip file from a remote destination, unzip the contents and then create those files over the network to the destination. I am using yauzl with the RandomAccessReader method to read the incoming stream and publish events.
const zipQueue = [];
const yauzl = new Yauzl(reader);
const zipfile = yauzl.fromRandomAccessReader(this, this.contentLength, { lazyEntries: true, autoclose: false});
zipfile.on('entry', function(entry){
zipQueue.push(entry);
zipfile.readEntry();
}).on('end', function() {
for(const entry of zipQueue) {
if(/\/$/.test(entry.fileName)) {
this.emit('directory');
} else {
const readStream = zipfile.openReadStream(entry);
this.emit('file');
}
}
})
I am getting the zipfile and the entry objects in proper format, but when I try to open the readStream with zipfile.openReadStream() it fails invalid local file header signature: 0x8074b50 after 30 odd entries. I suspect it has something to do with race condition. Is there any other npm package which offers similar solution?

In my case, I was getting that error because I was piping it from another stream which wasn't done writing, so it was reading an incomplete file.
By listening to the stream 'end' event and only doing it after, it started working.
stream.on('end', () => {
// Unzip here
});

Related

How to download a big file directly to the disk, without storing it in RAM of a server and browser?

I want to implement a big file downloading (approx. 10-1024 Mb) from the same server (without external cloud file storage, aka on-premises) where my app runs using Node.js and Express.js.
I figured out how to do that by converting the entire file into Blob, transferring it over the network, and then generating a download link with window.URL.createObjectURL(…) for the Blob. Such approach perfectly works as long as the files are small, otherwise it is impossible to keep the entire Blob in the RAM of neither server, nor client.
I've tried to implement several other approaches with File API and AJAX, but it looks like Chrome loads the entire file into RAM and only then dumps it to the disk. Again, it might be OK for small files, but for big ones it's not an option.
My last attempt was to send a basic Get-request:
const aTag = document.createElement("a");
aTag.href = `/downloadDocument?fileUUID=${fileName}`;
aTag.download = fileName;
aTag.click();
On the server-side:
app.mjs
app.get("/downloadDocument", async (req, res) => {
req.headers.range = "bytes=0";
const [urlPrefix, fileUUID] = req.url.split("/downloadDocument?fileUUID=");
const downloadResult = await StorageDriver.fileDownload(fileUUID, req, res);
});
StorageDriver.mjs
export const fileDownload = async function fileDownload(fileUUID, req, res) {
//e.g. C:\Users\User\Projects\POC\assets\wanted_file.pdf
const assetsPath = _resolveAbsoluteAssetsPath(fileUUID);
const options = {
dotfiles: "deny",
headers: {
"Content-Disposition": "form-data; name=\"files\"",
"Content-Type": "application/pdf",
"x-sent": true,
"x-timestamp": Date.now()
}
};
res.sendFile(assetsPath, options, (err) => {
if (err) {
console.log(err);
} else {
console.log("Sent");
}
});
};
When I click on the link, Chrome shows the file in Downloads but with a status Failed - No file. No file appears in the download destination.
My questions:
Why in case of sending a Get-request I get Failed - No file?
As far as I understand, res.sendFile can be a right choice for small files, but for big-ones it's better to use res.write, which can be split into chunks. Is it possible to use res.write with Get-request?
P.S. I've elaborated this question to make it more narrow and clear. Previously this question was focused on downloading a big file from Dropbox without storing it in the RAM, the answer can be found:
How to download a big file from Dropbox with Node.js?
Chrome can't show nice progress of downloading because the file is downloading on the background. And after downloading, a link to the file is created and "clicked" to force Chrome to show the dialog for the already downloaded file.
It can be done more easily. You need to create a GET request and let the browser download the file, without ajax.
app.get("/download", async (req, res, next) => {
const { fileName } = req.query;
const downloadResult = await StorageDriver.fileDownload(fileName);
res.set('Content-Type', 'application/pdf');
res.send(downloadResult.fileBinary);
});
function fileDownload(fileName) {
const a = document.createElement("a");
a.href = `/download?fileName=${fileName}`;
a.download = fileName;
a.click();
}

What is the best way to keep a file open to read/write?

I have a local JSON file which I intent to read/write from a NodeJS electron app. I am not sure, but I believe that instead of using readFile() and writeFile(), I should get a FileHandle to avoid multiple open and close actions.
So I've tried to grab a FileHandle from fs.promises.open(), but the problem seems to be that I am unable to get a FileHandle from an existing file without truncate it and clear it to 0.
const { resolve } = require('path');
const fsPromises = require('fs').promises;
function init() {
// Save table name
this.path = resolve(__dirname, '..', 'data', `test.json`);
// Create/Open the json file
fsPromises
.open(this.path, 'wx+')
.then(fileHandle => {
// Grab file handle if the file don't exists
// because of the flag 'wx+'
this.fh = fileHandle;
})
.catch(err => {
if (err.code === 'EEXIST') {
// File exists
}
});
}
Am I doing something wrong? Are there better ways to do it?
Links:
https://nodejs.org/api/fs.html#fs_fspromises_open_path_flags_mode
https://nodejs.org/api/fs.html#fs_file_system_flags
Because JSON is a text format that has to be read or written all at once and can't be easily modified or added onto in place, you're going to have to read the whole file or write the whole file at once.
So, your simplest option will be to just use fs.promises.readFile() and fs.promises.writeFile() and let the library open the file, read/write it and close the file. Opening and closing a file in a modern OS takes advantage of disk caching so if you're reopening a file you just previously opened not long ago, it's not going to be a slow operation. Further, since nodejs performs these operations in secondary threads in libuv, it doesn't block the main thread of nodejs either so its generally not a performance issue for your server.
If you really wanted to open the file once and hold it open, you would open it for reading and writing using the r+ flag as in:
const fileHandle = await fsPromises.open(this.path, 'r+');
Reading the whole file would be simple as the new fileHandle object has a .readFile() method.
const text = await fileHandle.readFile({encoding 'utf8'});
For writing the whole file from an open filehandle, you would have to truncate the file, then write your bytes, then flush the write buffer to ensure the last bit of the data got to the disk and isn't sitting in a buffer.
await fileHandle.truncate(0); // clear previous contents
let {bytesWritten} = await fileHandle.write(mybuffer, 0, someLength, 0); // write new data
assert(bytesWritten === someLength);
await fileHandle.sync(); // flush buffering to disk

Node read stream hangs on specific file, base64 encoding

I have following code working for every file except one that keeps hanging without emitting end or error events (I tried other stream events too).
const fs = require('fs');
const rs = fs.createReadStream(filePath, {
encoding: 'base64',
});
rs.on('data', () => {
console.log('data');
});
rs.on('end', () => {
console.log('end');
});
rs.on('error', e => {
console.log('error', e);
});
If I move read point with start option to 1 instead of 0 it works properly. Same if highWaterMark is set to value other than default. It doesn't really help as it seems it can fail with other "corrupted" file.
It seems like Node bug, but maybe there's something I'm missing here.
I'll post file in here too, but first I need to strip it to down to only corrupting part as it's somewhat private.
Update
Here's file to recreate the issue:
http://s3.eu-west-1.amazonaws.com/jjapitest/file
Update
Here's interactive demo of the issue:
https://repl.it/repls/AnimatedDisguisedNumerator

NodeJs Microsoft Azure Storage SDK Download File to Stream

I just started working with the Microsoft Azure Storage SDK for NodeJS (https://github.com/Azure/azure-storage-node) and already successfully uploaded my first pdf files to the cloud storage.
However, now I started looking at the documentation, in order to download my files as a node_buffer (so I dont have to use fs.createWriteStream), however the documentation is not giving any examples of how this works. The only thing they are writing is "There are also several ways to download files. For example, getFileToStream downloads the file to a stream:", but then they only show one example, which is using the fs.createWriteStream, which I dont want to use.
I was also not able to find anything on Google that really helped me, so I was wondering if anybody has experience with doing this and could share a code sample with me?
The getFileToStream function need a writable stream as param. If you want all the data wrote to a Buffer instead of a file, you just need to create a custom writable stream.
const { Writable } = require('stream');
let bufferArray = [];
const myWriteStream = new Writable({
write(chunk, encoding, callback) {
bufferArray.push(...chunk)
callback();
}
});
myWriteStream.on('finish', function () {
// all the data is stored inside this dataBuffer
let dataBuffer = Buffer.from(bufferArray);
})
then pass myWriteStream to getFileToStream function
fileService.getFileToStream('taskshare', 'taskdirectory', 'taskfile', myWriteStream, function(error, result, response) {
if (!error) {
// file retrieved
}
});

Download large data stream (> 1Gb) using javascript

I was wondering if it was possible to stream data from javascript to the browser's downloads manager.
Using webrtc, I stream data (from files > 1Gb) from a browser to the other. On the receiver side, I store into memory all this data (as arraybuffer ... so the data is essentially still chunks), and I would like the user to be able to download it.
Problem : Blob objects have a maximum size of about 600 Mb (depending on the browser) so I can't re-create the file from the chunks. Is there a way to stream these chunks so that the browser downloads them directly ?
if you want to fetch a large file blob from an api or url, you can use streamsaver.
npm install streamsaver
then you can do something like this
import { createWriteStream } from 'streamsaver';
export const downloadFile = (url, fileName) => {
return fetch(url).then(res => {
const fileStream = createWriteStream(fileName);
const writer = fileStream.getWriter();
if (res.body.pipeTo) {
writer.releaseLock();
return res.body.pipeTo(fileStream);
}
const reader = res.body.getReader();
const pump = () =>
reader
.read()
.then(({ value, done }) => (done ? writer.close() : writer.write(value).then(pump)));
return pump();
});
};
and you can use it like this:
const url = "http://urltobigfile";
const fileName = "bigfile.zip";
downloadFile(url, fileName).then(() => { alert('done'); });
Following #guest271314's advice, I added StreamSaver.js to my project, and I successfully received files bigger than 1GB on Chrome. According to the documentation, it should work for files up to 15GB but my browser crashed before that (maximum file size was about 4GB for me).
Note I: to avoid the Blob max size limitation, I also tried to manually append data to the href field of a <a></a> but it failed with files of about 600MB ...
Note II: as amazing as it might seem, the basic technique using createObjectURL works perfectly fine on Firefox for files up to 4GB !!

Categories