Downloading N number of remote files using Node.js synchronously

Downloading N number of remote files using Node.js synchronously - javascript

I'm working on a simple app using Node.js which needs to do the following when given a valid URL
Retrieve the HTML of the remote page, save it locally.
Spider the HTML (using cheerio) and record all JS and CSS file references.
Make HTTP request for each JS/CSS file and save it to the server by file name.
Zip up the html, css, and js files and stream the resulting file to the browser.
I've got 1 and 2 working, and the first half of #3 but I'm running into issues with the synchronous nature of the downloads. My code is running too fast and generating file names for the CSS and JS files, but none of the content. I'm guessing this is because my code isn't synchronous. The problem is that I don't know in advance how many files there might be and all of them have to be there before the ZIP file can be generated.
Here's the flow of my app as it currently exists. I've left out the helper methods as they don't affect synchronicity. Can any of you provide input as to what I should do?
http.get(fullurl, function(res) {
res.on('data', function (chunk) {
var $source = $(''+chunk),
js = getJS($source, domain),
css = getCSS($source, domain),
uniqueName = pw(),
dir = [baseDir,'jsd-', uniqueName, '/'].join(''),
jsdir = dir + 'js/',
cssdir = dir + 'css/',
html = rewritePaths($source);
// create tmp directory
fs.mkdirSync(dir);
console.log('creating index.html');
// save index file
fs.writeFileSync(dir + 'index.html', html);
// create js directory
fs.mkdirSync(jsdir);
// Save JS files
js.forEach(function(jsfile){
var filename = jsfile.split('/').reverse()[0];
request(jsfile).pipe(fs.createWriteStream(jsdir + filename));
console.log('creating ' + filename);
});
// create css directory
fs.mkdirSync(cssdir);
// Save CSS files
css.forEach(function(cssfile){
var filename = cssfile.split('/').reverse()[0];
request(cssfile).pipe(fs.createWriteStream(cssdir + filename));
console.log('creating ' + filename);
});
// write zip file to /tmp
writeZip(dir,uniqueName);
// https://npmjs.org/package/node-zip
// http://stuk.github.com/jszip/
});
}).on('error', function(e) {
console.log("Got error: " + e.message);
});

The way you are downloading file through request module is asynchronous
request(cssfile).pipe(fs.createWriteStream(cssdir + filename));
instead of download like that you need to do like this create a seperate function
function download (localFile, remotePath, callback) {
var localStream = fs.createWriteStream(localFile);
var out = request({ uri: remotePath });
out.on('response', function (resp) {
if (resp.statusCode === 200){
out.pipe(localStream);
localStream.on('close', function () {
callback(null, localFile);
});
}
else
callback(new Error("No file found at given url."),null);
})
};
you need to use async module by colan https://github.com/caolan/async for
// Save JS files
async.forEach(js,function(jsfile,cb){
var filename = jsfile.split('/').reverse()[0];
download(jsdir + filename,jsfile,function(err,result){
//handle error here
console.log('creating ' + filename);
cb();
})
},function(err){
// create css directory
fs.mkdirSync(cssdir);
// Save CSS files
css.forEach(function(cssfile){
var filename = cssfile.split('/').reverse()[0];
request(cssfile).pipe(fs.createWriteStream(cssdir + filename));
console.log('creating ' + filename);
});
// write zip file to /tmp
writeZip(dir,uniqueName);
});

Related

How do I pass a file/blob from JavaScript to emscripten/WebAssembly (C++)?

I'm writing a WebExtension that uses C++ code compiled with emscripten. The WebExtension downloads files which I want to process inside the C++ code. I'm aware of the File System API and I think I read most of it, but I don't get it to work - making a downloaded file accessible in emscripten.
This is the relevant JavaScript part of my WebExtension:
// Download a file
let blob = await fetch('https://stackoverflow.com/favicon.ico').then(response => {
if (!response.ok) {
return null;
}
return response.blob();
});
// Setup FS API
FS.mkdir('/working');
FS.mount(IDBFS, {}, '/working');
FS.syncfs(true, function(err) {
if (err) {
console.error('Error: ' + err);
}
});
// Store the file "somehow"
let filename = 'favicon.ico';
// ???
// Call WebAssembly/C++ to process the file
Module.processFile(filename);
The directory is created, what can be seen, when inspecting the Web Storage of the browser. If I understand the File System API correctly, I have to "somehow" write my data to a file inside /working. Then, I should be able to call a function of my C++ code (from JavaScript) and open that file as if there was a directory called 'working' at the root, containing the file. The call of the C++ function works (I can print the provided filename).
But how do I add the file (currently a blob) to that directory?
C++ code:
#include "emscripten/bind.h"
using namespace emscripten;
std::string processFile(std::string filename)
{
// open and process the file
}
EMSCRIPTEN_BINDINGS(my_module)
{
function("processFile", &processFile);
}

It turned out, that I was mixing some things up while trying different methods, and I was also misinterpreting my debugging tools. So the easiest way to accomplish this task (without using IDBFS) is:
JS:
// Download a file
let blob = await fetch('https://stackoverflow.com/favicon.ico').then(response => {
if (!response.ok) {
return null;
}
return response.blob();
});
// Convert blob to Uint8Array (more abstract: ArrayBufferView)
let data = new Uint8Array(await blob.arrayBuffer());
// Store the file
let filename = 'favicon.ico';
let stream = FS.open(filename, 'w+');
FS.write(stream, data, 0, data.length, 0);
FS.close(stream);
// Call WebAssembly/C++ to process the file
console.log(Module.processFile(filename));
C++:
#include "emscripten/bind.h"
#include <fstream>
using namespace emscripten;
std::string processFile(std::string filename)
{
std::fstream fs;
fs.open (filename, std::fstream::in | std::fstream::binary);
if (fs) {
fs.close();
return "File '" + filename + "' exists!";
} else {
return "File '" + filename + "' does NOT exist!";
}
}
EMSCRIPTEN_BINDINGS(my_module)
{
function("processFile", &processFile);
}
If you want to do it with IDBFS, you can do it like this:
// Download a file
let blob = await fetch('https://stackoverflow.com/favicon.ico').then(response => {
if (!response.ok) {
return null;
}
return response.blob();
});
// Convert blob to Uint8Array (more abstract: ArrayBufferView)
let data = new Uint8Array(await blob.arrayBuffer());
// Setup FS API
FS.mkdir('/persist');
FS.mount(IDBFS, {}, '/persist');
// Load persistant files (sync from IDBFS to MEMFS), will do nothing on first run
FS.syncfs(true, function(err) {
if (err) {
console.error('Error: ' + err);
}
});
FS.chdir('/persist');
// Store the file
let filename = 'favicon.ico';
let stream = FS.open(filename, 'w+');
FS.write(stream, data, 0, data.length, 0);
FS.close(stream);
// Persist the changes (sync from MEMFS to IDBFS)
FS.syncfs(false, function(err) {
if (err) {
console.error('Error: ' + err);
}
});
// NOW you will be able to see the file in your browser's IndexedDB section of the web storage inspector!
// Call WebAssembly/C++ to process the file
console.log(Module.processFile(filename));
Notes:
When using FS.chdir() in the JS world to change the directory, this also changes the working directory in the C++ world. So respect that, when working with relative paths.
When working with IDBFS instead of MEMFS, you are actually still working with MEMFS and just have the opportunity to sync data from or to IDBFS on demand. But all your work is still done with MEMFS. I would consider IDBFS as an add-on to MEMFS. Didn't read that directly from the docs.

How do I get access to the variables / methods in static files in Node.js + Expressjs at the server-side without using POST request?

More Specifically, I have a blob initialized and processed in a .js file in the 'public' (static folder) folder. Since that has been processed at the client-side, I want to know if there's a way I can somehow get access to the blob at the server-side without using a POST request. The file we talking about has been processed and is stored in a variable in a static file (script.js), Now I need to upload that variable/blob onto the Database. But, in a static file, I don't have access to the Database and can't even export the variable to the server. How do I get access to that variable which is within the static file? Someone, please edit if they have understood my requirement.
What my program does is that it records audio through the microphone of the client And that audio file has to be uploaded onto the Database. Now, I can add the functionality of 'Download' for the client and let the client download the file and then the client uses <input> tag to send a POST request to the server, But, now the client can upload any audio file into that input tag, Basically, this is a web app for live monitoring students writing exam, So that don't they don't cheat I capture their audio and save it to the DB. Please refer to my folder Structure for more details and then read the question again.
My Folder Structure:
--DBmodels
---AudioModels.js
---ImageModel.js
--node_modules
--public
---scipt.js (This contains the code for audio processing)
--views
---test.ejs (Main HTML page)
--index.js (server file)
--package.json
Here is a small diagram for reference :
Diagram
And here is my Folder Structure :
Folder Structure
One way of doing is this to download it on the client-side and then ask the client to upload but that doesn't work for me due to some reasons.
Here is my script.js, But I don't have access of the variables such as chunk_audio[] on the server.
const chunks_audio = [];
mediaRecorder.onstop = async function (e) {
console.log("DONE WITH AUDIO");
const blob = new Blob(chunks_audio, {
'type': "audio/ogg codecs=opus"
});
const audioURL = window.URL.createObjectURL(blob);
console.log(audioURL);
var link = document.getElementById("downloadaudio");
link.href = audioURL;
var audioMIMEtypes = ["audio/aac", "audio/mpeg", "audio/ogg", "audio/opus", "audio/wav"]
const audio = blob
const audiodb = new AudioSchema({
name : "Audio"+Date.now().toString()[5]
});
saveAudio(audiodb,audio)
try{
const new_audio = await audiodb.save();
console.log("AUDIO UPLOADED" + new_audio);
}catch (err){
console.log(err);
}
function saveAudio(audiodb, audioEncoded) {
if (audioEncoded == null) return;
console.log("before parse: " + audioEncoded);
const audio = JSON.parse(audioEncoded);
console.log("JSON parse: " + audio);
if (audio != null && audioMIMEtypes.includes(audio.type)) {
audiodb.audio = new Buffer.from(audio.data, "base64");
audiodb.audioType = audio.type;
}
}
// module.exports = chunks_audio; (This doesn't work for obvious reasons)
Here is my server file (index.js) , I tried to use POST request where the user posts the audio file after it gets downloaded, But the user could post any other file in the <input> tag, So that doesn't match with my requirement:
var audioMIMEtypes = ["audio/aac", "audio/mpeg", "audio/ogg", "audio/opus", "audio/wav"]
app.post('/', async ( req, res, next)=>{
const audio = blob // 'blob' is a variable in script.js , Hence don't have access here
const audiodb = new AudioSchema({
name : "Audio"+Date.now().toString()[5]
});
saveAudio(audiodb,audio)
try{
const new_audio = await audiodb.save();
console.log("AUDIO UPLOADED" + new_audio);
}catch (err){
console.log(err);
}
function saveAudio(audiodb, audioEncoded) {
if (audioEncoded == null) return;
console.log("before parse: " + audioEncoded);
const audio = JSON.parse(audioEncoded);
console.log("JSON parse: " + audio);
if (audio != null && audioMIMEtypes.includes(audio.type)) {
audiodb.audio = new Buffer.from(audio.data, "base64");
audiodb.audioType = audio.type;
}
}
})

how to prevent Async in NodeJS - wait for task a to complete before starting task b

I am in need to wait for task A to complete before executing task B code.
task A is to convert audio file and
task B uses the converted audio for further process.
because task A store new audio file to particular directory and task B is trying to access the file which does not exist my code breaks.
How do I make sure task B code executes once the new audio file is saved to a directory?
code
var track = fileURL;//your path to source file
ffmpeg(track)
.toFormat('flac')
.on('error', function (err) {
console.log('An error occurred: ' + err.message);
})
.on('progress', function (progress) {
// console.log(JSON.stringify(progress));
console.log('Processing: ' + progress.targetSize + ' KB converted');
})
.on('end', function () {
console.log('Processing finished !');
})
.save(path.join(__dirname, '/public/downloads/Test.flac'));//path where you want to save your file
The above part of the code takes file from uploads folder converts it to new file format and saves it to the downloads directory.
You can see below I am trying to access the file (Test.flac) in downloads folder. There is lot more code but I need to execute this block of code only after completion of above task.
const Speech = require('#google-cloud/speech');
const projectId = 'uliq-68823';
// Instantiates a client
const speechClient = Speech({
projectId: projectId
});
// The name of the audio file to transcribe
const fileName2 = path.join(__dirname, '/public/downloads/' + 'Test.flac');
// Reads a local audio file and converts it to base64
const file2 = fs.readFileSync(fileName2);
const audioBytes = file2.toString('base64');

The fluent-ffmpeg library uses streams to process your files. Therefore if you want to execute code after the stream is done, call your code in the callback called on the 'end' event of the stream.
Example:
var track = fileURL;//your path to source file
ffmpeg(track)
.toFormat('flac')
.on('error', function (err) {
console.log('An error occurred: ' + err.message);
})
.on('progress', function (progress) {
// console.log(JSON.stringify(progress));
console.log('Processing: ' + progress.targetSize + ' KB converted');
})
.on('end', function () {
console.log('Processing finished !');
// USE THE FILE HERE
// <----------------
})
.save(path.join(__dirname, '/public/downloads/Test.flac'));

Use async water fall package which is used to serialize the function so that second function will run after first
here is link package link

How to use download a PDF and convert it into a txt file in node.js? (JavaScript)

I am trying to get a pdf file by requesting a download link to download the file and then convert the text in it into a txt file. However, I am getting this error:
"(while reading XRef): Error: Invalid XRef stream
XRefParseException"
when loading the pdf into the parser. This sets off the error handler which just prints the error message. Here is my code right now:
import request from 'superagent';
import PDFparser from 'pdf2json';
//a download link (indicated by the dl=1) for some dropbox example.pdf
link = 'https://www.dropbox.com/s/22nvxasry8zpwbg/example%20(3).pdf?dl=1';
//sending a request to this download link
request.get(link).end((err, res) => {
if (res.headers['content-type'] === 'application/pdf') {
//creates a new file and pipes the response into the stream
let pdfId = 'search-' + Date.now();
let file = fs.createWriteStream('./tmp/pdf/' + pdfId + '.pdf');
res.pipe(file);
//api for pdfParser setting handlers
pdfParser.on("pdfParser_dataError", errData => {
console.error(errData.parserError)
});
pdfParser.on("pdfParser_dataReady", pdfData => {
console.log('got data, writing to txt file');
console.log("./tmp/txt/" + pdfId + ".txt");
fs.writeFile("./tmp/txt/" + pdfId + ".txt", pdfParser.getRawTextContent());
});
//load the pdf file into the pdfParser
// I think the error happens here
pdfParser.loadPDF('./tmp/pdf/' + pdfId + '.pdf');
}
});
I think the error happens when I'm trying to load the pdf into the parser, but I'm not 100% sure. And I don't know what to do about this error. Any help is appreciated. Thanks!
Here is the api guide for superagent:
https://visionmedia.github.io/superagent/
and the api guide for pdf2json: https://github.com/modesty/pdf2json

Unable to createWriteStream to save downloaded file

I have a function in my NW.js app that downloads a bunch of files from the server and saves them in the folder chosen by the user with the names sent from the server. I do not know the names of the files in advance - the urls I am using are randomly-generated strings that I have gotten from another server, and this server is looking up each hash to see which file it corresponds to.
var regexp = /filename=\"(.*)\"/gi;
media_urls.forEach(function(url) {
var req = client.request(options, function(res) {
var file_size = parseInt(res.headers['content-length'], 10);
var content_disposition = res.headers['content-disposition'];
var name = regexp.exec(content_disposition)[1];
var path = Path.join(save_dir, name);
var file = fs.createWriteStream(path);
file.on('error', function(e) {
console.log(e);
req.abort();
});
res.on('data', function(chunk) {
file.write(chunk);
});
res.on('end', function() {
file.end();
});
});
req.on('error', function(e) {
console.log(e);
});
req.end();
});
I keep getting ENOENT errors when this code runs. This doesn't make any sense because the file is supposed to be created now, so of course it doesn't exist!
Why am I getting this error instead of having the file downloaded?

The file names coming from the server had :s in them, which is a valid filename character on Linux ext4, but not on Windows ntfs.
Changing
var name = regexp.exec(content_disposition)[1];
to
var name = regexp.exec(content_disposition)[1].replace(':', '-');
solved this particular problem.

We Keep Coding

JavaScript is the programming language of the Web.

Downloading N number of remote files using Node.js synchronously - javascript

Related

How do I pass a file/blob from JavaScript to emscripten/WebAssembly (C++)?

How do I get access to the variables / methods in static files in Node.js + Expressjs at the server-side without using POST request?

how to prevent Async in NodeJS - wait for task a to complete before starting task b

How to use download a PDF and convert it into a txt file in node.js? (JavaScript)

Unable to createWriteStream to save downloaded file

Categories

Resources