Node.js - Browserify: Error on parsing tar file - javascript

I'm trying to download a tar file (non-compressed) over HTTP and piping it's response to the tar-stream parser for further processing. This works perfect when executed on the terminal without any errors. For the same thing to be utilized on browser, a bundle.js file is generated using browserify and is included in the HTML.
The tar stream contains 3 files. This browserified code when executed on the browser parses 2 entries successfully but raises the following error for the third one:
Error: Invalid tar header. Maybe the tar is corrupted or it needs to be gunzipped?
Whereas with the same HTTP download and parsing code, the tar file is downloaded and parsed completely without errors on terminal. Why is this happening?!
Code snippet is along these lines:
. . . .
var req = http.request(url, function(res){
res.pipe(tar.extract())
.on('entry', function(header, stream, callback) {
console.log("File found " + header.name);
stream.on('end', function() {
console.log("<<EOF>>");
callback();
})
stream.resume();
})
.on('finish', function(){
console.log("All files parsed");
})
.on('error', function(error){
console.log(error); //Raises the above mentioned error here
})
});
. . . .
Any Suggestions? Headers?

The problem here (and its solution) are tucked away in the http-browserify documentation. First, you need to understand a few things about browserify:
The browser environment is not the same as the node.js environment
Browserify does its best to provide node.js APIs that don't exist in the browser when the code you are browserifying needs them
The replacements don't behave exactly the same as in node.js, and are subject to caveats in the browser
With that in mind, you're using at least three node-specific APIs that have browserify reimplementations/shims: network connections, buffers, and streams. Network connections by necessity are replaced in the browser by XHR calls, which have their own semantics surrounding binary data that don't exist within Node [Node has Buffers]. If you look here, you'll notice an option called responseType; this sets the response type of the XHR call, which must be done to ensure you get binary data back instead of string data. Substack suggested to use ArrayBuffer; since this must be set on the options object of http.request, you need to use the long-form request format instead of the string-url format:
http.request({
method: 'GET',
hostname: 'www.site.com',
path: '/path/to/request',
responseType: 'arraybuffer' // note: lowercase
}, function (res) {
// ...
});
See the xhr spec for valid values for responseType. http-browserify passes it along as-is. In Node, this key will simply be ignored.
When you set the response type to 'arraybuffer', http-browserify will emit chunks as Uint8Array. Once you're getting a Uint8Array back from http.request, another problem presents itself: the Stream API only accepts string and Buffer for input, so when you pipe the response to the tar extractor stream, you'll receive TypeError: Invalid non-string/buffer chunk. This seems to me to be an oversight in stream-browserify, which should accept Uint8Array values to go along nicely with the other parts of the browserified Node API. You can fairly simply work around it yourself, though. The Buffer shim in the browser accepts a typed array in the constructor, so you can pipe the data yourself, converting each chunk to a Buffer manually:
http.request(opts, function (res) {
var tarExtractor = tar.extract();
res.on('data', function (chunk) {
tarExtractor.write(new Buffer(chunk));
});
res.on('end', function () {
tarExtractor.end();
});
res.on('error', function (err) {
// do something with your error
// and clean up the tarExtractor instance if necessary
});
});
Your code, then, should look something like this:
var req = http.request({
method: 'GET',
// Add your request hostname, path, etc. here
responseType: 'arraybuffer'
}, function(res){
var tarExtractor = tar.extract();
res.on('data', function (chunk) {
tarExtractor.write(new Buffer(chunk));
});
res.on('end', tarExtractor.end.bind(tarExtractor));
res.on('error', function (error) {
console.log(error);
});
tarExtractor.on('entry', function(header, stream, callback) {
console.log("File found " + header.name);
stream.on('end', function() {
console.log("<<EOF>>");
callback();
})
stream.resume(); // This won't be necessary once you do something with the data
})
.on('finish', function(){
console.log("All files parsed");
});
});

Related

How to read file from #google-cloud/storage?

I am retriving a file from my bucket.
I get the file and want to read it's contents but I do not want to download it to my local project
i just want to read the contents, take the data and do other operations with it.
my code:
export const fileManager = async () => {
try {
const source = 'upload/';
const options = { prefix: source, delimiter: '/' };
const remoteFile = st.bucket(bName).file('myData.csv');
const readFileData;
remoteFile
.createReadStream()
.on('error', err => {
console.log('error');
})
.on('response', response => {
readFileData = response;
console.log('success');
// Server connected and responded with the specified status and headers.
})
.on('end', () => {
console.log('end');
// The file is fully downloaded.
});
console.log("data", readFileData)
} catch (error) {
console.log('Error Is', error);
}
};
readFileData is undefined.
Is this possible? Every example I find envolves me downloading the file.
createReadStream is asynchronous and returns immediately. You have to use the callbacks to find out when the download to memory is complete. Right now, your code is always going to print "data undefined" because it's trying to print the response before it's available.
createReadStream is definitely the right way to go, but you will have to understand how node streams work in order to process the results correctly. There is a whole section in the linked documentation for reading streams, which is what you want to do here. The way you deal with the stream is not specific to Cloud Storage - it's the same for all node streams.
You might be helped by the answers to these questions that deal with reading node streams:
Node.js: How to read a stream into a buffer?
Convert stream into buffer?
How do I read the contents of a Node.js stream into a string variable?

Send a pdf file using telegram bot api

i am trying to send a pdf file using the url of the file and using the method "sendDocument", the problem is that i cant access the file directly because of the server where its stored. I tried to use the answer provided in this post:
readFileSync from an URL for Twitter media - node.js
It works, but the file is send as "file.doc". If i change the extension to pdf, it is the correct file. Is there any extra step i need to do to send the file with the correct name and extension, or is there another way i can achieve what i need?
EDIT: The code i am using to get the pdf looks exactly like the code in the anwser of the post i provided:
function getImage(url, callback) {
https.get(url, res => {
// Initialise an array
const bufs = [];
// Add the data to the buffer collection
res.on('data', function (chunk) {
bufs.push(chunk)
});
// This signifies the end of a request
res.on('end', function () {
// We can join all of the 'chunks' of the image together
const data = Buffer.concat(bufs);
// Then we can call our callback.
callback(null, data);
});
})
// Inform the callback of the error.
.on('error', callback);
}
To send the file i use something like this:
getImage(url, function(err, data){
if(err){
throw new Error(err);
}
bot.sendDocument(
msg.chat.id,
data,
);
})
Found the solution. I am using the telebot api (sorry for not mentionig that detail, but i did not knew it, i did not make the project).
I used the following line to send the file:
bot.sendDocument(chat_id, data, {fileName: 'file.pdf'});
You can specify the file name and file type by using this code:
const fileOptions = {
// Explicitly specify the file name.
filename: 'mypdf.pdf',
// Explicitly specify the MIME type.
contentType: 'application/pdf',
};
Full function:
getImage("https://your.url/yourfile.pdf", function(err, data){
if(err){
throw new Error(err);
}
const fileOptions = {
// Explicitly specify the file name.
filename: 'mypdf.pdf',
// Explicitly specify the MIME type.
contentType: 'application/pdf',
};
bot.sendDocument(msg.chat.id, data, {}, fileOptions);
});
NOTE: You MUST provide an empty object ({}) in place of Additional Telegram query options, if you have no query options to specify. For example,
// WRONG!
// 'fileOptions' will be taken as additional Telegram query options!!!
bot.sendAudio(chatId, data, fileOptions);
// RIGHT!
bot.sendAudio(chatId, data, {}, fileOptions);
More informations here:
https://github.com/yagop/node-telegram-bot-api/blob/master/doc/usage.md#sending-files

Mocking soap services with nock

I'm working on a node app that communicates with soap services, using the foam module to parse json into a valid soap request and back again when the response is received. This all works fine when communicating with the soap services.
The issue I'm having is writing unit tests for this (integration tests work fine). I'm using nock to mock the http service and send a reply. This reply does get parsed by foam and then I can make assertions against the response.
So I cannot pass a json object as a reply because foam expects a soap response. If I try to do this I get the error:
Error: Start tag expected, '<' not found
Storing XML in javascript variables is painful and doesn't work (i.e. wrapping it in quotes and escaping inner quotes isn't valid), so I wanted to put the mocked XML response into a file and pass that as a reply.
I've tried reading the file in as a stream
return fs.createReadStream('response.xml')
...and replying with a file
.replyWithFile(201, __dirname + 'response.xml');
Both fail with an error of
TypeError: Cannot read property 'ObjectReference' of undefined
Here is the XML in the file
<env:Envelope xmlns:env='http://schemas.xmlsoap.org/soap/envelope/'>
<env:Header></env:Header>
<env:Body>
<FLNewIndividualID xmlns='http://www.lagan.com/wsdl/FLTypes'>
<ObjectType>1</ObjectType>
<ObjectReference>12345678</ObjectReference>
<ObjectReference xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:nil='true'/>
<ObjectReference xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:nil='true'/>
</FLNewIndividualID>
</env:Body>
</env:Envelope>
The module being tested is
var foam = require('./foam-promise.js');
module.exports = {
createUserRequest: function(url, operation, action, message, namespace) {
var actionOp = action + '/actionRequestOp',
uri = url + '/actionRequest';
return new Promise(function(resolve, reject) {
foam.soapRequest(uri, operation, actionOp, message, namespace)
.then(function(response) {
resolve(response.FLNewIndividualID.ObjectReference[0]);
})
.catch(function(err) {
reject(err);
});
});
}
};
The assertion is using should-promised
return myRequest(url, operation, action, data, namespace)
.should.finally.be.exactly('12345678');
So it looks like the xml parser won't just accept a file (which makes sense). Does the stream not complete before it is tested?
Can an XML reply be mocked successfully with nock?
I also raised this on Github
Following pgte's advice here https://github.com/pgte/nock/issues/326 I was able to get this working by setting the correct headers, replying with an xml string (with escaped quotes).
From pgte:
It can. I don't know foam well, but I guess you have to set the
response content type header (see
https://github.com/pgte/nock#specifying-reply-headers ) so that the
client can parse the XML correctly.
Here's how the working test looks:
it('should return a user ID', function(){
var response = '<env:Envelope xmlns:env=\'http://schemas.xmlsoap.org/soap/envelope/\'><env:Header></env:Header><env:Body><UserReference>12345678</UserReference></env:Body></env:Envelope>'
nock(url)
.post('/createUserRequest')
.reply(201, response, {
'Content-Type': 'application/xml'
}
);
return createUserRequest(url, operation, action, message, options)
.should.be.fulfilledWith('12345678');
});
it('should be rejected if the request fails', function() {
nock(url)
.post('/createCaseRequest')
.replyWithError('The request failed');
return createUserRequest(url, operation, action, message, options)
.should.be.rejected;
});

Node.js weird encoding on response?

I am using a third party api to get some images the response gives me this. I don't think this is base64?
"����\u0000\u0010JFIF\u0000\u0001\u0001\u0000\u0000\u0001\u0000\u0001\u0000\u0000��\u0000C\u0000\b\u0006\u0006\u0007\u0006\u0005\b\u0007\u0007\u0007\t\t\b\n\f\u0014\r\f\u000b\u000b\f\u0019\u0012\u0013\u000f\u0014\u001d\u001a\u001f\u001e\u001d\u001a\u001c\u001c $.' \",#\u001c\u001c(7),01444\u001f'9=82<.342��\u0000C\u0001\t\t\t\f\u000b\f\u0018\r\r\u00182!\u001c!22222222222222222222222222222222222222222222222222��\u0000\u0011\b\u0002!\u0002&\u0003\u0001\"\u0000\u0002\u0011\u0001\u0003\u0011\u0001��\u0000\u001f\u0000\u0000\u0001\u0005\u0001\u0001\u0001\u0001\u0001\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0001\u0002\u0003\u0004\u0005\u0006\u0007\b\t\n\u000b��\u0000"
The code that makes the request.
unirest.get("MYAPIROUTE")
.header("X-Mashape-Key", "MYKEY")
.end(function (result) {
console.log(result.status, result.headers, result.body);
res.send(result.body);
});
My question is, with node.js how do I decode this so I can send the client a proper image?
Resolution:
unirest.get("MYAPIROUTE")
.header("X-Mashape-Key", "MYKEY")
.end(function (result) {
console.log(result.status, result.headers, result.body);
if(result.status==200) {
var buffer = (new Buffer(result.body.toString()));
res.end(buffer.toString("base64")); // output content as response body
require('fs').writeFileSync('/some/public/folder/md5HashOfRequestedUrl.jpg', buffer); // also write it to file
delete buffer;
return;
}
res.writeHead(result.status, result.headers);
res.write(result.body);
res.end();
});
reference: http://nodejs.org/api/buffer.html
let's say it's an image. so why not to try to set
<img src="http://your-site.com/some/public/folder/md5HashOfRequestedUrl.jpg">
also You can write response to file in temporary public folder to avoid doing same requests to somewhere.
The 3rd-party API should have the data-type of what the expected response type is, so that we can figure out what the format is that needs to be decoded...
My guess is that its returning a UTF-8 string, try this modules' decode function to decode the string: https://www.npmjs.com/package/utf8

Node.js posting a large request using request module hangs

I am using the node.js request module to post a large POST request (~150MB) to a REST service. For any request bigger than about 30MB, it seems to hang. My guess is that it is doing some naive JSON.stringify()ing to the data instead of streaming it, and once it gets large enough to hit swap, it just becomes very slow.
This is what my request looks like:
request({
uri: url
method: 'post',
json: args, // args is a 150MB json object
}, function(err, resp, body) {
// do something
});
The same request made using angularjs's $http from within a browser works in less than a minute, so I know it's the cilent and not the server.
Is there an easy way to fix this?
Streaming is not going to happen automatically. The API to stream something and the API to send a complete buffer are almost always different, and this is no exception. Most libraries out there given a complete javascript object in memory are just going to JSON.stringify it because you've already paid the memory and I/O price to load it in RAM so why bother streaming?
You could try the oboe streaming JSON library, which specializes in this type of thing. Here's a working example:
var oboe = require("oboe");
var args = {foo: "bar"};
var url = "http://localhost:2998";
oboe({url: url, body: args, method: "post"}).on("done", function (response) {
console.log('response:', response);
});
Aside: instead of guessing, you could verify in the source exactly what is happening. It's open source. It's javascript. Go ahead and dig in!
Updating answer:
I'd suggest your try two things :
See how much time superagent is taking to post the same data. Superagent is as simple as request.
var request = require('superagent');
request
.post(url)
.send(args)
.set('Accept', 'application/json')
.end(function(error, res){
});
Compress the data to be posted by using zlip, your will be compressing the data into a zlip buffer and write that as your output
var zlib = require('zlib');
var options = {
hostname: 'www.yourwebsite.com',
port: 80,
path: '/your-post-url',
method: 'POST',
headers: {'Content-Encoding': 'gzip'} // tell the server that the data is compressed
};
//args is your JSON stringified data
zlib.gzip(args, function (err, buffer) {
var req = http.request(options, function(res) {
res.setEncoding('utf8');
res.on('data', function (chunk) {
// ... do stuff with returned data
});
});
req.on('error', function(e) {
console.log('problem with request: ' + e.message);
});
req.write(buffer); // send compressed data
req.end();
});

Categories