Nodejs error encoding when get external site's content

Nodejs error encoding when get external site's content - javascript

I used get method of request module to get content of external site. If encoding of external site is utf-8, it is ok, but it has display error with other encodings such as shift-jis
function getExternalUrl(request, response, url){
mod_request.get(url, function (err, res, body) {
//mod_request.get({uri: url, encoding: 'binary'}, function (err, res, body) {
if (err){
console.log("\terr=" + err);
}else{
var result = res.body;
// Process res.body
response.write(result);
}
response.end();
});
}
How can I get content of external site with correct encoding?

I found the way to do:
Get with binary encoding
var mod_request = require('request');
mod_request.get({ uri: url, encoding: 'binary', headers: headers }, function(err, res, body) {});
Create a Buffer with binary format
var contentBuffer = new Buffer(res.body, 'binary');
Get real encoding of page by detect-character-encoding npm
var mod_detect_character_encoding = require('detect-character-encoding');
var charsetMatch = mod_detect_character_encoding(contentBuffer);
Convert page to utf-8 by iconv npm
var mod_iconv = require('iconv').Iconv;
var iconv = new mod_iconv(charsetMatch.encoding, 'utf-8');
var result = iconv.convert(contentBuffer).toString();
P/S: This way is only applied for text file (html, css, js). Please do not apply for image file or others which is not text

Related

Download image from express route

I have an express server running with the following route:
exports.getUserFile = function (req, resp) {
let filePath = path.join(__dirname, 'storage', req.params.fileName);
resp.download(filePath);
});
}
In my web app i'm calling this route and trying to save the file locally using file-saver:
let req = request.get('/users/' + userId + '/files/' + file.name);
req.set('Authorization', 'Bearer ' + this.state.jsonWebToken);
req.end((err, resp) => {
let f = new File([resp.text], file.name, {type: resp.type});
fileSaver.saveAs(f);
});
If the file is plain text then it works ok, but for other file types like images i'm not able to open the file (it's 'corrupt').
This is what the response looks like:
Do I need to decode the data in some way first? What is the correct way to save the content of the file?

If you're using superagent to perform the requests, you can explicitly set the response type to "blob", which would prevent any attempts to decode the response data. The binary data will end up in resp.body:
req.responseType('blob').end((err, resp) => {
saveAs(resp.body, file.name);
});

I haven't used express for a long time ago and I'm typing from mobile, it's seems a encoding issue, so it's seems that you're a sending raw image, you will need to encode it in base64 try something like:
//Here your saved file needs to be encoded to base 64.
var img = new Buffer(data, 'base64');
res.writeHead(200, {
'Content-Type': 'image/png',
'Content-Length': img.length
});
res.end(img);
Where data is your saved image, If you can render the image you just add the headers for download or just chain method download.

If you want to download the image as attachment in the page you can use res
exports.getUserFile = function (req, resp) {
let filePath = path.join(__dirname, 'storage', req.params.fileName);
var check = fs.readFileSync(__dirname+req.params.fileName);
resp.attachment(req.params.fileName); // The name of the file to be saved as. Eg Picture.jpg
res.resp(check) // Image buffer read from the path.
});
}
Reference:
http://expressjs.com/en/api.html#res.attachment
http://expressjs.com/en/api.html#res.end
Hope this helps.

How to decode a binary buffer to an image in node.js?

I'm receiving an image in a binary stream as shown below, however when I try to create a buffer with the following data the buffer appears to be empty. Is the problem that buffer doesn't understand this format?
V�q)�EB\u001599!F":"����\u000b��3��5%�L�\u0018��pO^::�~��m�<\u001e��L��k�%G�$b\u0003\u0011���=q�V=��A\u0018��O��U���m�B���\u00038�����0a�_��#\u001b����\f��(�3�\u0003���nGjr���Mt\�\u0014g����~�#�Q��
g�K��s��#C��\u001cS�`\u000bps�Gnzq�Rg�\fu���C\u0015�\u001d3�E.BI\u0007���
var buffer = new Buffer(req.body, 'binary')
console.log("BUFFER:" + buffer)
fs.writeFile('test.jpg', buffer, function(err,written){
if(err) console.log(err);
else {
console.log("Successfully written");
}
});

I think you should set the encoding when you call fs.writeFile like this :
fs.writeFile('test.jpg', buffer, 'binary', function(err) {

Problem was body-parser doesn't parse content-type: octet-stream and I was overriding the header to parse it as an url-encoded-form which the buffer didn't understand even though I was able to log the req.body. The middleware below allows for the parsing of content-type: octet-stream for body-parser.
app.use(function(req, res, next) {
var contentType = req.headers['content-type'] || ''
var mime = contentType.split(';')[0];
// Only use this middleware for content-type: application/octet-stream
if(mime != 'application/octet-stream') {
return next();
}
var data = '';
req.setEncoding('binary');
req.on('data', function(chunk) {
data += chunk;
});
req.on('end', function() {
req.rawBody = data;
next();
});
});

HTTP POST with a file upload

I am new to javascript/protractor and am trying to write code that posts or uploads a text file to a REST endpoint, somewhat, on the following lines. I am not able to get this to work and do not know why it is failing. Can anyone please validate this or suggest a better solution, preferably with sample code. (I have checked online and found lot of information, but was not able to apply any of that directly.)
var request = require('request');
var fs = require('fs');
var path = require('path');
var form = new FormData();
form.append('agency', 'California');
form.append('siteType', 'EF');
fileName = "test.txt";
var filePath = path.resolve(__dirname, "../resources/upload/" + fileName);
fs.writeFileSync(filePath,
"This is a test txt file");
form.append('file', fs.createReadStream(absolutePath));
request.post({url: restServiceUrl, formData: form},
function optionalCallback(err, httpResponse, body) {
if (err) {
console.error('upload failed:', err);
}
console.log('Upload successful! Server responded with:', body);
});

Node.js Server: Image Upload / Corruption Issues

So I'm trying to write a basic file server in Node.js, and all the images I've tried uploading and storing on it are coming back as corrupted. The problem seems to have something to do with the way that Node Buffers handle being converted to UTF-8 and back again (which I have to do in order to get the POST body headers out and away from the binary data).
Here's a simple Node server that shows my current approach and the problems I've been having:
var http = require('http');
var server = http.createServer(function(request, response) {
if (request.method === "GET") {
// on GET request, output a simple web page with a file upload form
var mypage = '<!doctype html><html><head><meta charset="utf-8">' +
'<title>Submit POST Form</title></head>\r\n<body>' +
'<form action="http://127.0.0.1:8008" method="POST" ' +
'enctype="multipart/form-data"> <input name="upload" ' +
'type="file"><p><button type="submit">Submit</button>' +
'</p></form></body></html>\r\n';
response.writeHead(200, {
"Content-Type": "text/html",
"Content-Length": mypage.length
});
response.end(mypage);
} else if (request.method === "POST") {
// if we have a return post request, let's capture it
var upload = new Buffer([]);
// get the data
request.on('data', function(chunk) {
// copy post data
upload = Buffer.concat([upload, chunk]);
});
// when we have all the data
request.on('end', function() {
// convert to UTF8 so we can pull out the post headers
var str = upload.toString('utf8');
// get post headers with a regular expression
var re = /(\S+)\r\nContent-Disposition:\s*form-data;\s*name="\w+";\s*filename="[^"]*"\r\nContent-Type: (\S+)\r\n\r\n/i,
reMatch = str.match(re);
var lengthOfHeaders = reMatch[0].length,
boundary = reMatch[1],
mimeType = reMatch[2];
// slice headers off top of post body
str = str.slice(lengthOfHeaders);
// remove the end boundary
str = str.replace("\r\n" + boundary + "--\r\n", '');
// convert back to buffer
var rawdata = new Buffer(str, 'utf8');
// echo back to client
response.writeHead(200, {
"Content-Type": mimeType
});
response.end(rawdata);
});
}
});
server.listen(8008);
console.log("server running on port 8008");
To test it, run the script in node and go to 127.0.0.1:8008 in your browser. Try uploading an image and submitting the form. The image comes back as corrupt every time -- even though the script should just be directly echoing the image data back to the browser.
So does anyone know what I'm doing wrong here? Is there a better way to handle POST body headers in Node that I haven't figured out yet? (And before anyone says anything, no, I don't want to use Express. I want to figure out and understand this problem.)

The problem seems to have something to do with the way that Node Buffers handle being converted to UTF-8 and back again
I guess you are right about that, convert to UTF-8 is a bad idea, but can do it just to work with the file and get the headers and boundaries positions, but keep the buffer file untouched, and when you have all the positions to get the header and boundary out of the file just copy the buffer to a new buffer like that
originalBuffer.copy(newBuffer,0, positionHeader, positionEndBoundary)
var http = require('http');
var fs = require('fs');
var connections = 0;
var server = http.createServer(function (req, res) {
connections++;
console.log(req.url,"connections: "+connections);
if(req.url == '/'){
res.writeHead(200, { 'content-type': 'text/html' });
res.end(
'<form action="/upload" enctype="multipart/form-data" method="post">' +
'<input type="file" name="upload" multiple="multiple"><br>' +
'<input type="submit" value="Upload">' +
'</form>'
);
}
var body = new Buffer([]);
if (req.url == '/upload') {
req.on('data', function (foo) {
//f.write(foo);
body = Buffer.concat([body,foo]);
if(isImage(body.toString())){
console.log("é imagem do tipo "+isImage(body.toString()));
}
else{
console.log("Não é imagem");
res.end("Não é imagem");
}
console.log(body.length, body.toString().length);
});
req.on('end', function () {
// console.log(req.headers);
//I converted the buffer to "utf 8" but i kept the original buffer
var str = body.toString();
console.log(str.length);
imageType = isImage(body.toString());
//get the index of the last header character
//I'm just using the string to find the postions to cut the headers and boundaries
var index = str.indexOf(imageType)+(imageType+"\r\n\r\n").length;
// var headers= str.slice(0,index).split(';');
// console.log(headers);
//Here comes the trick
/*
*I have to cut the last boundaries, so i use the lastIndexOf to cut the second boundary
* And maybe that is the corruption issues, because, I'm not sure, but I guess
* the UTF-8 format only use 7bits to represent all characters, and the buffer can use 8bits, or two hex,
*So, i need to take the difference here (body.length-str.length)
*/
var indexBoundayToBuffer = str.lastIndexOf('------WebKitFormBoundary')+(body.length-str.length);
console.log(index, indexBoundayToBuffer);
//maybe you can change this to use less memory, whatever
var newBuffer = Buffer.alloc(body.length);
/*
*And now use the index, and the indexBoudayToBuffer and you will have only the binary
*/
body.copy(newBuffer,0,index,indexBoundayToBuffer);
// f.end();
//file type
var type = imageType.substr("image/".length);
console.log("END");
fs.writeFile("nameFile."+type,newBuffer,function(err,ok){
if(err){
console.log(err);
return false;
}
res.end();
});
});
}
});
function isImage(str){
if(str.indexOf('image/png')!=-1) return 'image/png';
else if(str.indexOf('image/jpeg')!=-1) return 'image/jpeg';
else if(str.indexOf('image/bmp'!=-1)) return 'image/bmp';
else if(str.indexOf('image/gif'!=-1)) return 'image/gif';
else false;
}
var port = process.env.PORT || 8080;
server.listen(port, function () {
console.log('Recording connections on port %s', port);
});

You really shouldn't use regular expressions like that to parse multipart payloads as it can easily make trying to parse your image data very unreliable. There are modules on npm that parse forms for you such as busboy, multiparty, or formidable. None of them use regular expressions and they don't require Express.

Node.js base64 encode a downloaded image for use in data URI

Using Node v0.2.0 I am trying to fetch an image from a server, convert it into a base64 string and then embed it on the page in an image tag. I have the following code:
var express = require('express'),
request = require('request'),
sys = require('sys');
var app = express.createServer(
express.logger(),
express.bodyDecoder()
);
app.get('/', function(req, res){
if(req.param("url")) {
var url = unescape(req.param("url"));
request({uri:url}, function (error, response, body) {
if (!error && response.statusCode == 200) {
var data_uri_prefix = "data:" + response.headers["content-type"] + ";base64,";
var buf = new Buffer(body);
var image = buf.toString('base64');
image = data_uri_prefix + image;
res.send('<img src="'+image+'"/>');
}
});
}
});
app.listen(3000);
Note: This code requires "express" and "request". And of course, node. If you have npm installed, it should be as simple as "npm install express" or "npm install request".
Unfortunately, this doesn't work as expected. If I do the conversion with the Google logo, then I get the following at the beginning of the string:
77+9UE5HDQoaCgAAAA1JSERSAAABEwAAAF8IAwAAAO+/ve+/ve+/vSkAAAMAUExURQBzCw5xGiNmK0t+U++/vQUf77+9BiHvv70WKO+/vQkk77+9D
However if I use an online Base64 encoder with the same image, then it works perfectly. The string starts like this:
iVBORw0KGgoAAAANSUhEUgAAARMAAABfCAMAAAD8mtMpAAADAFBMVEUAcwsOcRojZitLflOWBR+aBiGQFiipCSS8DCm1Cya1FiyNKzexKTjDDSrLDS
Where am I going wrong that this isn't working correctly? I have tried so many different js base64 implementations and they all don't work in the same way. The only thing I can think of is that I am trying to convert the wrong thing into base64, but what should I convert if that is the case?

The problem is encoding and storing binary data in javascript strings. There's a pretty good section on this under Buffers at http://nodejs.org/api.html.
Unfortunately, the easiest way to fix this involved changing the request npm. I had to add response.setEncoding('binary'); on line 66 just below var buffer; in /path/to/lib/node/.npm/request/active/package/lib/main.js. This will work fine for this request but not others. You might want to hack it so that this is only set based on some other passed option.
I then changed var buf = new Buffer(body) to var buf = new Buffer(body, 'binary');. After this, everything worked fine.
Another way to do this, if you really didn't want to touch the request npm, would be to pass in an object that implements Writable Stream in the responseBodyStream argument to request. This object would then store the streamed data from the response in it's own buffer. Maybe there is a library that does this already... i'm not sure.
I'm going to leave it here for now, but feel free to comment if you want me to clarify anything.
EDIT
Check out comments. New solution at http://gist.github.com/583836

The following code (available at https://gist.github.com/804225)
var URL = require('url'),
sURL = 'http://nodejs.org/logo.png',
oURL = URL.parse(sURL),
http = require('http'),
client = http.createClient(80, oURL.hostname),
request = client.request('GET', oURL.pathname, {'host': oURL.hostname})
;
request.end();
request.on('response', function (response)
{
var type = response.headers["content-type"],
prefix = "data:" + type + ";base64,",
body = "";
response.setEncoding('binary');
response.on('end', function () {
var base64 = new Buffer(body, 'binary').toString('base64'),
data = prefix + base64;
console.log(data);
});
response.on('data', function (chunk) {
if (response.statusCode == 200) body += chunk;
});
});
should also produce a data URI without requiring any external modules.

This works for me using request:
const url = 'http://host/image.png';
request.get({url : url, encoding: null}, (err, res, body) => {
if (!err) {
const type = res.headers["content-type"];
const prefix = "data:" + type + ";base64,";
const base64 = body.toString('base64');
const dataUri = prefix + base64;
}
});
No need for any intermediate buffers. The key is to set encoding to null.

We Keep Coding

JavaScript is the programming language of the Web.

Nodejs error encoding when get external site's content - javascript

Related

Download image from express route

How to decode a binary buffer to an image in node.js?

HTTP POST with a file upload

Node.js Server: Image Upload / Corruption Issues

Node.js base64 encode a downloaded image for use in data URI

Categories

Resources