I am having serious problems decoding the message body of the emails I get using the Gmail API. I want to grab the message content and put the content in a div. I am using a base64 decoder, which I know won't decode emails encoded differently, but I am not sure how to check an email to decide which decoder to use -- emails that say they are utf-8 encoded are successfully decoded by the base64 decoder, but not be a utf-8 decoder.
I've been researching email decoding for several days now, and I've learned that I am a little out of my league here. I haven't done much work with coding around email before. Here is the code I am using to get the emails:
gapi.client.load('gmail', 'v1', function() {
var request = gapi.client.gmail.users.messages.list({
labelIds: ['INBOX']
});
request.execute(function(resp) {
document.getElementById('email-announcement').innerHTML = '<i>Hello! I am reading your <b>inbox</b> emails.</i><br><br>------<br>';
var content = document.getElementById("message-list");
if (resp.messages == null) {
content.innerHTML = "<b>Your inbox is empty.</b>";
} else {
var encodings = 0;
content.innerHTML = "";
angular.forEach(resp.messages, function(message) {
var email = gapi.client.gmail.users.messages.get({
'id': message.id
});
email.execute(function(stuff) {
if (stuff.payload == null) {
console.log("Payload null: " + message.id);
}
var header = "";
var sender = "";
angular.forEach(stuff.payload.headers, function(item) {
if (item.name == "Subject") {
header = item.value;
}
if (item.name == "From") {
sender = item.value;
}
})
try {
var contents = "";
if (stuff.payload.parts == null) {
contents = base64.decode(stuff.payload.body.data);
} else {
contents = base64.decode(stuff.payload.parts[0].body.data);
}
content.innerHTML += '<b>Subject: ' + header + '</b><br>';
content.innerHTML += '<b>From: ' + sender + '</b><br>';
content.innerHTML += contents + "<br><br>";
} catch (err) {
console.log("Encoding error: " + encodings++);
}
})
})
}
});
});
I was performing some checks and debugging, so there's leftover console.log's and some other things that are only there for testing. Still, you can see here what I am trying to do.
What is the best way to decode the emails I pull from the Gmail API? Should I try to put the emails into <script>'s with charset and type attributes matching the encoding content of the email? I believe I remember charset only works with a src attribute, which I wouldn't have here. Any suggestions?
For a prototype app I'm writing, the following code is working for me:
var base64 = require('js-base64').Base64;
// js-base64 is working fine for me.
var bodyData = message.payload.body.data;
// Simplified code: you'd need to check for multipart.
base64.decode(bodyData.replace(/-/g, '+').replace(/_/g, '/'));
// If you're going to use a different library other than js-base64,
// you may need to replace some characters before passing it to the decoder.
Caution: these points are not explicitly documented and could be wrong:
The users.messages: get API returns "parsed body content" by default. This data seems to be always encoded in UTF-8 and Base64, regardless of the Content-Type and Content-Transfer-Encoding header.
For example, my code had no problem parsing an email with these headers: Content-Type: text/plain; charset=ISO-2022-JP, Content-Transfer-Encoding: 7bit.
The mapping table of the Base64 encoding varies among various implementations. Gmail API uses - and _ as the last two characters of the table, as defined by RFC 4648's "URL and Filename safe Alphabet"1.
Check if your Base64 library is using a different mapping table. If so, replace those characters with the ones your library accepts before passing the body to the decoder.
1 There is one supportive line in the documentation: the "raw" format returns "body content as a base64url encoded string". (Thanks Eric!)
Use atob to decode the messages in JavaScript (see ref). For accessing your message payload, you can write a function:
var extractField = function(json, fieldName) {
return json.payload.headers.filter(function(header) {
return header.name === fieldName;
})[0].value;
};
var date = extractField(response, "Date");
var subject = extractField(response, "Subject");
referenced from my previous SO Question and
var part = message.parts.filter(function(part) {
return part.mimeType == 'text/html';
});
var html = atob(part.body.data);
If the above does not decode 100% properly, the comments by #cgenco on this answer below may apply to you. In that case, do
var html = atob(part.body.data.replace(/-/g, '+').replace(/_/g, '/'));
Here is the solution:
Gmail API - "Users.messages: get" method has in response message.payload.body.data parted base64 data, it's separated by "-" symbol. It's not entire base64 encoded text, it's parts of base64 text. You have to try to decode every single part of this or make one mono string by unite and replace "-" symbol. After this you can easily decode it to human text.
You can manually check every part here https://www.base64decode.org
I was also annoyed by this point. I discovered a solution through looking at an extension for VSCode. The solution is really simple:
const body = response.data.payload.body; // the base64 encoded body of a message
body = Buffer.alloc(
body.data.length,
body.data,
"base64"
).toString(); // the decoded message
It worked for me as I was using gmail.users.messages.get() call of Gmail API.
Please use websafe decoder for decoding gmail emails and attachments. I got blank pages when I used just base64decoder, had to use this: https://www.npmjs.com/package/urlsafe-base64
I can easily decode using another tool at https://simplycalc.com/base64-decode.php
In JS: https://www.npmjs.com/package/base64url
In Python 3:
import base64
base64.urlsafe_b64decode(coded_string)
Thank #ento 's answer. I explain more why you need to replace '-' and '_' character to '+' and '/' before decode.
Wiki Base64 Variants summary table shows:
RFC 4648 section 4: base64 (standard): use '+' and '/'
RFC 4648 section 5: base64url (URL-safe and filename-safe standard): use '-' and '_'
In short, Gmail API use base64url (urlsafe) format('-' and '_'), But JavaScript atob function or other
JavaScript libraries use base64 (standard) format('+' and '/').
For Gmail API, the document says body use base64url format, see below links:
string/bytes type
MessagePartBody
RAW
For Web atob/btoa standards, see below links:
The algorithm used by atob() and btoa() is specified in RFC 4648, section 4
8.3 Base64 utility methods
Forgiving base64
Related
I am encoding the view state in the hash using rison.
Here is an example URL:
http://example.com/board/projects#(date:'2019-01-24',projects:!(5441))
Here is how Gmail recognizes it:
http://example.com/board/projects#(date:'2019-01-24',projects:!(5441))
By the way, SE parser fails to recognize it properly as well:
http://example.com/board/projects#(date:'2019-01-24',projects:!(5441))
Even though all of the characters are valid URL characters, I am getting complaints from users that they can't send the link in gmail (which is actually possible, just doesn't happen automatically).
Is there any other encoding library or method that would encode the json object in the hash that would be safe for parsers such as gmail?
Standard URI encoding should do the job.
const base = "http://example.com/board/projects"
const data = "(date:'2019-01-24',projects:!(5441))"
const encoded_data = encodeURIComponent(data);
const final = base + '#' + encoded_data;
console.log(final);
I want to read user's file and gave him modified version of this file. I use input with type file to get text file, but how I can get charset of loaded file, because in different cases it can be various... Uploaded file has format .txt or something similar and isn't .html :)
var handler = document.getElementById('handler');
var reader = new FileReader();
handler.addEventListener('click', function() {
reader.readAsText(firstSub.files[0], /* Here I need use a correctly charset */);
});
reader.addEventListener("loadend", function() {
console.dir(reader.result.split('\n'));
});
In my case (I made a small web app that accepts subtitle .srt files and removes time codes and line breaks, making a printable text), it was enough to foresee 2 types of encoding: UTF-8 and CP1251 (in all cases I tried – with both Latin and Cyrillic letters – these two types are enough). At first I try encoding with UTF-8, and if it is not successful, some characters are replaced by '�'-signs. So, I check the result for presence of these signs, and, if found, the procedure is repeated with CP1251 encoding. So, here is my code:
function onFileInputChange(inputDomElement, utf8 = true) {
const file = inputDomElement.files[0];
const reader = new FileReader();
reader.readAsText(file, utf8 ? 'UTF-8' : 'CP1251');
reader.onload = () => {
const result = reader.result;
if (utf8 && result.includes('�')) {
onFileInputChange(inputDomElement, false);
console.log('The file encoding is not utf-8! Trying CP1251...');
} else {
document.querySelector('#textarea1').value = file.name.replace(/\.(srt|txt)$/, '').replace(/_+/g, '\ ').toUpperCase() + '\n' + result;
}
}
}
You should check out this library encoding.js
They also have a working demo. I would suggest you first try it out with the files that you'll typically work with to see if it detects the encoding correctly and then use the library in your project.
The other solutions didn't work for what I was trying to do, so I decided to create my own module that can detect the charset and language of any file loaded via input[type='file'] / FileReader API.
You load it via the <script> tag and then use the languageEncoding function to retrieve the charset/encoding:
// index.html
<script src="https://unpkg.com/detect-file-encoding-and-language/umd/language-encoding.min.js"></script>
// app.js
languageEncoding(file).then(fileInfo => console.log(fileInfo));
// Possible result: { language: english, encoding: UTF-8, confidence: { language: 0.96, encoding: 1 } }
For a more complete example/instructions check out this part of the documentation!
I'm trying to send a simple image file to a lambda function. Once it gets to the function I need to turn it into a buffer and then manipulate it. Right now when data is received there are a bunch of characters prepended to the data:
"body": "--X-INSOMNIA-BOUNDARY\r\nContent-Disposition: form-data; name=\"image\"; filename=\"americanflag.png\"\r\nContent-Type: image/png\r\n\r\n�PNG\r\n\n\rIHDR0�\b�;BIDATx��]u|����{g��.H\b^h�F)PJ�������WwwP���\"E��$!nk3���3�l���{��=�L�����=��=�|)����ٿ)��\"�$��q�����\r���'s��4����֦M��\"C�y��*U�YbUEc����|�ƼJ���#�=�/ �6���OD�p�����[�Q�D��\b�<hheB��&2���}�F�*�1M�u������BR�%\b�1RD�Q�������Q��}��R )%ĉ�Idv�髝�S��_W�Z�xSaZ��p�5k�{�|�\\�?
I have no idea how to handle that. My plan has just been to create a buffer as you normally would in Node:Buffer.from(data, 'utf8'). But it's throwing an error
Things I've tried:
I've been testing the function with Insomniac and Postman, both with the same result.
I've gone with both a multipart/form and binary file for the body
of the request.
I've tried multiple image files.
I've set the header of content-type to image/png and other file
types.
I've removed the headers.
I know that I could upload the files to S3 and that would be much easier but it negates the point of what I'm writing. I don't want to store the images I just want to manipulate them and then discard them.
This is what the response looks like when I send it back to myself.
Edit: The full code is uploaded. Again, I'm not sending via node at this very moment. It's simply through Postman/Insomniac. If the answer is simply "write your own encoder" then please put that as an answer.
Because you did not upload full code so based on my best prediction I post an answer here. There are probably any of the solutions may help to you.
Encoding Base64 Strings:
'use strict';
let data = '`stackoverflow.com`';
let buff = new Buffer(data);
let base64data = buff.toString('base64');
console.log('"' + data + '" converted to Base64 is "' + base64data + '"');
Decoding Base64 Strings:
'use strict';
let data = 'YHN0YWNrb3ZlcmZsb3cuY29tYA==';
let buff = new Buffer(data, 'base64');
let text = buff.toString('ascii');
console.log('"' + data + '" converted from Base64 to ASCII is "' + text + '"');
Encoding binary data to base64 string:
'use strict';
const fs = require('fs');
let buff = fs.readFileSync('image-log.png');
let base64data = buff.toString('base64');
console.log('Image converted to base 64 is:\n\n' + base64data);
Decoding Base64 Strings to Binary Data:
'use strict';
const fs = require('fs');
let data = 'encoded binary string';
let buff = new Buffer(data, 'base64');
fs.writeFileSync('image-log.png', buff);
console.log('Base64 image data converted to file: image-log.png');
Base64 encoding is the way to converting binary data into plain ASCII text. It is a very useful format for communicating between one or more systems that cannot easily handle binary data, like images in HTML markup or web requests.
In Node.js the Buffer object can be used to encode and decode Base64 strings to and from many other formats, allowing you to easily convert data back and forth as needed.
I am submitting an emoji (from my Mac) in my http post request as:
😀
It is saved in my derby database as:
😀
And it is sent back in response as:
😀
I am bit confused as to 1) is my image and char output the same but just using different encoding? 2) my html is using utf-8 tag - so how can I have browser display the emoji image?
More info:
I wrote a JUnit test as below:
System.out.println(Charset.defaultCharset()); <-- prints UTF-8
String str = "ðâ½ï¸";
System.out.println("testConvertToUtf8:"+new String(str.getBytes(UTF_8))); <--- prints char.
System.out.println("testConvertToUtf8:"+new String(str.getBytes(ISO_8859_1))); <-- displays emojis!!!
Why are emoji's showing up when I encode char using ISO_8859_1? I am running on OS El Capitan 10.11.6
Please set your database's encoding as utf8mb4
Basically my request encoding was getting changed from utf-8 to iso-8859-1 by Apache ServletFileUpload while reading multipart/form-data. I modified fileItem.getString() to fileItem.getString(charset) as below and that fixed my issue:
ServletFileUpload upload = new ServletFileUpload(factory);
items = upload.parseRequest(httpServletRequestHelper);
}
for (FileItem item : items) {
if (item.isFormField()) {
String charset = item.getContentType() == null ? "UTF-8" : item.getContentType();
this.setStringAttribute(item.getFieldName(), new String[] { **item.getString(charset)** }, parseHash);
}
}
I'm coding a Webhook for GitHub, and implemented secure verification in KOA.js as:
function sign(tok, blob) {
var hmac;
hmac = crypto
.createHmac('sha1', tok)
.update(blob)
.digest('hex');
return 'sha1=' + hmac;
}
...
key = this.request.headers['x-hub-signature'];
blob = JSON.stringify(this.request.body);
if (!key || !blob) {
this.status = 400;
this.body = 'Bad Request';
}
lock = sign(settings.api_secret, blob);
if (lock !== key) {
console.log(symbols.warning, 'Unauthorized');
this.status = 403;
this.body = 'Unauthorized';
return;
}
...
for pull_requests and create events this works ok, even pushing new branches works, but for push commits events the x-hub-signature and the computed hash from the payload don't match, so it always get 403 unauthorized.
Update
I've noticed that for this kind of push payloads the commits and head_commit are added to the payload. I've tried removing the commits and the head_commit from the body but it didn't work.
Update
For more information please review these example payloads. I've also included url for the test repo and token info: https://gist.github.com/marcoslhc/ec581f1a5ccdd80f8b33
The default encoding of Crypto hash.update() is binary as detailed in the answer to Node JS crypto, cannot create hmac on chars with accents. This causes a problem in your push-event payload, which contains the character U+00E1 LATIN SMALL LETTER A WITH ACUTE in Hernández four times, and GitHub services is hashing the payload as utf-8 encoded. Note that your Gist shows these incorrectly-encoded in ISO-8859-1, so also make sure that you are handling the incoming request character-encoding properly (but this should happen by-default).
To fix this you need to either use a Buffer:
hmac = crypto.createHmac('sha1', tok).update(new Buffer(blob, 'utf-8')).digest('hex');
... or pass the encoding directly to update:
hmac = crypto.createHmac('sha1', tok).update(blob, 'utf-8').digest('hex');
The correct hash of 7f9e6014b7bddf5533494eff6a2c71c4ec7c042d will then be calculated.