Decode Base64 issue in JavaScript [duplicate] - javascript

I'm using the Javascript window.atob() function to decode a base64-encoded string (specifically the base64-encoded content from the GitHub API). Problem is I'm getting ASCII-encoded characters back (like ⢠instead of ™). How can I properly handle the incoming base64-encoded stream so that it's decoded as utf-8?

The Unicode Problem
Though JavaScript (ECMAScript) has matured, the fragility of Base64, ASCII, and Unicode encoding has caused a lot of headache (much of it is in this question's history).
Consider the following example:
const ok = "a";
console.log(ok.codePointAt(0).toString(16)); // 61: occupies < 1 byte
const notOK = "✓"
console.log(notOK.codePointAt(0).toString(16)); // 2713: occupies > 1 byte
console.log(btoa(ok)); // YQ==
console.log(btoa(notOK)); // error
Why do we encounter this?
Base64, by design, expects binary data as its input. In terms of JavaScript strings, this means strings in which each character occupies only one byte. So if you pass a string into btoa() containing characters that occupy more than one byte, you will get an error, because this is not considered binary data.
Source: MDN (2021)
The original MDN article also covered the broken nature of window.btoa and .atob, which have since been mended in modern ECMAScript. The original, now-dead MDN article explained:
The "Unicode Problem"
Since DOMStrings are 16-bit-encoded strings, in most browsers calling window.btoa on a Unicode string will cause a Character Out Of Range exception if a character exceeds the range of a 8-bit byte (0x00~0xFF).
Solution with binary interoperability
(Keep scrolling for the ASCII base64 solution)
Source: MDN (2021)
The solution recommended by MDN is to actually encode to and from a binary string representation:
Encoding UTF8 ⇢ binary
// convert a Unicode string to a string in which
// each 16-bit unit occupies only one byte
function toBinary(string) {
const codeUnits = new Uint16Array(string.length);
for (let i = 0; i < codeUnits.length; i++) {
codeUnits[i] = string.charCodeAt(i);
}
return btoa(String.fromCharCode(...new Uint8Array(codeUnits.buffer)));
}
// a string that contains characters occupying > 1 byte
let encoded = toBinary("✓ à la mode") // "EycgAOAAIABsAGEAIABtAG8AZABlAA=="
Decoding binary ⇢ UTF-8
function fromBinary(encoded) {
const binary = atob(encoded);
const bytes = new Uint8Array(binary.length);
for (let i = 0; i < bytes.length; i++) {
bytes[i] = binary.charCodeAt(i);
}
return String.fromCharCode(...new Uint16Array(bytes.buffer));
}
// our previous Base64-encoded string
let decoded = fromBinary(encoded) // "✓ à la mode"
Where this fails a little, is that you'll notice the encoded string EycgAOAAIABsAGEAIABtAG8AZABlAA== no longer matches the previous solution's string 4pyTIMOgIGxhIG1vZGU=. This is because it is a binary encoded string, not a UTF-8 encoded string. If this doesn't matter to you (i.e., you aren't converting strings represented in UTF-8 from another system), then you're good to go. If, however, you want to preserve the UTF-8 functionality, you're better off using the solution described below.
Solution with ASCII base64 interoperability
The entire history of this question shows just how many different ways we've had to work around broken encoding systems over the years. Though the original MDN article no longer exists, this solution is still arguably a better one, and does a great job of solving "The Unicode Problem" while maintaining plain text base64 strings that you can decode on, say, base64decode.org.
There are two possible methods to solve this problem:
the first one is to escape the whole string (with UTF-8, see encodeURIComponent) and then encode it;
the second one is to convert the UTF-16 DOMString to an UTF-8 array of characters and then encode it.
A note on previous solutions: the MDN article originally suggested using unescape and escape to solve the Character Out Of Range exception problem, but they have since been deprecated. Some other answers here have suggested working around this with decodeURIComponent and encodeURIComponent, this has proven to be unreliable and unpredictable. The most recent update to this answer uses modern JavaScript functions to improve speed and modernize code.
If you're trying to save yourself some time, you could also consider using a library:
js-base64 (NPM, great for Node.js)
base64-js
Encoding UTF8 ⇢ base64
function b64EncodeUnicode(str) {
// first we use encodeURIComponent to get percent-encoded UTF-8,
// then we convert the percent encodings into raw bytes which
// can be fed into btoa.
return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g,
function toSolidBytes(match, p1) {
return String.fromCharCode('0x' + p1);
}));
}
b64EncodeUnicode('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64EncodeUnicode('\n'); // "Cg=="
Decoding base64 ⇢ UTF8
function b64DecodeUnicode(str) {
// Going backwards: from bytestream, to percent-encoding, to original string.
return decodeURIComponent(atob(str).split('').map(function(c) {
return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
}).join(''));
}
b64DecodeUnicode('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
b64DecodeUnicode('Cg=='); // "\n"
(Why do we need to do this? ('00' + c.charCodeAt(0).toString(16)).slice(-2) prepends a 0 to single character strings, for example when c == \n, the c.charCodeAt(0).toString(16) returns a, forcing a to be represented as 0a).
TypeScript support
Here's same solution with some additional TypeScript compatibility (via #MA-Maddin):
// Encoding UTF8 ⇢ base64
function b64EncodeUnicode(str) {
return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g, function(match, p1) {
return String.fromCharCode(parseInt(p1, 16))
}))
}
// Decoding base64 ⇢ UTF8
function b64DecodeUnicode(str) {
return decodeURIComponent(Array.prototype.map.call(atob(str), function(c) {
return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2)
}).join(''))
}
The first solution (deprecated)
This used escape and unescape (which are now deprecated, though this still works in all modern browsers):
function utf8_to_b64( str ) {
return window.btoa(unescape(encodeURIComponent( str )));
}
function b64_to_utf8( str ) {
return decodeURIComponent(escape(window.atob( str )));
}
// Usage:
utf8_to_b64('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64_to_utf8('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
And one last thing: I first encountered this problem when calling the GitHub API. To get this to work on (Mobile) Safari properly, I actually had to strip all white space from the base64 source before I could even decode the source. Whether or not this is still relevant in 2021, I don't know:
function b64_to_utf8( str ) {
str = str.replace(/\s/g, '');
return decodeURIComponent(escape(window.atob( str )));
}

Things change. The escape/unescape methods have been deprecated.
You can URI encode the string before you Base64-encode it. Note that this does't produce Base64-encoded UTF8, but rather Base64-encoded URL-encoded data. Both sides must agree on the same encoding.
See working example here: http://codepen.io/anon/pen/PZgbPW
// encode string
var base64 = window.btoa(encodeURIComponent('€ 你好 æøåÆØÅ'));
// decode string
var str = decodeURIComponent(window.atob(tmp));
// str is now === '€ 你好 æøåÆØÅ'
For OP's problem a third party library such as js-base64 should solve the problem.

The complete article that works for me: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Base64_encoding_and_decoding
The part where we encode from Unicode/UTF-8 is
function utf8_to_b64( str ) {
return window.btoa(unescape(encodeURIComponent( str )));
}
function b64_to_utf8( str ) {
return decodeURIComponent(escape(window.atob( str )));
}
// Usage:
utf8_to_b64('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64_to_utf8('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
This is one of the most used methods nowadays.

If treating strings as bytes is more your thing, you can use the following functions
function u_atob(ascii) {
return Uint8Array.from(atob(ascii), c => c.charCodeAt(0));
}
function u_btoa(buffer) {
var binary = [];
var bytes = new Uint8Array(buffer);
for (var i = 0, il = bytes.byteLength; i < il; i++) {
binary.push(String.fromCharCode(bytes[i]));
}
return btoa(binary.join(''));
}
// example, it works also with astral plane characters such as '𝒞'
var encodedString = new TextEncoder().encode('✓');
var base64String = u_btoa(encodedString);
console.log('✓' === new TextDecoder().decode(u_atob(base64String)))

Decoding base64 to UTF8 String
Below is current most voted answer by #brandonscript
function b64DecodeUnicode(str) {
// Going backwards: from bytestream, to percent-encoding, to original string.
return decodeURIComponent(atob(str).split('').map(function(c) {
return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
}).join(''));
}
Above code can work, but it's very slow. If your input is a very large base64 string, for example 30,000 chars for a base64 html document. It will need lots of computation.
Here is my answer, use built-in TextDecoder, nearly 10x faster than above code for large input.
function decodeBase64(base64) {
const text = atob(base64);
const length = text.length;
const bytes = new Uint8Array(length);
for (let i = 0; i < length; i++) {
bytes[i] = text.charCodeAt(i);
}
const decoder = new TextDecoder(); // default is utf-8
return decoder.decode(bytes);
}

Here is 2018 updated solution as described in the Mozilla Development Resources
TO ENCODE FROM UNICODE TO B64
function b64EncodeUnicode(str) {
// first we use encodeURIComponent to get percent-encoded UTF-8,
// then we convert the percent encodings into raw bytes which
// can be fed into btoa.
return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g,
function toSolidBytes(match, p1) {
return String.fromCharCode('0x' + p1);
}));
}
b64EncodeUnicode('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64EncodeUnicode('\n'); // "Cg=="
TO DECODE FROM B64 TO UNICODE
function b64DecodeUnicode(str) {
// Going backwards: from bytestream, to percent-encoding, to original string.
return decodeURIComponent(atob(str).split('').map(function(c) {
return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
}).join(''));
}
b64DecodeUnicode('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
b64DecodeUnicode('Cg=='); // "\n"

I would assume that one might want a solution that produces a widely useable base64 URI. Please visit data:text/plain;charset=utf-8;base64,4pi44pi54pi64pi74pi84pi+4pi/ to see a demonstration (copy the data uri, open a new tab, paste the data URI into the address bar, then press enter to go to the page). Despite the fact that this URI is base64-encoded, the browser is still able to recognize the high code points and decode them properly. The minified encoder+decoder is 1058 bytes (+Gzip→589 bytes)
!function(e){"use strict";function h(b){var a=b.charCodeAt(0);if(55296<=a&&56319>=a)if(b=b.charCodeAt(1),b===b&&56320<=b&&57343>=b){if(a=1024*(a-55296)+b-56320+65536,65535<a)return d(240|a>>>18,128|a>>>12&63,128|a>>>6&63,128|a&63)}else return d(239,191,189);return 127>=a?inputString:2047>=a?d(192|a>>>6,128|a&63):d(224|a>>>12,128|a>>>6&63,128|a&63)}function k(b){var a=b.charCodeAt(0)<<24,f=l(~a),c=0,e=b.length,g="";if(5>f&&e>=f){a=a<<f>>>24+f;for(c=1;c<f;++c)a=a<<6|b.charCodeAt(c)&63;65535>=a?g+=d(a):1114111>=a?(a-=65536,g+=d((a>>10)+55296,(a&1023)+56320)):c=0}for(;c<e;++c)g+="\ufffd";return g}var m=Math.log,n=Math.LN2,l=Math.clz32||function(b){return 31-m(b>>>0)/n|0},d=String.fromCharCode,p=atob,q=btoa;e.btoaUTF8=function(b,a){return q((a?"\u00ef\u00bb\u00bf":"")+b.replace(/[\x80-\uD7ff\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]?/g,h))};e.atobUTF8=function(b,a){a||"\u00ef\u00bb\u00bf"!==b.substring(0,3)||(b=b.substring(3));return p(b).replace(/[\xc0-\xff][\x80-\xbf]*/g,k)}}(""+void 0==typeof global?""+void 0==typeof self?this:self:global)
Below is the source code used to generate it.
var fromCharCode = String.fromCharCode;
var btoaUTF8 = (function(btoa, replacer){"use strict";
return function(inputString, BOMit){
return btoa((BOMit ? "\xEF\xBB\xBF" : "") + inputString.replace(
/[\x80-\uD7ff\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]?/g, replacer
));
}
})(btoa, function(nonAsciiChars){"use strict";
// make the UTF string into a binary UTF-8 encoded string
var point = nonAsciiChars.charCodeAt(0);
if (point >= 0xD800 && point <= 0xDBFF) {
var nextcode = nonAsciiChars.charCodeAt(1);
if (nextcode !== nextcode) // NaN because string is 1 code point long
return fromCharCode(0xef/*11101111*/, 0xbf/*10111111*/, 0xbd/*10111101*/);
// https://mathiasbynens.be/notes/javascript-encoding#surrogate-formulae
if (nextcode >= 0xDC00 && nextcode <= 0xDFFF) {
point = (point - 0xD800) * 0x400 + nextcode - 0xDC00 + 0x10000;
if (point > 0xffff)
return fromCharCode(
(0x1e/*0b11110*/<<3) | (point>>>18),
(0x2/*0b10*/<<6) | ((point>>>12)&0x3f/*0b00111111*/),
(0x2/*0b10*/<<6) | ((point>>>6)&0x3f/*0b00111111*/),
(0x2/*0b10*/<<6) | (point&0x3f/*0b00111111*/)
);
} else return fromCharCode(0xef, 0xbf, 0xbd);
}
if (point <= 0x007f) return nonAsciiChars;
else if (point <= 0x07ff) {
return fromCharCode((0x6<<5)|(point>>>6), (0x2<<6)|(point&0x3f));
} else return fromCharCode(
(0xe/*0b1110*/<<4) | (point>>>12),
(0x2/*0b10*/<<6) | ((point>>>6)&0x3f/*0b00111111*/),
(0x2/*0b10*/<<6) | (point&0x3f/*0b00111111*/)
);
});
Then, to decode the base64 data, either HTTP get the data as a data URI or use the function below.
var clz32 = Math.clz32 || (function(log, LN2){"use strict";
return function(x) {return 31 - log(x >>> 0) / LN2 | 0};
})(Math.log, Math.LN2);
var fromCharCode = String.fromCharCode;
var atobUTF8 = (function(atob, replacer){"use strict";
return function(inputString, keepBOM){
inputString = atob(inputString);
if (!keepBOM && inputString.substring(0,3) === "\xEF\xBB\xBF")
inputString = inputString.substring(3); // eradicate UTF-8 BOM
// 0xc0 => 0b11000000; 0xff => 0b11111111; 0xc0-0xff => 0b11xxxxxx
// 0x80 => 0b10000000; 0xbf => 0b10111111; 0x80-0xbf => 0b10xxxxxx
return inputString.replace(/[\xc0-\xff][\x80-\xbf]*/g, replacer);
}
})(atob, function(encoded){"use strict";
var codePoint = encoded.charCodeAt(0) << 24;
var leadingOnes = clz32(~codePoint);
var endPos = 0, stringLen = encoded.length;
var result = "";
if (leadingOnes < 5 && stringLen >= leadingOnes) {
codePoint = (codePoint<<leadingOnes)>>>(24+leadingOnes);
for (endPos = 1; endPos < leadingOnes; ++endPos)
codePoint = (codePoint<<6) | (encoded.charCodeAt(endPos)&0x3f/*0b00111111*/);
if (codePoint <= 0xFFFF) { // BMP code point
result += fromCharCode(codePoint);
} else if (codePoint <= 0x10FFFF) {
// https://mathiasbynens.be/notes/javascript-encoding#surrogate-formulae
codePoint -= 0x10000;
result += fromCharCode(
(codePoint >> 10) + 0xD800, // highSurrogate
(codePoint & 0x3ff) + 0xDC00 // lowSurrogate
);
} else endPos = 0; // to fill it in with INVALIDs
}
for (; endPos < stringLen; ++endPos) result += "\ufffd"; // replacement character
return result;
});
The advantage of being more standard is that this encoder and this decoder are more widely applicable because they can be used as a valid URL that displays correctly. Observe.
(function(window){
"use strict";
var sourceEle = document.getElementById("source");
var urlBarEle = document.getElementById("urlBar");
var mainFrameEle = document.getElementById("mainframe");
var gotoButton = document.getElementById("gotoButton");
var parseInt = window.parseInt;
var fromCodePoint = String.fromCodePoint;
var parse = JSON.parse;
function unescape(str){
return str.replace(/\\u[\da-f]{0,4}|\\x[\da-f]{0,2}|\\u{[^}]*}|\\[bfnrtv"'\\]|\\0[0-7]{1,3}|\\\d{1,3}/g, function(match){
try{
if (match.startsWith("\\u{"))
return fromCodePoint(parseInt(match.slice(2,-1),16));
if (match.startsWith("\\u") || match.startsWith("\\x"))
return fromCodePoint(parseInt(match.substring(2),16));
if (match.startsWith("\\0") && match.length > 2)
return fromCodePoint(parseInt(match.substring(2),8));
if (/^\\\d/.test(match)) return fromCodePoint(+match.slice(1));
}catch(e){return "\ufffd".repeat(match.length)}
return parse('"' + match + '"');
});
}
function whenChange(){
try{ urlBarEle.value = "data:text/plain;charset=UTF-8;base64," + btoaUTF8(unescape(sourceEle.value), true);
} finally{ gotoURL(); }
}
sourceEle.addEventListener("change",whenChange,{passive:1});
sourceEle.addEventListener("input",whenChange,{passive:1});
// IFrame Setup:
function gotoURL(){mainFrameEle.src = urlBarEle.value}
gotoButton.addEventListener("click", gotoURL, {passive: 1});
function urlChanged(){urlBarEle.value = mainFrameEle.src}
mainFrameEle.addEventListener("load", urlChanged, {passive: 1});
urlBarEle.addEventListener("keypress", function(evt){
if (evt.key === "enter") evt.preventDefault(), urlChanged();
}, {passive: 1});
var fromCharCode = String.fromCharCode;
var btoaUTF8 = (function(btoa, replacer){
"use strict";
return function(inputString, BOMit){
return btoa((BOMit?"\xEF\xBB\xBF":"") + inputString.replace(
/[\x80-\uD7ff\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]?/g, replacer
));
}
})(btoa, function(nonAsciiChars){
"use strict";
// make the UTF string into a binary UTF-8 encoded string
var point = nonAsciiChars.charCodeAt(0);
if (point >= 0xD800 && point <= 0xDBFF) {
var nextcode = nonAsciiChars.charCodeAt(1);
if (nextcode !== nextcode) { // NaN because string is 1code point long
return fromCharCode(0xef/*11101111*/, 0xbf/*10111111*/, 0xbd/*10111101*/);
}
// https://mathiasbynens.be/notes/javascript-encoding#surrogate-formulae
if (nextcode >= 0xDC00 && nextcode <= 0xDFFF) {
point = (point - 0xD800) * 0x400 + nextcode - 0xDC00 + 0x10000;
if (point > 0xffff) {
return fromCharCode(
(0x1e/*0b11110*/<<3) | (point>>>18),
(0x2/*0b10*/<<6) | ((point>>>12)&0x3f/*0b00111111*/),
(0x2/*0b10*/<<6) | ((point>>>6)&0x3f/*0b00111111*/),
(0x2/*0b10*/<<6) | (point&0x3f/*0b00111111*/)
);
}
} else {
return fromCharCode(0xef, 0xbf, 0xbd);
}
}
if (point <= 0x007f) { return inputString; }
else if (point <= 0x07ff) {
return fromCharCode((0x6<<5)|(point>>>6), (0x2<<6)|(point&0x3f/*00111111*/));
} else {
return fromCharCode(
(0xe/*0b1110*/<<4) | (point>>>12),
(0x2/*0b10*/<<6) | ((point>>>6)&0x3f/*0b00111111*/),
(0x2/*0b10*/<<6) | (point&0x3f/*0b00111111*/)
);
}
});
setTimeout(whenChange, 0);
})(window);
img:active{opacity:0.8}
<center>
<textarea id="source" style="width:66.7vw">Hello \u1234 W\186\0256ld!
Enter text into the top box. Then the URL will update automatically.
</textarea><br />
<div style="width:66.7vw;display:inline-block;height:calc(25vw + 1em + 6px);border:2px solid;text-align:left;line-height:1em">
<input id="urlBar" style="width:calc(100% - 1em - 13px)" /><img id="gotoButton" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABsAAAAeCAMAAADqx5XUAAAAclBMVEX///9NczZ8e32ko6fDxsU/fBoSQgdFtwA5pAHVxt+7vLzq5ex23y4SXABLiiTm0+/c2N6DhoQ6WSxSyweVlZVvdG/Uz9aF5kYlbwElkwAggACxs7Jl3hX07/cQbQCar5SU9lRntEWGum+C9zIDHwCGnH5IvZAOAAABmUlEQVQoz7WS25acIBBFkRLkIgKKtOCttbv//xdDmTGZzHv2S63ltuBQQP4rdRiRUP8UK4wh6nVddQwj/NtDQTvac8577zTQb72zj65/876qqt7wykU6/1U6vFEgjE1mt/5LRqrpu7oVsn0sjZejMfxR3W/yLikqAFcUx93YxLmZGOtElmEu6Ufd9xV3ZDTGcEvGLbMk0mHHlUSvS5svCwS+hVL8loQQyfpI1Ay8RF/xlNxcsTchGjGDIuBG3Ik7TMyNxn8m0TSnBAK6Z8UZfp3IbAonmJvmsEACum6aNv7B0CnvpezDcNhw9XWsuAr7qnRg6dABmeM4dTgn/DZdXWs3LMspZ1KDMt1kcPJ6S1icWNp2qaEmjq6myx7jbQK3VKItLJaW5FR+cuYlRhYNKzGa9vF4vM5roLW3OSVjkmiGJrPhUq301/16pVKZRGFYWjTP50spTxBN5Z4EKnSonruk+n4tUokv1aJSEl/MLZU90S3L6/U6o0J142iQVp3HcZxKSo8LfkNRCtJaKYFSRX7iaoAAUDty8wvWYR6HJEepdwAAAABJRU5ErkJggg==" style="width:calc(1em + 4px);line-height:1em;vertical-align:-40%;cursor:pointer" />
<iframe id="mainframe" style="width:66.7vw;height:25vw" frameBorder="0"></iframe>
</div>
</center>
In addition to being very standardized, the above code snippets are also very fast. Instead of an indirect chain of succession where the data has to be converted several times between various forms (such as in Riccardo Galli's response), the above code snippet is as direct as performantly possible. It uses only one simple fast String.prototype.replace call to process the data when encoding, and only one to decode the data when decoding. Another plus is that (especially for big strings), String.prototype.replace allows the browser to automatically handle the underlying memory management of resizing the string, leading a significant performance boost especially in evergreen browsers like Chrome and Firefox that heavily optimize String.prototype.replace. Finally, the icing on the cake is that for you latin script exclūsīvō users, strings which don't contain any code points above 0x7f are extra fast to process because the string remains unmodified by the replacement algorithm.
I have created a github repository for this solution at https://github.com/anonyco/BestBase64EncoderDecoder/

If trying to decode a Base64 representation of utf8 encoded data in node, you can use the native Buffer helper
Buffer.from("4pyTIMOgIGxhIG1vZGU=", "base64").toString(); // '✓ à la mode'
The toString method of Buffer defaults to utf8, but you can specify any desired encoding. For example, the reverse operation would look like this
Buffer.from('✓ à la mode', "utf8").toString("base64"); // "4pyTIMOgIGxhIG1vZGU="

This is my one-liner solution combining Jackie Hans answer and some code from another question:
const utf8_encoded_text = new TextDecoder().decode(Uint8Array.from(window.atob(base_64_decoded_text).split("").map(x => x.charCodeAt(0))));

Here's some future-proof code for browsers that may lack escape/unescape(). Note that IE 9 and older don't support atob/btoa(), so you'd need to use custom base64 functions for them.
// Polyfill for escape/unescape
if( !window.unescape ){
window.unescape = function( s ){
return s.replace( /%([0-9A-F]{2})/g, function( m, p ) {
return String.fromCharCode( '0x' + p );
} );
};
}
if( !window.escape ){
window.escape = function( s ){
var chr, hex, i = 0, l = s.length, out = '';
for( ; i < l; i ++ ){
chr = s.charAt( i );
if( chr.search( /[A-Za-z0-9\#\*\_\+\-\.\/]/ ) > -1 ){
out += chr; continue; }
hex = s.charCodeAt( i ).toString( 16 );
out += '%' + ( hex.length % 2 != 0 ? '0' : '' ) + hex;
}
return out;
};
}
// Base64 encoding of UTF-8 strings
var utf8ToB64 = function( s ){
return btoa( unescape( encodeURIComponent( s ) ) );
};
var b64ToUtf8 = function( s ){
return decodeURIComponent( escape( atob( s ) ) );
};
A more comprehensive example for UTF-8 encoding and decoding can be found here: http://jsfiddle.net/47zwb41o/

Small correction, unescape and escape are deprecated, so:
function utf8_to_b64( str ) {
return window.btoa(decodeURIComponent(encodeURIComponent(str)));
}
function b64_to_utf8( str ) {
return decodeURIComponent(encodeURIComponent(window.atob(str)));
}
function b64_to_utf8( str ) {
str = str.replace(/\s/g, '');
return decodeURIComponent(encodeURIComponent(window.atob(str)));
}

including above solution if still facing issue try as below, Considerign the case where escape is not supported for TS.
blob = new Blob(["\ufeff", csv_content]); // this will make symbols to appears in excel
for csv_content you can try like below.
function b64DecodeUnicode(str: any) {
return decodeURIComponent(atob(str).split('').map((c: any) => {
return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
}).join(''));
}

Related

How to use javascript (in Angular) to get bytes encoded by java.util.Base64? [duplicate]

I need to convert a base64 encode string into an ArrayBuffer.
The base64 strings are user input, they will be copy and pasted from an email, so they're not there when the page is loaded.
I would like to do this in javascript without making an ajax call to the server if possible.
I found those links interesting, but they didt'n help me:
ArrayBuffer to base64 encoded string
this is about the opposite conversion, from ArrayBuffer to base64, not the other way round
http://jsperf.com/json-vs-base64/2
this looks good but i can't figure out how to use the code.
Is there an easy (maybe native) way to do the conversion? thanks
Try this:
function _base64ToArrayBuffer(base64) {
var binary_string = window.atob(base64);
var len = binary_string.length;
var bytes = new Uint8Array(len);
for (var i = 0; i < len; i++) {
bytes[i] = binary_string.charCodeAt(i);
}
return bytes.buffer;
}
Using TypedArray.from:
Uint8Array.from(atob(base64_string), c => c.charCodeAt(0))
Performance to be compared with the for loop version of Goran.it answer.
For Node.js users:
const myBuffer = Buffer.from(someBase64String, 'base64');
myBuffer will be of type Buffer which is a subclass of Uint8Array. Unfortunately, Uint8Array is NOT an ArrayBuffer as the OP was asking for. But when manipulating an ArrayBuffer I almost always wrap it with Uint8Array or something similar, so it should be close to what's being asked for.
Goran.it's answer does not work because of unicode problem in javascript - https://developer.mozilla.org/en-US/docs/Web/API/WindowBase64/Base64_encoding_and_decoding.
I ended up using the function given on Daniel Guerrero's blog: http://blog.danguer.com/2011/10/24/base64-binary-decoding-in-javascript/
Function is listed on github link: https://github.com/danguer/blog-examples/blob/master/js/base64-binary.js
Use these lines
var uintArray = Base64Binary.decode(base64_string);
var byteArray = Base64Binary.decodeArrayBuffer(base64_string);
Async solution, it's better when the data is big:
// base64 to buffer
function base64ToBufferAsync(base64) {
var dataUrl = "data:application/octet-binary;base64," + base64;
fetch(dataUrl)
.then(res => res.arrayBuffer())
.then(buffer => {
console.log("base64 to buffer: " + new Uint8Array(buffer));
})
}
// buffer to base64
function bufferToBase64Async( buffer ) {
var blob = new Blob([buffer], {type:'application/octet-binary'});
console.log("buffer to blob:" + blob)
var fileReader = new FileReader();
fileReader.onload = function() {
var dataUrl = fileReader.result;
console.log("blob to dataUrl: " + dataUrl);
var base64 = dataUrl.substr(dataUrl.indexOf(',')+1)
console.log("dataUrl to base64: " + base64);
};
fileReader.readAsDataURL(blob);
}
Javascript is a fine development environment so it seems odd than it doesn't provide a solution to this small problem. The solutions offered elsewhere on this page are potentially slow. Here is my solution. It employs the inbuilt functionality that decodes base64 image and sound data urls.
var req = new XMLHttpRequest;
req.open('GET', "data:application/octet;base64," + base64Data);
req.responseType = 'arraybuffer';
req.onload = function fileLoaded(e)
{
var byteArray = new Uint8Array(e.target.response);
// var shortArray = new Int16Array(e.target.response);
// var unsignedShortArray = new Int16Array(e.target.response);
// etc.
}
req.send();
The send request fails if the base 64 string is badly formed.
The mime type (application/octet) is probably unnecessary.
Tested in chrome. Should work in other browsers.
Pure JS - no string middlestep (no atob)
I write following function which convert base64 in direct way (without conversion to string at the middlestep). IDEA
get 4 base64 characters chunk
find index of each character in base64 alphabet
convert index to 6-bit number (binary string)
join four 6 bit numbers which gives 24-bit numer (stored as binary string)
split 24-bit string to three 8-bit and covert each to number and store them in output array
corner case: if input base64 string ends with one/two = char, remove one/two numbers from output array
Below solution allows to process large input base64 strings. Similar function for convert bytes to base64 without btoa is HERE
function base64ToBytesArr(str) {
const abc = [..."ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"]; // base64 alphabet
let result = [];
for(let i=0; i<str.length/4; i++) {
let chunk = [...str.slice(4*i,4*i+4)]
let bin = chunk.map(x=> abc.indexOf(x).toString(2).padStart(6,0)).join('');
let bytes = bin.match(/.{1,8}/g).map(x=> +('0b'+x));
result.push(...bytes.slice(0,3 - (str[4*i+2]=="=") - (str[4*i+3]=="=")));
}
return result;
}
// --------
// TEST
// --------
let test = "Alice's Adventure in Wonderland.";
console.log('test string:', test.length, test);
let b64_btoa = btoa(test);
console.log('encoded string:', b64_btoa);
let decodedBytes = base64ToBytesArr(b64_btoa); // decode base64 to array of bytes
console.log('decoded bytes:', JSON.stringify(decodedBytes));
let decodedTest = decodedBytes.map(b => String.fromCharCode(b) ).join``;
console.log('Uint8Array', JSON.stringify(new Uint8Array(decodedBytes)));
console.log('decoded string:', decodedTest.length, decodedTest);
Caution!
If you want to decode base64 to STRING (not bytes array) and you know that result contains utf8 characters then atob will fail in general e.g. for character 💩 the atob("8J+SqQ==") will give wrong result . In this case you can use above solution and convert result bytes array to string in proper way e.g. :
function base64ToBytesArr(str) {
const abc = [..."ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"]; // base64 alphabet
let result = [];
for(let i=0; i<str.length/4; i++) {
let chunk = [...str.slice(4*i,4*i+4)]
let bin = chunk.map(x=> abc.indexOf(x).toString(2).padStart(6,0)).join('');
let bytes = bin.match(/.{1,8}/g).map(x=> +('0b'+x));
result.push(...bytes.slice(0,3 - (str[4*i+2]=="=") - (str[4*i+3]=="=")));
}
return result;
}
// --------
// TEST
// --------
let testB64 = "8J+SqQ=="; // for string: "💩";
console.log('input base64 :', testB64);
let decodedBytes = base64ToBytesArr(testB64); // decode base64 to array of bytes
console.log('decoded bytes :', JSON.stringify(decodedBytes));
let result = new TextDecoder("utf-8").decode(new Uint8Array(decodedBytes));
console.log('properly decoded string :', result);
let result_atob = atob(testB64);
console.log('decoded by atob :', result_atob);
Snippets tested 2022-08-04 on: chrome 103.0.5060.134 (arm64), safari 15.2, firefox 103.0.1 (64 bit), edge 103.0.1264.77 (arm64), and node-js v12.16.1
I would strongly suggest using an npm package implementing correctly the base64 specification.
The best one I know is rfc4648
The problem is that btoa and atob use binary strings instead of Uint8Array and trying to convert to and from it is cumbersome. Also there is a lot of bad packages in npm for that. I lose a lot of time before finding that one.
The creators of that specific package did a simple thing: they took the specification of Base64 (which is here by the way) and implemented it correctly from the beginning to the end. (Including other formats in the specification that are also useful like Base64-url, Base32, etc ...) That doesn't seem a lot but apparently that was too much to ask to the bunch of other libraries.
So yeah, I know I'm doing a bit of proselytism but if you want to avoid losing your time too just use rfc4648.
I used the accepted answer to this question to create base64Url string <-> arrayBuffer conversions in the realm of base64Url data transmitted via ASCII-cookie [atob, btoa are base64[with +/]<->js binary string], so I decided to post the code.
Many of us may want both conversions and client-server communication may use the base64Url version (though a cookie may contain +/ as well as -_ characters if I understand well, only ",;\ characters and some wicked characters from the 128 ASCII are disallowed). But a url cannot contain / character, hence the wider use of b64 url version which of course not what atob-btoa supports...
Seeing other comments, I would like to stress that my use case here is base64Url data transmission via url/cookie and trying to use this crypto data with the js crypto api (2017) hence the need for ArrayBuffer representation and b64u <-> arrBuff conversions... if array buffers represent other than base64 (part of ascii) this conversion wont work since atob, btoa is limited to ascii(128). Check out an appropriate converter like below:
The buff -> b64u version is from a tweet from Mathias Bynens, thanks for that one (too)! He also wrote a base64 encoder/decoder:
https://github.com/mathiasbynens/base64
Coming from java, it may help when trying to understand the code that java byte[] is practically js Int8Array (signed int) but we use here the unsigned version Uint8Array since js conversions work with them. They are both 256bit, so we call it byte[] in js now...
The code is from a module class, that is why static.
//utility
/**
* Array buffer to base64Url string
* - arrBuff->byte[]->biStr->b64->b64u
* #param arrayBuffer
* #returns {string}
* #private
*/
static _arrayBufferToBase64Url(arrayBuffer) {
console.log('base64Url from array buffer:', arrayBuffer);
let base64Url = window.btoa(String.fromCodePoint(...new Uint8Array(arrayBuffer)));
base64Url = base64Url.replaceAll('+', '-');
base64Url = base64Url.replaceAll('/', '_');
console.log('base64Url:', base64Url);
return base64Url;
}
/**
* Base64Url string to array buffer
* - b64u->b64->biStr->byte[]->arrBuff
* #param base64Url
* #returns {ArrayBufferLike}
* #private
*/
static _base64UrlToArrayBuffer(base64Url) {
console.log('array buffer from base64Url:', base64Url);
let base64 = base64Url.replaceAll('-', '+');
base64 = base64.replaceAll('_', '/');
const binaryString = window.atob(base64);
const length = binaryString.length;
const bytes = new Uint8Array(length);
for (let i = 0; i < length; i++) {
bytes[i] = binaryString.charCodeAt(i);
}
console.log('array buffer:', bytes.buffer);
return bytes.buffer;
}
made a ArrayBuffer from a base64:
function base64ToArrayBuffer(base64) {
var binary_string = window.atob(base64);
var len = binary_string.length;
var bytes = new Uint8Array(len);
for (var i = 0; i < len; i++) {
bytes[i] = binary_string.charCodeAt(i);
}
return bytes.buffer;
}
I was trying to use above code and It's working fine.
The result of atob is a string that is separated with some comma
,
A simpler way is to convert this string to a json array string and after that parse it to a byteArray
below code can simply be used to convert base64 to an array of number
let byteArray = JSON.parse('['+atob(base64)+']');
let buffer = new Uint8Array(byteArray);
Solution without atob
I've seen many people complaining about using atob and btoa in the replies. There are some issues to take into account when using them.
There's a solution without using them in the MDN page about Base64. Below you can find the code to convert a base64 string into a Uint8Array copied from the docs.
Note that the function below returns a Uint8Array. To get the ArrayBuffer version you just need to do uintArray.buffer.
function b64ToUint6(nChr) {
return nChr > 64 && nChr < 91
? nChr - 65
: nChr > 96 && nChr < 123
? nChr - 71
: nChr > 47 && nChr < 58
? nChr + 4
: nChr === 43
? 62
: nChr === 47
? 63
: 0;
}
function base64DecToArr(sBase64, nBlocksSize) {
const sB64Enc = sBase64.replace(/[^A-Za-z0-9+/]/g, "");
const nInLen = sB64Enc.length;
const nOutLen = nBlocksSize
? Math.ceil(((nInLen * 3 + 1) >> 2) / nBlocksSize) * nBlocksSize
: (nInLen * 3 + 1) >> 2;
const taBytes = new Uint8Array(nOutLen);
let nMod3;
let nMod4;
let nUint24 = 0;
let nOutIdx = 0;
for (let nInIdx = 0; nInIdx < nInLen; nInIdx++) {
nMod4 = nInIdx & 3;
nUint24 |= b64ToUint6(sB64Enc.charCodeAt(nInIdx)) << (6 * (3 - nMod4));
if (nMod4 === 3 || nInLen - nInIdx === 1) {
nMod3 = 0;
while (nMod3 < 3 && nOutIdx < nOutLen) {
taBytes[nOutIdx] = (nUint24 >>> ((16 >>> nMod3) & 24)) & 255;
nMod3++;
nOutIdx++;
}
nUint24 = 0;
}
}
return taBytes;
}
If you're interested in the reverse operation, ArrayBuffer to base64, you can find how to do it in the same link.

Invalid base64 decoding [duplicate]

I'm using the Javascript window.atob() function to decode a base64-encoded string (specifically the base64-encoded content from the GitHub API). Problem is I'm getting ASCII-encoded characters back (like ⢠instead of ™). How can I properly handle the incoming base64-encoded stream so that it's decoded as utf-8?
The Unicode Problem
Though JavaScript (ECMAScript) has matured, the fragility of Base64, ASCII, and Unicode encoding has caused a lot of headache (much of it is in this question's history).
Consider the following example:
const ok = "a";
console.log(ok.codePointAt(0).toString(16)); // 61: occupies < 1 byte
const notOK = "✓"
console.log(notOK.codePointAt(0).toString(16)); // 2713: occupies > 1 byte
console.log(btoa(ok)); // YQ==
console.log(btoa(notOK)); // error
Why do we encounter this?
Base64, by design, expects binary data as its input. In terms of JavaScript strings, this means strings in which each character occupies only one byte. So if you pass a string into btoa() containing characters that occupy more than one byte, you will get an error, because this is not considered binary data.
Source: MDN (2021)
The original MDN article also covered the broken nature of window.btoa and .atob, which have since been mended in modern ECMAScript. The original, now-dead MDN article explained:
The "Unicode Problem"
Since DOMStrings are 16-bit-encoded strings, in most browsers calling window.btoa on a Unicode string will cause a Character Out Of Range exception if a character exceeds the range of a 8-bit byte (0x00~0xFF).
Solution with binary interoperability
(Keep scrolling for the ASCII base64 solution)
Source: MDN (2021)
The solution recommended by MDN is to actually encode to and from a binary string representation:
Encoding UTF8 ⇢ binary
// convert a Unicode string to a string in which
// each 16-bit unit occupies only one byte
function toBinary(string) {
const codeUnits = new Uint16Array(string.length);
for (let i = 0; i < codeUnits.length; i++) {
codeUnits[i] = string.charCodeAt(i);
}
return btoa(String.fromCharCode(...new Uint8Array(codeUnits.buffer)));
}
// a string that contains characters occupying > 1 byte
let encoded = toBinary("✓ à la mode") // "EycgAOAAIABsAGEAIABtAG8AZABlAA=="
Decoding binary ⇢ UTF-8
function fromBinary(encoded) {
const binary = atob(encoded);
const bytes = new Uint8Array(binary.length);
for (let i = 0; i < bytes.length; i++) {
bytes[i] = binary.charCodeAt(i);
}
return String.fromCharCode(...new Uint16Array(bytes.buffer));
}
// our previous Base64-encoded string
let decoded = fromBinary(encoded) // "✓ à la mode"
Where this fails a little, is that you'll notice the encoded string EycgAOAAIABsAGEAIABtAG8AZABlAA== no longer matches the previous solution's string 4pyTIMOgIGxhIG1vZGU=. This is because it is a binary encoded string, not a UTF-8 encoded string. If this doesn't matter to you (i.e., you aren't converting strings represented in UTF-8 from another system), then you're good to go. If, however, you want to preserve the UTF-8 functionality, you're better off using the solution described below.
Solution with ASCII base64 interoperability
The entire history of this question shows just how many different ways we've had to work around broken encoding systems over the years. Though the original MDN article no longer exists, this solution is still arguably a better one, and does a great job of solving "The Unicode Problem" while maintaining plain text base64 strings that you can decode on, say, base64decode.org.
There are two possible methods to solve this problem:
the first one is to escape the whole string (with UTF-8, see encodeURIComponent) and then encode it;
the second one is to convert the UTF-16 DOMString to an UTF-8 array of characters and then encode it.
A note on previous solutions: the MDN article originally suggested using unescape and escape to solve the Character Out Of Range exception problem, but they have since been deprecated. Some other answers here have suggested working around this with decodeURIComponent and encodeURIComponent, this has proven to be unreliable and unpredictable. The most recent update to this answer uses modern JavaScript functions to improve speed and modernize code.
If you're trying to save yourself some time, you could also consider using a library:
js-base64 (NPM, great for Node.js)
base64-js
Encoding UTF8 ⇢ base64
function b64EncodeUnicode(str) {
// first we use encodeURIComponent to get percent-encoded UTF-8,
// then we convert the percent encodings into raw bytes which
// can be fed into btoa.
return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g,
function toSolidBytes(match, p1) {
return String.fromCharCode('0x' + p1);
}));
}
b64EncodeUnicode('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64EncodeUnicode('\n'); // "Cg=="
Decoding base64 ⇢ UTF8
function b64DecodeUnicode(str) {
// Going backwards: from bytestream, to percent-encoding, to original string.
return decodeURIComponent(atob(str).split('').map(function(c) {
return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
}).join(''));
}
b64DecodeUnicode('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
b64DecodeUnicode('Cg=='); // "\n"
(Why do we need to do this? ('00' + c.charCodeAt(0).toString(16)).slice(-2) prepends a 0 to single character strings, for example when c == \n, the c.charCodeAt(0).toString(16) returns a, forcing a to be represented as 0a).
TypeScript support
Here's same solution with some additional TypeScript compatibility (via #MA-Maddin):
// Encoding UTF8 ⇢ base64
function b64EncodeUnicode(str) {
return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g, function(match, p1) {
return String.fromCharCode(parseInt(p1, 16))
}))
}
// Decoding base64 ⇢ UTF8
function b64DecodeUnicode(str) {
return decodeURIComponent(Array.prototype.map.call(atob(str), function(c) {
return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2)
}).join(''))
}
The first solution (deprecated)
This used escape and unescape (which are now deprecated, though this still works in all modern browsers):
function utf8_to_b64( str ) {
return window.btoa(unescape(encodeURIComponent( str )));
}
function b64_to_utf8( str ) {
return decodeURIComponent(escape(window.atob( str )));
}
// Usage:
utf8_to_b64('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64_to_utf8('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
And one last thing: I first encountered this problem when calling the GitHub API. To get this to work on (Mobile) Safari properly, I actually had to strip all white space from the base64 source before I could even decode the source. Whether or not this is still relevant in 2021, I don't know:
function b64_to_utf8( str ) {
str = str.replace(/\s/g, '');
return decodeURIComponent(escape(window.atob( str )));
}
Things change. The escape/unescape methods have been deprecated.
You can URI encode the string before you Base64-encode it. Note that this does't produce Base64-encoded UTF8, but rather Base64-encoded URL-encoded data. Both sides must agree on the same encoding.
See working example here: http://codepen.io/anon/pen/PZgbPW
// encode string
var base64 = window.btoa(encodeURIComponent('€ 你好 æøåÆØÅ'));
// decode string
var str = decodeURIComponent(window.atob(tmp));
// str is now === '€ 你好 æøåÆØÅ'
For OP's problem a third party library such as js-base64 should solve the problem.
The complete article that works for me: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Base64_encoding_and_decoding
The part where we encode from Unicode/UTF-8 is
function utf8_to_b64( str ) {
return window.btoa(unescape(encodeURIComponent( str )));
}
function b64_to_utf8( str ) {
return decodeURIComponent(escape(window.atob( str )));
}
// Usage:
utf8_to_b64('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64_to_utf8('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
This is one of the most used methods nowadays.
If treating strings as bytes is more your thing, you can use the following functions
function u_atob(ascii) {
return Uint8Array.from(atob(ascii), c => c.charCodeAt(0));
}
function u_btoa(buffer) {
var binary = [];
var bytes = new Uint8Array(buffer);
for (var i = 0, il = bytes.byteLength; i < il; i++) {
binary.push(String.fromCharCode(bytes[i]));
}
return btoa(binary.join(''));
}
// example, it works also with astral plane characters such as '𝒞'
var encodedString = new TextEncoder().encode('✓');
var base64String = u_btoa(encodedString);
console.log('✓' === new TextDecoder().decode(u_atob(base64String)))
Decoding base64 to UTF8 String
Below is current most voted answer by #brandonscript
function b64DecodeUnicode(str) {
// Going backwards: from bytestream, to percent-encoding, to original string.
return decodeURIComponent(atob(str).split('').map(function(c) {
return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
}).join(''));
}
Above code can work, but it's very slow. If your input is a very large base64 string, for example 30,000 chars for a base64 html document. It will need lots of computation.
Here is my answer, use built-in TextDecoder, nearly 10x faster than above code for large input.
function decodeBase64(base64) {
const text = atob(base64);
const length = text.length;
const bytes = new Uint8Array(length);
for (let i = 0; i < length; i++) {
bytes[i] = text.charCodeAt(i);
}
const decoder = new TextDecoder(); // default is utf-8
return decoder.decode(bytes);
}
Here is 2018 updated solution as described in the Mozilla Development Resources
TO ENCODE FROM UNICODE TO B64
function b64EncodeUnicode(str) {
// first we use encodeURIComponent to get percent-encoded UTF-8,
// then we convert the percent encodings into raw bytes which
// can be fed into btoa.
return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g,
function toSolidBytes(match, p1) {
return String.fromCharCode('0x' + p1);
}));
}
b64EncodeUnicode('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64EncodeUnicode('\n'); // "Cg=="
TO DECODE FROM B64 TO UNICODE
function b64DecodeUnicode(str) {
// Going backwards: from bytestream, to percent-encoding, to original string.
return decodeURIComponent(atob(str).split('').map(function(c) {
return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
}).join(''));
}
b64DecodeUnicode('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
b64DecodeUnicode('Cg=='); // "\n"
I would assume that one might want a solution that produces a widely useable base64 URI. Please visit data:text/plain;charset=utf-8;base64,4pi44pi54pi64pi74pi84pi+4pi/ to see a demonstration (copy the data uri, open a new tab, paste the data URI into the address bar, then press enter to go to the page). Despite the fact that this URI is base64-encoded, the browser is still able to recognize the high code points and decode them properly. The minified encoder+decoder is 1058 bytes (+Gzip→589 bytes)
!function(e){"use strict";function h(b){var a=b.charCodeAt(0);if(55296<=a&&56319>=a)if(b=b.charCodeAt(1),b===b&&56320<=b&&57343>=b){if(a=1024*(a-55296)+b-56320+65536,65535<a)return d(240|a>>>18,128|a>>>12&63,128|a>>>6&63,128|a&63)}else return d(239,191,189);return 127>=a?inputString:2047>=a?d(192|a>>>6,128|a&63):d(224|a>>>12,128|a>>>6&63,128|a&63)}function k(b){var a=b.charCodeAt(0)<<24,f=l(~a),c=0,e=b.length,g="";if(5>f&&e>=f){a=a<<f>>>24+f;for(c=1;c<f;++c)a=a<<6|b.charCodeAt(c)&63;65535>=a?g+=d(a):1114111>=a?(a-=65536,g+=d((a>>10)+55296,(a&1023)+56320)):c=0}for(;c<e;++c)g+="\ufffd";return g}var m=Math.log,n=Math.LN2,l=Math.clz32||function(b){return 31-m(b>>>0)/n|0},d=String.fromCharCode,p=atob,q=btoa;e.btoaUTF8=function(b,a){return q((a?"\u00ef\u00bb\u00bf":"")+b.replace(/[\x80-\uD7ff\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]?/g,h))};e.atobUTF8=function(b,a){a||"\u00ef\u00bb\u00bf"!==b.substring(0,3)||(b=b.substring(3));return p(b).replace(/[\xc0-\xff][\x80-\xbf]*/g,k)}}(""+void 0==typeof global?""+void 0==typeof self?this:self:global)
Below is the source code used to generate it.
var fromCharCode = String.fromCharCode;
var btoaUTF8 = (function(btoa, replacer){"use strict";
return function(inputString, BOMit){
return btoa((BOMit ? "\xEF\xBB\xBF" : "") + inputString.replace(
/[\x80-\uD7ff\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]?/g, replacer
));
}
})(btoa, function(nonAsciiChars){"use strict";
// make the UTF string into a binary UTF-8 encoded string
var point = nonAsciiChars.charCodeAt(0);
if (point >= 0xD800 && point <= 0xDBFF) {
var nextcode = nonAsciiChars.charCodeAt(1);
if (nextcode !== nextcode) // NaN because string is 1 code point long
return fromCharCode(0xef/*11101111*/, 0xbf/*10111111*/, 0xbd/*10111101*/);
// https://mathiasbynens.be/notes/javascript-encoding#surrogate-formulae
if (nextcode >= 0xDC00 && nextcode <= 0xDFFF) {
point = (point - 0xD800) * 0x400 + nextcode - 0xDC00 + 0x10000;
if (point > 0xffff)
return fromCharCode(
(0x1e/*0b11110*/<<3) | (point>>>18),
(0x2/*0b10*/<<6) | ((point>>>12)&0x3f/*0b00111111*/),
(0x2/*0b10*/<<6) | ((point>>>6)&0x3f/*0b00111111*/),
(0x2/*0b10*/<<6) | (point&0x3f/*0b00111111*/)
);
} else return fromCharCode(0xef, 0xbf, 0xbd);
}
if (point <= 0x007f) return nonAsciiChars;
else if (point <= 0x07ff) {
return fromCharCode((0x6<<5)|(point>>>6), (0x2<<6)|(point&0x3f));
} else return fromCharCode(
(0xe/*0b1110*/<<4) | (point>>>12),
(0x2/*0b10*/<<6) | ((point>>>6)&0x3f/*0b00111111*/),
(0x2/*0b10*/<<6) | (point&0x3f/*0b00111111*/)
);
});
Then, to decode the base64 data, either HTTP get the data as a data URI or use the function below.
var clz32 = Math.clz32 || (function(log, LN2){"use strict";
return function(x) {return 31 - log(x >>> 0) / LN2 | 0};
})(Math.log, Math.LN2);
var fromCharCode = String.fromCharCode;
var atobUTF8 = (function(atob, replacer){"use strict";
return function(inputString, keepBOM){
inputString = atob(inputString);
if (!keepBOM && inputString.substring(0,3) === "\xEF\xBB\xBF")
inputString = inputString.substring(3); // eradicate UTF-8 BOM
// 0xc0 => 0b11000000; 0xff => 0b11111111; 0xc0-0xff => 0b11xxxxxx
// 0x80 => 0b10000000; 0xbf => 0b10111111; 0x80-0xbf => 0b10xxxxxx
return inputString.replace(/[\xc0-\xff][\x80-\xbf]*/g, replacer);
}
})(atob, function(encoded){"use strict";
var codePoint = encoded.charCodeAt(0) << 24;
var leadingOnes = clz32(~codePoint);
var endPos = 0, stringLen = encoded.length;
var result = "";
if (leadingOnes < 5 && stringLen >= leadingOnes) {
codePoint = (codePoint<<leadingOnes)>>>(24+leadingOnes);
for (endPos = 1; endPos < leadingOnes; ++endPos)
codePoint = (codePoint<<6) | (encoded.charCodeAt(endPos)&0x3f/*0b00111111*/);
if (codePoint <= 0xFFFF) { // BMP code point
result += fromCharCode(codePoint);
} else if (codePoint <= 0x10FFFF) {
// https://mathiasbynens.be/notes/javascript-encoding#surrogate-formulae
codePoint -= 0x10000;
result += fromCharCode(
(codePoint >> 10) + 0xD800, // highSurrogate
(codePoint & 0x3ff) + 0xDC00 // lowSurrogate
);
} else endPos = 0; // to fill it in with INVALIDs
}
for (; endPos < stringLen; ++endPos) result += "\ufffd"; // replacement character
return result;
});
The advantage of being more standard is that this encoder and this decoder are more widely applicable because they can be used as a valid URL that displays correctly. Observe.
(function(window){
"use strict";
var sourceEle = document.getElementById("source");
var urlBarEle = document.getElementById("urlBar");
var mainFrameEle = document.getElementById("mainframe");
var gotoButton = document.getElementById("gotoButton");
var parseInt = window.parseInt;
var fromCodePoint = String.fromCodePoint;
var parse = JSON.parse;
function unescape(str){
return str.replace(/\\u[\da-f]{0,4}|\\x[\da-f]{0,2}|\\u{[^}]*}|\\[bfnrtv"'\\]|\\0[0-7]{1,3}|\\\d{1,3}/g, function(match){
try{
if (match.startsWith("\\u{"))
return fromCodePoint(parseInt(match.slice(2,-1),16));
if (match.startsWith("\\u") || match.startsWith("\\x"))
return fromCodePoint(parseInt(match.substring(2),16));
if (match.startsWith("\\0") && match.length > 2)
return fromCodePoint(parseInt(match.substring(2),8));
if (/^\\\d/.test(match)) return fromCodePoint(+match.slice(1));
}catch(e){return "\ufffd".repeat(match.length)}
return parse('"' + match + '"');
});
}
function whenChange(){
try{ urlBarEle.value = "data:text/plain;charset=UTF-8;base64," + btoaUTF8(unescape(sourceEle.value), true);
} finally{ gotoURL(); }
}
sourceEle.addEventListener("change",whenChange,{passive:1});
sourceEle.addEventListener("input",whenChange,{passive:1});
// IFrame Setup:
function gotoURL(){mainFrameEle.src = urlBarEle.value}
gotoButton.addEventListener("click", gotoURL, {passive: 1});
function urlChanged(){urlBarEle.value = mainFrameEle.src}
mainFrameEle.addEventListener("load", urlChanged, {passive: 1});
urlBarEle.addEventListener("keypress", function(evt){
if (evt.key === "enter") evt.preventDefault(), urlChanged();
}, {passive: 1});
var fromCharCode = String.fromCharCode;
var btoaUTF8 = (function(btoa, replacer){
"use strict";
return function(inputString, BOMit){
return btoa((BOMit?"\xEF\xBB\xBF":"") + inputString.replace(
/[\x80-\uD7ff\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]?/g, replacer
));
}
})(btoa, function(nonAsciiChars){
"use strict";
// make the UTF string into a binary UTF-8 encoded string
var point = nonAsciiChars.charCodeAt(0);
if (point >= 0xD800 && point <= 0xDBFF) {
var nextcode = nonAsciiChars.charCodeAt(1);
if (nextcode !== nextcode) { // NaN because string is 1code point long
return fromCharCode(0xef/*11101111*/, 0xbf/*10111111*/, 0xbd/*10111101*/);
}
// https://mathiasbynens.be/notes/javascript-encoding#surrogate-formulae
if (nextcode >= 0xDC00 && nextcode <= 0xDFFF) {
point = (point - 0xD800) * 0x400 + nextcode - 0xDC00 + 0x10000;
if (point > 0xffff) {
return fromCharCode(
(0x1e/*0b11110*/<<3) | (point>>>18),
(0x2/*0b10*/<<6) | ((point>>>12)&0x3f/*0b00111111*/),
(0x2/*0b10*/<<6) | ((point>>>6)&0x3f/*0b00111111*/),
(0x2/*0b10*/<<6) | (point&0x3f/*0b00111111*/)
);
}
} else {
return fromCharCode(0xef, 0xbf, 0xbd);
}
}
if (point <= 0x007f) { return inputString; }
else if (point <= 0x07ff) {
return fromCharCode((0x6<<5)|(point>>>6), (0x2<<6)|(point&0x3f/*00111111*/));
} else {
return fromCharCode(
(0xe/*0b1110*/<<4) | (point>>>12),
(0x2/*0b10*/<<6) | ((point>>>6)&0x3f/*0b00111111*/),
(0x2/*0b10*/<<6) | (point&0x3f/*0b00111111*/)
);
}
});
setTimeout(whenChange, 0);
})(window);
img:active{opacity:0.8}
<center>
<textarea id="source" style="width:66.7vw">Hello \u1234 W\186\0256ld!
Enter text into the top box. Then the URL will update automatically.
</textarea><br />
<div style="width:66.7vw;display:inline-block;height:calc(25vw + 1em + 6px);border:2px solid;text-align:left;line-height:1em">
<input id="urlBar" style="width:calc(100% - 1em - 13px)" /><img id="gotoButton" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABsAAAAeCAMAAADqx5XUAAAAclBMVEX///9NczZ8e32ko6fDxsU/fBoSQgdFtwA5pAHVxt+7vLzq5ex23y4SXABLiiTm0+/c2N6DhoQ6WSxSyweVlZVvdG/Uz9aF5kYlbwElkwAggACxs7Jl3hX07/cQbQCar5SU9lRntEWGum+C9zIDHwCGnH5IvZAOAAABmUlEQVQoz7WS25acIBBFkRLkIgKKtOCttbv//xdDmTGZzHv2S63ltuBQQP4rdRiRUP8UK4wh6nVddQwj/NtDQTvac8577zTQb72zj65/876qqt7wykU6/1U6vFEgjE1mt/5LRqrpu7oVsn0sjZejMfxR3W/yLikqAFcUx93YxLmZGOtElmEu6Ufd9xV3ZDTGcEvGLbMk0mHHlUSvS5svCwS+hVL8loQQyfpI1Ay8RF/xlNxcsTchGjGDIuBG3Ik7TMyNxn8m0TSnBAK6Z8UZfp3IbAonmJvmsEACum6aNv7B0CnvpezDcNhw9XWsuAr7qnRg6dABmeM4dTgn/DZdXWs3LMspZ1KDMt1kcPJ6S1icWNp2qaEmjq6myx7jbQK3VKItLJaW5FR+cuYlRhYNKzGa9vF4vM5roLW3OSVjkmiGJrPhUq301/16pVKZRGFYWjTP50spTxBN5Z4EKnSonruk+n4tUokv1aJSEl/MLZU90S3L6/U6o0J142iQVp3HcZxKSo8LfkNRCtJaKYFSRX7iaoAAUDty8wvWYR6HJEepdwAAAABJRU5ErkJggg==" style="width:calc(1em + 4px);line-height:1em;vertical-align:-40%;cursor:pointer" />
<iframe id="mainframe" style="width:66.7vw;height:25vw" frameBorder="0"></iframe>
</div>
</center>
In addition to being very standardized, the above code snippets are also very fast. Instead of an indirect chain of succession where the data has to be converted several times between various forms (such as in Riccardo Galli's response), the above code snippet is as direct as performantly possible. It uses only one simple fast String.prototype.replace call to process the data when encoding, and only one to decode the data when decoding. Another plus is that (especially for big strings), String.prototype.replace allows the browser to automatically handle the underlying memory management of resizing the string, leading a significant performance boost especially in evergreen browsers like Chrome and Firefox that heavily optimize String.prototype.replace. Finally, the icing on the cake is that for you latin script exclūsīvō users, strings which don't contain any code points above 0x7f are extra fast to process because the string remains unmodified by the replacement algorithm.
I have created a github repository for this solution at https://github.com/anonyco/BestBase64EncoderDecoder/
If trying to decode a Base64 representation of utf8 encoded data in node, you can use the native Buffer helper
Buffer.from("4pyTIMOgIGxhIG1vZGU=", "base64").toString(); // '✓ à la mode'
The toString method of Buffer defaults to utf8, but you can specify any desired encoding. For example, the reverse operation would look like this
Buffer.from('✓ à la mode', "utf8").toString("base64"); // "4pyTIMOgIGxhIG1vZGU="
This is my one-liner solution combining Jackie Hans answer and some code from another question:
const utf8_encoded_text = new TextDecoder().decode(Uint8Array.from(window.atob(base_64_decoded_text).split("").map(x => x.charCodeAt(0))));
Here's some future-proof code for browsers that may lack escape/unescape(). Note that IE 9 and older don't support atob/btoa(), so you'd need to use custom base64 functions for them.
// Polyfill for escape/unescape
if( !window.unescape ){
window.unescape = function( s ){
return s.replace( /%([0-9A-F]{2})/g, function( m, p ) {
return String.fromCharCode( '0x' + p );
} );
};
}
if( !window.escape ){
window.escape = function( s ){
var chr, hex, i = 0, l = s.length, out = '';
for( ; i < l; i ++ ){
chr = s.charAt( i );
if( chr.search( /[A-Za-z0-9\#\*\_\+\-\.\/]/ ) > -1 ){
out += chr; continue; }
hex = s.charCodeAt( i ).toString( 16 );
out += '%' + ( hex.length % 2 != 0 ? '0' : '' ) + hex;
}
return out;
};
}
// Base64 encoding of UTF-8 strings
var utf8ToB64 = function( s ){
return btoa( unescape( encodeURIComponent( s ) ) );
};
var b64ToUtf8 = function( s ){
return decodeURIComponent( escape( atob( s ) ) );
};
A more comprehensive example for UTF-8 encoding and decoding can be found here: http://jsfiddle.net/47zwb41o/
Small correction, unescape and escape are deprecated, so:
function utf8_to_b64( str ) {
return window.btoa(decodeURIComponent(encodeURIComponent(str)));
}
function b64_to_utf8( str ) {
return decodeURIComponent(encodeURIComponent(window.atob(str)));
}
function b64_to_utf8( str ) {
str = str.replace(/\s/g, '');
return decodeURIComponent(encodeURIComponent(window.atob(str)));
}
including above solution if still facing issue try as below, Considerign the case where escape is not supported for TS.
blob = new Blob(["\ufeff", csv_content]); // this will make symbols to appears in excel
for csv_content you can try like below.
function b64DecodeUnicode(str: any) {
return decodeURIComponent(atob(str).split('').map((c: any) => {
return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
}).join(''));
}

Size of json object? (in KBs/MBs)

How can I know that how much data is transferred over wire in terms of size in kilobytes, megabytes?
Take for example
{
'a': 1,
'b': 2
}
How do I know what is the size of this payload is and not the length or items in the object
UPDATE
content-encoding:gzip
content-type:application/json
Transfer-Encoding:chunked
vary:Accept-Encoding
An answer to the actual question should include the bytes spent on the headers and should include taking gzip compression into account, but I will ignore those things.
You have a few options. They all output the same answer when run:
If Using a Browser or Node (Not IE)
const size = new TextEncoder().encode(JSON.stringify(obj)).length
const kiloBytes = size / 1024;
const megaBytes = kiloBytes / 1024;
If you need it to work on IE, you can use a pollyfill
If Using Node
const size = Buffer.byteLength(JSON.stringify(obj))
(which is the same as Buffer.byteLength(JSON.stringify(obj), "utf8")).
Shortcut That Works in IE, Modern Browsers, and Node
const size = encodeURI(JSON.stringify(obj)).split(/%..|./).length - 1;
That last solution will work in almost every case, but that last solution will throw a URIError: URI malformed exception if you feed it input containing a string that should not exist, like let obj = { partOfAnEmoji: "👍🏽"[1] }. The other two solutions I provided will not have that weakness.
(Credits: Credit for the first solution goes here.
Credit for the second solution goes to the utf8-byte-length package (which is good, you could use that instead).
Most of the credit for that last solution goes to here, but I simplified it a bit.
I found the test suite of the utf8-byte-length package super helpful, when researching this.)
For ascii, you can count the characters if you do...
JSON.stringify({
'a': 1,
'b': 2
}).length
If you have special characters too, you can pass through a function for calculating length of UTF-8 characters...
function lengthInUtf8Bytes(str) {
// Matches only the 10.. bytes that are non-initial characters in a multi-byte sequence.
var m = encodeURIComponent(str).match(/%[89ABab]/g);
return str.length + (m ? m.length : 0);
}
Should be accurate...
var myJson = JSON.stringify({
'a': 1,
'b': 2,
'c': 'Máybë itß nºt that sîmple, though.'
})
// simply measuring character length of string is not enough...
console.log("Inaccurate for non ascii chars: "+myJson.length)
// pass it through UTF-8 length function...
console.log("Accurate for non ascii chars: "+ lengthInUtf8Bytes(myJson))
/* Should echo...
Inaccurate for non ascii chars: 54
Accurate for non ascii chars: 59
*/
Working demo
Here is a function which does the job.
function memorySizeOf(obj) {
var bytes = 0;
function sizeOf(obj) {
if(obj !== null && obj !== undefined) {
switch(typeof obj) {
case 'number':
bytes += 8;
break;
case 'string':
bytes += obj.length * 2;
break;
case 'boolean':
bytes += 4;
break;
case 'object':
var objClass = Object.prototype.toString.call(obj).slice(8, -1);
if(objClass === 'Object' || objClass === 'Array') {
for(var key in obj) {
if(!obj.hasOwnProperty(key)) continue;
sizeOf(obj[key]);
}
} else bytes += obj.toString().length * 2;
break;
}
}
return bytes;
};
function formatByteSize(bytes) {
if(bytes < 1024) return bytes + " bytes";
else if(bytes < 1048576) return(bytes / 1024).toFixed(3) + " KiB";
else if(bytes < 1073741824) return(bytes / 1048576).toFixed(3) + " MiB";
else return(bytes / 1073741824).toFixed(3) + " GiB";
};
return formatByteSize(sizeOf(obj));
};
console.log(memorySizeOf({"name": "john"}));
I got the snippet from following URL
https://gist.github.com/zensh/4975495
Buffer.from(JSON.stringify(obj)).length will give you the number of bytes.
JSON-objects are Javascript-objects, this SO question already shows a way to calculate the rough size. <- See Comments/Edits
EDIT: Your actual question is quite difficult as even if you could access the headers, content-size doesn't show the gzip'ed value as far as I know.
If you're lucky enough to have a good html5 browser, this contains a nice piece of code:
var xhr = new window.XMLHttpRequest();
//Upload progress
xhr.upload.addEventListener("progress", function(evt){
if (evt.lengthComputable) {
var percentComplete = evt.loaded / evt.total;
//Do something with upload progress
console.log(percentComplete);
}
}
Therefore the following should work (can't test right now):
var xhr = new window.XMLHttpRequest();
if (xhr.lengthComputable) {
alert(xhr.total);
}
Note that XMLHttpRequest is native and doesn't require jQuery.
Next EDIT:
I found a kind-of duplicate (slightly diffrent question, pretty much same answer). The important part for this question is
content_length = jqXHR.getResponseHeader("X-Content-Length");
Where jqXHR is your XmlHttpRequest.

Converting between strings and ArrayBuffers

Is there a commonly accepted technique for efficiently converting JavaScript strings to ArrayBuffers and vice-versa? Specifically, I'd like to be able to write the contents of an ArrayBuffer to localStorage and then read it back.
Update 2016 - five years on there are now new methods in the specs (see support below) to convert between strings and typed arrays using proper encoding.
TextEncoder
The TextEncoder represents:
The TextEncoder interface represents an encoder for a specific method,
that is a specific character encoding, like utf-8, iso-8859-2, koi8,
cp1261, gbk, ... An encoder takes a stream of code points as input and
emits a stream of bytes.
Change note since the above was written: (ibid.)
Note: Firefox, Chrome and Opera used to have support for encoding
types other than utf-8 (such as utf-16, iso-8859-2, koi8, cp1261, and
gbk). As of Firefox 48 [...], Chrome 54 [...] and Opera 41, no
other encoding types are available other than utf-8, in order to match
the spec.*
*) Updated specs (W3) and here (whatwg).
After creating an instance of the TextEncoder it will take a string and encode it using a given encoding parameter:
if (!("TextEncoder" in window))
alert("Sorry, this browser does not support TextEncoder...");
var enc = new TextEncoder(); // always utf-8
console.log(enc.encode("This is a string converted to a Uint8Array"));
You then of course use the .buffer parameter on the resulting Uint8Array to convert the underlaying ArrayBuffer to a different view if needed.
Just make sure that the characters in the string adhere to the encoding schema, for example, if you use characters outside the UTF-8 range in the example they will be encoded to two bytes instead of one.
For general use you would use UTF-16 encoding for things like localStorage.
TextDecoder
Likewise, the opposite process uses the TextDecoder:
The TextDecoder interface represents a decoder for a specific method,
that is a specific character encoding, like utf-8, iso-8859-2, koi8,
cp1261, gbk, ... A decoder takes a stream of bytes as input and emits
a stream of code points.
All available decoding types can be found here.
if (!("TextDecoder" in window))
alert("Sorry, this browser does not support TextDecoder...");
var enc = new TextDecoder("utf-8");
var arr = new Uint8Array([84,104,105,115,32,105,115,32,97,32,85,105,110,116,
56,65,114,114,97,121,32,99,111,110,118,101,114,116,
101,100,32,116,111,32,97,32,115,116,114,105,110,103]);
console.log(enc.decode(arr));
The MDN StringView library
An alternative to these is to use the StringView library (licensed as lgpl-3.0) which goal is:
to create a C-like interface for strings (i.e., an array of character codes — an ArrayBufferView in JavaScript) based upon the
JavaScript ArrayBuffer interface
to create a highly extensible library that anyone can extend by adding methods to the object StringView.prototype
to create a collection of methods for such string-like objects (since now: stringViews) which work strictly on arrays of numbers
rather than on creating new immutable JavaScript strings
to work with Unicode encodings other than JavaScript's default UTF-16 DOMStrings
giving much more flexibility. However, it would require us to link to or embed this library while TextEncoder/TextDecoder is being built-in in modern browsers.
Support
As of July/2018:
TextEncoder (Experimental, On Standard Track)
Chrome | Edge | Firefox | IE | Opera | Safari
----------|-----------|-----------|-----------|-----------|-----------
38 | ? | 19° | - | 25 | -
Chrome/A | Edge/mob | Firefox/A | Opera/A |Safari/iOS | Webview/A
----------|-----------|-----------|-----------|-----------|-----------
38 | ? | 19° | ? | - | 38
°) 18: Firefox 18 implemented an earlier and slightly different version
of the specification.
WEB WORKER SUPPORT:
Experimental, On Standard Track
Chrome | Edge | Firefox | IE | Opera | Safari
----------|-----------|-----------|-----------|-----------|-----------
38 | ? | 20 | - | 25 | -
Chrome/A | Edge/mob | Firefox/A | Opera/A |Safari/iOS | Webview/A
----------|-----------|-----------|-----------|-----------|-----------
38 | ? | 20 | ? | - | 38
Data from MDN - `npm i -g mdncomp` by epistemex
Although Dennis and gengkev solutions of using Blob/FileReader work, I wouldn't suggest taking that approach. It is an async approach to a simple problem, and it is much slower than a direct solution. I've made a post in html5rocks with a simpler and (much faster) solution:
http://updates.html5rocks.com/2012/06/How-to-convert-ArrayBuffer-to-and-from-String
And the solution is:
function ab2str(buf) {
return String.fromCharCode.apply(null, new Uint16Array(buf));
}
function str2ab(str) {
var buf = new ArrayBuffer(str.length*2); // 2 bytes for each char
var bufView = new Uint16Array(buf);
for (var i=0, strLen=str.length; i<strLen; i++) {
bufView[i] = str.charCodeAt(i);
}
return buf;
}
EDIT:
The Encoding API helps solving the string conversion problem. Check out the response from Jeff Posnik on Html5Rocks.com to the above original article.
Excerpt:
The Encoding API makes it simple to translate between raw bytes and native JavaScript strings, regardless of which of the many standard encodings you need to work with.
<pre id="results"></pre>
<script>
if ('TextDecoder' in window) {
// The local files to be fetched, mapped to the encoding that they're using.
var filesToEncoding = {
'utf8.bin': 'utf-8',
'utf16le.bin': 'utf-16le',
'macintosh.bin': 'macintosh'
};
Object.keys(filesToEncoding).forEach(function(file) {
fetchAndDecode(file, filesToEncoding[file]);
});
} else {
document.querySelector('#results').textContent = 'Your browser does not support the Encoding API.'
}
// Use XHR to fetch `file` and interpret its contents as being encoded with `encoding`.
function fetchAndDecode(file, encoding) {
var xhr = new XMLHttpRequest();
xhr.open('GET', file);
// Using 'arraybuffer' as the responseType ensures that the raw data is returned,
// rather than letting XMLHttpRequest decode the data first.
xhr.responseType = 'arraybuffer';
xhr.onload = function() {
if (this.status == 200) {
// The decode() method takes a DataView as a parameter, which is a wrapper on top of the ArrayBuffer.
var dataView = new DataView(this.response);
// The TextDecoder interface is documented at http://encoding.spec.whatwg.org/#interface-textdecoder
var decoder = new TextDecoder(encoding);
var decodedString = decoder.decode(dataView);
// Add the decoded file's text to the <pre> element on the page.
document.querySelector('#results').textContent += decodedString + '\n';
} else {
console.error('Error while requesting', file, this);
}
};
xhr.send();
}
</script>
You can use TextEncoder and TextDecoder from the Encoding standard, which is polyfilled by the stringencoding library, to convert string to and from ArrayBuffers:
var uint8array = new TextEncoder().encode(string);
var string = new TextDecoder(encoding).decode(uint8array);
Blob is much slower than String.fromCharCode(null,array);
but that fails if the array buffer gets too big. The best solution I have found is to use String.fromCharCode(null,array); and split it up into operations that won't blow the stack, but are faster than a single char at a time.
The best solution for large array buffer is:
function arrayBufferToString(buffer){
var bufView = new Uint16Array(buffer);
var length = bufView.length;
var result = '';
var addition = Math.pow(2,16)-1;
for(var i = 0;i<length;i+=addition){
if(i + addition > length){
addition = length - i;
}
result += String.fromCharCode.apply(null, bufView.subarray(i,i+addition));
}
return result;
}
I found this to be about 20 times faster than using blob. It also works for large strings of over 100mb.
In case you have binary data in a string (obtained from nodejs + readFile(..., 'binary'), or cypress + cy.fixture(..., 'binary'), etc), you can't use TextEncoder. It supports only utf8. Bytes with values >= 128 are each turned into 2 bytes.
ES2015:
a = Uint8Array.from(s, x => x.charCodeAt(0))
Uint8Array(33) [2, 134, 140, 186, 82, 70, 108, 182, 233, 40, 143, 247, 29, 76, 245, 206, 29, 87, 48, 160, 78, 225, 242, 56, 236, 201, 80, 80, 152, 118, 92, 144, 48
s = String.fromCharCode.apply(null, a)
"ºRFl¶é(÷LõÎW0 Náò8ìÉPPv\0"
Based on the answer of gengkev, I created functions for both ways, because BlobBuilder can handle String and ArrayBuffer:
function string2ArrayBuffer(string, callback) {
var bb = new BlobBuilder();
bb.append(string);
var f = new FileReader();
f.onload = function(e) {
callback(e.target.result);
}
f.readAsArrayBuffer(bb.getBlob());
}
and
function arrayBuffer2String(buf, callback) {
var bb = new BlobBuilder();
bb.append(buf);
var f = new FileReader();
f.onload = function(e) {
callback(e.target.result)
}
f.readAsText(bb.getBlob());
}
A simple test:
string2ArrayBuffer("abc",
function (buf) {
var uInt8 = new Uint8Array(buf);
console.log(uInt8); // Returns `Uint8Array { 0=97, 1=98, 2=99}`
arrayBuffer2String(buf,
function (string) {
console.log(string); // returns "abc"
}
)
}
)
All the following is about getting binary strings from array buffers
I'd recommend not to use
var binaryString = String.fromCharCode.apply(null, new Uint8Array(arrayBuffer));
because it
crashes on big buffers (somebody wrote about "magic" size of 246300 but I got Maximum call stack size exceeded error on 120000 bytes buffer (Chrome 29))
it has really poor performance (see below)
If you exactly need synchronous solution use something like
var
binaryString = '',
bytes = new Uint8Array(arrayBuffer),
length = bytes.length;
for (var i = 0; i < length; i++) {
binaryString += String.fromCharCode(bytes[i]);
}
it is as slow as the previous one but works correctly. It seems that at the moment of writing this there is no quite fast synchronous solution for that problem (all libraries mentioned in this topic uses the same approach for their synchronous features).
But what I really recommend is using Blob + FileReader approach
function readBinaryStringFromArrayBuffer (arrayBuffer, onSuccess, onFail) {
var reader = new FileReader();
reader.onload = function (event) {
onSuccess(event.target.result);
};
reader.onerror = function (event) {
onFail(event.target.error);
};
reader.readAsBinaryString(new Blob([ arrayBuffer ],
{ type: 'application/octet-stream' }));
}
the only disadvantage (not for all) is that it is asynchronous. And it is about 8-10 times faster then previous solutions! (Some details: synchronous solution on my environment took 950-1050 ms for 2.4Mb buffer but solution with FileReader had times about 100-120 ms for the same amount of data. And I have tested both synchronous solutions on 100Kb buffer and they have taken almost the same time, so loop is not much slower the using 'apply'.)
BTW here: How to convert ArrayBuffer to and from String author compares two approaches like me and get completely opposite results (his test code is here) Why so different results? Probably because of his test string that is 1Kb long (he called it "veryLongStr"). My buffer was a really big JPEG image of size 2.4Mb.
Unlike the solutions here, I needed to convert to/from UTF-8 data. For this purpose, I coded the following two functions, using the (un)escape/(en)decodeURIComponent trick. They're pretty wasteful of memory, allocating 9 times the length of the encoded utf8-string, though those should be recovered by gc. Just don't use them for 100mb text.
function utf8AbFromStr(str) {
var strUtf8 = unescape(encodeURIComponent(str));
var ab = new Uint8Array(strUtf8.length);
for (var i = 0; i < strUtf8.length; i++) {
ab[i] = strUtf8.charCodeAt(i);
}
return ab;
}
function strFromUtf8Ab(ab) {
return decodeURIComponent(escape(String.fromCharCode.apply(null, ab)));
}
Checking that it works:
strFromUtf8Ab(utf8AbFromStr('latinкирилицаαβγδεζηあいうえお'))
-> "latinкирилицаαβγδεζηあいうえお"
(Update Please see the 2nd half of this answer, where I have (hopefully) provided a more complete solution.)
I also ran into this issue, the following works for me in FF 6 (for one direction):
var buf = new ArrayBuffer( 10 );
var view = new Uint8Array( buf );
view[ 3 ] = 4;
alert(Array.prototype.slice.call(view).join(""));
Unfortunately, of course, you end up with ASCII text representations of the values in the array, rather than characters. It still (should be) much more efficient than a loop, though.
eg. For the example above, the result is 0004000000, rather than several null chars & a chr(4).
Edit:
After looking on MDC here, you may create an ArrayBuffer from an Array as follows:
var arr = new Array(23);
// New Uint8Array() converts the Array elements
// to Uint8s & creates a new ArrayBuffer
// to store them in & a corresponding view.
// To get at the generated ArrayBuffer,
// you can then access it as below, with the .buffer property
var buf = new Uint8Array( arr ).buffer;
To answer your original question, this allows you to convert ArrayBuffer <-> String as follows:
var buf, view, str;
buf = new ArrayBuffer( 256 );
view = new Uint8Array( buf );
view[ 0 ] = 7; // Some dummy values
view[ 2 ] = 4;
// ...
// 1. Buffer -> String (as byte array "list")
str = bufferToString(buf);
alert(str); // Alerts "7,0,4,..."
// 1. String (as byte array) -> Buffer
buf = stringToBuffer(str);
alert(new Uint8Array( buf )[ 2 ]); // Alerts "4"
// Converts any ArrayBuffer to a string
// (a comma-separated list of ASCII ordinals,
// NOT a string of characters from the ordinals
// in the buffer elements)
function bufferToString( buf ) {
var view = new Uint8Array( buf );
return Array.prototype.join.call(view, ",");
}
// Converts a comma-separated ASCII ordinal string list
// back to an ArrayBuffer (see note for bufferToString())
function stringToBuffer( str ) {
var arr = str.split(",")
, view = new Uint8Array( arr );
return view.buffer;
}
For convenience, here is a function for converting a raw Unicode String to an ArrayBuffer (will only work with ASCII/one-byte characters)
function rawStringToBuffer( str ) {
var idx, len = str.length, arr = new Array( len );
for ( idx = 0 ; idx < len ; ++idx ) {
arr[ idx ] = str.charCodeAt(idx) & 0xFF;
}
// You may create an ArrayBuffer from a standard array (of values) as follows:
return new Uint8Array( arr ).buffer;
}
// Alerts "97"
alert(new Uint8Array( rawStringToBuffer("abc") )[ 0 ]);
The above allow you to go from ArrayBuffer -> String & back to ArrayBuffer again, where the string may be stored in eg. .localStorage :)
Hope this helps,
Dan
Just
const buffer = thisReturnsBuffers();
const blob = new Blob([buffer], {type: 'text/plain; charset=utf-8'});
blob.text().then(text => console.log(text));
Or
const stringVal = "string here";
const blob = new Blob([stringVal], {type: 'text/plain; charset=utf-8'});
blob.arrayBuffer().then(buffer => console.log(buffer));
Why are you all making this so complicated?
For node.js and also for browsers using https://github.com/feross/buffer
function ab2str(buf: Uint8Array) {
return Buffer.from(buf).toString('base64');
}
function str2ab(str: string) {
return new Uint8Array(Buffer.from(str, 'base64'))
}
Note: Solutions here didn't work for me. I need to support node.js and browsers and just serialize UInt8Array to a string. I could serialize it as a number[] but that occupies unnecessary space. With that solution I don't need to worry about encodings since it's base64. Just in case other people struggle with the same problem... My two cents
I found I had problems with this approach, basically because I was trying to write the output to a file and it was non encoded properly. Since JS seems to use UCS-2 encoding (source, source), we need to stretch this solution a step further, here's my enhanced solution that works to me.
I had no difficulties with generic text, but when it was down to Arab or Korean, the output file didn't have all the chars but instead was showing error characters
File output:
","10k unit":"",Follow:"Õ©íüY‹","Follow %{screen_name}":"%{screen_name}U“’Õ©íü",Tweet:"ĤüÈ","Tweet %{hashtag}":"%{hashtag} ’ĤüÈY‹","Tweet to %{name}":"%{name}U“xĤüÈY‹"},ko:{"%{followers_count} followers":"%{followers_count}…X \Ì","100K+":"100Ì tÁ","10k unit":"Ì è",Follow:"\°","Follow %{screen_name}":"%{screen_name} Ø \°X0",K:"œ",M:"1Ì",Tweet:"¸","Tweet %{hashtag}":"%{hashtag}
Original:
","10k unit":"万",Follow:"フォローする","Follow %{screen_name}":"%{screen_name}さんをフォロー",Tweet:"ツイート","Tweet %{hashtag}":"%{hashtag} をツイートする","Tweet to %{name}":"%{name}さんへツイートする"},ko:{"%{followers_count} followers":"%{followers_count}명의 팔로워","100K+":"100만 이상","10k unit":"만 단위",Follow:"팔로우","Follow %{screen_name}":"%{screen_name} 님 팔로우하기",K:"천",M:"백만",Tweet:"트윗","Tweet %{hashtag}":"%{hashtag}
I took the information from dennis' solution and this post I found.
Here's my code:
function encode_utf8(s) {
return unescape(encodeURIComponent(s));
}
function decode_utf8(s) {
return decodeURIComponent(escape(s));
}
function ab2str(buf) {
var s = String.fromCharCode.apply(null, new Uint8Array(buf));
return decode_utf8(decode_utf8(s))
}
function str2ab(str) {
var s = encode_utf8(str)
var buf = new ArrayBuffer(s.length);
var bufView = new Uint8Array(buf);
for (var i=0, strLen=s.length; i<strLen; i++) {
bufView[i] = s.charCodeAt(i);
}
return bufView;
}
This allows me to save the content to a file without encoding problems.
How it works:
It basically takes the single 8-byte chunks composing a UTF-8 character and saves them as single characters (therefore an UTF-8 character built in this way, could be composed by 1-4 of these characters).
UTF-8 encodes characters in a format that variates from 1 to 4 bytes in length. What we do here is encoding the sting in an URI component and then take this component and translate it in the corresponding 8 byte character. In this way we don't lose the information given by UTF8 characters that are more than 1 byte long.
if you used huge array example arr.length=1000000
you can this code to avoid stack callback problems
function ab2str(buf) {
var bufView = new Uint16Array(buf);
var unis =""
for (var i = 0; i < bufView.length; i++) {
unis=unis+String.fromCharCode(bufView[i]);
}
return unis
}
reverse function
mangini answer from top
function str2ab(str) {
var buf = new ArrayBuffer(str.length*2); // 2 bytes for each char
var bufView = new Uint16Array(buf);
for (var i=0, strLen=str.length; i<strLen; i++) {
bufView[i] = str.charCodeAt(i);
}
return buf;
}
Well, here's a somewhat convoluted way of doing the same thing:
var string = "Blah blah blah", output;
var bb = new (window.BlobBuilder||window.WebKitBlobBuilder||window.MozBlobBuilder)();
bb.append(string);
var f = new FileReader();
f.onload = function(e) {
// do whatever
output = e.target.result;
}
f.readAsArrayBuffer(bb.getBlob());
Edit: BlobBuilder has long been deprecated in favor of the Blob constructor, which did not exist when I first wrote this post. Here's an updated version. (And yes, this has always been a very silly way to do the conversion, but it was just for fun!)
var string = "Blah blah blah", output;
var f = new FileReader();
f.onload = function(e) {
// do whatever
output = e.target.result;
};
f.readAsArrayBuffer(new Blob([string]));
The following is a working Typescript implementation:
bufferToString(buffer: ArrayBuffer): string {
return String.fromCharCode.apply(null, Array.from(new Uint16Array(buffer)));
}
stringToBuffer(value: string): ArrayBuffer {
let buffer = new ArrayBuffer(value.length * 2); // 2 bytes per char
let view = new Uint16Array(buffer);
for (let i = 0, length = value.length; i < length; i++) {
view[i] = value.charCodeAt(i);
}
return buffer;
}
I've used this for numerous operations while working with crypto.subtle.
Recently I also need to do this for one of my project so did a well research and got a result from Google's Developer community which states this in a simple manner:
For ArrayBuffer to String
function ab2str(buf) {
return String.fromCharCode.apply(null, new Uint16Array(buf));
}
// Here Uint16 can be different like Uinit8/Uint32 depending upon your buffer value type.
For String to ArrayBuffer
function str2ab(str) {
var buf = new ArrayBuffer(str.length*2); // 2 bytes for each char
var bufView = new Uint16Array(buf);
for (var i=0, strLen=str.length; i < strLen; i++) {
bufView[i] = str.charCodeAt(i);
}
return buf;
}
//Same here also for the Uint16Array.
For more in detail reference you can refer this blog by Google.
Let's say you have an arrayBuffer binaryStr:
let text = String.fromCharCode.apply(null, new Uint8Array(binaryStr));
and then you assign the text to the state.
I used this and works for me.
function arrayBufferToBase64( buffer ) {
var binary = '';
var bytes = new Uint8Array( buffer );
var len = bytes.byteLength;
for (var i = 0; i < len; i++) {
binary += String.fromCharCode( bytes[ i ] );
}
return window.btoa( binary );
}
function base64ToArrayBuffer(base64) {
var binary_string = window.atob(base64);
var len = binary_string.length;
var bytes = new Uint8Array( len );
for (var i = 0; i < len; i++) {
bytes[i] = binary_string.charCodeAt(i);
}
return bytes.buffer;
}
stringToArrayBuffer(byteString) {
var byteArray = new Uint8Array(byteString.length);
for (var i = 0; i < byteString.length; i++) {
byteArray[i] = byteString.codePointAt(i);
}
return byteArray;
}
arrayBufferToString(buffer) {
var byteArray = new Uint8Array(buffer);
var byteString = '';
for (var i = 0; i < byteArray.byteLength; i++) {
byteString += String.fromCodePoint(byteArray[i]);
}
return byteString;
}
After playing with mangini's solution for converting from ArrayBuffer to String - ab2str (which is the most elegant and useful one I have found - thanks!), I had some issues when handling large arrays. More specefivally, calling String.fromCharCode.apply(null, new Uint16Array(buf)); throws an error:
arguments array passed to Function.prototype.apply is too large.
In order to solve it (bypass) I have decided to handle the input ArrayBuffer in chunks. So the modified solution is:
function ab2str(buf) {
var str = "";
var ab = new Uint16Array(buf);
var abLen = ab.length;
var CHUNK_SIZE = Math.pow(2, 16);
var offset, len, subab;
for (offset = 0; offset < abLen; offset += CHUNK_SIZE) {
len = Math.min(CHUNK_SIZE, abLen-offset);
subab = ab.subarray(offset, offset+len);
str += String.fromCharCode.apply(null, subab);
}
return str;
}
The chunk size is set to 2^16 because this was the size I have found to work in my development landscape. Setting a higher value caused the same error to reoccur. It can be altered by setting the CHUNK_SIZE variable to a different value. It is important to have an even number.
Note on performance - I did not make any performance tests for this solution. However, since it is based on the previous solution, and can handle large arrays, I see no reason why not to use it.
ArrayBuffer -> Buffer -> String(Base64)
Change ArrayBuffer to Buffer and then to String.
Buffer.from(arrBuffer).toString("base64");
See here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Typed_arrays/StringView
(a C-like interface for strings based upon the JavaScript ArrayBuffer interface)
The "native" binary string that atob() returns is a 1-byte-per-character Array.
So we shouldn't store 2 byte into a character.
var arrayBufferToString = function(buffer) {
return String.fromCharCode.apply(null, new Uint8Array(buffer));
}
var stringToArrayBuffer = function(str) {
return (new Uint8Array([].map.call(str,function(x){return x.charCodeAt(0)}))).buffer;
}
Yes:
const encstr = (`TextEncoder` in window) ? new TextEncoder().encode(str) : Uint8Array.from(str, c => c.codePointAt(0));
I'd recommend NOT using deprecated APIs like BlobBuilder
BlobBuilder has long been deprecated by the Blob object. Compare the code in Dennis' answer — where BlobBuilder is used — with the code below:
function arrayBufferGen(str, cb) {
var b = new Blob([str]);
var f = new FileReader();
f.onload = function(e) {
cb(e.target.result);
}
f.readAsArrayBuffer(b);
}
Note how much cleaner and less bloated this is compared to the deprecated method... Yeah, this is definitely something to consider here.
Use splat unpacking instead of loops:
arrbuf = new Uint8Array([104, 101, 108, 108, 111])
text = String.fromCharCode(...arrbuf)
console.log(text)
For substrings arrbuf.slice() can be employed.
For me this worked well.
static async hash(message) {
const data = new TextEncoder().encode(message);
const hashBuffer = await crypto.subtle.digest('SHA-256', data)
const hashArray = Array.from(new Uint8Array(hashBuffer))
const hashHex = hashArray.map((b) => b.toString(16).padStart(2, '0')).join('')
return hashHex
}
var decoder = new TextDecoder ();
var string = decoder.decode (arrayBuffer);
See https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder/decode
From emscripten:
function stringToUTF8Array(str, outU8Array, outIdx, maxBytesToWrite) {
if (!(maxBytesToWrite > 0)) return 0;
var startIdx = outIdx;
var endIdx = outIdx + maxBytesToWrite - 1;
for (var i = 0; i < str.length; ++i) {
var u = str.charCodeAt(i);
if (u >= 55296 && u <= 57343) {
var u1 = str.charCodeAt(++i);
u = 65536 + ((u & 1023) << 10) | u1 & 1023
}
if (u <= 127) {
if (outIdx >= endIdx) break;
outU8Array[outIdx++] = u
} else if (u <= 2047) {
if (outIdx + 1 >= endIdx) break;
outU8Array[outIdx++] = 192 | u >> 6;
outU8Array[outIdx++] = 128 | u & 63
} else if (u <= 65535) {
if (outIdx + 2 >= endIdx) break;
outU8Array[outIdx++] = 224 | u >> 12;
outU8Array[outIdx++] = 128 | u >> 6 & 63;
outU8Array[outIdx++] = 128 | u & 63
} else {
if (outIdx + 3 >= endIdx) break;
outU8Array[outIdx++] = 240 | u >> 18;
outU8Array[outIdx++] = 128 | u >> 12 & 63;
outU8Array[outIdx++] = 128 | u >> 6 & 63;
outU8Array[outIdx++] = 128 | u & 63
}
}
outU8Array[outIdx] = 0;
return outIdx - startIdx
}
Use like:
stringToUTF8Array('abs', new Uint8Array(3), 0, 4);

Base64 encoding and decoding in client-side Javascript [duplicate]

This question already has answers here:
How can you encode a string to Base64 in JavaScript?
(33 answers)
Closed 1 year ago.
Are there any methods in JavaScript that could be used to encode and decode a string using base64 encoding?
Some browsers such as Firefox, Chrome, Safari, Opera and IE10+ can handle Base64 natively. Take a look at this Stackoverflow question. It's using btoa() and atob() functions.
For server-side JavaScript (Node), you can use Buffers to decode.
If you are going for a cross-browser solution, there are existing libraries like CryptoJS or code like:
http://ntt.cc/2008/01/19/base64-encoder-decoder-with-javascript.html (Archive)
With the latter, you need to thoroughly test the function for cross browser compatibility. And error has already been reported.
Internet Explorer 10+
// Define the string
var string = 'Hello World!';
// Encode the String
var encodedString = btoa(string);
console.log(encodedString); // Outputs: "SGVsbG8gV29ybGQh"
// Decode the String
var decodedString = atob(encodedString);
console.log(decodedString); // Outputs: "Hello World!"
Cross-Browser
Re-written and modularized UTF-8 and Base64 Javascript Encoding and Decoding Libraries / Modules for AMD, CommonJS, Nodejs and Browsers. Cross-browser compatible.
with Node.js
Here is how you encode normal text to base64 in Node.js:
//Buffer() requires a number, array or string as the first parameter, and an optional encoding type as the second parameter.
// Default is utf8, possible encoding types are ascii, utf8, ucs2, base64, binary, and hex
var b = Buffer.from('JavaScript');
// If we don't use toString(), JavaScript assumes we want to convert the object to utf8.
// We can make it convert to other formats by passing the encoding type to toString().
var s = b.toString('base64');
And here is how you decode base64 encoded strings:
var b = Buffer.from('SmF2YVNjcmlwdA==', 'base64')
var s = b.toString();
with Dojo.js
To encode an array of bytes using dojox.encoding.base64:
var str = dojox.encoding.base64.encode(myByteArray);
To decode a base64-encoded string:
var bytes = dojox.encoding.base64.decode(str)
bower install angular-base64
<script src="bower_components/angular-base64/angular-base64.js"></script>
angular
.module('myApp', ['base64'])
.controller('myController', [
'$base64', '$scope',
function($base64, $scope) {
$scope.encoded = $base64.encode('a string');
$scope.decoded = $base64.decode('YSBzdHJpbmc=');
}]);
But How?
If you would like to learn more about how base64 is encoded in general, and in JavaScript in-particular, I would recommend this article: Computer science in JavaScript: Base64 encoding
In Gecko/WebKit-based browsers (Firefox, Chrome and Safari) and Opera, you can use btoa() and atob().
Original answer: How can you encode a string to Base64 in JavaScript?
Here is a tightened up version of Sniper's post. It presumes well formed base64 string with no carriage returns. This version eliminates a couple of loops, adds the &0xff fix from Yaroslav, eliminates trailing nulls, plus a bit of code golf.
decodeBase64 = function(s) {
var e={},i,b=0,c,x,l=0,a,r='',w=String.fromCharCode,L=s.length;
var A="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
for(i=0;i<64;i++){e[A.charAt(i)]=i;}
for(x=0;x<L;x++){
c=e[s.charAt(x)];b=(b<<6)+c;l+=6;
while(l>=8){((a=(b>>>(l-=8))&0xff)||(x<(L-2)))&&(r+=w(a));}
}
return r;
};
Short and fast Base64 JavaScript Decode Function without Failsafe:
function decode_base64 (s)
{
var e = {}, i, k, v = [], r = '', w = String.fromCharCode;
var n = [[65, 91], [97, 123], [48, 58], [43, 44], [47, 48]];
for (z in n)
{
for (i = n[z][0]; i < n[z][1]; i++)
{
v.push(w(i));
}
}
for (i = 0; i < 64; i++)
{
e[v[i]] = i;
}
for (i = 0; i < s.length; i+=72)
{
var b = 0, c, x, l = 0, o = s.substring(i, i+72);
for (x = 0; x < o.length; x++)
{
c = e[o.charAt(x)];
b = (b << 6) + c;
l += 6;
while (l >= 8)
{
r += w((b >>> (l -= 8)) % 256);
}
}
}
return r;
}
function b64_to_utf8( str ) {
return decodeURIComponent(escape(window.atob( str )));
}
https://developer.mozilla.org/en-US/docs/Web/API/WindowBase64/Base64_encoding_and_decoding#The_.22Unicode_Problem.22
Modern browsers have built-in javascript functions for Base64 encoding btoa() and decoding atob(). More info about support in older browser versions: https://caniuse.com/?search=atob
However, be aware that atob and btoa functions work only for ASCII charset.
If you need Base64 functions for UTF-8 charset, you can do it with:
function base64_encode(s) {
return btoa(unescape(encodeURIComponent(s)));
}
function base64_decode(s) {
return decodeURIComponent(escape(atob(s)));
}
Did someone say code golf? =)
The following is my attempt at improving my handicap while catching up with the times. Supplied for your convenience.
function decode_base64(s) {
var b=l=0, r='',
m='ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
s.split('').forEach(function (v) {
b=(b<<6)+m.indexOf(v); l+=6;
if (l>=8) r+=String.fromCharCode((b>>>(l-=8))&0xff);
});
return r;
}
What I was actually after was an asynchronous implementation and to my surprise it turns out forEach as opposed to JQuery's $([]).each method implementation is very much synchronous.
If you also had such crazy notions in mind a 0 delay window.setTimeout will run the base64 decode asynchronously and execute the callback function with the result when done.
function decode_base64_async(s, cb) {
setTimeout(function () { cb(decode_base64(s)); }, 0);
}
#Toothbrush suggested "index a string like an array", and get rid of the split. This routine seems really odd and not sure how compatible it will be, but it does hit another birdie so lets have it.
function decode_base64(s) {
var b=l=0, r='',
m='ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
[].forEach.call(s, function (v) {
b=(b<<6)+m.indexOf(v); l+=6;
if (l>=8) r+=String.fromCharCode((b>>>(l-=8))&0xff);
});
return r;
}
While trying to find more information on JavaScript string as array I stumbled on this pro tip using a /./g regex to step through a string. This reduces the code size even more by replacing the string in place and eliminating the need of keeping a return variable.
function decode_base64(s) {
var b=l=0,
m='ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
return s.replace(/./g, function (v) {
b=(b<<6)+m.indexOf(v); l+=6;
return l<8?'':String.fromCharCode((b>>>(l-=8))&0xff);
});
}
If however you were looking for something a little more traditional perhaps the following is more to your taste.
function decode_base64(s) {
var b=l=0, r='', s=s.split(''), i,
m='ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
for (i in s) {
b=(b<<6)+m.indexOf(s[i]); l+=6;
if (l>=8) r+=String.fromCharCode((b>>>(l-=8))&0xff);
}
return r;
}
I didn't have the trailing null issue so this was removed to remain under par but it should easily be resolved with a trim() or a trimRight() if you'd prefer, should this pose a problem for you.
ie.
return r.trimRight();
Note:
The result is an ascii byte string, if you need unicode the easiest is to escape the byte string which can then be decoded with decodeURIComponent to produce the unicode string.
function decode_base64_usc(s) {
return decodeURIComponent(escape(decode_base64(s)));
}
Since escape is being deprecated we could change our function to support unicode directly without the need for escape or String.fromCharCode we can produce a % escaped string ready for URI decoding.
function decode_base64(s) {
var b=l=0,
m='ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
return decodeURIComponent(s.replace(/./g, function (v) {
b=(b<<6)+m.indexOf(v); l+=6;
return l<8?'':'%'+(0x100+((b>>>(l-=8))&0xff)).toString(16).slice(-2);
}));
}
Edit for #Charles Byrne:
Can't remember why we didn't ignore the '=' padding characters, might've worked with a specification that didn't require them at the time. If we were to modify the decodeURIComponent routine to ignore these, as we should since they do not represent any data, the result decodes the example correctly.
function decode_base64(s) {
var b=l=0,
m='ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
return decodeURIComponent(s.replace(/=*$/,'').replace(/./g, function (v) {
b=(b<<6)+m.indexOf(v); l+=6;
return l<8?'':'%'+(0x100+((b>>>(l-=8))&0xff)).toString(16).slice(-2);
}));
}
Now calling decode_base64('4pyTIMOgIGxhIG1vZGU=') will return the encoded string '✓ à la mode', without any errors.
Since '=' is reserved as padding character I can reduce my code golf handicap, if I may:
function decode_base64(s) {
var b=l=0,
m='ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
return decodeURIComponent(s.replace(/./g, function (v) {
b=(b<<6)+m.indexOf(v); l+=6;
return l<8||'='==v?'':'%'+(0x100+((b>>>(l-=8))&0xff)).toString(16).slice(-2);
}));
}
nJoy!
The php.js project has JavaScript implementations of many of PHP's functions. base64_encode and base64_decode are included.
For what it's worth, I got inspired by the other answers and wrote a small utility which calls the platform specific APIs to be used universally from either Node.js or a browser:
/**
* Encode a string of text as base64
*
* #param data The string of text.
* #returns The base64 encoded string.
*/
function encodeBase64(data: string) {
if (typeof btoa === "function") {
return btoa(data);
} else if (typeof Buffer === "function") {
return Buffer.from(data, "utf-8").toString("base64");
} else {
throw new Error("Failed to determine the platform specific encoder");
}
}
/**
* Decode a string of base64 as text
*
* #param data The string of base64 encoded text
* #returns The decoded text.
*/
function decodeBase64(data: string) {
if (typeof atob === "function") {
return atob(data);
} else if (typeof Buffer === "function") {
return Buffer.from(data, "base64").toString("utf-8");
} else {
throw new Error("Failed to determine the platform specific decoder");
}
}
I have tried the Javascript routines at phpjs.org and they have worked well.
I first tried the routines suggested in the chosen answer by Ranhiru Cooray - http://ntt.cc/2008/01/19/base64-encoder-decoder-with-javascript.html
I found that they did not work in all circumstances. I wrote up a test case where these routines fail and posted them to GitHub at:
https://github.com/scottcarter/base64_javascript_test_data.git
I also posted a comment to the blog post at ntt.cc to alert the author (awaiting moderation - the article is old so not sure if comment will get posted).
Frontend: Good solutions above, but quickly for the backend...
NodeJS - no deprecation
Use Buffer.from.
> inBase64 = Buffer.from('plain').toString('base64')
'cGxhaW4='
> // DEPRECATED //
> new Buffer(inBase64, 'base64').toString()
'plain'
> (node:1188987) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
// Works //
> Buffer.from(inBase64, 'base64').toString()
'plain'
In Node.js we can do it in simple way
var base64 = 'SGVsbG8gV29ybGQ='
var base64_decode = new Buffer(base64, 'base64').toString('ascii');
console.log(base64_decode); // "Hello World"
I'd rather use the bas64 encode/decode methods from CryptoJS, the most popular library for standard and secure cryptographic algorithms implemented in JavaScript using best practices and patterns.
For JavaScripts frameworks where there is no atob method and in case you do not want to import external libraries, this is short function that does it.
It would get a string that contains Base64 encoded value and will return a decoded array of bytes (where the array of bytes is represented as array of numbers where each number is an integer between 0 and 255 inclusive).
function fromBase64String(str) {
var alpha =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
var value = [];
var index = 0;
var destIndex = 0;
var padding = false;
while (true) {
var first = getNextChr(str, index, padding, alpha);
var second = getNextChr(str, first .nextIndex, first .padding, alpha);
var third = getNextChr(str, second.nextIndex, second.padding, alpha);
var fourth = getNextChr(str, third .nextIndex, third .padding, alpha);
index = fourth.nextIndex;
padding = fourth.padding;
// ffffffss sssstttt ttffffff
var base64_first = first.code == null ? 0 : first.code;
var base64_second = second.code == null ? 0 : second.code;
var base64_third = third.code == null ? 0 : third.code;
var base64_fourth = fourth.code == null ? 0 : fourth.code;
var a = (( base64_first << 2) & 0xFC ) | ((base64_second>>4) & 0x03);
var b = (( base64_second<< 4) & 0xF0 ) | ((base64_third >>2) & 0x0F);
var c = (( base64_third << 6) & 0xC0 ) | ((base64_fourth>>0) & 0x3F);
value [destIndex++] = a;
if (!third.padding) {
value [destIndex++] = b;
} else {
break;
}
if (!fourth.padding) {
value [destIndex++] = c;
} else {
break;
}
if (index >= str.length) {
break;
}
}
return value;
}
function getNextChr(str, index, equalSignReceived, alpha) {
var chr = null;
var code = 0;
var padding = equalSignReceived;
while (index < str.length) {
chr = str.charAt(index);
if (chr == " " || chr == "\r" || chr == "\n" || chr == "\t") {
index++;
continue;
}
if (chr == "=") {
padding = true;
} else {
if (equalSignReceived) {
throw new Error("Invalid Base64 Endcoding character \""
+ chr + "\" with code " + str.charCodeAt(index)
+ " on position " + index
+ " received afer an equal sign (=) padding "
+ "character has already been received. "
+ "The equal sign padding character is the only "
+ "possible padding character at the end.");
}
code = alpha.indexOf(chr);
if (code == -1) {
throw new Error("Invalid Base64 Encoding character \""
+ chr + "\" with code " + str.charCodeAt(index)
+ " on position " + index + ".");
}
}
break;
}
return { character: chr, code: code, padding: padding, nextIndex: ++index};
}
Resources used: RFC-4648 Section 4
Base64 Win-1251 decoding for encodings other than acsi or iso-8859-1.
As it turned out, all the scripts I saw here convert Cyrillic Base64 to iso-8859-1 encoding. It is strange that no one noticed this.
Thus, to restore the Cyrillic alphabet, it is enough to do an additional transcoding of the text from iso-8859-1 to windows-1251.
I think that with other languages, it will be the same. Just change Cyrilic windows-1251 to yours.
... and Thanks to Der Hochstapler for his code i'm take from his comment ... of over comment, which is somewhat unusual.
code for JScript (for Windows desktop only) (ActiveXObject) - 1251 file encoding
decode_base64=function(f){var g={},b=65,d=0,a,c=0,h,e="",k=String.fromCharCode,l=f.length;for(a="";91>b;)a+=k(b++);a+=a.toLowerCase()+"0123456789+/";for(b=0;64>b;b++)g[a.charAt(b)]=b;for(a=0;a<l;a++)for(b=g[f.charAt(a)],d=(d<<6)+b,c+=6;8<=c;)((h=d>>>(c-=8)&255)||a<l-2)&&(e+=k(h));return e};
sDOS2Win = function(sText, bInsideOut) {
var aCharsets = ["iso-8859-1", "windows-1251"];
sText += "";
bInsideOut = bInsideOut ? 1 : 0;
with (new ActiveXObject("ADODB.Stream")) { //http://www.w3schools.com/ado/ado_ref_stream.asp
type = 2; //Binary 1, Text 2 (default)
mode = 3; //Permissions have not been set 0, Read-only 1, Write-only 2, Read-write 3,
//Prevent other read 4, Prevent other write 8, Prevent other open 12, Allow others all 16
charset = aCharsets[bInsideOut];
open();
writeText(sText);
position = 0;
charset = aCharsets[1 - bInsideOut];
return readText();
}
}
var base64='0PPx8ero5SDh8+ru4uroIQ=='
text = sDOS2Win(decode_base64(base64), false );
WScript.Echo(text)
var x=WScript.StdIn.ReadLine();

Categories