Related
I'm using the Javascript window.atob() function to decode a base64-encoded string (specifically the base64-encoded content from the GitHub API). Problem is I'm getting ASCII-encoded characters back (like ⢠instead of ™). How can I properly handle the incoming base64-encoded stream so that it's decoded as utf-8?
The Unicode Problem
Though JavaScript (ECMAScript) has matured, the fragility of Base64, ASCII, and Unicode encoding has caused a lot of headache (much of it is in this question's history).
Consider the following example:
const ok = "a";
console.log(ok.codePointAt(0).toString(16)); // 61: occupies < 1 byte
const notOK = "✓"
console.log(notOK.codePointAt(0).toString(16)); // 2713: occupies > 1 byte
console.log(btoa(ok)); // YQ==
console.log(btoa(notOK)); // error
Why do we encounter this?
Base64, by design, expects binary data as its input. In terms of JavaScript strings, this means strings in which each character occupies only one byte. So if you pass a string into btoa() containing characters that occupy more than one byte, you will get an error, because this is not considered binary data.
Source: MDN (2021)
The original MDN article also covered the broken nature of window.btoa and .atob, which have since been mended in modern ECMAScript. The original, now-dead MDN article explained:
The "Unicode Problem"
Since DOMStrings are 16-bit-encoded strings, in most browsers calling window.btoa on a Unicode string will cause a Character Out Of Range exception if a character exceeds the range of a 8-bit byte (0x00~0xFF).
Solution with binary interoperability
(Keep scrolling for the ASCII base64 solution)
Source: MDN (2021)
The solution recommended by MDN is to actually encode to and from a binary string representation:
Encoding UTF8 ⇢ binary
// convert a Unicode string to a string in which
// each 16-bit unit occupies only one byte
function toBinary(string) {
const codeUnits = new Uint16Array(string.length);
for (let i = 0; i < codeUnits.length; i++) {
codeUnits[i] = string.charCodeAt(i);
}
return btoa(String.fromCharCode(...new Uint8Array(codeUnits.buffer)));
}
// a string that contains characters occupying > 1 byte
let encoded = toBinary("✓ à la mode") // "EycgAOAAIABsAGEAIABtAG8AZABlAA=="
Decoding binary ⇢ UTF-8
function fromBinary(encoded) {
const binary = atob(encoded);
const bytes = new Uint8Array(binary.length);
for (let i = 0; i < bytes.length; i++) {
bytes[i] = binary.charCodeAt(i);
}
return String.fromCharCode(...new Uint16Array(bytes.buffer));
}
// our previous Base64-encoded string
let decoded = fromBinary(encoded) // "✓ à la mode"
Where this fails a little, is that you'll notice the encoded string EycgAOAAIABsAGEAIABtAG8AZABlAA== no longer matches the previous solution's string 4pyTIMOgIGxhIG1vZGU=. This is because it is a binary encoded string, not a UTF-8 encoded string. If this doesn't matter to you (i.e., you aren't converting strings represented in UTF-8 from another system), then you're good to go. If, however, you want to preserve the UTF-8 functionality, you're better off using the solution described below.
Solution with ASCII base64 interoperability
The entire history of this question shows just how many different ways we've had to work around broken encoding systems over the years. Though the original MDN article no longer exists, this solution is still arguably a better one, and does a great job of solving "The Unicode Problem" while maintaining plain text base64 strings that you can decode on, say, base64decode.org.
There are two possible methods to solve this problem:
the first one is to escape the whole string (with UTF-8, see encodeURIComponent) and then encode it;
the second one is to convert the UTF-16 DOMString to an UTF-8 array of characters and then encode it.
A note on previous solutions: the MDN article originally suggested using unescape and escape to solve the Character Out Of Range exception problem, but they have since been deprecated. Some other answers here have suggested working around this with decodeURIComponent and encodeURIComponent, this has proven to be unreliable and unpredictable. The most recent update to this answer uses modern JavaScript functions to improve speed and modernize code.
If you're trying to save yourself some time, you could also consider using a library:
js-base64 (NPM, great for Node.js)
base64-js
Encoding UTF8 ⇢ base64
function b64EncodeUnicode(str) {
// first we use encodeURIComponent to get percent-encoded UTF-8,
// then we convert the percent encodings into raw bytes which
// can be fed into btoa.
return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g,
function toSolidBytes(match, p1) {
return String.fromCharCode('0x' + p1);
}));
}
b64EncodeUnicode('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64EncodeUnicode('\n'); // "Cg=="
Decoding base64 ⇢ UTF8
function b64DecodeUnicode(str) {
// Going backwards: from bytestream, to percent-encoding, to original string.
return decodeURIComponent(atob(str).split('').map(function(c) {
return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
}).join(''));
}
b64DecodeUnicode('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
b64DecodeUnicode('Cg=='); // "\n"
(Why do we need to do this? ('00' + c.charCodeAt(0).toString(16)).slice(-2) prepends a 0 to single character strings, for example when c == \n, the c.charCodeAt(0).toString(16) returns a, forcing a to be represented as 0a).
TypeScript support
Here's same solution with some additional TypeScript compatibility (via #MA-Maddin):
// Encoding UTF8 ⇢ base64
function b64EncodeUnicode(str) {
return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g, function(match, p1) {
return String.fromCharCode(parseInt(p1, 16))
}))
}
// Decoding base64 ⇢ UTF8
function b64DecodeUnicode(str) {
return decodeURIComponent(Array.prototype.map.call(atob(str), function(c) {
return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2)
}).join(''))
}
The first solution (deprecated)
This used escape and unescape (which are now deprecated, though this still works in all modern browsers):
function utf8_to_b64( str ) {
return window.btoa(unescape(encodeURIComponent( str )));
}
function b64_to_utf8( str ) {
return decodeURIComponent(escape(window.atob( str )));
}
// Usage:
utf8_to_b64('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64_to_utf8('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
And one last thing: I first encountered this problem when calling the GitHub API. To get this to work on (Mobile) Safari properly, I actually had to strip all white space from the base64 source before I could even decode the source. Whether or not this is still relevant in 2021, I don't know:
function b64_to_utf8( str ) {
str = str.replace(/\s/g, '');
return decodeURIComponent(escape(window.atob( str )));
}
Things change. The escape/unescape methods have been deprecated.
You can URI encode the string before you Base64-encode it. Note that this does't produce Base64-encoded UTF8, but rather Base64-encoded URL-encoded data. Both sides must agree on the same encoding.
See working example here: http://codepen.io/anon/pen/PZgbPW
// encode string
var base64 = window.btoa(encodeURIComponent('€ 你好 æøåÆØÅ'));
// decode string
var str = decodeURIComponent(window.atob(tmp));
// str is now === '€ 你好 æøåÆØÅ'
For OP's problem a third party library such as js-base64 should solve the problem.
The complete article that works for me: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Base64_encoding_and_decoding
The part where we encode from Unicode/UTF-8 is
function utf8_to_b64( str ) {
return window.btoa(unescape(encodeURIComponent( str )));
}
function b64_to_utf8( str ) {
return decodeURIComponent(escape(window.atob( str )));
}
// Usage:
utf8_to_b64('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64_to_utf8('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
This is one of the most used methods nowadays.
If treating strings as bytes is more your thing, you can use the following functions
function u_atob(ascii) {
return Uint8Array.from(atob(ascii), c => c.charCodeAt(0));
}
function u_btoa(buffer) {
var binary = [];
var bytes = new Uint8Array(buffer);
for (var i = 0, il = bytes.byteLength; i < il; i++) {
binary.push(String.fromCharCode(bytes[i]));
}
return btoa(binary.join(''));
}
// example, it works also with astral plane characters such as '𝒞'
var encodedString = new TextEncoder().encode('✓');
var base64String = u_btoa(encodedString);
console.log('✓' === new TextDecoder().decode(u_atob(base64String)))
Decoding base64 to UTF8 String
Below is current most voted answer by #brandonscript
function b64DecodeUnicode(str) {
// Going backwards: from bytestream, to percent-encoding, to original string.
return decodeURIComponent(atob(str).split('').map(function(c) {
return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
}).join(''));
}
Above code can work, but it's very slow. If your input is a very large base64 string, for example 30,000 chars for a base64 html document. It will need lots of computation.
Here is my answer, use built-in TextDecoder, nearly 10x faster than above code for large input.
function decodeBase64(base64) {
const text = atob(base64);
const length = text.length;
const bytes = new Uint8Array(length);
for (let i = 0; i < length; i++) {
bytes[i] = text.charCodeAt(i);
}
const decoder = new TextDecoder(); // default is utf-8
return decoder.decode(bytes);
}
Here is 2018 updated solution as described in the Mozilla Development Resources
TO ENCODE FROM UNICODE TO B64
function b64EncodeUnicode(str) {
// first we use encodeURIComponent to get percent-encoded UTF-8,
// then we convert the percent encodings into raw bytes which
// can be fed into btoa.
return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g,
function toSolidBytes(match, p1) {
return String.fromCharCode('0x' + p1);
}));
}
b64EncodeUnicode('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64EncodeUnicode('\n'); // "Cg=="
TO DECODE FROM B64 TO UNICODE
function b64DecodeUnicode(str) {
// Going backwards: from bytestream, to percent-encoding, to original string.
return decodeURIComponent(atob(str).split('').map(function(c) {
return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
}).join(''));
}
b64DecodeUnicode('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
b64DecodeUnicode('Cg=='); // "\n"
I would assume that one might want a solution that produces a widely useable base64 URI. Please visit data:text/plain;charset=utf-8;base64,4pi44pi54pi64pi74pi84pi+4pi/ to see a demonstration (copy the data uri, open a new tab, paste the data URI into the address bar, then press enter to go to the page). Despite the fact that this URI is base64-encoded, the browser is still able to recognize the high code points and decode them properly. The minified encoder+decoder is 1058 bytes (+Gzip→589 bytes)
!function(e){"use strict";function h(b){var a=b.charCodeAt(0);if(55296<=a&&56319>=a)if(b=b.charCodeAt(1),b===b&&56320<=b&&57343>=b){if(a=1024*(a-55296)+b-56320+65536,65535<a)return d(240|a>>>18,128|a>>>12&63,128|a>>>6&63,128|a&63)}else return d(239,191,189);return 127>=a?inputString:2047>=a?d(192|a>>>6,128|a&63):d(224|a>>>12,128|a>>>6&63,128|a&63)}function k(b){var a=b.charCodeAt(0)<<24,f=l(~a),c=0,e=b.length,g="";if(5>f&&e>=f){a=a<<f>>>24+f;for(c=1;c<f;++c)a=a<<6|b.charCodeAt(c)&63;65535>=a?g+=d(a):1114111>=a?(a-=65536,g+=d((a>>10)+55296,(a&1023)+56320)):c=0}for(;c<e;++c)g+="\ufffd";return g}var m=Math.log,n=Math.LN2,l=Math.clz32||function(b){return 31-m(b>>>0)/n|0},d=String.fromCharCode,p=atob,q=btoa;e.btoaUTF8=function(b,a){return q((a?"\u00ef\u00bb\u00bf":"")+b.replace(/[\x80-\uD7ff\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]?/g,h))};e.atobUTF8=function(b,a){a||"\u00ef\u00bb\u00bf"!==b.substring(0,3)||(b=b.substring(3));return p(b).replace(/[\xc0-\xff][\x80-\xbf]*/g,k)}}(""+void 0==typeof global?""+void 0==typeof self?this:self:global)
Below is the source code used to generate it.
var fromCharCode = String.fromCharCode;
var btoaUTF8 = (function(btoa, replacer){"use strict";
return function(inputString, BOMit){
return btoa((BOMit ? "\xEF\xBB\xBF" : "") + inputString.replace(
/[\x80-\uD7ff\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]?/g, replacer
));
}
})(btoa, function(nonAsciiChars){"use strict";
// make the UTF string into a binary UTF-8 encoded string
var point = nonAsciiChars.charCodeAt(0);
if (point >= 0xD800 && point <= 0xDBFF) {
var nextcode = nonAsciiChars.charCodeAt(1);
if (nextcode !== nextcode) // NaN because string is 1 code point long
return fromCharCode(0xef/*11101111*/, 0xbf/*10111111*/, 0xbd/*10111101*/);
// https://mathiasbynens.be/notes/javascript-encoding#surrogate-formulae
if (nextcode >= 0xDC00 && nextcode <= 0xDFFF) {
point = (point - 0xD800) * 0x400 + nextcode - 0xDC00 + 0x10000;
if (point > 0xffff)
return fromCharCode(
(0x1e/*0b11110*/<<3) | (point>>>18),
(0x2/*0b10*/<<6) | ((point>>>12)&0x3f/*0b00111111*/),
(0x2/*0b10*/<<6) | ((point>>>6)&0x3f/*0b00111111*/),
(0x2/*0b10*/<<6) | (point&0x3f/*0b00111111*/)
);
} else return fromCharCode(0xef, 0xbf, 0xbd);
}
if (point <= 0x007f) return nonAsciiChars;
else if (point <= 0x07ff) {
return fromCharCode((0x6<<5)|(point>>>6), (0x2<<6)|(point&0x3f));
} else return fromCharCode(
(0xe/*0b1110*/<<4) | (point>>>12),
(0x2/*0b10*/<<6) | ((point>>>6)&0x3f/*0b00111111*/),
(0x2/*0b10*/<<6) | (point&0x3f/*0b00111111*/)
);
});
Then, to decode the base64 data, either HTTP get the data as a data URI or use the function below.
var clz32 = Math.clz32 || (function(log, LN2){"use strict";
return function(x) {return 31 - log(x >>> 0) / LN2 | 0};
})(Math.log, Math.LN2);
var fromCharCode = String.fromCharCode;
var atobUTF8 = (function(atob, replacer){"use strict";
return function(inputString, keepBOM){
inputString = atob(inputString);
if (!keepBOM && inputString.substring(0,3) === "\xEF\xBB\xBF")
inputString = inputString.substring(3); // eradicate UTF-8 BOM
// 0xc0 => 0b11000000; 0xff => 0b11111111; 0xc0-0xff => 0b11xxxxxx
// 0x80 => 0b10000000; 0xbf => 0b10111111; 0x80-0xbf => 0b10xxxxxx
return inputString.replace(/[\xc0-\xff][\x80-\xbf]*/g, replacer);
}
})(atob, function(encoded){"use strict";
var codePoint = encoded.charCodeAt(0) << 24;
var leadingOnes = clz32(~codePoint);
var endPos = 0, stringLen = encoded.length;
var result = "";
if (leadingOnes < 5 && stringLen >= leadingOnes) {
codePoint = (codePoint<<leadingOnes)>>>(24+leadingOnes);
for (endPos = 1; endPos < leadingOnes; ++endPos)
codePoint = (codePoint<<6) | (encoded.charCodeAt(endPos)&0x3f/*0b00111111*/);
if (codePoint <= 0xFFFF) { // BMP code point
result += fromCharCode(codePoint);
} else if (codePoint <= 0x10FFFF) {
// https://mathiasbynens.be/notes/javascript-encoding#surrogate-formulae
codePoint -= 0x10000;
result += fromCharCode(
(codePoint >> 10) + 0xD800, // highSurrogate
(codePoint & 0x3ff) + 0xDC00 // lowSurrogate
);
} else endPos = 0; // to fill it in with INVALIDs
}
for (; endPos < stringLen; ++endPos) result += "\ufffd"; // replacement character
return result;
});
The advantage of being more standard is that this encoder and this decoder are more widely applicable because they can be used as a valid URL that displays correctly. Observe.
(function(window){
"use strict";
var sourceEle = document.getElementById("source");
var urlBarEle = document.getElementById("urlBar");
var mainFrameEle = document.getElementById("mainframe");
var gotoButton = document.getElementById("gotoButton");
var parseInt = window.parseInt;
var fromCodePoint = String.fromCodePoint;
var parse = JSON.parse;
function unescape(str){
return str.replace(/\\u[\da-f]{0,4}|\\x[\da-f]{0,2}|\\u{[^}]*}|\\[bfnrtv"'\\]|\\0[0-7]{1,3}|\\\d{1,3}/g, function(match){
try{
if (match.startsWith("\\u{"))
return fromCodePoint(parseInt(match.slice(2,-1),16));
if (match.startsWith("\\u") || match.startsWith("\\x"))
return fromCodePoint(parseInt(match.substring(2),16));
if (match.startsWith("\\0") && match.length > 2)
return fromCodePoint(parseInt(match.substring(2),8));
if (/^\\\d/.test(match)) return fromCodePoint(+match.slice(1));
}catch(e){return "\ufffd".repeat(match.length)}
return parse('"' + match + '"');
});
}
function whenChange(){
try{ urlBarEle.value = "data:text/plain;charset=UTF-8;base64," + btoaUTF8(unescape(sourceEle.value), true);
} finally{ gotoURL(); }
}
sourceEle.addEventListener("change",whenChange,{passive:1});
sourceEle.addEventListener("input",whenChange,{passive:1});
// IFrame Setup:
function gotoURL(){mainFrameEle.src = urlBarEle.value}
gotoButton.addEventListener("click", gotoURL, {passive: 1});
function urlChanged(){urlBarEle.value = mainFrameEle.src}
mainFrameEle.addEventListener("load", urlChanged, {passive: 1});
urlBarEle.addEventListener("keypress", function(evt){
if (evt.key === "enter") evt.preventDefault(), urlChanged();
}, {passive: 1});
var fromCharCode = String.fromCharCode;
var btoaUTF8 = (function(btoa, replacer){
"use strict";
return function(inputString, BOMit){
return btoa((BOMit?"\xEF\xBB\xBF":"") + inputString.replace(
/[\x80-\uD7ff\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]?/g, replacer
));
}
})(btoa, function(nonAsciiChars){
"use strict";
// make the UTF string into a binary UTF-8 encoded string
var point = nonAsciiChars.charCodeAt(0);
if (point >= 0xD800 && point <= 0xDBFF) {
var nextcode = nonAsciiChars.charCodeAt(1);
if (nextcode !== nextcode) { // NaN because string is 1code point long
return fromCharCode(0xef/*11101111*/, 0xbf/*10111111*/, 0xbd/*10111101*/);
}
// https://mathiasbynens.be/notes/javascript-encoding#surrogate-formulae
if (nextcode >= 0xDC00 && nextcode <= 0xDFFF) {
point = (point - 0xD800) * 0x400 + nextcode - 0xDC00 + 0x10000;
if (point > 0xffff) {
return fromCharCode(
(0x1e/*0b11110*/<<3) | (point>>>18),
(0x2/*0b10*/<<6) | ((point>>>12)&0x3f/*0b00111111*/),
(0x2/*0b10*/<<6) | ((point>>>6)&0x3f/*0b00111111*/),
(0x2/*0b10*/<<6) | (point&0x3f/*0b00111111*/)
);
}
} else {
return fromCharCode(0xef, 0xbf, 0xbd);
}
}
if (point <= 0x007f) { return inputString; }
else if (point <= 0x07ff) {
return fromCharCode((0x6<<5)|(point>>>6), (0x2<<6)|(point&0x3f/*00111111*/));
} else {
return fromCharCode(
(0xe/*0b1110*/<<4) | (point>>>12),
(0x2/*0b10*/<<6) | ((point>>>6)&0x3f/*0b00111111*/),
(0x2/*0b10*/<<6) | (point&0x3f/*0b00111111*/)
);
}
});
setTimeout(whenChange, 0);
})(window);
img:active{opacity:0.8}
<center>
<textarea id="source" style="width:66.7vw">Hello \u1234 W\186\0256ld!
Enter text into the top box. Then the URL will update automatically.
</textarea><br />
<div style="width:66.7vw;display:inline-block;height:calc(25vw + 1em + 6px);border:2px solid;text-align:left;line-height:1em">
<input id="urlBar" style="width:calc(100% - 1em - 13px)" /><img id="gotoButton" src="" style="width:calc(1em + 4px);line-height:1em;vertical-align:-40%;cursor:pointer" />
<iframe id="mainframe" style="width:66.7vw;height:25vw" frameBorder="0"></iframe>
</div>
</center>
In addition to being very standardized, the above code snippets are also very fast. Instead of an indirect chain of succession where the data has to be converted several times between various forms (such as in Riccardo Galli's response), the above code snippet is as direct as performantly possible. It uses only one simple fast String.prototype.replace call to process the data when encoding, and only one to decode the data when decoding. Another plus is that (especially for big strings), String.prototype.replace allows the browser to automatically handle the underlying memory management of resizing the string, leading a significant performance boost especially in evergreen browsers like Chrome and Firefox that heavily optimize String.prototype.replace. Finally, the icing on the cake is that for you latin script exclūsīvō users, strings which don't contain any code points above 0x7f are extra fast to process because the string remains unmodified by the replacement algorithm.
I have created a github repository for this solution at https://github.com/anonyco/BestBase64EncoderDecoder/
If trying to decode a Base64 representation of utf8 encoded data in node, you can use the native Buffer helper
Buffer.from("4pyTIMOgIGxhIG1vZGU=", "base64").toString(); // '✓ à la mode'
The toString method of Buffer defaults to utf8, but you can specify any desired encoding. For example, the reverse operation would look like this
Buffer.from('✓ à la mode', "utf8").toString("base64"); // "4pyTIMOgIGxhIG1vZGU="
This is my one-liner solution combining Jackie Hans answer and some code from another question:
const utf8_encoded_text = new TextDecoder().decode(Uint8Array.from(window.atob(base_64_decoded_text).split("").map(x => x.charCodeAt(0))));
Here's some future-proof code for browsers that may lack escape/unescape(). Note that IE 9 and older don't support atob/btoa(), so you'd need to use custom base64 functions for them.
// Polyfill for escape/unescape
if( !window.unescape ){
window.unescape = function( s ){
return s.replace( /%([0-9A-F]{2})/g, function( m, p ) {
return String.fromCharCode( '0x' + p );
} );
};
}
if( !window.escape ){
window.escape = function( s ){
var chr, hex, i = 0, l = s.length, out = '';
for( ; i < l; i ++ ){
chr = s.charAt( i );
if( chr.search( /[A-Za-z0-9\#\*\_\+\-\.\/]/ ) > -1 ){
out += chr; continue; }
hex = s.charCodeAt( i ).toString( 16 );
out += '%' + ( hex.length % 2 != 0 ? '0' : '' ) + hex;
}
return out;
};
}
// Base64 encoding of UTF-8 strings
var utf8ToB64 = function( s ){
return btoa( unescape( encodeURIComponent( s ) ) );
};
var b64ToUtf8 = function( s ){
return decodeURIComponent( escape( atob( s ) ) );
};
A more comprehensive example for UTF-8 encoding and decoding can be found here: http://jsfiddle.net/47zwb41o/
Small correction, unescape and escape are deprecated, so:
function utf8_to_b64( str ) {
return window.btoa(decodeURIComponent(encodeURIComponent(str)));
}
function b64_to_utf8( str ) {
return decodeURIComponent(encodeURIComponent(window.atob(str)));
}
function b64_to_utf8( str ) {
str = str.replace(/\s/g, '');
return decodeURIComponent(encodeURIComponent(window.atob(str)));
}
including above solution if still facing issue try as below, Considerign the case where escape is not supported for TS.
blob = new Blob(["\ufeff", csv_content]); // this will make symbols to appears in excel
for csv_content you can try like below.
function b64DecodeUnicode(str: any) {
return decodeURIComponent(atob(str).split('').map((c: any) => {
return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
}).join(''));
}
I'm using the Javascript window.atob() function to decode a base64-encoded string (specifically the base64-encoded content from the GitHub API). Problem is I'm getting ASCII-encoded characters back (like ⢠instead of ™). How can I properly handle the incoming base64-encoded stream so that it's decoded as utf-8?
The Unicode Problem
Though JavaScript (ECMAScript) has matured, the fragility of Base64, ASCII, and Unicode encoding has caused a lot of headache (much of it is in this question's history).
Consider the following example:
const ok = "a";
console.log(ok.codePointAt(0).toString(16)); // 61: occupies < 1 byte
const notOK = "✓"
console.log(notOK.codePointAt(0).toString(16)); // 2713: occupies > 1 byte
console.log(btoa(ok)); // YQ==
console.log(btoa(notOK)); // error
Why do we encounter this?
Base64, by design, expects binary data as its input. In terms of JavaScript strings, this means strings in which each character occupies only one byte. So if you pass a string into btoa() containing characters that occupy more than one byte, you will get an error, because this is not considered binary data.
Source: MDN (2021)
The original MDN article also covered the broken nature of window.btoa and .atob, which have since been mended in modern ECMAScript. The original, now-dead MDN article explained:
The "Unicode Problem"
Since DOMStrings are 16-bit-encoded strings, in most browsers calling window.btoa on a Unicode string will cause a Character Out Of Range exception if a character exceeds the range of a 8-bit byte (0x00~0xFF).
Solution with binary interoperability
(Keep scrolling for the ASCII base64 solution)
Source: MDN (2021)
The solution recommended by MDN is to actually encode to and from a binary string representation:
Encoding UTF8 ⇢ binary
// convert a Unicode string to a string in which
// each 16-bit unit occupies only one byte
function toBinary(string) {
const codeUnits = new Uint16Array(string.length);
for (let i = 0; i < codeUnits.length; i++) {
codeUnits[i] = string.charCodeAt(i);
}
return btoa(String.fromCharCode(...new Uint8Array(codeUnits.buffer)));
}
// a string that contains characters occupying > 1 byte
let encoded = toBinary("✓ à la mode") // "EycgAOAAIABsAGEAIABtAG8AZABlAA=="
Decoding binary ⇢ UTF-8
function fromBinary(encoded) {
const binary = atob(encoded);
const bytes = new Uint8Array(binary.length);
for (let i = 0; i < bytes.length; i++) {
bytes[i] = binary.charCodeAt(i);
}
return String.fromCharCode(...new Uint16Array(bytes.buffer));
}
// our previous Base64-encoded string
let decoded = fromBinary(encoded) // "✓ à la mode"
Where this fails a little, is that you'll notice the encoded string EycgAOAAIABsAGEAIABtAG8AZABlAA== no longer matches the previous solution's string 4pyTIMOgIGxhIG1vZGU=. This is because it is a binary encoded string, not a UTF-8 encoded string. If this doesn't matter to you (i.e., you aren't converting strings represented in UTF-8 from another system), then you're good to go. If, however, you want to preserve the UTF-8 functionality, you're better off using the solution described below.
Solution with ASCII base64 interoperability
The entire history of this question shows just how many different ways we've had to work around broken encoding systems over the years. Though the original MDN article no longer exists, this solution is still arguably a better one, and does a great job of solving "The Unicode Problem" while maintaining plain text base64 strings that you can decode on, say, base64decode.org.
There are two possible methods to solve this problem:
the first one is to escape the whole string (with UTF-8, see encodeURIComponent) and then encode it;
the second one is to convert the UTF-16 DOMString to an UTF-8 array of characters and then encode it.
A note on previous solutions: the MDN article originally suggested using unescape and escape to solve the Character Out Of Range exception problem, but they have since been deprecated. Some other answers here have suggested working around this with decodeURIComponent and encodeURIComponent, this has proven to be unreliable and unpredictable. The most recent update to this answer uses modern JavaScript functions to improve speed and modernize code.
If you're trying to save yourself some time, you could also consider using a library:
js-base64 (NPM, great for Node.js)
base64-js
Encoding UTF8 ⇢ base64
function b64EncodeUnicode(str) {
// first we use encodeURIComponent to get percent-encoded UTF-8,
// then we convert the percent encodings into raw bytes which
// can be fed into btoa.
return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g,
function toSolidBytes(match, p1) {
return String.fromCharCode('0x' + p1);
}));
}
b64EncodeUnicode('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64EncodeUnicode('\n'); // "Cg=="
Decoding base64 ⇢ UTF8
function b64DecodeUnicode(str) {
// Going backwards: from bytestream, to percent-encoding, to original string.
return decodeURIComponent(atob(str).split('').map(function(c) {
return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
}).join(''));
}
b64DecodeUnicode('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
b64DecodeUnicode('Cg=='); // "\n"
(Why do we need to do this? ('00' + c.charCodeAt(0).toString(16)).slice(-2) prepends a 0 to single character strings, for example when c == \n, the c.charCodeAt(0).toString(16) returns a, forcing a to be represented as 0a).
TypeScript support
Here's same solution with some additional TypeScript compatibility (via #MA-Maddin):
// Encoding UTF8 ⇢ base64
function b64EncodeUnicode(str) {
return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g, function(match, p1) {
return String.fromCharCode(parseInt(p1, 16))
}))
}
// Decoding base64 ⇢ UTF8
function b64DecodeUnicode(str) {
return decodeURIComponent(Array.prototype.map.call(atob(str), function(c) {
return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2)
}).join(''))
}
The first solution (deprecated)
This used escape and unescape (which are now deprecated, though this still works in all modern browsers):
function utf8_to_b64( str ) {
return window.btoa(unescape(encodeURIComponent( str )));
}
function b64_to_utf8( str ) {
return decodeURIComponent(escape(window.atob( str )));
}
// Usage:
utf8_to_b64('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64_to_utf8('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
And one last thing: I first encountered this problem when calling the GitHub API. To get this to work on (Mobile) Safari properly, I actually had to strip all white space from the base64 source before I could even decode the source. Whether or not this is still relevant in 2021, I don't know:
function b64_to_utf8( str ) {
str = str.replace(/\s/g, '');
return decodeURIComponent(escape(window.atob( str )));
}
Things change. The escape/unescape methods have been deprecated.
You can URI encode the string before you Base64-encode it. Note that this does't produce Base64-encoded UTF8, but rather Base64-encoded URL-encoded data. Both sides must agree on the same encoding.
See working example here: http://codepen.io/anon/pen/PZgbPW
// encode string
var base64 = window.btoa(encodeURIComponent('€ 你好 æøåÆØÅ'));
// decode string
var str = decodeURIComponent(window.atob(tmp));
// str is now === '€ 你好 æøåÆØÅ'
For OP's problem a third party library such as js-base64 should solve the problem.
The complete article that works for me: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Base64_encoding_and_decoding
The part where we encode from Unicode/UTF-8 is
function utf8_to_b64( str ) {
return window.btoa(unescape(encodeURIComponent( str )));
}
function b64_to_utf8( str ) {
return decodeURIComponent(escape(window.atob( str )));
}
// Usage:
utf8_to_b64('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64_to_utf8('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
This is one of the most used methods nowadays.
If treating strings as bytes is more your thing, you can use the following functions
function u_atob(ascii) {
return Uint8Array.from(atob(ascii), c => c.charCodeAt(0));
}
function u_btoa(buffer) {
var binary = [];
var bytes = new Uint8Array(buffer);
for (var i = 0, il = bytes.byteLength; i < il; i++) {
binary.push(String.fromCharCode(bytes[i]));
}
return btoa(binary.join(''));
}
// example, it works also with astral plane characters such as '𝒞'
var encodedString = new TextEncoder().encode('✓');
var base64String = u_btoa(encodedString);
console.log('✓' === new TextDecoder().decode(u_atob(base64String)))
Decoding base64 to UTF8 String
Below is current most voted answer by #brandonscript
function b64DecodeUnicode(str) {
// Going backwards: from bytestream, to percent-encoding, to original string.
return decodeURIComponent(atob(str).split('').map(function(c) {
return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
}).join(''));
}
Above code can work, but it's very slow. If your input is a very large base64 string, for example 30,000 chars for a base64 html document. It will need lots of computation.
Here is my answer, use built-in TextDecoder, nearly 10x faster than above code for large input.
function decodeBase64(base64) {
const text = atob(base64);
const length = text.length;
const bytes = new Uint8Array(length);
for (let i = 0; i < length; i++) {
bytes[i] = text.charCodeAt(i);
}
const decoder = new TextDecoder(); // default is utf-8
return decoder.decode(bytes);
}
Here is 2018 updated solution as described in the Mozilla Development Resources
TO ENCODE FROM UNICODE TO B64
function b64EncodeUnicode(str) {
// first we use encodeURIComponent to get percent-encoded UTF-8,
// then we convert the percent encodings into raw bytes which
// can be fed into btoa.
return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g,
function toSolidBytes(match, p1) {
return String.fromCharCode('0x' + p1);
}));
}
b64EncodeUnicode('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64EncodeUnicode('\n'); // "Cg=="
TO DECODE FROM B64 TO UNICODE
function b64DecodeUnicode(str) {
// Going backwards: from bytestream, to percent-encoding, to original string.
return decodeURIComponent(atob(str).split('').map(function(c) {
return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
}).join(''));
}
b64DecodeUnicode('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
b64DecodeUnicode('Cg=='); // "\n"
I would assume that one might want a solution that produces a widely useable base64 URI. Please visit data:text/plain;charset=utf-8;base64,4pi44pi54pi64pi74pi84pi+4pi/ to see a demonstration (copy the data uri, open a new tab, paste the data URI into the address bar, then press enter to go to the page). Despite the fact that this URI is base64-encoded, the browser is still able to recognize the high code points and decode them properly. The minified encoder+decoder is 1058 bytes (+Gzip→589 bytes)
!function(e){"use strict";function h(b){var a=b.charCodeAt(0);if(55296<=a&&56319>=a)if(b=b.charCodeAt(1),b===b&&56320<=b&&57343>=b){if(a=1024*(a-55296)+b-56320+65536,65535<a)return d(240|a>>>18,128|a>>>12&63,128|a>>>6&63,128|a&63)}else return d(239,191,189);return 127>=a?inputString:2047>=a?d(192|a>>>6,128|a&63):d(224|a>>>12,128|a>>>6&63,128|a&63)}function k(b){var a=b.charCodeAt(0)<<24,f=l(~a),c=0,e=b.length,g="";if(5>f&&e>=f){a=a<<f>>>24+f;for(c=1;c<f;++c)a=a<<6|b.charCodeAt(c)&63;65535>=a?g+=d(a):1114111>=a?(a-=65536,g+=d((a>>10)+55296,(a&1023)+56320)):c=0}for(;c<e;++c)g+="\ufffd";return g}var m=Math.log,n=Math.LN2,l=Math.clz32||function(b){return 31-m(b>>>0)/n|0},d=String.fromCharCode,p=atob,q=btoa;e.btoaUTF8=function(b,a){return q((a?"\u00ef\u00bb\u00bf":"")+b.replace(/[\x80-\uD7ff\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]?/g,h))};e.atobUTF8=function(b,a){a||"\u00ef\u00bb\u00bf"!==b.substring(0,3)||(b=b.substring(3));return p(b).replace(/[\xc0-\xff][\x80-\xbf]*/g,k)}}(""+void 0==typeof global?""+void 0==typeof self?this:self:global)
Below is the source code used to generate it.
var fromCharCode = String.fromCharCode;
var btoaUTF8 = (function(btoa, replacer){"use strict";
return function(inputString, BOMit){
return btoa((BOMit ? "\xEF\xBB\xBF" : "") + inputString.replace(
/[\x80-\uD7ff\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]?/g, replacer
));
}
})(btoa, function(nonAsciiChars){"use strict";
// make the UTF string into a binary UTF-8 encoded string
var point = nonAsciiChars.charCodeAt(0);
if (point >= 0xD800 && point <= 0xDBFF) {
var nextcode = nonAsciiChars.charCodeAt(1);
if (nextcode !== nextcode) // NaN because string is 1 code point long
return fromCharCode(0xef/*11101111*/, 0xbf/*10111111*/, 0xbd/*10111101*/);
// https://mathiasbynens.be/notes/javascript-encoding#surrogate-formulae
if (nextcode >= 0xDC00 && nextcode <= 0xDFFF) {
point = (point - 0xD800) * 0x400 + nextcode - 0xDC00 + 0x10000;
if (point > 0xffff)
return fromCharCode(
(0x1e/*0b11110*/<<3) | (point>>>18),
(0x2/*0b10*/<<6) | ((point>>>12)&0x3f/*0b00111111*/),
(0x2/*0b10*/<<6) | ((point>>>6)&0x3f/*0b00111111*/),
(0x2/*0b10*/<<6) | (point&0x3f/*0b00111111*/)
);
} else return fromCharCode(0xef, 0xbf, 0xbd);
}
if (point <= 0x007f) return nonAsciiChars;
else if (point <= 0x07ff) {
return fromCharCode((0x6<<5)|(point>>>6), (0x2<<6)|(point&0x3f));
} else return fromCharCode(
(0xe/*0b1110*/<<4) | (point>>>12),
(0x2/*0b10*/<<6) | ((point>>>6)&0x3f/*0b00111111*/),
(0x2/*0b10*/<<6) | (point&0x3f/*0b00111111*/)
);
});
Then, to decode the base64 data, either HTTP get the data as a data URI or use the function below.
var clz32 = Math.clz32 || (function(log, LN2){"use strict";
return function(x) {return 31 - log(x >>> 0) / LN2 | 0};
})(Math.log, Math.LN2);
var fromCharCode = String.fromCharCode;
var atobUTF8 = (function(atob, replacer){"use strict";
return function(inputString, keepBOM){
inputString = atob(inputString);
if (!keepBOM && inputString.substring(0,3) === "\xEF\xBB\xBF")
inputString = inputString.substring(3); // eradicate UTF-8 BOM
// 0xc0 => 0b11000000; 0xff => 0b11111111; 0xc0-0xff => 0b11xxxxxx
// 0x80 => 0b10000000; 0xbf => 0b10111111; 0x80-0xbf => 0b10xxxxxx
return inputString.replace(/[\xc0-\xff][\x80-\xbf]*/g, replacer);
}
})(atob, function(encoded){"use strict";
var codePoint = encoded.charCodeAt(0) << 24;
var leadingOnes = clz32(~codePoint);
var endPos = 0, stringLen = encoded.length;
var result = "";
if (leadingOnes < 5 && stringLen >= leadingOnes) {
codePoint = (codePoint<<leadingOnes)>>>(24+leadingOnes);
for (endPos = 1; endPos < leadingOnes; ++endPos)
codePoint = (codePoint<<6) | (encoded.charCodeAt(endPos)&0x3f/*0b00111111*/);
if (codePoint <= 0xFFFF) { // BMP code point
result += fromCharCode(codePoint);
} else if (codePoint <= 0x10FFFF) {
// https://mathiasbynens.be/notes/javascript-encoding#surrogate-formulae
codePoint -= 0x10000;
result += fromCharCode(
(codePoint >> 10) + 0xD800, // highSurrogate
(codePoint & 0x3ff) + 0xDC00 // lowSurrogate
);
} else endPos = 0; // to fill it in with INVALIDs
}
for (; endPos < stringLen; ++endPos) result += "\ufffd"; // replacement character
return result;
});
The advantage of being more standard is that this encoder and this decoder are more widely applicable because they can be used as a valid URL that displays correctly. Observe.
(function(window){
"use strict";
var sourceEle = document.getElementById("source");
var urlBarEle = document.getElementById("urlBar");
var mainFrameEle = document.getElementById("mainframe");
var gotoButton = document.getElementById("gotoButton");
var parseInt = window.parseInt;
var fromCodePoint = String.fromCodePoint;
var parse = JSON.parse;
function unescape(str){
return str.replace(/\\u[\da-f]{0,4}|\\x[\da-f]{0,2}|\\u{[^}]*}|\\[bfnrtv"'\\]|\\0[0-7]{1,3}|\\\d{1,3}/g, function(match){
try{
if (match.startsWith("\\u{"))
return fromCodePoint(parseInt(match.slice(2,-1),16));
if (match.startsWith("\\u") || match.startsWith("\\x"))
return fromCodePoint(parseInt(match.substring(2),16));
if (match.startsWith("\\0") && match.length > 2)
return fromCodePoint(parseInt(match.substring(2),8));
if (/^\\\d/.test(match)) return fromCodePoint(+match.slice(1));
}catch(e){return "\ufffd".repeat(match.length)}
return parse('"' + match + '"');
});
}
function whenChange(){
try{ urlBarEle.value = "data:text/plain;charset=UTF-8;base64," + btoaUTF8(unescape(sourceEle.value), true);
} finally{ gotoURL(); }
}
sourceEle.addEventListener("change",whenChange,{passive:1});
sourceEle.addEventListener("input",whenChange,{passive:1});
// IFrame Setup:
function gotoURL(){mainFrameEle.src = urlBarEle.value}
gotoButton.addEventListener("click", gotoURL, {passive: 1});
function urlChanged(){urlBarEle.value = mainFrameEle.src}
mainFrameEle.addEventListener("load", urlChanged, {passive: 1});
urlBarEle.addEventListener("keypress", function(evt){
if (evt.key === "enter") evt.preventDefault(), urlChanged();
}, {passive: 1});
var fromCharCode = String.fromCharCode;
var btoaUTF8 = (function(btoa, replacer){
"use strict";
return function(inputString, BOMit){
return btoa((BOMit?"\xEF\xBB\xBF":"") + inputString.replace(
/[\x80-\uD7ff\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]?/g, replacer
));
}
})(btoa, function(nonAsciiChars){
"use strict";
// make the UTF string into a binary UTF-8 encoded string
var point = nonAsciiChars.charCodeAt(0);
if (point >= 0xD800 && point <= 0xDBFF) {
var nextcode = nonAsciiChars.charCodeAt(1);
if (nextcode !== nextcode) { // NaN because string is 1code point long
return fromCharCode(0xef/*11101111*/, 0xbf/*10111111*/, 0xbd/*10111101*/);
}
// https://mathiasbynens.be/notes/javascript-encoding#surrogate-formulae
if (nextcode >= 0xDC00 && nextcode <= 0xDFFF) {
point = (point - 0xD800) * 0x400 + nextcode - 0xDC00 + 0x10000;
if (point > 0xffff) {
return fromCharCode(
(0x1e/*0b11110*/<<3) | (point>>>18),
(0x2/*0b10*/<<6) | ((point>>>12)&0x3f/*0b00111111*/),
(0x2/*0b10*/<<6) | ((point>>>6)&0x3f/*0b00111111*/),
(0x2/*0b10*/<<6) | (point&0x3f/*0b00111111*/)
);
}
} else {
return fromCharCode(0xef, 0xbf, 0xbd);
}
}
if (point <= 0x007f) { return inputString; }
else if (point <= 0x07ff) {
return fromCharCode((0x6<<5)|(point>>>6), (0x2<<6)|(point&0x3f/*00111111*/));
} else {
return fromCharCode(
(0xe/*0b1110*/<<4) | (point>>>12),
(0x2/*0b10*/<<6) | ((point>>>6)&0x3f/*0b00111111*/),
(0x2/*0b10*/<<6) | (point&0x3f/*0b00111111*/)
);
}
});
setTimeout(whenChange, 0);
})(window);
img:active{opacity:0.8}
<center>
<textarea id="source" style="width:66.7vw">Hello \u1234 W\186\0256ld!
Enter text into the top box. Then the URL will update automatically.
</textarea><br />
<div style="width:66.7vw;display:inline-block;height:calc(25vw + 1em + 6px);border:2px solid;text-align:left;line-height:1em">
<input id="urlBar" style="width:calc(100% - 1em - 13px)" /><img id="gotoButton" src="" style="width:calc(1em + 4px);line-height:1em;vertical-align:-40%;cursor:pointer" />
<iframe id="mainframe" style="width:66.7vw;height:25vw" frameBorder="0"></iframe>
</div>
</center>
In addition to being very standardized, the above code snippets are also very fast. Instead of an indirect chain of succession where the data has to be converted several times between various forms (such as in Riccardo Galli's response), the above code snippet is as direct as performantly possible. It uses only one simple fast String.prototype.replace call to process the data when encoding, and only one to decode the data when decoding. Another plus is that (especially for big strings), String.prototype.replace allows the browser to automatically handle the underlying memory management of resizing the string, leading a significant performance boost especially in evergreen browsers like Chrome and Firefox that heavily optimize String.prototype.replace. Finally, the icing on the cake is that for you latin script exclūsīvō users, strings which don't contain any code points above 0x7f are extra fast to process because the string remains unmodified by the replacement algorithm.
I have created a github repository for this solution at https://github.com/anonyco/BestBase64EncoderDecoder/
If trying to decode a Base64 representation of utf8 encoded data in node, you can use the native Buffer helper
Buffer.from("4pyTIMOgIGxhIG1vZGU=", "base64").toString(); // '✓ à la mode'
The toString method of Buffer defaults to utf8, but you can specify any desired encoding. For example, the reverse operation would look like this
Buffer.from('✓ à la mode', "utf8").toString("base64"); // "4pyTIMOgIGxhIG1vZGU="
This is my one-liner solution combining Jackie Hans answer and some code from another question:
const utf8_encoded_text = new TextDecoder().decode(Uint8Array.from(window.atob(base_64_decoded_text).split("").map(x => x.charCodeAt(0))));
Here's some future-proof code for browsers that may lack escape/unescape(). Note that IE 9 and older don't support atob/btoa(), so you'd need to use custom base64 functions for them.
// Polyfill for escape/unescape
if( !window.unescape ){
window.unescape = function( s ){
return s.replace( /%([0-9A-F]{2})/g, function( m, p ) {
return String.fromCharCode( '0x' + p );
} );
};
}
if( !window.escape ){
window.escape = function( s ){
var chr, hex, i = 0, l = s.length, out = '';
for( ; i < l; i ++ ){
chr = s.charAt( i );
if( chr.search( /[A-Za-z0-9\#\*\_\+\-\.\/]/ ) > -1 ){
out += chr; continue; }
hex = s.charCodeAt( i ).toString( 16 );
out += '%' + ( hex.length % 2 != 0 ? '0' : '' ) + hex;
}
return out;
};
}
// Base64 encoding of UTF-8 strings
var utf8ToB64 = function( s ){
return btoa( unescape( encodeURIComponent( s ) ) );
};
var b64ToUtf8 = function( s ){
return decodeURIComponent( escape( atob( s ) ) );
};
A more comprehensive example for UTF-8 encoding and decoding can be found here: http://jsfiddle.net/47zwb41o/
Small correction, unescape and escape are deprecated, so:
function utf8_to_b64( str ) {
return window.btoa(decodeURIComponent(encodeURIComponent(str)));
}
function b64_to_utf8( str ) {
return decodeURIComponent(encodeURIComponent(window.atob(str)));
}
function b64_to_utf8( str ) {
str = str.replace(/\s/g, '');
return decodeURIComponent(encodeURIComponent(window.atob(str)));
}
including above solution if still facing issue try as below, Considerign the case where escape is not supported for TS.
blob = new Blob(["\ufeff", csv_content]); // this will make symbols to appears in excel
for csv_content you can try like below.
function b64DecodeUnicode(str: any) {
return decodeURIComponent(atob(str).split('').map((c: any) => {
return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
}).join(''));
}
How do I convert a 64 bit steam ID to a 32 bit account ID? Steam says to take the first 32 bits of the number, but how do you do this in Node?
Do I need to use BigNumber to store the 64 bit int?
To convert a 64 bit Steam ID to a 32 bit Account ID, you can just subtract 76561197960265728 from the 64 bit id.
This requires bigNumber in node:
bignumber = require("bignumber.js");
console.log(bignumber('76561197991791363').minus('76561197960265728'))
I had the same issue but didn't want to use any library like bignumber.js as my project was quite small and will be used in a web browser. In the end I came up with this elegant solution:
function steamID64toSteamID32 (steamID64) {
return Number(steamID64.substr(-16,16)) - 6561197960265728
}
How it works:
To get the lower 32 bits we need to convert the SteamID64 string to a number, but because JavaScript has a precision limit of 57 bits the SteamID64 will be erroneously rounded. The workaround is to truncate the leftmost digits to get a 16 digit number, which uses at most 54 bits and will therefore retain its precision in Javascript. This is acceptable because the leftmost digits comes from the higher 32 bits which will be zeroed anyway, so nothing of value will be lost.
To zero the remaining higher bits we subtract the decimal number they're representing. If we assume that every SteamID64 we're converting to be in the public universe this decimal number will be constant and can be computed like this:
1. 0b00000001000100000000000000000001 0b00000000000000000000000000000000 = 76561197960265728
2. Number('76561197960265728'.substr(-16,16)) = 6561197960265728
Here's what I came up with. I started learning JavaScript yesterday (coming from a C++ background, not very accustomed to working without types), so correct me if I did something derpy with the language. I tested it with my own steam ID and it seems to work.
// NOTE: Functions can take in a steamID in its packed 64-bit form
// (community ID starting with 765), its modern form with or without
// either or both brackets, and its legacy form. SteamID's that
// contain letters (e.g. STEAM_0... or [U:1...) are not case-sensitive.
// Dependencies: BigInteger library, available from http://silentmatt.com/biginteger/
// Global variable used by all conversion functions
var STEAM_BASELINE = '76561197960265728';
// IN: String containing a steamID in any of the 3 formats
// OUT: String containing the steamID as a community ID (64-bit packed ID)
function ConvertToPacked(inputID)
{
var output = "unknown";
// From packed
if(inputID.match(/^765/) && inputID.length > 15)
{
output = inputID;
}
// From modern
else if(inputID.match(/^\[U:1:/i) || inputID.match(/^U:1:/i))
{
var numericPortion = inputID.replace(/^\[U:1:|^U:1:/i,'').replace(/\]/,'');
output = BigInteger.add(numericPortion, STEAM_BASELINE).toString();
}
// From legacy
else if(inputID.match(/^STEAM_0:[0-1]:/i))
{
var splitID = inputID.split(":");
var product = BigInteger.multiply(splitID[2],2);
var sum = BigInteger.add(product, STEAM_BASELINE);
output = BigInteger.add(sum, splitID[1]).toString();
}
return output;
}
// IN: String containing a steamID in any of the 3 formats
// OUT: String containing the steamID in the modern format (e.g. [U:1:123456])
function ConvertToModern(inputID)
{
var output = "unknown";
// From packed
if(inputID.match(/^765/) && inputID.length > 15)
{
output = "[U:1:" + BigInteger.subtract(inputID, STEAM_BASELINE).toString() + "]";
}
// From modern
else if(inputID.match(/^\[U:1:/i) || inputID.match(/^U:1:/i))
{
var numericPortion = inputID.replace(/^\[U:1:|^U:1:/i,'').replace(/\]/,'');
output = "[U:1:" + numericPortion + "]";
}
// From legacy
else if(inputID.match(/^STEAM_0:[0-1]:/i))
{
var splitID = inputID.split(":");
var numeric = BigInteger.add(BigInteger.multiply(splitID[2],2), splitID[1]);
output = "[U:1:" + numeric.toString() + "]";
}
return output;
}
// IN: String containing a steamID in any of the 3 formats
// OUT: String containing the steamID in the legacy format (e.g. STEAM_0:0:123456)
function ConvertToLegacy(inputID)
{
var output = "unknown"
// From packed
if(inputID.match(/^765/) && inputID.length > 15)
{
var z = BigInteger.divide(BigInteger.subtract(inputID, STEAM_BASELINE), 2);
var y = BigInteger.remainder(inputID, 2);
output = 'STEAM_0:' + y.toString() + ':' + z.toString();
}
// From modern
else if(inputID.match(/^\[U:1:/i) || inputID.match(/^U:1:/i))
{
var numericPortion = inputID.replace(/^\[U:1:|^U:1:/i,'').replace(/\]/,'');
var z = BigInteger.divide(numericPortion, 2);
var y = BigInteger.remainder(numericPortion, 2);
output = 'STEAM_0:' + y.toString() + ':' + z.toString();
}
// From legacy
else if(inputID.match(/^STEAM_0:[0-1]:/i))
{
output = inputID.toUpperCase();
}
return output;
}
Since localStorage (currently) only supports strings as values, and in order to do that the objects need to be stringified (stored as JSON-string) before they can be stored, is there a defined limitation regarding the length of the values.
Does anyone know if there is a definition which applies to all browsers?
Quoting from the Wikipedia article on Web Storage:
Web storage can be viewed simplistically as an improvement on cookies, providing much greater storage capacity (10 MB per origin in Google Chrome(https://plus.google.com/u/0/+FrancoisBeaufort/posts/S5Q9HqDB8bh), Mozilla Firefox, and Opera; 10 MB per storage area in Internet Explorer) and better programmatic interfaces.
And also quoting from a John Resig article [posted January 2007]:
Storage Space
It is implied that, with DOM Storage,
you have considerably more storage
space than the typical user agent
limitations imposed upon Cookies.
However, the amount that is provided
is not defined in the specification,
nor is it meaningfully broadcast by
the user agent.
If you look at the Mozilla source code
we can see that 5120KB is the default
storage size for an entire domain.
This gives you considerably more space
to work with than a typical 2KB
cookie.
However, the size of this storage area
can be customized by the user (so a
5MB storage area is not guaranteed,
nor is it implied) and the user agent
(Opera, for example, may only provide
3MB - but only time will tell.)
Actually Opera doesn't have 5MB limit. It offers to increase limit as applications requires more. User can even choose "Unlimited storage" for a domain.
You can easily test localStorage limits/quota yourself.
Here's a straightforward script for finding out the limit:
if (localStorage && !localStorage.getItem('size')) {
var i = 0;
try {
// Test up to 10 MB
for (i = 250; i <= 10000; i += 250) {
localStorage.setItem('test', new Array((i * 1024) + 1).join('a'));
}
} catch (e) {
localStorage.removeItem('test');
localStorage.setItem('size', i - 250);
}
}
Here's the gist, JSFiddle and blog post.
The script will test setting increasingly larger strings of text until the browser throws and exception. At that point it’ll clear out the test data and set a size key in localStorage storing the size in kilobytes.
Find the maximum length of a single string that can be stored in localStorage
This snippet will find the maximum length of a String that can be stored in localStorage per domain.
//Clear localStorage
for (var item in localStorage) delete localStorage[item];
window.result = window.result || document.getElementById('result');
result.textContent = 'Test running…';
//Start test
//Defer running so DOM can be updated with "test running" message
setTimeout(function () {
//Variables
var low = 0,
high = 2e9,
half;
//Two billion may be a little low as a starting point, so increase if necessary
while (canStore(high)) high *= 2;
//Keep refining until low and high are equal
while (low !== high) {
half = Math.floor((high - low) / 2 + low);
//Check if we can't scale down any further
if (low === half || high === half) {
console.info(low, high, half);
//Set low to the maximum possible amount that can be stored
low = canStore(high) ? high : low;
high = low;
break;
}
//Check if the maximum storage is no higher than half
if (storageMaxBetween(low, half)) {
high = half;
//The only other possibility is that it's higher than half but not higher than "high"
} else {
low = half + 1;
}
}
//Show the result we found!
result.innerHTML = 'The maximum length of a string that can be stored in localStorage is <strong>' + low + '</strong> characters.';
//Functions
function canStore(strLen) {
try {
delete localStorage.foo;
localStorage.foo = Array(strLen + 1).join('A');
return true;
} catch (ex) {
return false;
}
}
function storageMaxBetween(low, high) {
return canStore(low) && !canStore(high);
}
}, 0);
<h1>LocalStorage single value max length test</h1>
<div id='result'>Please enable JavaScript</div>
Note that the length of a string is limited in JavaScript; if you want to view the maximum amount of data that can be stored in localStorage when not limited to a single string, you can use the code in this answer.
Edit: Stack Snippets don't support localStorage, so here is a link to JSFiddle.
Results
Chrome (45.0.2454.101): 5242878 characters
Firefox (40.0.1): 5242883 characters
Internet Explorer (11.0.9600.18036): 16386 122066 122070 characters
I get different results on each run in Internet Explorer.
Don't assume 5MB is available - localStorage capacity varies by browser, with 2.5MB, 5MB and unlimited being the most common values.
Source: http://dev-test.nemikor.com/web-storage/support-test/
I wrote this simple code that is testing localStorage size in bytes.
https://github.com/gkucmierz/Test-of-localStorage-limits-quota
const check = bytes => {
try {
localStorage.clear();
localStorage.setItem('a', '0'.repeat(bytes));
localStorage.clear();
return true;
} catch(e) {
localStorage.clear();
return false;
}
};
Github pages:
https://gkucmierz.github.io/Test-of-localStorage-limits-quota/
I have the same results on desktop Google chrome, opera, firefox, brave and mobile chrome which is ~10Mbytes
And half smaller result in safari ~4Mbytes
You don't want to stringify large objects into a single localStorage entry. That would be very inefficient - the whole thing would have to be parsed and re-encoded every time some slight detail changes. Also, JSON can't handle multiple cross references within an object structure and wipes out a lot of details, e.g. the constructor, non-numerical properties of arrays, what's in a sparse entry, etc.
Instead, you can use Rhaboo. It stores large objects using lots of localStorage entries so you can make small changes quickly. The restored objects are much more accurate copies of the saved ones and the API is incredibly simple. E.g.:
var store = Rhaboo.persistent('Some name');
store.write('count', store.count ? store.count+1 : 1);
store.write('somethingfancy', {
one: ['man', 'went'],
2: 'mow',
went: [ 2, { mow: ['a', 'meadow' ] }, {} ]
});
store.somethingfancy.went[1].mow.write(1, 'lawn');
BTW, I wrote it.
I've condensed a binary test into this function that I use:
function getStorageTotalSize(upperLimit/*in bytes*/) {
var store = localStorage, testkey = "$_test"; // (NOTE: Test key is part of the storage!!! It should also be an even number of characters)
var test = function (_size) { try { store.removeItem(testkey); store.setItem(testkey, new Array(_size + 1).join('0')); } catch (_ex) { return false; } return true; }
var backup = {};
for (var i = 0, n = store.length; i < n; ++i) backup[store.key(i)] = store.getItem(store.key(i));
store.clear(); // (you could iterate over the items and backup first then restore later)
var low = 0, high = 1, _upperLimit = (upperLimit || 1024 * 1024 * 1024) / 2, upperTest = true;
while ((upperTest = test(high)) && high < _upperLimit) { low = high; high *= 2; }
if (!upperTest) {
var half = ~~((high - low + 1) / 2); // (~~ is a faster Math.floor())
high -= half;
while (half > 0) high += (half = ~~(half / 2)) * (test(high) ? 1 : -1);
high = testkey.length + high;
}
if (high > _upperLimit) high = _upperLimit;
store.removeItem(testkey);
for (var p in backup) store.setItem(p, backup[p]);
return high * 2; // (*2 because of Unicode storage)
}
It also backs up the contents before testing, then restores them.
How it works: It doubles the size until the limit is reached or the test fails. It then stores half the distance between low and high and subtracts/adds a half of the half each time (subtract on failure and add on success); honing into the proper value.
upperLimit is 1GB by default, and just limits how far upwards to scan exponentially before starting the binary search. I doubt this will even need to be changed, but I'm always thinking ahead. ;)
On Chrome:
> getStorageTotalSize();
> 10485762
> 10485762/2
> 5242881
> localStorage.setItem("a", new Array(5242880).join("0")) // works
> localStorage.setItem("a", new Array(5242881).join("0")) // fails ('a' takes one spot [2 bytes])
IE11, Edge, and FireFox also report the same max size (10485762 bytes).
You can use the following code in modern browsers to efficiently check the storage quota (total & used) in real-time:
if ('storage' in navigator && 'estimate' in navigator.storage) {
navigator.storage.estimate()
.then(estimate => {
console.log("Usage (in Bytes): ", estimate.usage,
", Total Quota (in Bytes): ", estimate.quota);
});
}
I'm doing the following:
getLocalStorageSizeLimit = function () {
var maxLength = Math.pow(2,24);
var preLength = 0;
var hugeString = "0";
var testString;
var keyName = "testingLengthKey";
//2^24 = 16777216 should be enough to all browsers
testString = (new Array(Math.pow(2, 24))).join("X");
while (maxLength !== preLength) {
try {
localStorage.setItem(keyName, testString);
preLength = testString.length;
maxLength = Math.ceil(preLength + ((hugeString.length - preLength) / 2));
testString = hugeString.substr(0, maxLength);
} catch (e) {
hugeString = testString;
maxLength = Math.floor(testString.length - (testString.length - preLength) / 2);
testString = hugeString.substr(0, maxLength);
}
}
localStorage.removeItem(keyName);
// Original used this.storageObject in place of localStorage. I can only guess the goal is to check the size of the localStorage with everything but the testString given that maxLength is then added.
maxLength = JSON.stringify(localStorage).length + maxLength + keyName.length - 2;
return maxLength;
};
I really like cdmckay's answer, but it does not really look good to check the size in a real time: it is just too slow (2 seconds for me). This is the improved version, which is way faster and more exact, also with an option to choose how big the error can be (default 250,000, the smaller error is - the longer the calculation is):
function getLocalStorageMaxSize(error) {
if (localStorage) {
var max = 10 * 1024 * 1024,
i = 64,
string1024 = '',
string = '',
// generate a random key
testKey = 'size-test-' + Math.random().toString(),
minimalFound = 0,
error = error || 25e4;
// fill a string with 1024 symbols / bytes
while (i--) string1024 += 1e16;
i = max / 1024;
// fill a string with 'max' amount of symbols / bytes
while (i--) string += string1024;
i = max;
// binary search implementation
while (i > 1) {
try {
localStorage.setItem(testKey, string.substr(0, i));
localStorage.removeItem(testKey);
if (minimalFound < i - error) {
minimalFound = i;
i = i * 1.5;
}
else break;
} catch (e) {
localStorage.removeItem(testKey);
i = minimalFound + (i - minimalFound) / 2;
}
}
return minimalFound;
}
}
To test:
console.log(getLocalStorageMaxSize()); // takes .3s
console.log(getLocalStorageMaxSize(.1)); // takes 2s, but way more exact
This works dramatically faster for the standard error; also it can be much more exact when necessary.
Once I developed Chrome (desktop browser) extension and tested Local Storage real max size for this reason.
My results:
Ubuntu 18.04.1 LTS (64-bit)
Chrome 71.0.3578.98 (Official Build) (64-bit)
Local Storage content size 10240 KB (10 MB)
More than 10240 KB usage returned me the error:
Uncaught DOMException: Failed to execute 'setItem' on 'Storage': Setting the value of 'notes' exceeded the quota.
Edit on Oct 23, 2020
For a Chrome extensions available chrome.storage API. If you declare the "storage" permission in manifest.js:
{
"name": "My extension",
...
"permissions": ["storage"],
...
}
You can access it like this:
chrome.storage.local.QUOTA_BYTES // 5242880 (in bytes)
This question already has answers here:
How can you encode a string to Base64 in JavaScript?
(33 answers)
Closed 1 year ago.
Are there any methods in JavaScript that could be used to encode and decode a string using base64 encoding?
Some browsers such as Firefox, Chrome, Safari, Opera and IE10+ can handle Base64 natively. Take a look at this Stackoverflow question. It's using btoa() and atob() functions.
For server-side JavaScript (Node), you can use Buffers to decode.
If you are going for a cross-browser solution, there are existing libraries like CryptoJS or code like:
http://ntt.cc/2008/01/19/base64-encoder-decoder-with-javascript.html (Archive)
With the latter, you need to thoroughly test the function for cross browser compatibility. And error has already been reported.
Internet Explorer 10+
// Define the string
var string = 'Hello World!';
// Encode the String
var encodedString = btoa(string);
console.log(encodedString); // Outputs: "SGVsbG8gV29ybGQh"
// Decode the String
var decodedString = atob(encodedString);
console.log(decodedString); // Outputs: "Hello World!"
Cross-Browser
Re-written and modularized UTF-8 and Base64 Javascript Encoding and Decoding Libraries / Modules for AMD, CommonJS, Nodejs and Browsers. Cross-browser compatible.
with Node.js
Here is how you encode normal text to base64 in Node.js:
//Buffer() requires a number, array or string as the first parameter, and an optional encoding type as the second parameter.
// Default is utf8, possible encoding types are ascii, utf8, ucs2, base64, binary, and hex
var b = Buffer.from('JavaScript');
// If we don't use toString(), JavaScript assumes we want to convert the object to utf8.
// We can make it convert to other formats by passing the encoding type to toString().
var s = b.toString('base64');
And here is how you decode base64 encoded strings:
var b = Buffer.from('SmF2YVNjcmlwdA==', 'base64')
var s = b.toString();
with Dojo.js
To encode an array of bytes using dojox.encoding.base64:
var str = dojox.encoding.base64.encode(myByteArray);
To decode a base64-encoded string:
var bytes = dojox.encoding.base64.decode(str)
bower install angular-base64
<script src="bower_components/angular-base64/angular-base64.js"></script>
angular
.module('myApp', ['base64'])
.controller('myController', [
'$base64', '$scope',
function($base64, $scope) {
$scope.encoded = $base64.encode('a string');
$scope.decoded = $base64.decode('YSBzdHJpbmc=');
}]);
But How?
If you would like to learn more about how base64 is encoded in general, and in JavaScript in-particular, I would recommend this article: Computer science in JavaScript: Base64 encoding
In Gecko/WebKit-based browsers (Firefox, Chrome and Safari) and Opera, you can use btoa() and atob().
Original answer: How can you encode a string to Base64 in JavaScript?
Here is a tightened up version of Sniper's post. It presumes well formed base64 string with no carriage returns. This version eliminates a couple of loops, adds the &0xff fix from Yaroslav, eliminates trailing nulls, plus a bit of code golf.
decodeBase64 = function(s) {
var e={},i,b=0,c,x,l=0,a,r='',w=String.fromCharCode,L=s.length;
var A="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
for(i=0;i<64;i++){e[A.charAt(i)]=i;}
for(x=0;x<L;x++){
c=e[s.charAt(x)];b=(b<<6)+c;l+=6;
while(l>=8){((a=(b>>>(l-=8))&0xff)||(x<(L-2)))&&(r+=w(a));}
}
return r;
};
Short and fast Base64 JavaScript Decode Function without Failsafe:
function decode_base64 (s)
{
var e = {}, i, k, v = [], r = '', w = String.fromCharCode;
var n = [[65, 91], [97, 123], [48, 58], [43, 44], [47, 48]];
for (z in n)
{
for (i = n[z][0]; i < n[z][1]; i++)
{
v.push(w(i));
}
}
for (i = 0; i < 64; i++)
{
e[v[i]] = i;
}
for (i = 0; i < s.length; i+=72)
{
var b = 0, c, x, l = 0, o = s.substring(i, i+72);
for (x = 0; x < o.length; x++)
{
c = e[o.charAt(x)];
b = (b << 6) + c;
l += 6;
while (l >= 8)
{
r += w((b >>> (l -= 8)) % 256);
}
}
}
return r;
}
function b64_to_utf8( str ) {
return decodeURIComponent(escape(window.atob( str )));
}
https://developer.mozilla.org/en-US/docs/Web/API/WindowBase64/Base64_encoding_and_decoding#The_.22Unicode_Problem.22
Modern browsers have built-in javascript functions for Base64 encoding btoa() and decoding atob(). More info about support in older browser versions: https://caniuse.com/?search=atob
However, be aware that atob and btoa functions work only for ASCII charset.
If you need Base64 functions for UTF-8 charset, you can do it with:
function base64_encode(s) {
return btoa(unescape(encodeURIComponent(s)));
}
function base64_decode(s) {
return decodeURIComponent(escape(atob(s)));
}
Did someone say code golf? =)
The following is my attempt at improving my handicap while catching up with the times. Supplied for your convenience.
function decode_base64(s) {
var b=l=0, r='',
m='ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
s.split('').forEach(function (v) {
b=(b<<6)+m.indexOf(v); l+=6;
if (l>=8) r+=String.fromCharCode((b>>>(l-=8))&0xff);
});
return r;
}
What I was actually after was an asynchronous implementation and to my surprise it turns out forEach as opposed to JQuery's $([]).each method implementation is very much synchronous.
If you also had such crazy notions in mind a 0 delay window.setTimeout will run the base64 decode asynchronously and execute the callback function with the result when done.
function decode_base64_async(s, cb) {
setTimeout(function () { cb(decode_base64(s)); }, 0);
}
#Toothbrush suggested "index a string like an array", and get rid of the split. This routine seems really odd and not sure how compatible it will be, but it does hit another birdie so lets have it.
function decode_base64(s) {
var b=l=0, r='',
m='ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
[].forEach.call(s, function (v) {
b=(b<<6)+m.indexOf(v); l+=6;
if (l>=8) r+=String.fromCharCode((b>>>(l-=8))&0xff);
});
return r;
}
While trying to find more information on JavaScript string as array I stumbled on this pro tip using a /./g regex to step through a string. This reduces the code size even more by replacing the string in place and eliminating the need of keeping a return variable.
function decode_base64(s) {
var b=l=0,
m='ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
return s.replace(/./g, function (v) {
b=(b<<6)+m.indexOf(v); l+=6;
return l<8?'':String.fromCharCode((b>>>(l-=8))&0xff);
});
}
If however you were looking for something a little more traditional perhaps the following is more to your taste.
function decode_base64(s) {
var b=l=0, r='', s=s.split(''), i,
m='ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
for (i in s) {
b=(b<<6)+m.indexOf(s[i]); l+=6;
if (l>=8) r+=String.fromCharCode((b>>>(l-=8))&0xff);
}
return r;
}
I didn't have the trailing null issue so this was removed to remain under par but it should easily be resolved with a trim() or a trimRight() if you'd prefer, should this pose a problem for you.
ie.
return r.trimRight();
Note:
The result is an ascii byte string, if you need unicode the easiest is to escape the byte string which can then be decoded with decodeURIComponent to produce the unicode string.
function decode_base64_usc(s) {
return decodeURIComponent(escape(decode_base64(s)));
}
Since escape is being deprecated we could change our function to support unicode directly without the need for escape or String.fromCharCode we can produce a % escaped string ready for URI decoding.
function decode_base64(s) {
var b=l=0,
m='ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
return decodeURIComponent(s.replace(/./g, function (v) {
b=(b<<6)+m.indexOf(v); l+=6;
return l<8?'':'%'+(0x100+((b>>>(l-=8))&0xff)).toString(16).slice(-2);
}));
}
Edit for #Charles Byrne:
Can't remember why we didn't ignore the '=' padding characters, might've worked with a specification that didn't require them at the time. If we were to modify the decodeURIComponent routine to ignore these, as we should since they do not represent any data, the result decodes the example correctly.
function decode_base64(s) {
var b=l=0,
m='ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
return decodeURIComponent(s.replace(/=*$/,'').replace(/./g, function (v) {
b=(b<<6)+m.indexOf(v); l+=6;
return l<8?'':'%'+(0x100+((b>>>(l-=8))&0xff)).toString(16).slice(-2);
}));
}
Now calling decode_base64('4pyTIMOgIGxhIG1vZGU=') will return the encoded string '✓ à la mode', without any errors.
Since '=' is reserved as padding character I can reduce my code golf handicap, if I may:
function decode_base64(s) {
var b=l=0,
m='ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
return decodeURIComponent(s.replace(/./g, function (v) {
b=(b<<6)+m.indexOf(v); l+=6;
return l<8||'='==v?'':'%'+(0x100+((b>>>(l-=8))&0xff)).toString(16).slice(-2);
}));
}
nJoy!
The php.js project has JavaScript implementations of many of PHP's functions. base64_encode and base64_decode are included.
For what it's worth, I got inspired by the other answers and wrote a small utility which calls the platform specific APIs to be used universally from either Node.js or a browser:
/**
* Encode a string of text as base64
*
* #param data The string of text.
* #returns The base64 encoded string.
*/
function encodeBase64(data: string) {
if (typeof btoa === "function") {
return btoa(data);
} else if (typeof Buffer === "function") {
return Buffer.from(data, "utf-8").toString("base64");
} else {
throw new Error("Failed to determine the platform specific encoder");
}
}
/**
* Decode a string of base64 as text
*
* #param data The string of base64 encoded text
* #returns The decoded text.
*/
function decodeBase64(data: string) {
if (typeof atob === "function") {
return atob(data);
} else if (typeof Buffer === "function") {
return Buffer.from(data, "base64").toString("utf-8");
} else {
throw new Error("Failed to determine the platform specific decoder");
}
}
I have tried the Javascript routines at phpjs.org and they have worked well.
I first tried the routines suggested in the chosen answer by Ranhiru Cooray - http://ntt.cc/2008/01/19/base64-encoder-decoder-with-javascript.html
I found that they did not work in all circumstances. I wrote up a test case where these routines fail and posted them to GitHub at:
https://github.com/scottcarter/base64_javascript_test_data.git
I also posted a comment to the blog post at ntt.cc to alert the author (awaiting moderation - the article is old so not sure if comment will get posted).
Frontend: Good solutions above, but quickly for the backend...
NodeJS - no deprecation
Use Buffer.from.
> inBase64 = Buffer.from('plain').toString('base64')
'cGxhaW4='
> // DEPRECATED //
> new Buffer(inBase64, 'base64').toString()
'plain'
> (node:1188987) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
// Works //
> Buffer.from(inBase64, 'base64').toString()
'plain'
In Node.js we can do it in simple way
var base64 = 'SGVsbG8gV29ybGQ='
var base64_decode = new Buffer(base64, 'base64').toString('ascii');
console.log(base64_decode); // "Hello World"
I'd rather use the bas64 encode/decode methods from CryptoJS, the most popular library for standard and secure cryptographic algorithms implemented in JavaScript using best practices and patterns.
For JavaScripts frameworks where there is no atob method and in case you do not want to import external libraries, this is short function that does it.
It would get a string that contains Base64 encoded value and will return a decoded array of bytes (where the array of bytes is represented as array of numbers where each number is an integer between 0 and 255 inclusive).
function fromBase64String(str) {
var alpha =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
var value = [];
var index = 0;
var destIndex = 0;
var padding = false;
while (true) {
var first = getNextChr(str, index, padding, alpha);
var second = getNextChr(str, first .nextIndex, first .padding, alpha);
var third = getNextChr(str, second.nextIndex, second.padding, alpha);
var fourth = getNextChr(str, third .nextIndex, third .padding, alpha);
index = fourth.nextIndex;
padding = fourth.padding;
// ffffffss sssstttt ttffffff
var base64_first = first.code == null ? 0 : first.code;
var base64_second = second.code == null ? 0 : second.code;
var base64_third = third.code == null ? 0 : third.code;
var base64_fourth = fourth.code == null ? 0 : fourth.code;
var a = (( base64_first << 2) & 0xFC ) | ((base64_second>>4) & 0x03);
var b = (( base64_second<< 4) & 0xF0 ) | ((base64_third >>2) & 0x0F);
var c = (( base64_third << 6) & 0xC0 ) | ((base64_fourth>>0) & 0x3F);
value [destIndex++] = a;
if (!third.padding) {
value [destIndex++] = b;
} else {
break;
}
if (!fourth.padding) {
value [destIndex++] = c;
} else {
break;
}
if (index >= str.length) {
break;
}
}
return value;
}
function getNextChr(str, index, equalSignReceived, alpha) {
var chr = null;
var code = 0;
var padding = equalSignReceived;
while (index < str.length) {
chr = str.charAt(index);
if (chr == " " || chr == "\r" || chr == "\n" || chr == "\t") {
index++;
continue;
}
if (chr == "=") {
padding = true;
} else {
if (equalSignReceived) {
throw new Error("Invalid Base64 Endcoding character \""
+ chr + "\" with code " + str.charCodeAt(index)
+ " on position " + index
+ " received afer an equal sign (=) padding "
+ "character has already been received. "
+ "The equal sign padding character is the only "
+ "possible padding character at the end.");
}
code = alpha.indexOf(chr);
if (code == -1) {
throw new Error("Invalid Base64 Encoding character \""
+ chr + "\" with code " + str.charCodeAt(index)
+ " on position " + index + ".");
}
}
break;
}
return { character: chr, code: code, padding: padding, nextIndex: ++index};
}
Resources used: RFC-4648 Section 4
Base64 Win-1251 decoding for encodings other than acsi or iso-8859-1.
As it turned out, all the scripts I saw here convert Cyrillic Base64 to iso-8859-1 encoding. It is strange that no one noticed this.
Thus, to restore the Cyrillic alphabet, it is enough to do an additional transcoding of the text from iso-8859-1 to windows-1251.
I think that with other languages, it will be the same. Just change Cyrilic windows-1251 to yours.
... and Thanks to Der Hochstapler for his code i'm take from his comment ... of over comment, which is somewhat unusual.
code for JScript (for Windows desktop only) (ActiveXObject) - 1251 file encoding
decode_base64=function(f){var g={},b=65,d=0,a,c=0,h,e="",k=String.fromCharCode,l=f.length;for(a="";91>b;)a+=k(b++);a+=a.toLowerCase()+"0123456789+/";for(b=0;64>b;b++)g[a.charAt(b)]=b;for(a=0;a<l;a++)for(b=g[f.charAt(a)],d=(d<<6)+b,c+=6;8<=c;)((h=d>>>(c-=8)&255)||a<l-2)&&(e+=k(h));return e};
sDOS2Win = function(sText, bInsideOut) {
var aCharsets = ["iso-8859-1", "windows-1251"];
sText += "";
bInsideOut = bInsideOut ? 1 : 0;
with (new ActiveXObject("ADODB.Stream")) { //http://www.w3schools.com/ado/ado_ref_stream.asp
type = 2; //Binary 1, Text 2 (default)
mode = 3; //Permissions have not been set 0, Read-only 1, Write-only 2, Read-write 3,
//Prevent other read 4, Prevent other write 8, Prevent other open 12, Allow others all 16
charset = aCharsets[bInsideOut];
open();
writeText(sText);
position = 0;
charset = aCharsets[1 - bInsideOut];
return readText();
}
}
var base64='0PPx8ero5SDh8+ru4uroIQ=='
text = sDOS2Win(decode_base64(base64), false );
WScript.Echo(text)
var x=WScript.StdIn.ReadLine();