Converting between strings and ArrayBuffers - javascript

Is there a commonly accepted technique for efficiently converting JavaScript strings to ArrayBuffers and vice-versa? Specifically, I'd like to be able to write the contents of an ArrayBuffer to localStorage and then read it back.

Update 2016 - five years on there are now new methods in the specs (see support below) to convert between strings and typed arrays using proper encoding.
TextEncoder
The TextEncoder represents:
The TextEncoder interface represents an encoder for a specific method,
that is a specific character encoding, like utf-8, iso-8859-2, koi8,
cp1261, gbk, ... An encoder takes a stream of code points as input and
emits a stream of bytes.
Change note since the above was written: (ibid.)
Note: Firefox, Chrome and Opera used to have support for encoding
types other than utf-8 (such as utf-16, iso-8859-2, koi8, cp1261, and
gbk). As of Firefox 48 [...], Chrome 54 [...] and Opera 41, no
other encoding types are available other than utf-8, in order to match
the spec.*
*) Updated specs (W3) and here (whatwg).
After creating an instance of the TextEncoder it will take a string and encode it using a given encoding parameter:
if (!("TextEncoder" in window))
alert("Sorry, this browser does not support TextEncoder...");
var enc = new TextEncoder(); // always utf-8
console.log(enc.encode("This is a string converted to a Uint8Array"));
You then of course use the .buffer parameter on the resulting Uint8Array to convert the underlaying ArrayBuffer to a different view if needed.
Just make sure that the characters in the string adhere to the encoding schema, for example, if you use characters outside the UTF-8 range in the example they will be encoded to two bytes instead of one.
For general use you would use UTF-16 encoding for things like localStorage.
TextDecoder
Likewise, the opposite process uses the TextDecoder:
The TextDecoder interface represents a decoder for a specific method,
that is a specific character encoding, like utf-8, iso-8859-2, koi8,
cp1261, gbk, ... A decoder takes a stream of bytes as input and emits
a stream of code points.
All available decoding types can be found here.
if (!("TextDecoder" in window))
alert("Sorry, this browser does not support TextDecoder...");
var enc = new TextDecoder("utf-8");
var arr = new Uint8Array([84,104,105,115,32,105,115,32,97,32,85,105,110,116,
56,65,114,114,97,121,32,99,111,110,118,101,114,116,
101,100,32,116,111,32,97,32,115,116,114,105,110,103]);
console.log(enc.decode(arr));
The MDN StringView library
An alternative to these is to use the StringView library (licensed as lgpl-3.0) which goal is:
to create a C-like interface for strings (i.e., an array of character codes — an ArrayBufferView in JavaScript) based upon the
JavaScript ArrayBuffer interface
to create a highly extensible library that anyone can extend by adding methods to the object StringView.prototype
to create a collection of methods for such string-like objects (since now: stringViews) which work strictly on arrays of numbers
rather than on creating new immutable JavaScript strings
to work with Unicode encodings other than JavaScript's default UTF-16 DOMStrings
giving much more flexibility. However, it would require us to link to or embed this library while TextEncoder/TextDecoder is being built-in in modern browsers.
Support
As of July/2018:
TextEncoder (Experimental, On Standard Track)
Chrome | Edge | Firefox | IE | Opera | Safari
----------|-----------|-----------|-----------|-----------|-----------
38 | ? | 19° | - | 25 | -
Chrome/A | Edge/mob | Firefox/A | Opera/A |Safari/iOS | Webview/A
----------|-----------|-----------|-----------|-----------|-----------
38 | ? | 19° | ? | - | 38
°) 18: Firefox 18 implemented an earlier and slightly different version
of the specification.
WEB WORKER SUPPORT:
Experimental, On Standard Track
Chrome | Edge | Firefox | IE | Opera | Safari
----------|-----------|-----------|-----------|-----------|-----------
38 | ? | 20 | - | 25 | -
Chrome/A | Edge/mob | Firefox/A | Opera/A |Safari/iOS | Webview/A
----------|-----------|-----------|-----------|-----------|-----------
38 | ? | 20 | ? | - | 38
Data from MDN - `npm i -g mdncomp` by epistemex

Although Dennis and gengkev solutions of using Blob/FileReader work, I wouldn't suggest taking that approach. It is an async approach to a simple problem, and it is much slower than a direct solution. I've made a post in html5rocks with a simpler and (much faster) solution:
http://updates.html5rocks.com/2012/06/How-to-convert-ArrayBuffer-to-and-from-String
And the solution is:
function ab2str(buf) {
return String.fromCharCode.apply(null, new Uint16Array(buf));
}
function str2ab(str) {
var buf = new ArrayBuffer(str.length*2); // 2 bytes for each char
var bufView = new Uint16Array(buf);
for (var i=0, strLen=str.length; i<strLen; i++) {
bufView[i] = str.charCodeAt(i);
}
return buf;
}
EDIT:
The Encoding API helps solving the string conversion problem. Check out the response from Jeff Posnik on Html5Rocks.com to the above original article.
Excerpt:
The Encoding API makes it simple to translate between raw bytes and native JavaScript strings, regardless of which of the many standard encodings you need to work with.
<pre id="results"></pre>
<script>
if ('TextDecoder' in window) {
// The local files to be fetched, mapped to the encoding that they're using.
var filesToEncoding = {
'utf8.bin': 'utf-8',
'utf16le.bin': 'utf-16le',
'macintosh.bin': 'macintosh'
};
Object.keys(filesToEncoding).forEach(function(file) {
fetchAndDecode(file, filesToEncoding[file]);
});
} else {
document.querySelector('#results').textContent = 'Your browser does not support the Encoding API.'
}
// Use XHR to fetch `file` and interpret its contents as being encoded with `encoding`.
function fetchAndDecode(file, encoding) {
var xhr = new XMLHttpRequest();
xhr.open('GET', file);
// Using 'arraybuffer' as the responseType ensures that the raw data is returned,
// rather than letting XMLHttpRequest decode the data first.
xhr.responseType = 'arraybuffer';
xhr.onload = function() {
if (this.status == 200) {
// The decode() method takes a DataView as a parameter, which is a wrapper on top of the ArrayBuffer.
var dataView = new DataView(this.response);
// The TextDecoder interface is documented at http://encoding.spec.whatwg.org/#interface-textdecoder
var decoder = new TextDecoder(encoding);
var decodedString = decoder.decode(dataView);
// Add the decoded file's text to the <pre> element on the page.
document.querySelector('#results').textContent += decodedString + '\n';
} else {
console.error('Error while requesting', file, this);
}
};
xhr.send();
}
</script>

You can use TextEncoder and TextDecoder from the Encoding standard, which is polyfilled by the stringencoding library, to convert string to and from ArrayBuffers:
var uint8array = new TextEncoder().encode(string);
var string = new TextDecoder(encoding).decode(uint8array);

Blob is much slower than String.fromCharCode(null,array);
but that fails if the array buffer gets too big. The best solution I have found is to use String.fromCharCode(null,array); and split it up into operations that won't blow the stack, but are faster than a single char at a time.
The best solution for large array buffer is:
function arrayBufferToString(buffer){
var bufView = new Uint16Array(buffer);
var length = bufView.length;
var result = '';
var addition = Math.pow(2,16)-1;
for(var i = 0;i<length;i+=addition){
if(i + addition > length){
addition = length - i;
}
result += String.fromCharCode.apply(null, bufView.subarray(i,i+addition));
}
return result;
}
I found this to be about 20 times faster than using blob. It also works for large strings of over 100mb.

In case you have binary data in a string (obtained from nodejs + readFile(..., 'binary'), or cypress + cy.fixture(..., 'binary'), etc), you can't use TextEncoder. It supports only utf8. Bytes with values >= 128 are each turned into 2 bytes.
ES2015:
a = Uint8Array.from(s, x => x.charCodeAt(0))
Uint8Array(33) [2, 134, 140, 186, 82, 70, 108, 182, 233, 40, 143, 247, 29, 76, 245, 206, 29, 87, 48, 160, 78, 225, 242, 56, 236, 201, 80, 80, 152, 118, 92, 144, 48
s = String.fromCharCode.apply(null, a)
"ºRFl¶é(÷LõÎW0 Náò8ìÉPPv\0"

Based on the answer of gengkev, I created functions for both ways, because BlobBuilder can handle String and ArrayBuffer:
function string2ArrayBuffer(string, callback) {
var bb = new BlobBuilder();
bb.append(string);
var f = new FileReader();
f.onload = function(e) {
callback(e.target.result);
}
f.readAsArrayBuffer(bb.getBlob());
}
and
function arrayBuffer2String(buf, callback) {
var bb = new BlobBuilder();
bb.append(buf);
var f = new FileReader();
f.onload = function(e) {
callback(e.target.result)
}
f.readAsText(bb.getBlob());
}
A simple test:
string2ArrayBuffer("abc",
function (buf) {
var uInt8 = new Uint8Array(buf);
console.log(uInt8); // Returns `Uint8Array { 0=97, 1=98, 2=99}`
arrayBuffer2String(buf,
function (string) {
console.log(string); // returns "abc"
}
)
}
)

All the following is about getting binary strings from array buffers
I'd recommend not to use
var binaryString = String.fromCharCode.apply(null, new Uint8Array(arrayBuffer));
because it
crashes on big buffers (somebody wrote about "magic" size of 246300 but I got Maximum call stack size exceeded error on 120000 bytes buffer (Chrome 29))
it has really poor performance (see below)
If you exactly need synchronous solution use something like
var
binaryString = '',
bytes = new Uint8Array(arrayBuffer),
length = bytes.length;
for (var i = 0; i < length; i++) {
binaryString += String.fromCharCode(bytes[i]);
}
it is as slow as the previous one but works correctly. It seems that at the moment of writing this there is no quite fast synchronous solution for that problem (all libraries mentioned in this topic uses the same approach for their synchronous features).
But what I really recommend is using Blob + FileReader approach
function readBinaryStringFromArrayBuffer (arrayBuffer, onSuccess, onFail) {
var reader = new FileReader();
reader.onload = function (event) {
onSuccess(event.target.result);
};
reader.onerror = function (event) {
onFail(event.target.error);
};
reader.readAsBinaryString(new Blob([ arrayBuffer ],
{ type: 'application/octet-stream' }));
}
the only disadvantage (not for all) is that it is asynchronous. And it is about 8-10 times faster then previous solutions! (Some details: synchronous solution on my environment took 950-1050 ms for 2.4Mb buffer but solution with FileReader had times about 100-120 ms for the same amount of data. And I have tested both synchronous solutions on 100Kb buffer and they have taken almost the same time, so loop is not much slower the using 'apply'.)
BTW here: How to convert ArrayBuffer to and from String author compares two approaches like me and get completely opposite results (his test code is here) Why so different results? Probably because of his test string that is 1Kb long (he called it "veryLongStr"). My buffer was a really big JPEG image of size 2.4Mb.

Unlike the solutions here, I needed to convert to/from UTF-8 data. For this purpose, I coded the following two functions, using the (un)escape/(en)decodeURIComponent trick. They're pretty wasteful of memory, allocating 9 times the length of the encoded utf8-string, though those should be recovered by gc. Just don't use them for 100mb text.
function utf8AbFromStr(str) {
var strUtf8 = unescape(encodeURIComponent(str));
var ab = new Uint8Array(strUtf8.length);
for (var i = 0; i < strUtf8.length; i++) {
ab[i] = strUtf8.charCodeAt(i);
}
return ab;
}
function strFromUtf8Ab(ab) {
return decodeURIComponent(escape(String.fromCharCode.apply(null, ab)));
}
Checking that it works:
strFromUtf8Ab(utf8AbFromStr('latinкирилицаαβγδεζηあいうえお'))
-> "latinкирилицаαβγδεζηあいうえお"

(Update Please see the 2nd half of this answer, where I have (hopefully) provided a more complete solution.)
I also ran into this issue, the following works for me in FF 6 (for one direction):
var buf = new ArrayBuffer( 10 );
var view = new Uint8Array( buf );
view[ 3 ] = 4;
alert(Array.prototype.slice.call(view).join(""));
Unfortunately, of course, you end up with ASCII text representations of the values in the array, rather than characters. It still (should be) much more efficient than a loop, though.
eg. For the example above, the result is 0004000000, rather than several null chars & a chr(4).
Edit:
After looking on MDC here, you may create an ArrayBuffer from an Array as follows:
var arr = new Array(23);
// New Uint8Array() converts the Array elements
// to Uint8s & creates a new ArrayBuffer
// to store them in & a corresponding view.
// To get at the generated ArrayBuffer,
// you can then access it as below, with the .buffer property
var buf = new Uint8Array( arr ).buffer;
To answer your original question, this allows you to convert ArrayBuffer <-> String as follows:
var buf, view, str;
buf = new ArrayBuffer( 256 );
view = new Uint8Array( buf );
view[ 0 ] = 7; // Some dummy values
view[ 2 ] = 4;
// ...
// 1. Buffer -> String (as byte array "list")
str = bufferToString(buf);
alert(str); // Alerts "7,0,4,..."
// 1. String (as byte array) -> Buffer
buf = stringToBuffer(str);
alert(new Uint8Array( buf )[ 2 ]); // Alerts "4"
// Converts any ArrayBuffer to a string
// (a comma-separated list of ASCII ordinals,
// NOT a string of characters from the ordinals
// in the buffer elements)
function bufferToString( buf ) {
var view = new Uint8Array( buf );
return Array.prototype.join.call(view, ",");
}
// Converts a comma-separated ASCII ordinal string list
// back to an ArrayBuffer (see note for bufferToString())
function stringToBuffer( str ) {
var arr = str.split(",")
, view = new Uint8Array( arr );
return view.buffer;
}
For convenience, here is a function for converting a raw Unicode String to an ArrayBuffer (will only work with ASCII/one-byte characters)
function rawStringToBuffer( str ) {
var idx, len = str.length, arr = new Array( len );
for ( idx = 0 ; idx < len ; ++idx ) {
arr[ idx ] = str.charCodeAt(idx) & 0xFF;
}
// You may create an ArrayBuffer from a standard array (of values) as follows:
return new Uint8Array( arr ).buffer;
}
// Alerts "97"
alert(new Uint8Array( rawStringToBuffer("abc") )[ 0 ]);
The above allow you to go from ArrayBuffer -> String & back to ArrayBuffer again, where the string may be stored in eg. .localStorage :)
Hope this helps,
Dan

Just
const buffer = thisReturnsBuffers();
const blob = new Blob([buffer], {type: 'text/plain; charset=utf-8'});
blob.text().then(text => console.log(text));
Or
const stringVal = "string here";
const blob = new Blob([stringVal], {type: 'text/plain; charset=utf-8'});
blob.arrayBuffer().then(buffer => console.log(buffer));
Why are you all making this so complicated?

For node.js and also for browsers using https://github.com/feross/buffer
function ab2str(buf: Uint8Array) {
return Buffer.from(buf).toString('base64');
}
function str2ab(str: string) {
return new Uint8Array(Buffer.from(str, 'base64'))
}
Note: Solutions here didn't work for me. I need to support node.js and browsers and just serialize UInt8Array to a string. I could serialize it as a number[] but that occupies unnecessary space. With that solution I don't need to worry about encodings since it's base64. Just in case other people struggle with the same problem... My two cents

I found I had problems with this approach, basically because I was trying to write the output to a file and it was non encoded properly. Since JS seems to use UCS-2 encoding (source, source), we need to stretch this solution a step further, here's my enhanced solution that works to me.
I had no difficulties with generic text, but when it was down to Arab or Korean, the output file didn't have all the chars but instead was showing error characters
File output:
","10k unit":"",Follow:"Õ©íüY‹","Follow %{screen_name}":"%{screen_name}U“’Õ©íü",Tweet:"ĤüÈ","Tweet %{hashtag}":"%{hashtag} ’ĤüÈY‹","Tweet to %{name}":"%{name}U“xĤüÈY‹"},ko:{"%{followers_count} followers":"%{followers_count}…X \Ì","100K+":"100Ì tÁ","10k unit":"Ì è",Follow:"\°","Follow %{screen_name}":"%{screen_name} Ø \°X0",K:"œ",M:"1Ì",Tweet:"¸","Tweet %{hashtag}":"%{hashtag}
Original:
","10k unit":"万",Follow:"フォローする","Follow %{screen_name}":"%{screen_name}さんをフォロー",Tweet:"ツイート","Tweet %{hashtag}":"%{hashtag} をツイートする","Tweet to %{name}":"%{name}さんへツイートする"},ko:{"%{followers_count} followers":"%{followers_count}명의 팔로워","100K+":"100만 이상","10k unit":"만 단위",Follow:"팔로우","Follow %{screen_name}":"%{screen_name} 님 팔로우하기",K:"천",M:"백만",Tweet:"트윗","Tweet %{hashtag}":"%{hashtag}
I took the information from dennis' solution and this post I found.
Here's my code:
function encode_utf8(s) {
return unescape(encodeURIComponent(s));
}
function decode_utf8(s) {
return decodeURIComponent(escape(s));
}
function ab2str(buf) {
var s = String.fromCharCode.apply(null, new Uint8Array(buf));
return decode_utf8(decode_utf8(s))
}
function str2ab(str) {
var s = encode_utf8(str)
var buf = new ArrayBuffer(s.length);
var bufView = new Uint8Array(buf);
for (var i=0, strLen=s.length; i<strLen; i++) {
bufView[i] = s.charCodeAt(i);
}
return bufView;
}
This allows me to save the content to a file without encoding problems.
How it works:
It basically takes the single 8-byte chunks composing a UTF-8 character and saves them as single characters (therefore an UTF-8 character built in this way, could be composed by 1-4 of these characters).
UTF-8 encodes characters in a format that variates from 1 to 4 bytes in length. What we do here is encoding the sting in an URI component and then take this component and translate it in the corresponding 8 byte character. In this way we don't lose the information given by UTF8 characters that are more than 1 byte long.

if you used huge array example arr.length=1000000
you can this code to avoid stack callback problems
function ab2str(buf) {
var bufView = new Uint16Array(buf);
var unis =""
for (var i = 0; i < bufView.length; i++) {
unis=unis+String.fromCharCode(bufView[i]);
}
return unis
}
reverse function
mangini answer from top
function str2ab(str) {
var buf = new ArrayBuffer(str.length*2); // 2 bytes for each char
var bufView = new Uint16Array(buf);
for (var i=0, strLen=str.length; i<strLen; i++) {
bufView[i] = str.charCodeAt(i);
}
return buf;
}

Well, here's a somewhat convoluted way of doing the same thing:
var string = "Blah blah blah", output;
var bb = new (window.BlobBuilder||window.WebKitBlobBuilder||window.MozBlobBuilder)();
bb.append(string);
var f = new FileReader();
f.onload = function(e) {
// do whatever
output = e.target.result;
}
f.readAsArrayBuffer(bb.getBlob());
Edit: BlobBuilder has long been deprecated in favor of the Blob constructor, which did not exist when I first wrote this post. Here's an updated version. (And yes, this has always been a very silly way to do the conversion, but it was just for fun!)
var string = "Blah blah blah", output;
var f = new FileReader();
f.onload = function(e) {
// do whatever
output = e.target.result;
};
f.readAsArrayBuffer(new Blob([string]));

The following is a working Typescript implementation:
bufferToString(buffer: ArrayBuffer): string {
return String.fromCharCode.apply(null, Array.from(new Uint16Array(buffer)));
}
stringToBuffer(value: string): ArrayBuffer {
let buffer = new ArrayBuffer(value.length * 2); // 2 bytes per char
let view = new Uint16Array(buffer);
for (let i = 0, length = value.length; i < length; i++) {
view[i] = value.charCodeAt(i);
}
return buffer;
}
I've used this for numerous operations while working with crypto.subtle.

Recently I also need to do this for one of my project so did a well research and got a result from Google's Developer community which states this in a simple manner:
For ArrayBuffer to String
function ab2str(buf) {
return String.fromCharCode.apply(null, new Uint16Array(buf));
}
// Here Uint16 can be different like Uinit8/Uint32 depending upon your buffer value type.
For String to ArrayBuffer
function str2ab(str) {
var buf = new ArrayBuffer(str.length*2); // 2 bytes for each char
var bufView = new Uint16Array(buf);
for (var i=0, strLen=str.length; i < strLen; i++) {
bufView[i] = str.charCodeAt(i);
}
return buf;
}
//Same here also for the Uint16Array.
For more in detail reference you can refer this blog by Google.

Let's say you have an arrayBuffer binaryStr:
let text = String.fromCharCode.apply(null, new Uint8Array(binaryStr));
and then you assign the text to the state.

I used this and works for me.
function arrayBufferToBase64( buffer ) {
var binary = '';
var bytes = new Uint8Array( buffer );
var len = bytes.byteLength;
for (var i = 0; i < len; i++) {
binary += String.fromCharCode( bytes[ i ] );
}
return window.btoa( binary );
}
function base64ToArrayBuffer(base64) {
var binary_string = window.atob(base64);
var len = binary_string.length;
var bytes = new Uint8Array( len );
for (var i = 0; i < len; i++) {
bytes[i] = binary_string.charCodeAt(i);
}
return bytes.buffer;
}

stringToArrayBuffer(byteString) {
var byteArray = new Uint8Array(byteString.length);
for (var i = 0; i < byteString.length; i++) {
byteArray[i] = byteString.codePointAt(i);
}
return byteArray;
}
arrayBufferToString(buffer) {
var byteArray = new Uint8Array(buffer);
var byteString = '';
for (var i = 0; i < byteArray.byteLength; i++) {
byteString += String.fromCodePoint(byteArray[i]);
}
return byteString;
}

After playing with mangini's solution for converting from ArrayBuffer to String - ab2str (which is the most elegant and useful one I have found - thanks!), I had some issues when handling large arrays. More specefivally, calling String.fromCharCode.apply(null, new Uint16Array(buf)); throws an error:
arguments array passed to Function.prototype.apply is too large.
In order to solve it (bypass) I have decided to handle the input ArrayBuffer in chunks. So the modified solution is:
function ab2str(buf) {
var str = "";
var ab = new Uint16Array(buf);
var abLen = ab.length;
var CHUNK_SIZE = Math.pow(2, 16);
var offset, len, subab;
for (offset = 0; offset < abLen; offset += CHUNK_SIZE) {
len = Math.min(CHUNK_SIZE, abLen-offset);
subab = ab.subarray(offset, offset+len);
str += String.fromCharCode.apply(null, subab);
}
return str;
}
The chunk size is set to 2^16 because this was the size I have found to work in my development landscape. Setting a higher value caused the same error to reoccur. It can be altered by setting the CHUNK_SIZE variable to a different value. It is important to have an even number.
Note on performance - I did not make any performance tests for this solution. However, since it is based on the previous solution, and can handle large arrays, I see no reason why not to use it.

ArrayBuffer -> Buffer -> String(Base64)
Change ArrayBuffer to Buffer and then to String.
Buffer.from(arrBuffer).toString("base64");

See here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Typed_arrays/StringView
(a C-like interface for strings based upon the JavaScript ArrayBuffer interface)

The "native" binary string that atob() returns is a 1-byte-per-character Array.
So we shouldn't store 2 byte into a character.
var arrayBufferToString = function(buffer) {
return String.fromCharCode.apply(null, new Uint8Array(buffer));
}
var stringToArrayBuffer = function(str) {
return (new Uint8Array([].map.call(str,function(x){return x.charCodeAt(0)}))).buffer;
}

Yes:
const encstr = (`TextEncoder` in window) ? new TextEncoder().encode(str) : Uint8Array.from(str, c => c.codePointAt(0));

I'd recommend NOT using deprecated APIs like BlobBuilder
BlobBuilder has long been deprecated by the Blob object. Compare the code in Dennis' answer — where BlobBuilder is used — with the code below:
function arrayBufferGen(str, cb) {
var b = new Blob([str]);
var f = new FileReader();
f.onload = function(e) {
cb(e.target.result);
}
f.readAsArrayBuffer(b);
}
Note how much cleaner and less bloated this is compared to the deprecated method... Yeah, this is definitely something to consider here.

Use splat unpacking instead of loops:
arrbuf = new Uint8Array([104, 101, 108, 108, 111])
text = String.fromCharCode(...arrbuf)
console.log(text)
For substrings arrbuf.slice() can be employed.

For me this worked well.
static async hash(message) {
const data = new TextEncoder().encode(message);
const hashBuffer = await crypto.subtle.digest('SHA-256', data)
const hashArray = Array.from(new Uint8Array(hashBuffer))
const hashHex = hashArray.map((b) => b.toString(16).padStart(2, '0')).join('')
return hashHex
}

var decoder = new TextDecoder ();
var string = decoder.decode (arrayBuffer);
See https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder/decode

From emscripten:
function stringToUTF8Array(str, outU8Array, outIdx, maxBytesToWrite) {
if (!(maxBytesToWrite > 0)) return 0;
var startIdx = outIdx;
var endIdx = outIdx + maxBytesToWrite - 1;
for (var i = 0; i < str.length; ++i) {
var u = str.charCodeAt(i);
if (u >= 55296 && u <= 57343) {
var u1 = str.charCodeAt(++i);
u = 65536 + ((u & 1023) << 10) | u1 & 1023
}
if (u <= 127) {
if (outIdx >= endIdx) break;
outU8Array[outIdx++] = u
} else if (u <= 2047) {
if (outIdx + 1 >= endIdx) break;
outU8Array[outIdx++] = 192 | u >> 6;
outU8Array[outIdx++] = 128 | u & 63
} else if (u <= 65535) {
if (outIdx + 2 >= endIdx) break;
outU8Array[outIdx++] = 224 | u >> 12;
outU8Array[outIdx++] = 128 | u >> 6 & 63;
outU8Array[outIdx++] = 128 | u & 63
} else {
if (outIdx + 3 >= endIdx) break;
outU8Array[outIdx++] = 240 | u >> 18;
outU8Array[outIdx++] = 128 | u >> 12 & 63;
outU8Array[outIdx++] = 128 | u >> 6 & 63;
outU8Array[outIdx++] = 128 | u & 63
}
}
outU8Array[outIdx] = 0;
return outIdx - startIdx
}
Use like:
stringToUTF8Array('abs', new Uint8Array(3), 0, 4);

Related

How to use javascript (in Angular) to get bytes encoded by java.util.Base64? [duplicate]

I need to convert a base64 encode string into an ArrayBuffer.
The base64 strings are user input, they will be copy and pasted from an email, so they're not there when the page is loaded.
I would like to do this in javascript without making an ajax call to the server if possible.
I found those links interesting, but they didt'n help me:
ArrayBuffer to base64 encoded string
this is about the opposite conversion, from ArrayBuffer to base64, not the other way round
http://jsperf.com/json-vs-base64/2
this looks good but i can't figure out how to use the code.
Is there an easy (maybe native) way to do the conversion? thanks
Try this:
function _base64ToArrayBuffer(base64) {
var binary_string = window.atob(base64);
var len = binary_string.length;
var bytes = new Uint8Array(len);
for (var i = 0; i < len; i++) {
bytes[i] = binary_string.charCodeAt(i);
}
return bytes.buffer;
}
Using TypedArray.from:
Uint8Array.from(atob(base64_string), c => c.charCodeAt(0))
Performance to be compared with the for loop version of Goran.it answer.
For Node.js users:
const myBuffer = Buffer.from(someBase64String, 'base64');
myBuffer will be of type Buffer which is a subclass of Uint8Array. Unfortunately, Uint8Array is NOT an ArrayBuffer as the OP was asking for. But when manipulating an ArrayBuffer I almost always wrap it with Uint8Array or something similar, so it should be close to what's being asked for.
Goran.it's answer does not work because of unicode problem in javascript - https://developer.mozilla.org/en-US/docs/Web/API/WindowBase64/Base64_encoding_and_decoding.
I ended up using the function given on Daniel Guerrero's blog: http://blog.danguer.com/2011/10/24/base64-binary-decoding-in-javascript/
Function is listed on github link: https://github.com/danguer/blog-examples/blob/master/js/base64-binary.js
Use these lines
var uintArray = Base64Binary.decode(base64_string);
var byteArray = Base64Binary.decodeArrayBuffer(base64_string);
Async solution, it's better when the data is big:
// base64 to buffer
function base64ToBufferAsync(base64) {
var dataUrl = "data:application/octet-binary;base64," + base64;
fetch(dataUrl)
.then(res => res.arrayBuffer())
.then(buffer => {
console.log("base64 to buffer: " + new Uint8Array(buffer));
})
}
// buffer to base64
function bufferToBase64Async( buffer ) {
var blob = new Blob([buffer], {type:'application/octet-binary'});
console.log("buffer to blob:" + blob)
var fileReader = new FileReader();
fileReader.onload = function() {
var dataUrl = fileReader.result;
console.log("blob to dataUrl: " + dataUrl);
var base64 = dataUrl.substr(dataUrl.indexOf(',')+1)
console.log("dataUrl to base64: " + base64);
};
fileReader.readAsDataURL(blob);
}
Javascript is a fine development environment so it seems odd than it doesn't provide a solution to this small problem. The solutions offered elsewhere on this page are potentially slow. Here is my solution. It employs the inbuilt functionality that decodes base64 image and sound data urls.
var req = new XMLHttpRequest;
req.open('GET', "data:application/octet;base64," + base64Data);
req.responseType = 'arraybuffer';
req.onload = function fileLoaded(e)
{
var byteArray = new Uint8Array(e.target.response);
// var shortArray = new Int16Array(e.target.response);
// var unsignedShortArray = new Int16Array(e.target.response);
// etc.
}
req.send();
The send request fails if the base 64 string is badly formed.
The mime type (application/octet) is probably unnecessary.
Tested in chrome. Should work in other browsers.
Pure JS - no string middlestep (no atob)
I write following function which convert base64 in direct way (without conversion to string at the middlestep). IDEA
get 4 base64 characters chunk
find index of each character in base64 alphabet
convert index to 6-bit number (binary string)
join four 6 bit numbers which gives 24-bit numer (stored as binary string)
split 24-bit string to three 8-bit and covert each to number and store them in output array
corner case: if input base64 string ends with one/two = char, remove one/two numbers from output array
Below solution allows to process large input base64 strings. Similar function for convert bytes to base64 without btoa is HERE
function base64ToBytesArr(str) {
const abc = [..."ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"]; // base64 alphabet
let result = [];
for(let i=0; i<str.length/4; i++) {
let chunk = [...str.slice(4*i,4*i+4)]
let bin = chunk.map(x=> abc.indexOf(x).toString(2).padStart(6,0)).join('');
let bytes = bin.match(/.{1,8}/g).map(x=> +('0b'+x));
result.push(...bytes.slice(0,3 - (str[4*i+2]=="=") - (str[4*i+3]=="=")));
}
return result;
}
// --------
// TEST
// --------
let test = "Alice's Adventure in Wonderland.";
console.log('test string:', test.length, test);
let b64_btoa = btoa(test);
console.log('encoded string:', b64_btoa);
let decodedBytes = base64ToBytesArr(b64_btoa); // decode base64 to array of bytes
console.log('decoded bytes:', JSON.stringify(decodedBytes));
let decodedTest = decodedBytes.map(b => String.fromCharCode(b) ).join``;
console.log('Uint8Array', JSON.stringify(new Uint8Array(decodedBytes)));
console.log('decoded string:', decodedTest.length, decodedTest);
Caution!
If you want to decode base64 to STRING (not bytes array) and you know that result contains utf8 characters then atob will fail in general e.g. for character 💩 the atob("8J+SqQ==") will give wrong result . In this case you can use above solution and convert result bytes array to string in proper way e.g. :
function base64ToBytesArr(str) {
const abc = [..."ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"]; // base64 alphabet
let result = [];
for(let i=0; i<str.length/4; i++) {
let chunk = [...str.slice(4*i,4*i+4)]
let bin = chunk.map(x=> abc.indexOf(x).toString(2).padStart(6,0)).join('');
let bytes = bin.match(/.{1,8}/g).map(x=> +('0b'+x));
result.push(...bytes.slice(0,3 - (str[4*i+2]=="=") - (str[4*i+3]=="=")));
}
return result;
}
// --------
// TEST
// --------
let testB64 = "8J+SqQ=="; // for string: "💩";
console.log('input base64 :', testB64);
let decodedBytes = base64ToBytesArr(testB64); // decode base64 to array of bytes
console.log('decoded bytes :', JSON.stringify(decodedBytes));
let result = new TextDecoder("utf-8").decode(new Uint8Array(decodedBytes));
console.log('properly decoded string :', result);
let result_atob = atob(testB64);
console.log('decoded by atob :', result_atob);
Snippets tested 2022-08-04 on: chrome 103.0.5060.134 (arm64), safari 15.2, firefox 103.0.1 (64 bit), edge 103.0.1264.77 (arm64), and node-js v12.16.1
I would strongly suggest using an npm package implementing correctly the base64 specification.
The best one I know is rfc4648
The problem is that btoa and atob use binary strings instead of Uint8Array and trying to convert to and from it is cumbersome. Also there is a lot of bad packages in npm for that. I lose a lot of time before finding that one.
The creators of that specific package did a simple thing: they took the specification of Base64 (which is here by the way) and implemented it correctly from the beginning to the end. (Including other formats in the specification that are also useful like Base64-url, Base32, etc ...) That doesn't seem a lot but apparently that was too much to ask to the bunch of other libraries.
So yeah, I know I'm doing a bit of proselytism but if you want to avoid losing your time too just use rfc4648.
I used the accepted answer to this question to create base64Url string <-> arrayBuffer conversions in the realm of base64Url data transmitted via ASCII-cookie [atob, btoa are base64[with +/]<->js binary string], so I decided to post the code.
Many of us may want both conversions and client-server communication may use the base64Url version (though a cookie may contain +/ as well as -_ characters if I understand well, only ",;\ characters and some wicked characters from the 128 ASCII are disallowed). But a url cannot contain / character, hence the wider use of b64 url version which of course not what atob-btoa supports...
Seeing other comments, I would like to stress that my use case here is base64Url data transmission via url/cookie and trying to use this crypto data with the js crypto api (2017) hence the need for ArrayBuffer representation and b64u <-> arrBuff conversions... if array buffers represent other than base64 (part of ascii) this conversion wont work since atob, btoa is limited to ascii(128). Check out an appropriate converter like below:
The buff -> b64u version is from a tweet from Mathias Bynens, thanks for that one (too)! He also wrote a base64 encoder/decoder:
https://github.com/mathiasbynens/base64
Coming from java, it may help when trying to understand the code that java byte[] is practically js Int8Array (signed int) but we use here the unsigned version Uint8Array since js conversions work with them. They are both 256bit, so we call it byte[] in js now...
The code is from a module class, that is why static.
//utility
/**
* Array buffer to base64Url string
* - arrBuff->byte[]->biStr->b64->b64u
* #param arrayBuffer
* #returns {string}
* #private
*/
static _arrayBufferToBase64Url(arrayBuffer) {
console.log('base64Url from array buffer:', arrayBuffer);
let base64Url = window.btoa(String.fromCodePoint(...new Uint8Array(arrayBuffer)));
base64Url = base64Url.replaceAll('+', '-');
base64Url = base64Url.replaceAll('/', '_');
console.log('base64Url:', base64Url);
return base64Url;
}
/**
* Base64Url string to array buffer
* - b64u->b64->biStr->byte[]->arrBuff
* #param base64Url
* #returns {ArrayBufferLike}
* #private
*/
static _base64UrlToArrayBuffer(base64Url) {
console.log('array buffer from base64Url:', base64Url);
let base64 = base64Url.replaceAll('-', '+');
base64 = base64.replaceAll('_', '/');
const binaryString = window.atob(base64);
const length = binaryString.length;
const bytes = new Uint8Array(length);
for (let i = 0; i < length; i++) {
bytes[i] = binaryString.charCodeAt(i);
}
console.log('array buffer:', bytes.buffer);
return bytes.buffer;
}
made a ArrayBuffer from a base64:
function base64ToArrayBuffer(base64) {
var binary_string = window.atob(base64);
var len = binary_string.length;
var bytes = new Uint8Array(len);
for (var i = 0; i < len; i++) {
bytes[i] = binary_string.charCodeAt(i);
}
return bytes.buffer;
}
I was trying to use above code and It's working fine.
The result of atob is a string that is separated with some comma
,
A simpler way is to convert this string to a json array string and after that parse it to a byteArray
below code can simply be used to convert base64 to an array of number
let byteArray = JSON.parse('['+atob(base64)+']');
let buffer = new Uint8Array(byteArray);
Solution without atob
I've seen many people complaining about using atob and btoa in the replies. There are some issues to take into account when using them.
There's a solution without using them in the MDN page about Base64. Below you can find the code to convert a base64 string into a Uint8Array copied from the docs.
Note that the function below returns a Uint8Array. To get the ArrayBuffer version you just need to do uintArray.buffer.
function b64ToUint6(nChr) {
return nChr > 64 && nChr < 91
? nChr - 65
: nChr > 96 && nChr < 123
? nChr - 71
: nChr > 47 && nChr < 58
? nChr + 4
: nChr === 43
? 62
: nChr === 47
? 63
: 0;
}
function base64DecToArr(sBase64, nBlocksSize) {
const sB64Enc = sBase64.replace(/[^A-Za-z0-9+/]/g, "");
const nInLen = sB64Enc.length;
const nOutLen = nBlocksSize
? Math.ceil(((nInLen * 3 + 1) >> 2) / nBlocksSize) * nBlocksSize
: (nInLen * 3 + 1) >> 2;
const taBytes = new Uint8Array(nOutLen);
let nMod3;
let nMod4;
let nUint24 = 0;
let nOutIdx = 0;
for (let nInIdx = 0; nInIdx < nInLen; nInIdx++) {
nMod4 = nInIdx & 3;
nUint24 |= b64ToUint6(sB64Enc.charCodeAt(nInIdx)) << (6 * (3 - nMod4));
if (nMod4 === 3 || nInLen - nInIdx === 1) {
nMod3 = 0;
while (nMod3 < 3 && nOutIdx < nOutLen) {
taBytes[nOutIdx] = (nUint24 >>> ((16 >>> nMod3) & 24)) & 255;
nMod3++;
nOutIdx++;
}
nUint24 = 0;
}
}
return taBytes;
}
If you're interested in the reverse operation, ArrayBuffer to base64, you can find how to do it in the same link.

How to create a hash using Web Crypto API?

I'm trying to create SHA-1 hash on the client-side. I'm trying to do this with Web Crypto API but when I'm comparing the output to what various online tools give me, the result is completely different. I think the problem is in ArrayBuffer to Hex conversion. Here is my code:
function generateHash() {
var value = "mypassword";
var crypto = window.crypto;
var buffer = new ArrayBuffer(value);
var hash_bytes = crypto.subtle.digest("SHA-1", buffer);
hash_bytes.then(value => document.write([...new Uint8Array(value)].map(x => x.toString(16).padStart(2, '0')).join('')));
}
Output of document.write should be:
91dfd9ddb4198affc5c194cd8ce6d338fde470e2
But it's not, I get a completely different hash of different length (should be 40). Could I have some advise on the problem? Thanks.
The problem seems to be more the input conversion from a string to an ArrayBuffer. E.g. with str2ab() the code works:
generateHash();
function generateHash() {
var value = "mypassword";
var crypto = window.crypto;
var buffer = str2ab(value); // Fix
var hash_bytes = crypto.subtle.digest("SHA-1", buffer);
hash_bytes.then(value => document.write([...new Uint8Array(value)].map(x => x.toString(16).padStart(2, '0')).join('')));
}
// https://stackoverflow.com/a/11058858
function str2ab(str) {
const buf = new ArrayBuffer(str.length);
const bufView = new Uint8Array(buf);
for (let i = 0, strLen = str.length; i < strLen; i++) {
bufView[i] = str.charCodeAt(i);
}
return buf;
}
with the expected output:
91dfd9ddb4198affc5c194cd8ce6d338fde470e2
Using the debugger, it looks like var buffer = new ArrayBuffer(value); results in buffer being an empty ArrayBuffer. The text string stored in value must be utf-8 encoded in order to be correctly converted to bytes, which can then by passed as an input to the crypto.subtle.digest() function.
Try changing:
var buffer = new ArrayBuffer(value);
to:
var buffer = new TextEncoder("utf-8").encode(value);
This creates a Uint8Array (as expected by crypto.subtle.digest()) consisting of the bytes resulting from utf-8 encoding the text string in 'value'. This should solve the problem and produce the result that you are expecting.

Working with memory to fetch string yields incorrect result

I am following the solutions from here:
How can I return a JavaScript string from a WebAssembly function
and here:
How to return a string (or similar) from Rust in WebAssembly?
However, when reading from memory I am not getting the desired results.
AssemblyScript file, helloWorldModule.ts:
export function getMessageLocation(): string {
return "Hello World";
}
index.html:
<script>
fetch("helloWorldModule.wasm").then(response =>
response.arrayBuffer()
).then(bytes =>
WebAssembly.instantiate(bytes, {imports: {}})
).then(results => {
var linearMemory = results.instance.exports.memory;
var offset = results.instance.exports.getMessageLocation();
var stringBuffer = new Uint8Array(linearMemory.buffer, offset, 11);
let str = '';
for (let i=0; i<stringBuffer.length; i++) {
str += String.fromCharCode(stringBuffer[i]);
}
debugger;
});
</script>
This returns an offset of 32. And finally yields a string that starts too early and has spaces between each letter of "Hello World":
However, if I change the array to an Int16Array, and add 8 to the offset (which was 32), to make an offset of 40. Like so:
<script>
fetch("helloWorldModule.wasm").then(response =>
response.arrayBuffer()
).then(bytes =>
WebAssembly.instantiate(bytes, {imports: {}})
).then(results => {
var linearMemory = results.instance.exports.memory;
var offset = results.instance.exports.getMessageLocation();
var stringBuffer = new Int16Array(linearMemory.buffer, offset+8, 11);
let str = '';
for (let i=0; i<stringBuffer.length; i++) {
str += String.fromCharCode(stringBuffer[i]);
}
debugger;
});
</script>
Then we get the correct result:
Why does the first set of code not work like its supposed to in the links I provided? Why do I need to change it to work with Int16Array to get rid of the space between "H" and "e" for example? Why do I need to add 8 bytes to the offset?
In summary, what on earth is going on here?
Edit: Another clue, is if I use a TextDecoder on the UInt8 array, decoding as UTF-16 looks more correct than decoding as UTF-8:
AssemblyScript uses utf-16: https://github.com/AssemblyScript/assemblyscript/issues/43
Additionally AssemblyScript stores the length of the string in the first 32 or 64 bits.
That's why my code behaves differently. The examples in the links at the top of this post were for C++ and Rust, which do string encoding differently

Recover ArrayBuffer from xhr.responseText

I need to get an array buffer from an http request sending me a base64 answer.
For this request, I can't use XMLHttpRequest.responseType="arraybuffer".
The response I get from this request is read through xhr.responseText. Hence it's encoded as a DOMString. I'm trying to get it back as an array buffer.
I've tried to go back to the base64 from the DOMString using btoa(mysString) or window.btoa(unescape(encodeURIComponent(str))) but the first option just fails, whereas the second option doesn't give the same base64. Example of the first few characters from each base64:
Incoming : UEsDBBQACAgIACp750oAAAAAAAAAAAAAAAALAAAAX3JlbHMvLnJlbH
After the second processing: UEsDBBQACAgIAO+/ve+/ve+/vUoAAAAAAAAAAAAAAAALAAAAX3JlbHMvLnJlbH
As you can see a part of it is similar, but some parts are way off.
What am I missing to get it right?
I have got same issue too.
The solution (I ran at Chrome(68.0.3440.84))
let url = ''
let iso_8859_15_table = { 338: 188, 339: 189, 352: 166, 353: 168, 376: 190, 381: 180, 382: 184, 8364: 164 }
function iso_8859_15_to_uint8array(iso_8859_15_str) {
let buf = new ArrayBuffer(iso_8859_15_str.length);
let bufView = new Uint8Array(buf);
for (let i = 0, strLen = iso_8859_15_str.length; i < strLen; i++) {
let octet = iso_8859_15_str.charCodeAt(i);
if (iso_8859_15_table.hasOwnProperty(octet))
octet = iso_8859_15_table[octet]
bufView[i] = octet;
if(octet < 0 || 255 < octet)
console.error(`invalid data error`)
}
return bufView
}
req = new XMLHttpRequest();
req.overrideMimeType('text/plain; charset=ISO-8859-15');
req.onload = () => {
console.log(`Uint8Array : `)
var uint8array = iso_8859_15_to_uint8array(req.responseText)
console.log(uint8array)
}
req.open("get", url);
req.send();
Below is explanation what I learned to solve it.
Explanation
Why some parts are way off?
because TextDecoder cause data loss (Your case is utf-8).
For example, let's talk about UTF-8
variable width character encoding for Unicode.
It has rules(This will become problem.) for reasons such as variable length characteristics and ASCII compatibility, etc.
so, decoder may replace a non-conforming characters to replacement character such as U+003F(?, Question mark) or U+FFFD(�, Unicode replacement character).
in utf-8 case, 0~127 of values are stable, 128~255 of values are unstable. 128~255 will converted to U+FFFD
Are other Text Decoders safe except UTF-8?
No. In most cases, not safe from rules.
UTF-8 is also unrecoverable. (128~255 are set to U+FFFD)
If the binary data and the decoded result can be corresponded to one-to-one, they can be recovered.
How to solve it?
Finds recoverable Text Decoders.
Force MIME type to recoverable charset of the incoming data.
xhr_object.overrideMimeType('text/plain; charset=ISO-8859-15')
Recover binary data from string with recover table when received.
Finds recoverable Text Decoders.
To recover, avoid the situation when decoded results' are duplicated.
The following code is a simple example, so there may be missing recoverable text decoders because it only consider Uint8Array.
let bufferView = new Uint8Array(256);
for (let i = 0; i < 256; i++)
bufferView[i] = i;
let recoverable = []
let decoding = ['utf-8', 'ibm866', 'iso-8859-2', 'iso-8859-3', 'iso-8859-4', 'iso-8859-5', 'iso-8859-6', 'iso-8859-7', 'iso-8859-8', 'iso-8859-8i', 'iso-8859-10', 'iso-8859-13', 'iso-8859-14', 'iso-8859-15', 'iso-8859-16', 'koi8-r', 'koi8-u', 'macintosh', 'windows-874', 'windows-1250', 'windows-1251', 'windows-1252', 'windows-1253', 'windows-1254', 'windows-1255', 'windows-1256', 'windows-1257', 'windows-1258', 'x-mac-cyrillic', 'gbk', 'gb18030', 'hz-gb-2312', 'big5', 'euc-jp', 'iso-2022-jp', 'shift-jis', 'euc-kr', 'iso-2022-kr', 'utf-16be', 'utf-16le', 'x-user-defined', 'ISO-2022-CN', 'ISO-2022-CN-ext']
for (let dec of decoding) {
try {
let decodedText = new TextDecoder(dec).decode(bufferView);
let loss = 0
let recoverTable = {}
let unrecoverable = 0
for (let i = 0; i < decodedText.length; i++) {
let charCode = decodedText.charCodeAt(i)
if (charCode != i)
loss++
if (!recoverTable[charCode])
recoverTable[charCode] = i
else
unrecoverable++
}
let tableCnt = 0
for (let props in recoverTable) {
tableCnt++
}
if (tableCnt == 256 && unrecoverable == 0){
recoverable.push(dec)
setTimeout(()=>{
console.log(`[${dec}] : err(${loss}/${decodedText.length}, ${Math.round(loss / decodedText.length * 100)}%) alive(${tableCnt}) unrecoverable(${unrecoverable})`)
},10)
}
else {
console.log(`!! [${dec}] : err(${loss}/${decodedText.length}, ${Math.round(loss / decodedText.length * 100)}%) alive(${tableCnt}) unrecoverable(${unrecoverable})`)
}
} catch (e) {
console.log(`!! [${dec}] : not supported.`)
}
}
setTimeout(()=>{
console.log(`recoverable Charset : ${recoverable}`)
}, 10)
In my console, this return
recoverable Charset : ibm866,iso-8859-2,iso-8859-4,iso-8859-5,iso-8859-10,iso-8859-13,iso-8859-14,iso-8859-15,iso-8859-16,koi8-r,koi8-u,macintosh,windows-1250,windows-1251,windows-1252,windows-1254,windows-1256,windows-1258,x-mac-cyrillic,x-user-defined
And I used iso-8859-15 at beginning of this answer. (It has Smallest table size.)
Additional test) Comparison between UTF-8's and ISO-8859-15's result
Check U+FFFD is really disappeared when using ISO-8859-15.
function requestAjax(url, charset) {
let req = new XMLHttpRequest();
if (charset)
req.overrideMimeType(`text/plain; charset=${charset}`);
else
charset = 'utf-8';
req.open('get', url);
req.onload = () => {
console.log(`==========\n${charset}`)
console.log(`${req.responseText.split('', 50)}\n==========`);
console.log('\n')
}
req.send();
}
var url = '';
requestAjax(url, 'ISO-8859-15');
requestAjax(url);
Bottom line
Recover binary data to, from string needs some additional job.
Find recoverable text encoder/decoder.
Make a recover table
Recover with the table.
(You can refer to the very top of code.)
For use this trick, force MIME type of incoming data to desired charset.

Change JavaScript string encoding

At the moment I have a large JavaScript string I'm attempting to write to a file, but in a different encoding (ISO-8859-1). I was hoping to use something like downloadify. Downloadify only accepts normal JavaScript strings or base64 encoded strings.
Because of this, I've decided to compress my string using JSZip which generates a nicely base64 encoded string that can be passed to downloadify, and downloaded to my desktop. Huzzah! The issue is that the string I compressed, of course, is still the wrong encoding.
Luckily JSZip can take a Uint8Array as data, instead of a string. So is there any way to convert a JavaScript string into a ISO-8859-1 encoded string and store it in a Uint8Array?
Alternatively, if I'm approaching this all wrong, is there a better solution all together? Is there a fancy JavaScript string class that can use different internal encodings?
Edit: To clarify, I'm not pushing this string to a webpage so it won't automatically convert it for me. I'm doing something like this:
var zip = new JSZip();
zip.file("genSave.txt", result);
return zip.generate({compression:"DEFLATE"});
And for this to make sense, I would need result to be in the proper encoding (and JSZip only takes strings, arraybuffers, or uint8arrays).
Final Edit (This was -not- a duplicate question because the result wasn't being displayed in the browser or transmitted to a server where the encoding could be changed):
This turned out to be a little more obscure than I had thought, so I ended up rolling my own solution. It's not nearly as robust as a proper solution would be, but it'll convert a JavaScript string into windows-1252 encoding, and stick it in a Uint8Array:
var enc = new string_transcoder("windows-1252");
var tenc = enc.transcode(result); //This is now a Uint8Array
You can then either use it in the array like I did:
//Make this into a zip
var zip = new JSZip();
zip.file("genSave.txt", tenc);
return zip.generate({compression:"DEFLATE"});
Or convert it into a windows-1252 encoded string using this string encoding library:
var string = TextDecoder("windows-1252").decode(tenc);
To use this function, either use:
<script src="//www.eu4editor.com/string_transcoder.js"></script>
Or include this:
function string_transcoder (target) {
this.encodeList = encodings[target];
if (this.encodeList === undefined) {
return undefined;
}
//Initialize the easy encodings
if (target === "windows-1252") {
var i;
for (i = 0x0; i <= 0x7F; i++) {
this.encodeList[i] = i;
}
for (i = 0xA0; i <= 0xFF; i++) {
this.encodeList[i] = i;
}
}
}
string_transcoder.prototype.transcode = function (inString) {
var res = new Uint8Array(inString.length), i;
for (i = 0; i < inString.length; i++) {
var temp = inString.charCodeAt(i);
var tempEncode = (this.encodeList)[temp];
if (tempEncode === undefined) {
return undefined; //This encoding is messed up
} else {
res[i] = tempEncode;
}
}
return res;
};
encodings = {
"windows-1252": {0x20AC:0x80, 0x201A:0x82, 0x0192:0x83, 0x201E:0x84, 0x2026:0x85, 0x2020:0x86, 0x2021:0x87, 0x02C6:0x88, 0x2030:0x89, 0x0160:0x8A, 0x2039:0x8B, 0x0152:0x8C, 0x017D:0x8E, 0x2018:0x91, 0x2019:0x92, 0x201C:0x93, 0x201D:0x94, 0x2022:0x95, 0x2013:0x96, 0x2014:0x97, 0x02DC:0x98, 0x2122:0x99, 0x0161:0x9A, 0x203A:0x9B, 0x0153:0x9C, 0x017E:0x9E, 0x0178:0x9F}
};
This turned out to be a little more obscure than [the author] had thought, so [the author] ended up rolling [his] own solution. It's not nearly as robust as a proper solution would be, but it'll convert a JavaScript string into windows-1252 encoding, and stick it in a Uint8Array:
var enc = new string_transcoder("windows-1252");
var tenc = enc.transcode(result); //This is now a Uint8Array
You can then either use it in the array like [the author] did:
//Make this into a zip
var zip = new JSZip();
zip.file("genSave.txt", tenc);
return zip.generate({compression:"DEFLATE"});
Or convert it into a windows-1252 encoded string using this string encoding library:
var string = TextDecoder("windows-1252").decode(tenc);
To use this function, either use:
<script src="//www.eu4editor.com/string_transcoder.js"></script>
Or include this:
function string_transcoder (target) {
this.encodeList = encodings[target];
if (this.encodeList === undefined) {
return undefined;
}
//Initialize the easy encodings
if (target === "windows-1252") {
var i;
for (i = 0x0; i <= 0x7F; i++) {
this.encodeList[i] = i;
}
for (i = 0xA0; i <= 0xFF; i++) {
this.encodeList[i] = i;
}
}
}
string_transcoder.prototype.transcode = function (inString) {
var res = new Uint8Array(inString.length), i;
for (i = 0; i < inString.length; i++) {
var temp = inString.charCodeAt(i);
var tempEncode = (this.encodeList)[temp];
if (tempEncode === undefined) {
return undefined; //This encoding is messed up
} else {
res[i] = tempEncode;
}
}
return res;
};
encodings = {
"windows-1252": {0x20AC:0x80, 0x201A:0x82, 0x0192:0x83, 0x201E:0x84, 0x2026:0x85, 0x2020:0x86, 0x2021:0x87, 0x02C6:0x88, 0x2030:0x89, 0x0160:0x8A, 0x2039:0x8B, 0x0152:0x8C, 0x017D:0x8E, 0x2018:0x91, 0x2019:0x92, 0x201C:0x93, 0x201D:0x94, 0x2022:0x95, 0x2013:0x96, 0x2014:0x97, 0x02DC:0x98, 0x2122:0x99, 0x0161:0x9A, 0x203A:0x9B, 0x0153:0x9C, 0x017E:0x9E, 0x0178:0x9F}
};
Test the following script:
<script type="text/javascript" charset="utf-8">
The best solution for me was posted here and this is my one-liner:
<!-- Required for non-UTF encodings (quite big) -->
<script src="encoding-indexes.js"></script>
<script src="encoding.js"></script>
...
// windows-1252 is just one typical example encoding/transcoding
let transcodedString = new TextDecoder( 'windows-1252' ).decode(
new TextEncoder().encode( someUtf8String ))
or this if the transcoding has to be applied on multiple inputs reusing the encoder and decoder:
let srcArr = [ ... ] // some UTF-8 string array
let encoder = new TextEncoder()
let decoder = new TextDecoder( 'windows-1252' )
let transcodedArr = srcArr.forEach( (s,i) => {
srcArr[i] = decoder.decode( encoder.encode( s )) })
(The slightly modified other answer from related question:)
This is what I found after a more specific Google search than just
UTF-8 encode/decode. so for those who are looking for a converting
library to convert between encodings, here you go.
github.com/inexorabletash/text-encoding
var uint8array = new TextEncoder().encode(str);
var str = new TextDecoder(encoding).decode(uint8array);
Paste from repo readme
All encodings from the Encoding specification are supported:
utf-8 ibm866 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6
iso-8859-7 iso-8859-8 iso-8859-8-i iso-8859-10 iso-8859-13 iso-8859-14
iso-8859-15 iso-8859-16 koi8-r koi8-u macintosh windows-874 windows-1250
windows-1251 windows-1252 windows-1253 windows-1254 windows-1255
windows-1256 windows-1257 windows-1258 x-mac-cyrillic gb18030 hz-gb-2312
big5 euc-jp iso-2022-jp shift_jis euc-kr replacement utf-16be utf-16le
x-user-defined
(Some encodings may be supported under other names, e.g. ascii,
iso-8859-1, etc. See Encoding for additional labels for each
encoding.)

Categories