Javascript: create UTF-16 text file? - javascript

I have some string need to be a UTF-16 text file. For example:
var s = "aosjdfkzlzkdoaslckjznx";
var file = "data:text/plain;base64," + btoa(s);
This will result a UTF-8 encoding text file. How can I get a UTF-16 text file with string s?

Related: Javascript to csv export encoding issue
This should do it:
document.getElementById('download').addEventListener('click', function(){
downloadUtf16('Hello, World', 'myFile.csv')
});
function downloadUtf16(str, filename) {
// ref: https://stackoverflow.com/q/6226189
var charCode, byteArray = [];
// BE BOM
byteArray.push(254, 255);
// LE BOM
// byteArray.push(255, 254);
for (var i = 0; i < str.length; ++i) {
charCode = str.charCodeAt(i);
// BE Bytes
byteArray.push((charCode & 0xFF00) >>> 8);
byteArray.push(charCode & 0xFF);
// LE Bytes
// byteArray.push(charCode & 0xff);
// byteArray.push(charCode / 256 >>> 0);
}
var blob = new Blob([new Uint8Array(byteArray)], {type:'text/plain;charset=UTF-16BE;'});
var blobUrl = URL.createObjectURL(blob);
// ref: https://stackoverflow.com/a/18197511
var link = document.createElement('a');
link.href = blobUrl;
link.download = filename;
if (document.createEvent) {
var event = document.createEvent('MouseEvents');
event.initEvent('click', true, true);
link.dispatchEvent(event);
} else {
link.click();
}
}
<button id="download">Download</button>

You can use a legacy polyfill of the native TextEncoder API to transform a JavaScript string into an ArrayBuffer. As you'll see in that documentation, UTF16 with either endianness is was supported. Libraries that provide UTF-16 support in a Text-Encoder-compatible way will probably appear soon, if they haven't already. Let's assume that one such library exposes a constructor called ExtendedTextEncoder.
Then you can easily create a Blob URL to allow users to download the file, without the inefficient base-64 conversion.
Something like this:
s = "aosjdfkzlzkdoaslckjznx"
var encoder = new ExtendedTextEncoder("utf-16be")
var blob = new Blob(encoder.encode(s), "text/plain")
var url = URL.createObjectURL(blob)
Now you can use url instead of your data: URL.

Related

How to use javascript (in Angular) to get bytes encoded by java.util.Base64? [duplicate]

I need to convert a base64 encode string into an ArrayBuffer.
The base64 strings are user input, they will be copy and pasted from an email, so they're not there when the page is loaded.
I would like to do this in javascript without making an ajax call to the server if possible.
I found those links interesting, but they didt'n help me:
ArrayBuffer to base64 encoded string
this is about the opposite conversion, from ArrayBuffer to base64, not the other way round
http://jsperf.com/json-vs-base64/2
this looks good but i can't figure out how to use the code.
Is there an easy (maybe native) way to do the conversion? thanks
Try this:
function _base64ToArrayBuffer(base64) {
var binary_string = window.atob(base64);
var len = binary_string.length;
var bytes = new Uint8Array(len);
for (var i = 0; i < len; i++) {
bytes[i] = binary_string.charCodeAt(i);
}
return bytes.buffer;
}
Using TypedArray.from:
Uint8Array.from(atob(base64_string), c => c.charCodeAt(0))
Performance to be compared with the for loop version of Goran.it answer.
For Node.js users:
const myBuffer = Buffer.from(someBase64String, 'base64');
myBuffer will be of type Buffer which is a subclass of Uint8Array. Unfortunately, Uint8Array is NOT an ArrayBuffer as the OP was asking for. But when manipulating an ArrayBuffer I almost always wrap it with Uint8Array or something similar, so it should be close to what's being asked for.
Goran.it's answer does not work because of unicode problem in javascript - https://developer.mozilla.org/en-US/docs/Web/API/WindowBase64/Base64_encoding_and_decoding.
I ended up using the function given on Daniel Guerrero's blog: http://blog.danguer.com/2011/10/24/base64-binary-decoding-in-javascript/
Function is listed on github link: https://github.com/danguer/blog-examples/blob/master/js/base64-binary.js
Use these lines
var uintArray = Base64Binary.decode(base64_string);
var byteArray = Base64Binary.decodeArrayBuffer(base64_string);
Async solution, it's better when the data is big:
// base64 to buffer
function base64ToBufferAsync(base64) {
var dataUrl = "data:application/octet-binary;base64," + base64;
fetch(dataUrl)
.then(res => res.arrayBuffer())
.then(buffer => {
console.log("base64 to buffer: " + new Uint8Array(buffer));
})
}
// buffer to base64
function bufferToBase64Async( buffer ) {
var blob = new Blob([buffer], {type:'application/octet-binary'});
console.log("buffer to blob:" + blob)
var fileReader = new FileReader();
fileReader.onload = function() {
var dataUrl = fileReader.result;
console.log("blob to dataUrl: " + dataUrl);
var base64 = dataUrl.substr(dataUrl.indexOf(',')+1)
console.log("dataUrl to base64: " + base64);
};
fileReader.readAsDataURL(blob);
}
Javascript is a fine development environment so it seems odd than it doesn't provide a solution to this small problem. The solutions offered elsewhere on this page are potentially slow. Here is my solution. It employs the inbuilt functionality that decodes base64 image and sound data urls.
var req = new XMLHttpRequest;
req.open('GET', "data:application/octet;base64," + base64Data);
req.responseType = 'arraybuffer';
req.onload = function fileLoaded(e)
{
var byteArray = new Uint8Array(e.target.response);
// var shortArray = new Int16Array(e.target.response);
// var unsignedShortArray = new Int16Array(e.target.response);
// etc.
}
req.send();
The send request fails if the base 64 string is badly formed.
The mime type (application/octet) is probably unnecessary.
Tested in chrome. Should work in other browsers.
Pure JS - no string middlestep (no atob)
I write following function which convert base64 in direct way (without conversion to string at the middlestep). IDEA
get 4 base64 characters chunk
find index of each character in base64 alphabet
convert index to 6-bit number (binary string)
join four 6 bit numbers which gives 24-bit numer (stored as binary string)
split 24-bit string to three 8-bit and covert each to number and store them in output array
corner case: if input base64 string ends with one/two = char, remove one/two numbers from output array
Below solution allows to process large input base64 strings. Similar function for convert bytes to base64 without btoa is HERE
function base64ToBytesArr(str) {
const abc = [..."ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"]; // base64 alphabet
let result = [];
for(let i=0; i<str.length/4; i++) {
let chunk = [...str.slice(4*i,4*i+4)]
let bin = chunk.map(x=> abc.indexOf(x).toString(2).padStart(6,0)).join('');
let bytes = bin.match(/.{1,8}/g).map(x=> +('0b'+x));
result.push(...bytes.slice(0,3 - (str[4*i+2]=="=") - (str[4*i+3]=="=")));
}
return result;
}
// --------
// TEST
// --------
let test = "Alice's Adventure in Wonderland.";
console.log('test string:', test.length, test);
let b64_btoa = btoa(test);
console.log('encoded string:', b64_btoa);
let decodedBytes = base64ToBytesArr(b64_btoa); // decode base64 to array of bytes
console.log('decoded bytes:', JSON.stringify(decodedBytes));
let decodedTest = decodedBytes.map(b => String.fromCharCode(b) ).join``;
console.log('Uint8Array', JSON.stringify(new Uint8Array(decodedBytes)));
console.log('decoded string:', decodedTest.length, decodedTest);
Caution!
If you want to decode base64 to STRING (not bytes array) and you know that result contains utf8 characters then atob will fail in general e.g. for character 💩 the atob("8J+SqQ==") will give wrong result . In this case you can use above solution and convert result bytes array to string in proper way e.g. :
function base64ToBytesArr(str) {
const abc = [..."ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"]; // base64 alphabet
let result = [];
for(let i=0; i<str.length/4; i++) {
let chunk = [...str.slice(4*i,4*i+4)]
let bin = chunk.map(x=> abc.indexOf(x).toString(2).padStart(6,0)).join('');
let bytes = bin.match(/.{1,8}/g).map(x=> +('0b'+x));
result.push(...bytes.slice(0,3 - (str[4*i+2]=="=") - (str[4*i+3]=="=")));
}
return result;
}
// --------
// TEST
// --------
let testB64 = "8J+SqQ=="; // for string: "💩";
console.log('input base64 :', testB64);
let decodedBytes = base64ToBytesArr(testB64); // decode base64 to array of bytes
console.log('decoded bytes :', JSON.stringify(decodedBytes));
let result = new TextDecoder("utf-8").decode(new Uint8Array(decodedBytes));
console.log('properly decoded string :', result);
let result_atob = atob(testB64);
console.log('decoded by atob :', result_atob);
Snippets tested 2022-08-04 on: chrome 103.0.5060.134 (arm64), safari 15.2, firefox 103.0.1 (64 bit), edge 103.0.1264.77 (arm64), and node-js v12.16.1
I would strongly suggest using an npm package implementing correctly the base64 specification.
The best one I know is rfc4648
The problem is that btoa and atob use binary strings instead of Uint8Array and trying to convert to and from it is cumbersome. Also there is a lot of bad packages in npm for that. I lose a lot of time before finding that one.
The creators of that specific package did a simple thing: they took the specification of Base64 (which is here by the way) and implemented it correctly from the beginning to the end. (Including other formats in the specification that are also useful like Base64-url, Base32, etc ...) That doesn't seem a lot but apparently that was too much to ask to the bunch of other libraries.
So yeah, I know I'm doing a bit of proselytism but if you want to avoid losing your time too just use rfc4648.
I used the accepted answer to this question to create base64Url string <-> arrayBuffer conversions in the realm of base64Url data transmitted via ASCII-cookie [atob, btoa are base64[with +/]<->js binary string], so I decided to post the code.
Many of us may want both conversions and client-server communication may use the base64Url version (though a cookie may contain +/ as well as -_ characters if I understand well, only ",;\ characters and some wicked characters from the 128 ASCII are disallowed). But a url cannot contain / character, hence the wider use of b64 url version which of course not what atob-btoa supports...
Seeing other comments, I would like to stress that my use case here is base64Url data transmission via url/cookie and trying to use this crypto data with the js crypto api (2017) hence the need for ArrayBuffer representation and b64u <-> arrBuff conversions... if array buffers represent other than base64 (part of ascii) this conversion wont work since atob, btoa is limited to ascii(128). Check out an appropriate converter like below:
The buff -> b64u version is from a tweet from Mathias Bynens, thanks for that one (too)! He also wrote a base64 encoder/decoder:
https://github.com/mathiasbynens/base64
Coming from java, it may help when trying to understand the code that java byte[] is practically js Int8Array (signed int) but we use here the unsigned version Uint8Array since js conversions work with them. They are both 256bit, so we call it byte[] in js now...
The code is from a module class, that is why static.
//utility
/**
* Array buffer to base64Url string
* - arrBuff->byte[]->biStr->b64->b64u
* #param arrayBuffer
* #returns {string}
* #private
*/
static _arrayBufferToBase64Url(arrayBuffer) {
console.log('base64Url from array buffer:', arrayBuffer);
let base64Url = window.btoa(String.fromCodePoint(...new Uint8Array(arrayBuffer)));
base64Url = base64Url.replaceAll('+', '-');
base64Url = base64Url.replaceAll('/', '_');
console.log('base64Url:', base64Url);
return base64Url;
}
/**
* Base64Url string to array buffer
* - b64u->b64->biStr->byte[]->arrBuff
* #param base64Url
* #returns {ArrayBufferLike}
* #private
*/
static _base64UrlToArrayBuffer(base64Url) {
console.log('array buffer from base64Url:', base64Url);
let base64 = base64Url.replaceAll('-', '+');
base64 = base64.replaceAll('_', '/');
const binaryString = window.atob(base64);
const length = binaryString.length;
const bytes = new Uint8Array(length);
for (let i = 0; i < length; i++) {
bytes[i] = binaryString.charCodeAt(i);
}
console.log('array buffer:', bytes.buffer);
return bytes.buffer;
}
made a ArrayBuffer from a base64:
function base64ToArrayBuffer(base64) {
var binary_string = window.atob(base64);
var len = binary_string.length;
var bytes = new Uint8Array(len);
for (var i = 0; i < len; i++) {
bytes[i] = binary_string.charCodeAt(i);
}
return bytes.buffer;
}
I was trying to use above code and It's working fine.
The result of atob is a string that is separated with some comma
,
A simpler way is to convert this string to a json array string and after that parse it to a byteArray
below code can simply be used to convert base64 to an array of number
let byteArray = JSON.parse('['+atob(base64)+']');
let buffer = new Uint8Array(byteArray);
Solution without atob
I've seen many people complaining about using atob and btoa in the replies. There are some issues to take into account when using them.
There's a solution without using them in the MDN page about Base64. Below you can find the code to convert a base64 string into a Uint8Array copied from the docs.
Note that the function below returns a Uint8Array. To get the ArrayBuffer version you just need to do uintArray.buffer.
function b64ToUint6(nChr) {
return nChr > 64 && nChr < 91
? nChr - 65
: nChr > 96 && nChr < 123
? nChr - 71
: nChr > 47 && nChr < 58
? nChr + 4
: nChr === 43
? 62
: nChr === 47
? 63
: 0;
}
function base64DecToArr(sBase64, nBlocksSize) {
const sB64Enc = sBase64.replace(/[^A-Za-z0-9+/]/g, "");
const nInLen = sB64Enc.length;
const nOutLen = nBlocksSize
? Math.ceil(((nInLen * 3 + 1) >> 2) / nBlocksSize) * nBlocksSize
: (nInLen * 3 + 1) >> 2;
const taBytes = new Uint8Array(nOutLen);
let nMod3;
let nMod4;
let nUint24 = 0;
let nOutIdx = 0;
for (let nInIdx = 0; nInIdx < nInLen; nInIdx++) {
nMod4 = nInIdx & 3;
nUint24 |= b64ToUint6(sB64Enc.charCodeAt(nInIdx)) << (6 * (3 - nMod4));
if (nMod4 === 3 || nInLen - nInIdx === 1) {
nMod3 = 0;
while (nMod3 < 3 && nOutIdx < nOutLen) {
taBytes[nOutIdx] = (nUint24 >>> ((16 >>> nMod3) & 24)) & 255;
nMod3++;
nOutIdx++;
}
nUint24 = 0;
}
}
return taBytes;
}
If you're interested in the reverse operation, ArrayBuffer to base64, you can find how to do it in the same link.

createObjectUrl for binary data fails

In my javascript I have a base64 encoded pkcs12 object, which I want to provide as download link. The Pkcs12 (pfx) file to be downloaded is binary data.
So I decoded the object and tried to create an objectUrl from it:
var bin = atob(pkcs12);
var blob = new Blob([bin],
{ type : 'application/x-pkcs12' });
$scope.pkcs12Blob = (window.URL || window.webkitURL).createObjectURL( blob );
The problem is, that the downloaded file is bigger than the original binary data and is not recognized as pkcs12. It looks like as if some utf-8/unicode stuff was introduced into the file.
If I provide the original base64 encoded data to the createObjectURL and download the base64 encoded file, I can decode the downloaded file and get a valid p12 file.
So I am wondering: How does createObjectURL work for binary data?
For some reason createObjectURL does not accept a binary string but requires a byte array. This code worked like a charm:
var bytechars = atob($scope.enrolledToken.pkcs12);
var byteNumbers = new Array(bytechars.length);
for (var i = 0; i < bytechars.length; i++) {
byteNumbers[i] = bytechars.charCodeAt(i);
}
var byteArray = new Uint8Array(byteNumbers);
var blob = new Blob([byteArray], {type: 'application/x-pkcs12'});
$scope.pkcs12Blob = (window.URL || window.webkitURL).createObjectURL( blob );

Convert audio data uri string to file

The server saves the audio data as base64 data string. The mobile web client fetches the data and plays the audio.
But found an issue in mobile Chrome in iOS and android that the audio with data uri can't play (issue).
To make it work, I was wondering if there is a way in the client side to convert the data string to an audio file (like .m4a) and link the audio src to the file?
Figured out directly using the web audio api has the best compatibility across the mobile browsers in iOS and Android.
function base64ToArrayBuffer(base64) {
var binaryString = window.atob(base64);
var len = binaryString.length;
var bytes = new Uint8Array( len );
for (var i = 0; i < len; i++) {
bytes[i] = binaryString.charCodeAt(i);
}
return bytes.buffer;
}
var base64 = '<data string retrieved from server>';
var audioContext = new (window.AudioContext || window.webkitAudioContext)();
var source = audioContext.createBufferSource();
audioContext.decodeAudioData(base64ToArrayBuffer(base64), function(buffer) {
source.buffer = buffer;
source.connect(audioContext.destination);
source.start(0);
});
It works in iOS safari, Chrome and Android default browser and Chrome.
There is a way to do kind of what you want, it works on desktop, but I cannot guarantee it works on mobile. The idea is to convert the dataURI to ArrayBuffer, construct a Blob from it and then make a ObjectURL with it, to pass to the audio element. Here is the code (I tested it in Chrome/Firefox under Linux and it works):
<script>
var base64audio = "data:audio/ogg;base64,gibberish";
function dataURItoBlob(dataURI)
{
// Split the input to get the mime-type and the data itself
dataURI = dataURI.split( ',' );
// First part contains data:audio/ogg;base64 from which we only need audio/ogg
var type = dataURI[ 0 ].split( ':' )[ 1 ].split( ';' )[ 0 ];
// Second part is the data itself and we decode it
var byteString = atob( dataURI[ 1 ] );
var byteStringLen = byteString.length;
// Create ArrayBuffer with the byte string and set the length to it
var ab = new ArrayBuffer( byteStringLen );
// Create a typed array out of the array buffer representing each character from as a 8-bit unsigned integer
var intArray = new Uint8Array( ab );
for ( var i = 0; i < byteStringLen; i++ )
{
intArray[ i ] = byteString.charCodeAt( i );
}
return new Blob( [ intArray ], {type: type} );
}
document.addEventListener( 'DOMContentLoaded', function()
{
// Construct an URL from the Blob. This URL will remain valid until user closes the tab or you revoke it
// Make sure at some point (when you don't need the audio anymore) to do URL.revokeObjectURL() with the constructed URL
var objectURL = URL.createObjectURL(dataURItoBlob(base64audio));
// Pass the URL to the audio element and load it
var audio = document.getElementById( 'test' );
audio.src = objectURL;
audio.load();
} );
</script>
...
<audio id="test" controls />
I hope that helps ;)

Saving binary data in a browser without it getting UTF8 encoded on download

My web app receives data in the form of a base64 encoded string, which is decodes using atob, and stores via URL.createObjectURL(). This data is then downloaded via the right-click save-as dialog. The downloaded filed always matches the source file when the source file is ascii encoded. However this isn't the case when the source file is just plain binary data. A diff of a non ascii encoded downloaded file vs its source file appears to show that the downloaded file is UTF-8 encoded. How can this problem be fixed? Please note, I'm locked into using firefox 10.
Convert the string to a Arraybuffer and it should work. If there is any way that you can get the data into an array buffer directly without passing a sting that would be the best solution.
The following code is tested in FF10, and are using the now obsolete MozBlobBuilder.
fiddle
var str="",
idx, len,
buf, view, blobbuild, blob, url,
elem;
// create a test string
for (var idx = 0; idx < 256; ++idx) {
str += String.fromCharCode(idx);
}
// create a buffer
buf = new ArrayBuffer(str.length);
view = new Uint8Array(buf);
// convert string to buffer
for (idx = 0, len = str.length; idx < len; ++idx) {
view[idx] = str.charCodeAt(idx);
}
blobbuild = new MozBlobBuilder();
blobbuild.append(buf);
blob = blobbuild.getBlob('application/octet-stream');
url = URL.createObjectURL(blob);
elem = document.createElement('a');
elem.href = url;
elem.textContent = 'Test';
document.body.appendChild(elem);

Downloading generated binary content contains utf-8 encoded chars in disk-file

I am trying to save a generated zip-file to disk from within a chrome extension with the follwing code:
function sendFile (nm, file) {
var a = document.createElement('a');
a.href = window.URL.createObjectURL(file);
a.download = nm; // file name
a.style.display = 'none';
document.body.appendChild(a);
a.click();
document.body.removeChild(a);
}
function downloadZip (nm) {
window.URL = window.webkitURL || window.URL;
var content;
content = zip.generate();
var file = new Blob ([content], {type:'application/base64'});
sendFile ("x.b64", file);
content = zip.generate({base64:false});
var file = new Blob ([content], {type:'application/binary'});
sendFile ("x.zip", file);
}
Currently this saves the contents of my zip in two versions, the first one is base64 encoded, and when I decode it with base64 -d the resulting zip is ok.
The second version should just save the raw data (the zip file), but this raw data arrives utf-8 encoded on my disk. (each value >= 0x80 is preprended with 0xc2). So how to get rid of this utf-8 encoding? Tried various type-strings like application/zip, or ommitting the type info completely, it just arrives always with utf-8 encoding. I am also curious how to make the browser store/convert base64-data (the first case) by itself, so that they arrive as decoded binary data on my disk... I'm using Chrome Version 23.0.1271.95 m
PS: The second content I analysed with a hexdump-utility inside the browser: it does not contain utf-8 encodings (or my hexdump calls something which does implicit conversion). For completeness (sorry, its just transposed from c, so it might not be that cool js-code), I append it here:
function hex (bytes, val) {
var ret="";
var tmp="";
for (var i=0;i<bytes;i++) {
tmp=val.toString (16);
if (tmp.length<2)
tmp="0"+tmp;
ret=tmp+ret;
val>>=8;
}
return ret;
}
function hexdump (buf, len) {
var p=0;
while (p<len) {
line=hex (2,p);
var i;
for (i=0;i<16;i++) {
if (i==8)
line +=" ";
if (p+i<len)
line+=" "+hex(1,buf.charCodeAt(p+i));
else
line+=" ";
}
line+=" |";
for (i=0;i<16;i++) {
if (p+i<len) {
var cc=buf.charCodeAt (p+i);
line+= ((cc>=32)&&(cc<=127)&&(cc!='|')?String.fromCharCode(cc):'.');
}
}
p+=16;
console.log (line);
}
}
From working draft:
If element is a DOMString, run the following substeps:
Let s be the result of converting element to a sequence of Unicode characters [Unicode] using the algorithm for doing so in WebIDL
[WebIDL].
Encode s as UTF-8 and append the resulting bytes to bytes.
So strings are always converted to UTF-8, and there is no parameter to affect this. This doesn't affect base64 strings because they only contain characters that match single byte per codepoint, with the codepoint and byte having the same value. Luckily Blob exposes lower level interface (direct bytes), so that limitation doesn't really matter.
You could do this:
var binaryString = zip.generate({base64: false}), //By glancing over the source I trust the string is in "binary" form
len = binaryString.length, //I.E. having only code points 0 - 255 that represent bytes
bytes = new Uint8Array(len);
for( var i = 0; i < len; ++i ) {
bytes[i] = binaryString.charCodeAt(i);
}
var file = new Blob([bytes], {type:'application/zip'});
sendFile( "myzip.zip", file );

Categories