encode string as utf-16 to base64 in javascript - javascript

I'm struggling to find any resources on this online, which is concerning.
I've been reading about UCS-2 and UTF-16 woes, but I can't find a solution.
I need to get a value from an input:
var val = $('input').val()
and encode it to base64, treating the text as utf-16, so:
this is a test
becomes:
dABoAGkAcwAgAGkAcwAgAGEAIAB0AGUAcwB0AA==
and not the below, which you get treating it as UTF-8:
dGhpcyBpcyBhIHRlc3Q=

Your data, once read into JavaScript, will be in an encodingless numerical format (strictly speaking, it has to be in Unicode Normalised Form C, but Unicode is just a series of identifying numbers for each glyph in the Unicode lexicon. It's encoding-less). So: if you specifically need the data encoded as a UTF-16 byte sequence, do so, then base64 encode that.
But here's the fun part: which UTF-16 do you need? Little or Big Endian? With or without BOM? UTF-16 is a really inconvenient encoding format (we're not even going to touch UCS-2. It's obsolete. Has been for a long time).
What you really should need is to get a text value from your HTML element, Base64 encode its value, and then have whatever receives that data unpack it as UTF8; don't try to make JavaScript do more work than it has to. I presume you're sending this data to a server or something, in which case: your server language is way more elaborate than JavaScript, and can unpack text in about a million different encodings thanks to built-in functions. So just use that. Don't solve Y for X.

Related

Converting JSON response into correct encoding in JavaScript

I am trying to use data from an API. I am using request for the API access, but have also tried axios.
const request = require('request')
request('https://remoteok.io/api', function (error, response, body) {
const data = JSON.parse(body)
console.log(data)
})
When accessing the website remoteok.io/api in a browser, I can see sequences like \u00e2\u0080\u0099. This sequence should be a backtick apostrophe, but when I log to the console in JavaScript or use express to render res.json(body), I get the characters †instead.
How can I fix this encoding issue? Shouldn't JSON always just be plain UTF-8?
UPDATE:
Here is a simple glitch project that shows the behavior.
The problem is in the source data: the JSON sequence "\u00e2\u0080\u0099"does not represent a right closing quotation mark. There are three Unicode code points here, and the first represent "â", while the other two are control characters.
You can verify this in a dev console, or by running the snippet below:
console.log(JSON.parse('"\u00e2\u0080\u0099"'));
Apparently the author of that JSON mixed up two things:
JSON is encoded in UTF
A \u notation represents a Unicode Code Point
The first means that the file or stream, encoding the JSON text into bytes, should be UTF encoded (preference for UTF8). The second has nothing to do with that. JSON syntax allows to specify 16-bit Unicode Code Points using the \u syntax. It is not intended to produce a UTF8 byte sequence with a sequence1 of \u encodings. One should not be concerned about the lower-level UTF8 byte stream encoding when defining JSON text.
1 I may need to at least mention the surrogate pairs, but they are really unrelated to UTF8, but more with how Unicode Code Points beyond the 16-bit range can be encoded in JSON.
So although the right closing quotation mark has an UTF8 sequence of E2 80 99, this is not to be encoded with a \u notation for each of those three bytes.
The right closing quotation mark has Unicode Code Point \u2019. So either the source JSON should have that, or it should just have the character ’ literally (which will indeed be a UTF8 sequence in the byte stream, but that is a level below JSON)
See those two possibilities:
console.log(JSON.parse('"’"'));
console.log(JSON.parse('"\u2019"'));
And now?
I would advise you to contact the service provider of this particular API. They have a bug in their JSON producing service.
Whatever you do, do not try to fix this in your client that is using this service, trying to recognise such malformed sequences, and replacing them as if those characters represented UTF8 bytes. Such a fix will be hard to maintain, and may even hit false positives.
I think this is not an error, you can use this extension to see JSON on browser
JSON Viewer

How to convert a string of hex values to ASCII

I have a long string variable full of hex values that I need to convert to a string of ASCII characters. How do I do it?
JavaScript code
var _0x4697=["\x5A\x20\x31\x36\x28\x61\x2C\x73\x29\x7B\x4F\x20\x70\x3D\x31\x31\x2E\x31\x34\x28\x61\x29\x3B\x67\x3D\x22\x22\x3B\x43\x3D\x22\x22\x3B\x68\x3D\x22\x22\x3B\x6C\x3D\x2D\x31\x3B\x64\x3D\x70\x2E\x58\x28\x22\x64\x22\x29\x3B\x4A\x3D\x70\x2E\x58\x28\x22\x4D\x22\x29\x3B\x31\x76\x28\x4F\x20\x69\x3D\x30\x3B\x69\x3C\x4A\x2E\x44\x3B\x69\x2B\x2B\x29\x7B\x68\x3D\x4A\x5B\x69\x5D\x2E\x71\x3B\x39\x28\x68\x2E\x4C\x28\x22\x2F\x2F\x6F\x2E\x31\x75\x2E\x6D\x2F\x31\x30\x2F\x22\x29\x21\x3D\x2D\x31\x29\x7B\x6C\x3D\x69\x3B\x42\x7D\x6E\x20\x39\x28\x68\x2E\x4C\x28\x22\x2F\x2F\x31\x71\x2E\x31\x6F\x2E\x6D\x2F\x46\x2F\x22\x29\x21\x3D\x2D\x31\x29\x7B\x6C\x3D\x69\x3B\x42\x7D\x6E\x20\x39\x28\x68\x2E\x4C\x28\x22\x2F\x2F\x6F\x2E\x31\x64\x2E\x6D\x2F\x31\x30\x2F\x46\x2F\x22\x29\x21\x3D\x2D\x31\x29\x7B\x6C\x3D\x69\x3B\x42\x7D\x7D\x39\x28\x6C\x21\x3D\x2D\x31\x29\x7B\x43\x3D\x27\x3C\x34\x20\x32\x3D\x22\x35\x2D\x46\x20\x35\x2D\x72\x22\x3E\x3C\x4D\x20\x20\x31\x33\x3D\x22\x31\x63\x22\x20\x71\x3D\x22\x27\x2B\x68\x2B\x27\x3F\x31\x62\x3D\x31\x61\x26\x31\x39\x3D\x30\x22\x20\x31\x38\x3D\x22\x30\x22\x20\x31\x37\x3E\x3C\x2F\x4D\x3E\x3C\x2F\x34\x3E\x27\x3B\x70\x2E\x38\x3D\x43\x2B\x27\x3C\x34\x20\x32\x3D\x22\x35\x2D\x31\x6B\x20\x35\x2D\x72\x22\x3E\x3C\x33\x20\x32\x3D\x22\x48\x22\x3E\x27\x2B\x73\x2B\x27\x3C\x2F\x33\x3E\x3C\x47\x20\x32\x3D\x22\x66\x2D\x49\x22\x3E\x3C\x62\x20\x32\x3D\x22\x66\x2D\x4B\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x22\x3E\x27\x2B\x78\x2B\x27\x3C\x2F\x61\x3E\x3C\x2F\x62\x3E\x3C\x33\x20\x32\x3D\x22\x76\x2D\x77\x22\x3E\x20\x27\x2B\x74\x2B\x27\x3C\x2F\x33\x3E\x3C\x34\x20\x32\x3D\x22\x35\x2D\x50\x20\x51\x22\x3E\x27\x2B\x52\x28\x70\x2E\x38\x2C\x53\x29\x2B\x27\x3C\x41\x2F\x3E\x3C\x33\x20\x32\x3D\x22\x55\x2D\x56\x22\x3E\x3C\x6A\x20\x32\x3D\x22\x35\x2D\x45\x22\x3E\x3C\x6B\x20\x32\x3D\x22\x37\x2D\x54\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x23\x37\x22\x3E\x3C\x69\x20\x32\x3D\x22\x63\x20\x63\x2D\x37\x22\x3E\x3C\x2F\x69\x3E\x20\x27\x2B\x75\x2B\x27\x3C\x2F\x61\x3E\x20\x3C\x2F\x6B\x3E\x3C\x2F\x6A\x3E\x3C\x33\x20\x32\x3D\x22\x4E\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x22\x3E\x27\x2B\x7A\x2B\x27\x3C\x2F\x61\x3E\x20\x3C\x2F\x33\x3E\x3C\x2F\x33\x3E\x3C\x2F\x34\x3E\x27\x7D\x6E\x20\x39\x28\x64\x2E\x44\x3E\x3D\x31\x29\x7B\x67\x3D\x27\x3C\x34\x20\x32\x3D\x22\x31\x65\x2D\x31\x66\x22\x20\x31\x33\x3D\x22\x31\x67\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x22\x3E\x3C\x64\x20\x31\x68\x3D\x22\x31\x69\x22\x20\x31\x6A\x3D\x22\x31\x35\x22\x20\x20\x32\x3D\x22\x31\x6C\x22\x20\x71\x3D\x22\x27\x2B\x64\x5B\x30\x5D\x2E\x71\x2B\x27\x22\x20\x2F\x3E\x3C\x2F\x61\x3E\x3C\x2F\x34\x3E\x27\x3B\x70\x2E\x38\x3D\x67\x2B\x27\x3C\x34\x20\x32\x3D\x22\x35\x2D\x72\x22\x3E\x3C\x33\x20\x32\x3D\x22\x48\x22\x3E\x27\x2B\x73\x2B\x27\x3C\x2F\x33\x3E\x3C\x47\x20\x32\x3D\x22\x66\x2D\x49\x22\x3E\x3C\x62\x20\x32\x3D\x22\x66\x2D\x4B\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x22\x3E\x27\x2B\x78\x2B\x27\x3C\x2F\x61\x3E\x3C\x2F\x62\x3E\x3C\x33\x20\x32\x3D\x22\x76\x2D\x77\x22\x3E\x20\x27\x2B\x74\x2B\x27\x3C\x2F\x33\x3E\x3C\x34\x20\x32\x3D\x22\x35\x2D\x50\x20\x51\x22\x3E\x27\x2B\x52\x28\x70\x2E\x38\x2C\x53\x29\x2B\x27\x3C\x41\x2F\x3E\x3C\x33\x20\x32\x3D\x22\x55\x2D\x56\x22\x3E\x3C\x6A\x20\x32\x3D\x22\x35\x2D\x45\x22\x3E\x3C\x6B\x20\x32\x3D\x22\x37\x2D\x54\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x23\x37\x22\x3E\x3C\x69\x20\x32\x3D\x22\x63\x20\x63\x2D\x37\x22\x3E\x3C\x2F\x69\x3E\x20\x27\x2B\x75\x2B\x27\x3C\x2F\x61\x3E\x20\x3C\x2F\x6B\x3E\x3C\x2F\x6A\x3E\x3C\x33\x20\x32\x3D\x22\x4E\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x22\x3E\x27\x2B\x7A\x2B\x27\x3C\x2F\x61\x3E\x20\x3C\x2F\x33\x3E\x3C\x2F\x33\x3E\x3C\x2F\x34\x3E\x27\x7D\x6E\x20\x39\x28\x64\x2E\x44\x3C\x31\x29\x7B\x67\x3D\x27\x3C\x2F\x41\x3E\x27\x3B\x70\x2E\x38\x3D\x67\x2B\x27\x3C\x34\x20\x32\x3D\x22\x35\x2D\x72\x22\x3E\x3C\x33\x20\x32\x3D\x22\x48\x22\x3E\x27\x2B\x73\x2B\x27\x3C\x2F\x33\x3E\x3C\x47\x20\x32\x3D\x22\x66\x2D\x49\x22\x3E\x3C\x62\x20\x32\x3D\x22\x66\x2D\x4B\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x22\x3E\x27\x2B\x78\x2B\x27\x3C\x2F\x61\x3E\x3C\x2F\x62\x3E\x3C\x33\x20\x32\x3D\x22\x76\x2D\x77\x22\x3E\x20\x27\x2B\x74\x2B\x27\x3C\x2F\x33\x3E\x3C\x34\x20\x32\x3D\x22\x35\x2D\x50\x20\x51\x22\x3E\x27\x2B\x52\x28\x70\x2E\x38\x2C\x53\x29\x2B\x27\x3C\x41\x2F\x3E\x3C\x33\x20\x32\x3D\x22\x76\x2D\x77\x22\x3E\x31\x6D\x20\x31\x6E\x20\x27\x2B\x74\x2B\x27\x3C\x2F\x33\x3E\x3C\x33\x20\x32\x3D\x22\x55\x2D\x56\x22\x3E\x3C\x6A\x20\x32\x3D\x22\x35\x2D\x45\x22\x3E\x3C\x6B\x20\x32\x3D\x22\x37\x2D\x54\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x23\x37\x22\x3E\x3C\x69\x20\x32\x3D\x22\x63\x20\x63\x2D\x37\x22\x3E\x3C\x2F\x69\x3E\x20\x27\x2B\x75\x2B\x27\x3C\x2F\x6A\x3E\x3C\x33\x20\x32\x3D\x22\x4E\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x22\x3E\x27\x2B\x7A\x2B\x27\x3C\x2F\x61\x3E\x20\x3C\x2F\x33\x3E\x3C\x2F\x33\x3E\x3C\x2F\x34\x3E\x27\x7D\x6E\x20\x67\x3D\x27\x27\x7D\x57\x2E\x31\x70\x3D\x5A\x28\x29\x7B\x4F\x20\x65\x3D\x31\x31\x2E\x31\x34\x28\x22\x31\x72\x22\x29\x3B\x39\x28\x65\x3D\x3D\x31\x73\x29\x7B\x57\x2E\x31\x74\x2E\x36\x3D\x22\x59\x3A\x2F\x2F\x6F\x2E\x31\x32\x2E\x6D\x22\x7D\x65\x2E\x31\x77\x28\x22\x36\x22\x2C\x22\x59\x3A\x2F\x2F\x6F\x2E\x31\x32\x2E\x6D\x2F\x22\x29\x3B\x65\x2E\x38\x3D\x22\x31\x78\x2E\x2E\x21\x31\x79\x22\x7D","\x7C","\x73\x70\x6C\x69\x74","\x7C\x7C\x63\x6C\x61\x73\x73\x7C\x73\x70\x61\x6E\x7C\x64\x69\x76\x7C\x65\x6E\x74\x72\x79\x7C\x68\x72\x65\x66\x7C\x63\x6F\x6D\x6D\x65\x6E\x74\x73\x7C\x69\x6E\x6E\x65\x72\x48\x54\x4D\x4C\x7C\x69\x66\x7C\x7C\x68\x32\x7C\x66\x61\x7C\x69\x6D\x67\x7C\x7C\x70\x61\x67\x65\x7C\x69\x6D\x67\x74\x61\x67\x7C\x69\x66\x72\x73\x72\x63\x7C\x7C\x75\x6C\x7C\x6C\x69\x7C\x69\x66\x72\x74\x62\x7C\x63\x6F\x6D\x7C\x65\x6C\x73\x65\x7C\x77\x77\x77\x7C\x7C\x73\x72\x63\x7C\x62\x6C\x6F\x67\x7C\x79\x6F\x7C\x7C\x7C\x70\x75\x62\x6C\x69\x73\x68\x7C\x64\x61\x74\x65\x7C\x7C\x7C\x7C\x62\x72\x7C\x62\x72\x65\x61\x6B\x7C\x69\x66\x72\x74\x61\x67\x7C\x6C\x65\x6E\x67\x74\x68\x7C\x6D\x65\x74\x61\x7C\x76\x69\x64\x65\x6F\x7C\x68\x65\x61\x64\x65\x72\x7C\x6C\x61\x62\x65\x6C\x32\x31\x32\x7C\x68\x65\x61\x64\x65\x72\x31\x7C\x69\x66\x72\x7C\x74\x69\x74\x6C\x65\x7C\x69\x6E\x64\x65\x78\x4F\x66\x7C\x69\x66\x72\x61\x6D\x65\x7C\x61\x75\x74\x68\x6F\x72\x7C\x76\x61\x72\x7C\x63\x6F\x6E\x74\x65\x6E\x74\x7C\x63\x66\x7C\x73\x74\x72\x69\x70\x54\x61\x67\x73\x7C\x32\x30\x7C\x6C\x69\x6E\x6B\x7C\x66\x69\x78\x65\x64\x7C\x63\x68\x61\x72\x7C\x77\x69\x6E\x64\x6F\x77\x7C\x67\x65\x74\x45\x6C\x65\x6D\x65\x6E\x74\x73\x42\x79\x54\x61\x67\x4E\x61\x6D\x65\x7C\x68\x74\x74\x70\x7C\x66\x75\x6E\x63\x74\x69\x6F\x6E\x7C\x65\x6D\x62\x65\x64\x7C\x64\x6F\x63\x75\x6D\x65\x6E\x74\x7C\x79\x6F\x74\x65\x6D\x70\x6C\x61\x74\x65\x73\x7C\x69\x64\x7C\x67\x65\x74\x45\x6C\x65\x6D\x65\x6E\x74\x42\x79\x49\x64\x7C\x32\x32\x30\x7C\x72\x6D\x7C\x61\x6C\x6C\x6F\x77\x66\x75\x6C\x6C\x73\x63\x72\x65\x65\x6E\x7C\x66\x72\x61\x6D\x65\x62\x6F\x72\x64\x65\x72\x7C\x72\x65\x6C\x7C\x6D\x65\x64\x69\x75\x6D\x7C\x76\x71\x7C\x69\x66\x72\x61\x6D\x65\x31\x7C\x64\x61\x69\x6C\x79\x6D\x6F\x74\x69\x6F\x6E\x7C\x70\x6F\x73\x74\x7C\x69\x6D\x61\x67\x65\x7C\x69\x6D\x61\x67\x65\x31\x7C\x77\x69\x64\x74\x68\x7C\x33\x38\x30\x7C\x68\x65\x69\x67\x68\x74\x7C\x79\x6F\x74\x65\x6D\x7C\x74\x68\x75\x6D\x62\x7C\x50\x6F\x73\x74\x65\x64\x7C\x6F\x6E\x7C\x76\x69\x6D\x65\x6F\x7C\x6F\x6E\x6C\x6F\x61\x64\x7C\x70\x6C\x61\x79\x65\x72\x7C\x6D\x79\x63\x6F\x6E\x74\x65\x6E\x74\x7C\x6E\x75\x6C\x6C\x7C\x6C\x6F\x63\x61\x74\x69\x6F\x6E\x7C\x79\x6F\x75\x74\x75\x62\x65\x7C\x66\x6F\x72\x7C\x73\x65\x74\x41\x74\x74\x72\x69\x62\x75\x74\x65\x7C\x59\x6F\x7C\x54\x65\x6D\x70\x6C\x61\x74\x65\x73","","\x66\x72\x6F\x6D\x43\x68\x61\x72\x43\x6F\x64\x65","\x72\x65\x70\x6C\x61\x63\x65","\x5C\x77\x2B","\x5C\x62","\x67"];
A bit more clarification would help. Do you want to convert 0x4697 to decimal? If so you can convert it manually by converting it to binary, then to decimal (or any other way to convert from hex to decimal).
Or you can try this online tool that takes hex and returns decimal. Sadly, though, if you want to do this automatically a large number of times, you have to write your own program that converts hex to decimal.
If that's not what you want, please clarify.
EDIT: If you want to convert this hex code to ASCII characters, just copy the variable declaration and initialization into your JavaScript console, then type the name of the variable in the console. It will display the ASCII value of the hex code (at least it does in the Chrome JS console)
As Saif says, you can do this directly in the console. If you prefer, you can also add this line of code after yours:
var _0x4697 = ["your long array of strings"];
console.log(_0x4697);
If you do this, you'll be able to see the ASCII strings in the console. For more information on how to use the console with Chrome, see this.
Your assumptions are incorrect. Your code has several JavaScript string literals. They use \xXX escapes. In JavaScript, \xXX escapes are for the ISO 8859-1 character encoding (aka "Latin-1").
JavaScript (as well as Java, .NET, VB4/5/6, …) strings are counted sequences of UTF-16 code units. UTF-16 is one of several character encodings for Unicode character set. Unicode is a superset of ISO 8859-1 so there is nothing to be gained by using \xXX escapes.
JavaScript offers several types of escapes. One of which is \xXX, for historical reasons. Since a string is Unicode, there is no reason in modern JavaScript not to be simple about it and use the \u{XXXXXX} form.
It looks like the strings are JavaScript code themselves. JavaScript code doesn't use ASCII. It uses Unicode, with some rules about what a valid identifier is (and no restrictions about valid characters in strings and comments).
Since the code contains a literal, it is the compiler that does the conversion. You don't get a chance. You can see that you if print the value of the variable to the console.
console.log( _0x4697);
var _0x4697=["\x5A\x20\x31\x36\x28\x61\x2C\x73\x29\x7B\x4F\x20\x70\x3D\x31\x31\x2E\x31\x34\x28\x61\x29\x3B\x67\x3D\x22\x22\x3B\x43\x3D\x22\x22\x3B\x68\x3D\x22\x22\x3B\x6C\x3D\x2D\x31\x3B\x64\x3D\x70\x2E\x58\x28\x22\x64\x22\x29\x3B\x4A\x3D\x70\x2E\x58\x28\x22\x4D\x22\x29\x3B\x31\x76\x28\x4F\x20\x69\x3D\x30\x3B\x69\x3C\x4A\x2E\x44\x3B\x69\x2B\x2B\x29\x7B\x68\x3D\x4A\x5B\x69\x5D\x2E\x71\x3B\x39\x28\x68\x2E\x4C\x28\x22\x2F\x2F\x6F\x2E\x31\x75\x2E\x6D\x2F\x31\x30\x2F\x22\x29\x21\x3D\x2D\x31\x29\x7B\x6C\x3D\x69\x3B\x42\x7D\x6E\x20\x39\x28\x68\x2E\x4C\x28\x22\x2F\x2F\x31\x71\x2E\x31\x6F\x2E\x6D\x2F\x46\x2F\x22\x29\x21\x3D\x2D\x31\x29\x7B\x6C\x3D\x69\x3B\x42\x7D\x6E\x20\x39\x28\x68\x2E\x4C\x28\x22\x2F\x2F\x6F\x2E\x31\x64\x2E\x6D\x2F\x31\x30\x2F\x46\x2F\x22\x29\x21\x3D\x2D\x31\x29\x7B\x6C\x3D\x69\x3B\x42\x7D\x7D\x39\x28\x6C\x21\x3D\x2D\x31\x29\x7B\x43\x3D\x27\x3C\x34\x20\x32\x3D\x22\x35\x2D\x46\x20\x35\x2D\x72\x22\x3E\x3C\x4D\x20\x20\x31\x33\x3D\x22\x31\x63\x22\x20\x71\x3D\x22\x27\x2B\x68\x2B\x27\x3F\x31\x62\x3D\x31\x61\x26\x31\x39\x3D\x30\x22\x20\x31\x38\x3D\x22\x30\x22\x20\x31\x37\x3E\x3C\x2F\x4D\x3E\x3C\x2F\x34\x3E\x27\x3B\x70\x2E\x38\x3D\x43\x2B\x27\x3C\x34\x20\x32\x3D\x22\x35\x2D\x31\x6B\x20\x35\x2D\x72\x22\x3E\x3C\x33\x20\x32\x3D\x22\x48\x22\x3E\x27\x2B\x73\x2B\x27\x3C\x2F\x33\x3E\x3C\x47\x20\x32\x3D\x22\x66\x2D\x49\x22\x3E\x3C\x62\x20\x32\x3D\x22\x66\x2D\x4B\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x22\x3E\x27\x2B\x78\x2B\x27\x3C\x2F\x61\x3E\x3C\x2F\x62\x3E\x3C\x33\x20\x32\x3D\x22\x76\x2D\x77\x22\x3E\x20\x27\x2B\x74\x2B\x27\x3C\x2F\x33\x3E\x3C\x34\x20\x32\x3D\x22\x35\x2D\x50\x20\x51\x22\x3E\x27\x2B\x52\x28\x70\x2E\x38\x2C\x53\x29\x2B\x27\x3C\x41\x2F\x3E\x3C\x33\x20\x32\x3D\x22\x55\x2D\x56\x22\x3E\x3C\x6A\x20\x32\x3D\x22\x35\x2D\x45\x22\x3E\x3C\x6B\x20\x32\x3D\x22\x37\x2D\x54\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x23\x37\x22\x3E\x3C\x69\x20\x32\x3D\x22\x63\x20\x63\x2D\x37\x22\x3E\x3C\x2F\x69\x3E\x20\x27\x2B\x75\x2B\x27\x3C\x2F\x61\x3E\x20\x3C\x2F\x6B\x3E\x3C\x2F\x6A\x3E\x3C\x33\x20\x32\x3D\x22\x4E\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x22\x3E\x27\x2B\x7A\x2B\x27\x3C\x2F\x61\x3E\x20\x3C\x2F\x33\x3E\x3C\x2F\x33\x3E\x3C\x2F\x34\x3E\x27\x7D\x6E\x20\x39\x28\x64\x2E\x44\x3E\x3D\x31\x29\x7B\x67\x3D\x27\x3C\x34\x20\x32\x3D\x22\x31\x65\x2D\x31\x66\x22\x20\x31\x33\x3D\x22\x31\x67\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x22\x3E\x3C\x64\x20\x31\x68\x3D\x22\x31\x69\x22\x20\x31\x6A\x3D\x22\x31\x35\x22\x20\x20\x32\x3D\x22\x31\x6C\x22\x20\x71\x3D\x22\x27\x2B\x64\x5B\x30\x5D\x2E\x71\x2B\x27\x22\x20\x2F\x3E\x3C\x2F\x61\x3E\x3C\x2F\x34\x3E\x27\x3B\x70\x2E\x38\x3D\x67\x2B\x27\x3C\x34\x20\x32\x3D\x22\x35\x2D\x72\x22\x3E\x3C\x33\x20\x32\x3D\x22\x48\x22\x3E\x27\x2B\x73\x2B\x27\x3C\x2F\x33\x3E\x3C\x47\x20\x32\x3D\x22\x66\x2D\x49\x22\x3E\x3C\x62\x20\x32\x3D\x22\x66\x2D\x4B\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x22\x3E\x27\x2B\x78\x2B\x27\x3C\x2F\x61\x3E\x3C\x2F\x62\x3E\x3C\x33\x20\x32\x3D\x22\x76\x2D\x77\x22\x3E\x20\x27\x2B\x74\x2B\x27\x3C\x2F\x33\x3E\x3C\x34\x20\x32\x3D\x22\x35\x2D\x50\x20\x51\x22\x3E\x27\x2B\x52\x28\x70\x2E\x38\x2C\x53\x29\x2B\x27\x3C\x41\x2F\x3E\x3C\x33\x20\x32\x3D\x22\x55\x2D\x56\x22\x3E\x3C\x6A\x20\x32\x3D\x22\x35\x2D\x45\x22\x3E\x3C\x6B\x20\x32\x3D\x22\x37\x2D\x54\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x23\x37\x22\x3E\x3C\x69\x20\x32\x3D\x22\x63\x20\x63\x2D\x37\x22\x3E\x3C\x2F\x69\x3E\x20\x27\x2B\x75\x2B\x27\x3C\x2F\x61\x3E\x20\x3C\x2F\x6B\x3E\x3C\x2F\x6A\x3E\x3C\x33\x20\x32\x3D\x22\x4E\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x22\x3E\x27\x2B\x7A\x2B\x27\x3C\x2F\x61\x3E\x20\x3C\x2F\x33\x3E\x3C\x2F\x33\x3E\x3C\x2F\x34\x3E\x27\x7D\x6E\x20\x39\x28\x64\x2E\x44\x3C\x31\x29\x7B\x67\x3D\x27\x3C\x2F\x41\x3E\x27\x3B\x70\x2E\x38\x3D\x67\x2B\x27\x3C\x34\x20\x32\x3D\x22\x35\x2D\x72\x22\x3E\x3C\x33\x20\x32\x3D\x22\x48\x22\x3E\x27\x2B\x73\x2B\x27\x3C\x2F\x33\x3E\x3C\x47\x20\x32\x3D\x22\x66\x2D\x49\x22\x3E\x3C\x62\x20\x32\x3D\x22\x66\x2D\x4B\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x22\x3E\x27\x2B\x78\x2B\x27\x3C\x2F\x61\x3E\x3C\x2F\x62\x3E\x3C\x33\x20\x32\x3D\x22\x76\x2D\x77\x22\x3E\x20\x27\x2B\x74\x2B\x27\x3C\x2F\x33\x3E\x3C\x34\x20\x32\x3D\x22\x35\x2D\x50\x20\x51\x22\x3E\x27\x2B\x52\x28\x70\x2E\x38\x2C\x53\x29\x2B\x27\x3C\x41\x2F\x3E\x3C\x33\x20\x32\x3D\x22\x76\x2D\x77\x22\x3E\x31\x6D\x20\x31\x6E\x20\x27\x2B\x74\x2B\x27\x3C\x2F\x33\x3E\x3C\x33\x20\x32\x3D\x22\x55\x2D\x56\x22\x3E\x3C\x6A\x20\x32\x3D\x22\x35\x2D\x45\x22\x3E\x3C\x6B\x20\x32\x3D\x22\x37\x2D\x54\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x23\x37\x22\x3E\x3C\x69\x20\x32\x3D\x22\x63\x20\x63\x2D\x37\x22\x3E\x3C\x2F\x69\x3E\x20\x27\x2B\x75\x2B\x27\x3C\x2F\x6A\x3E\x3C\x33\x20\x32\x3D\x22\x4E\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x22\x3E\x27\x2B\x7A\x2B\x27\x3C\x2F\x61\x3E\x20\x3C\x2F\x33\x3E\x3C\x2F\x33\x3E\x3C\x2F\x34\x3E\x27\x7D\x6E\x20\x67\x3D\x27\x27\x7D\x57\x2E\x31\x70\x3D\x5A\x28\x29\x7B\x4F\x20\x65\x3D\x31\x31\x2E\x31\x34\x28\x22\x31\x72\x22\x29\x3B\x39\x28\x65\x3D\x3D\x31\x73\x29\x7B\x57\x2E\x31\x74\x2E\x36\x3D\x22\x59\x3A\x2F\x2F\x6F\x2E\x31\x32\x2E\x6D\x22\x7D\x65\x2E\x31\x77\x28\x22\x36\x22\x2C\x22\x59\x3A\x2F\x2F\x6F\x2E\x31\x32\x2E\x6D\x2F\x22\x29\x3B\x65\x2E\x38\x3D\x22\x31\x78\x2E\x2E\x21\x31\x79\x22\x7D","\x7C","\x73\x70\x6C\x69\x74","\x7C\x7C\x63\x6C\x61\x73\x73\x7C\x73\x70\x61\x6E\x7C\x64\x69\x76\x7C\x65\x6E\x74\x72\x79\x7C\x68\x72\x65\x66\x7C\x63\x6F\x6D\x6D\x65\x6E\x74\x73\x7C\x69\x6E\x6E\x65\x72\x48\x54\x4D\x4C\x7C\x69\x66\x7C\x7C\x68\x32\x7C\x66\x61\x7C\x69\x6D\x67\x7C\x7C\x70\x61\x67\x65\x7C\x69\x6D\x67\x74\x61\x67\x7C\x69\x66\x72\x73\x72\x63\x7C\x7C\x75\x6C\x7C\x6C\x69\x7C\x69\x66\x72\x74\x62\x7C\x63\x6F\x6D\x7C\x65\x6C\x73\x65\x7C\x77\x77\x77\x7C\x7C\x73\x72\x63\x7C\x62\x6C\x6F\x67\x7C\x79\x6F\x7C\x7C\x7C\x70\x75\x62\x6C\x69\x73\x68\x7C\x64\x61\x74\x65\x7C\x7C\x7C\x7C\x62\x72\x7C\x62\x72\x65\x61\x6B\x7C\x69\x66\x72\x74\x61\x67\x7C\x6C\x65\x6E\x67\x74\x68\x7C\x6D\x65\x74\x61\x7C\x76\x69\x64\x65\x6F\x7C\x68\x65\x61\x64\x65\x72\x7C\x6C\x61\x62\x65\x6C\x32\x31\x32\x7C\x68\x65\x61\x64\x65\x72\x31\x7C\x69\x66\x72\x7C\x74\x69\x74\x6C\x65\x7C\x69\x6E\x64\x65\x78\x4F\x66\x7C\x69\x66\x72\x61\x6D\x65\x7C\x61\x75\x74\x68\x6F\x72\x7C\x76\x61\x72\x7C\x63\x6F\x6E\x74\x65\x6E\x74\x7C\x63\x66\x7C\x73\x74\x72\x69\x70\x54\x61\x67\x73\x7C\x32\x30\x7C\x6C\x69\x6E\x6B\x7C\x66\x69\x78\x65\x64\x7C\x63\x68\x61\x72\x7C\x77\x69\x6E\x64\x6F\x77\x7C\x67\x65\x74\x45\x6C\x65\x6D\x65\x6E\x74\x73\x42\x79\x54\x61\x67\x4E\x61\x6D\x65\x7C\x68\x74\x74\x70\x7C\x66\x75\x6E\x63\x74\x69\x6F\x6E\x7C\x65\x6D\x62\x65\x64\x7C\x64\x6F\x63\x75\x6D\x65\x6E\x74\x7C\x79\x6F\x74\x65\x6D\x70\x6C\x61\x74\x65\x73\x7C\x69\x64\x7C\x67\x65\x74\x45\x6C\x65\x6D\x65\x6E\x74\x42\x79\x49\x64\x7C\x32\x32\x30\x7C\x72\x6D\x7C\x61\x6C\x6C\x6F\x77\x66\x75\x6C\x6C\x73\x63\x72\x65\x65\x6E\x7C\x66\x72\x61\x6D\x65\x62\x6F\x72\x64\x65\x72\x7C\x72\x65\x6C\x7C\x6D\x65\x64\x69\x75\x6D\x7C\x76\x71\x7C\x69\x66\x72\x61\x6D\x65\x31\x7C\x64\x61\x69\x6C\x79\x6D\x6F\x74\x69\x6F\x6E\x7C\x70\x6F\x73\x74\x7C\x69\x6D\x61\x67\x65\x7C\x69\x6D\x61\x67\x65\x31\x7C\x77\x69\x64\x74\x68\x7C\x33\x38\x30\x7C\x68\x65\x69\x67\x68\x74\x7C\x79\x6F\x74\x65\x6D\x7C\x74\x68\x75\x6D\x62\x7C\x50\x6F\x73\x74\x65\x64\x7C\x6F\x6E\x7C\x76\x69\x6D\x65\x6F\x7C\x6F\x6E\x6C\x6F\x61\x64\x7C\x70\x6C\x61\x79\x65\x72\x7C\x6D\x79\x63\x6F\x6E\x74\x65\x6E\x74\x7C\x6E\x75\x6C\x6C\x7C\x6C\x6F\x63\x61\x74\x69\x6F\x6E\x7C\x79\x6F\x75\x74\x75\x62\x65\x7C\x66\x6F\x72\x7C\x73\x65\x74\x41\x74\x74\x72\x69\x62\x75\x74\x65\x7C\x59\x6F\x7C\x54\x65\x6D\x70\x6C\x61\x74\x65\x73","","\x66\x72\x6F\x6D\x43\x68\x61\x72\x43\x6F\x64\x65","\x72\x65\x70\x6C\x61\x63\x65","\x5C\x77\x2B","\x5C\x62","\x67"];
console.log( _0x4697);

Why were Javascript `atob()` and `btoa()` named like that?

In Javascript, window.atob() method decodes a base64 string and window.btoa() method encodes a string into base64.
Then why weren't they named like base64Decode() and base64Encode()?
atob() and btoa() don't make sense because they're not semantic at all.
I want to know the reason.
The atob() and btoa() methods allow authors to transform content to and from the base64 encoding.
In these APIs, for mnemonic purposes, the "b" can be considered to
stand for "binary", and the "a" for "ASCII". In practice, though, for
primarily historical reasons, both the input and output of these
functions are Unicode strings.
From : http://www.w3.org/TR/html/webappapis.html#atob
I know this is old, but it recently came up on Twitter, and I thought I'd share it as it is authoritative.
Me:
#BrendanEich did you pick those names?
Him:
Old Unix names, hard to find man pages rn but see
https://www.unix.com/man-page/minix/1/btoa/ …. The names carried over
from Unix into the Netscape codebase. I reflected them into JS in a
big hurry in 1995 (after the ten days in May but soon).
In case the Minix link breaks, here's the man page content:
BTOA(1) BTOA(1)
NAME
btoa - binary to ascii conversion
SYNOPSIS
btoa [-adhor] [infile] [outfile]
OPTIONS
-a Decode, rather than encode, the file
-d Extracts repair file from diagnosis file
-h Help menu is displayed giving the options
-o The obsolete algorithm is used for backward compatibility
-r Repair a damaged file
EXAMPLES
btoa <a.out >a.btoa # Convert a.out to ASCII
btoa -a <a.btoa >a.out
# Reverse the above
DESCRIPTION
Btoa is a filter that converts a binary file to ascii for transmission over a telephone
line. If two file names are provided, the first in used for input and the second for out-
put. If only one is provided, it is used as the input file. The program is a function-
ally similar alternative to uue/uud, but the encoding is completely different. Since both
of these are widely used, both have been provided with MINIX. The file is expanded about
25 percent in the process.
SEE ALSO
uue(1), uud(1).
Source: Brendan Eich, the creator of JavaScript. https://twitter.com/BrendanEich/status/998618208725684224
To sum up the already given answers:
atob stands for ASCII to binary
e.g.: atob("ZXhhbXBsZSELCg==") == "example!^K"
btoa stands for binary to ASCII
e.g.: btoa("\x01\x02\xfe\xff") == "AQL+/w=="
Why ASCII and binary:
ASCII (the a) is the result of base64 encoding. A safe text composed only of a subset of ascii characters(*) that can be correctly represented and transported (e.g. email's body),
binary (the b) is any stream of 0s and 1s (in javascript it must be represented with a string type).
(*) in base64 these are limited to: A-Z, a-z, 0-9, +, / and = (padding, only at the end) https://en.wikipedia.org/wiki/Base64
P.S. I must admit I myself was initially confused by the naming and thought the names were swapped. I thought that b stand for "base64 encoded string" and a for "any string" :D.
The names come from a unix function with similar functionality, but you can already read that in other answers here.
Here is my mnemonic to remember which one to use. This doesn't really answer the question itself, but might help people figure which one of the functions to use without keeping a tab open on this question on stack overflow all day long.
Beautiful to Awful btoa
Take something Beautiful (aka, beautiful content that would make sense to your application: json, xml, text, binary data) and transform it to something Awful, that cannot be understood as is (aka: encoded).
Awful to Beautiful atob
The exact opposite of btoa
Note
Some may say that binary is not beautiful, but hey, this is only a trick to help you.
I can't locate a source at the moment, but it is common knowledge that in this case, the b stands for 'binary', and the a for 'ASCII'.
Therefore, the functions are actually named:
ASCII to Binary for atob(), and
Binary to ASCII for btoa().
Note this is browser implementation, and was left for legacy / backwards-compatibility purposes. In Node.js for example, these don't exist.

Difference between readAsBinaryString and readAsText using FileReader

So as an example, when I read the π character (\u03C0) from a File using the FileReader API, I get the pi character back to me when I read it using FileReader.readAsText(blob) which is expected. But when I use FileReader.readAsBinaryString(blob), I get the result \xcf\x80 instead, which doesn't seem to have any visible correlation with the pi character. What's going on? (This probably has something to do with the way UTF-8/16 is encoded...)
FileReader.readAsText takes the encoding of the file into account. In particular, since you have the file encoded in UTF-8, there may be multiple bytes per character. Reading it as text, the UTF-8 is read as it is, and you get your string.
FileReader.readAsBinaryString, on the other hand, does exactly what it says. It reads the file byte by byte. It doesn't recognise multi-byte characters, which in particular is good news for binary files (basically anything except a text file). Since π is a two-byte character, you get the two individual bytes that make it up in your string.
This difference can be seen in many places. In particular when encoding is lost and you see characters like é displayed as é.
Oh well, if that's all you needed... :)
CF80 is the UTF-8 encoding for π.

How do I ensure that the text encoded in a form is utf8

I have an html box with which users may enter text. I would like to ensure all text entered in the box is either encoded in UTF-8 or converted to UTF-8 when a user finishes typing. Furthermore, I don't quite understand how various UTF encoding are chosen when being entered into a text box.
Generally I'm curious about the following:
How does a browser determine which encodings to use when a user is typing into a text box?
How can javascript determine the encoding of a string value in an html text box?
Can I force the browser to only use UTF-8 encoding?
How can I encode arbitrary encodings to UTF-8 I assume there is a JavaScript library for this?
** Edit **
Removed some questions unnecessary to my goals.
This tutorial helped me understand JavaScript character codes better, but is buggy and does not actually translate character codes to utf-8 in all cases.
http://www.webtoolkit.info/javascript-base64.html
How does a browser determine which encodings to use when a user is typing into a text box?
It uses the encoding the page was decoded as by default. According to the spec, you should be able to override this with the accept-charset attribute of the <form> element, but IE is buggy, so you shouldn't rely on this (I've seen several different sources describe several different bugs, and I don't have all the relevant versions of IE in front of me to test, so I'll leave it at that).
How can javascript determine the encoding of a string value in an html text box?
All strings in JavaScript are encoded in UTF-16. The browser will map everything into UTF-16 for JavaScript, and from UTF-16 into whatever the page is encoded in.
UTF-16 is an encoding that grew out of UCS-2. Originally, it was thought that 65,536 code points would be enough for all of Unicode, and so a 16 bit character encoding would be sufficient. It turned out that the is not the case, and so the character set was expanded to 1,114,112 code points. In order to maintain backwards compatibility, a few unused ranges of the 16 bit character set were set aside for surrogate pairs, in which two 16 bit code units were used to encode a single character. Read up on UTF-16 and UCS-2 on Wikipedia for details.
The upshot is that when you have a string str in JavaScript, str.length does not give you the number of characters, it gives you the number of code units, where two code units may be used to encode a single character, if that character is not within the Basic Multilingual Plane. For instance, "abc".length gives you 3, but "𐤀𐤁𐤂".length gives you 6; and "𐤀𐤁𐤂".substring(0,1) gives what looks like an empty string, since a half of a surrogate pair cannot be displayed, but the string still contains that invalid character (I will not guarantee this works cross browser; I believe it is acceptable to drop broken characters). To get a valid character, you must use "𐤀𐤁𐤂".substring(0,2).
Can I force the browser to only use UTF-8 encoding?
The best way to do this is to deliver your page in UTF-8. Ensure that your web server is sending the appropriate Content-type: text/html; charset=UTF-8 headers. You may also want to embed a <meta charset="UTF-8"> element in your <head> element, for cases in which the Content-Type does not get set properly (such as if your page is loaded off of the local disk).
How can I encode arbitrary encodings to UTF-8 I assume there is a JavaScript library for this?
There isn't much need in JavaScript to encode text in particular encodings. If you are simply writing to the DOM, or reading or filling in form controls, you should just use JavaScript strings which are treated as sequences of UTF-16 code units. XMLHTTPRequest, when used to send(data) via POST, will use UTF-8 (if you pass it a document with a different encoding declared in the <?xml ...> declaration, it may or may not convert that to UTF-8, so for compatibility you generally shouldn't use anything other than UTF-8).
I would like to ensure all text entered in the box is either encoded in UTF-8
Text in an HTML DOM including input fields has no intrinsic byte encoding; it is stored as Unicode characters (specifically, at a DOM and ECMAScript standard level, UTF-16 code units; on the rare case you use characters outside the Basic Multilingual Plane it is possible to see the difference, eg. '𝅘𝅥𝅯'.length is 2).
It is only when the form is sent that the text is serialised into bytes using a particular encoding, by default the same encoding as was used to parse the page So you should serve your page containing the form as UTF-8 (via Content-Type header charset parameter and/or equivalent <meta> tag).
Whilst in principle there is an override for this in the accept-charset attribute of the <form> element, it doesn't work correctly (and is actively harmful in many cases) in IE. So avoid that one.
There are no explicit encoding-handling functions available in JavaScript itself. You can hack together a Unicode-to-UTF-8-bytes encoder by chaining unescape(encodeURIComponent(str)) (and similarly the other way round with the inverse function), but that's about it.
The text in a text box is not encoded in any way; it is "text", an abstract series of characters. In almost every contemporary application, that text is expressed as a sequence of Unicode code points, which are integers mapped to particular abstract characters. Text doesn't get "encoded" until it is turned into a sequence of bytes, as when submitting the form. At that time, the encoding is determined by the encoding of the HTML page in which the form appears, or by the accept-charset attribute of the form element.

Categories