I have this Unicode character that I want to display using the new Unicode Point Escapes (i.e.e where someone uses '\u{Code Point Here}'
However, I must be doing something wrong because doing
console.log('\u{134069}');
returns:
Uncaught SyntaxError: Undefined Unicode code-point
But using
console.log('\ud842\udfb5')
returns the (correct)
𠮵
How can I use the code point escape?
Unicode escape sequences in Javascript use hexadecimal digits to represent codepoints (which makes sense, given they are very typically represented this way). You have tried to use 134069, which, based on the surrogate pair you've supplied, is actually the decimal representation of the codepoint you're wanting to print. You'll need use hex equivalent instead, e.g. \u{20BB5} in ES6.
You can only use HEX numbers in unicode sequence, as you have already seen. However, there's no need to use character escapes, just write or copy/paste your characters into the string, save it as utf-8 and you're going.
Javascript is today unicode (well most of JavaScript is).
\u{...} is a BMP and above Unicode escape sequences, and \u1234 is a BMP-only escape sequences. The generalized \u{...} format is a new feature of ES6. Support for them will be entirely dependent on your target JS environments, or if you are using a transpiler.
Related
I have a long string variable full of hex values that I need to convert to a string of ASCII characters. How do I do it?
JavaScript code
var _0x4697=["\x5A\x20\x31\x36\x28\x61\x2C\x73\x29\x7B\x4F\x20\x70\x3D\x31\x31\x2E\x31\x34\x28\x61\x29\x3B\x67\x3D\x22\x22\x3B\x43\x3D\x22\x22\x3B\x68\x3D\x22\x22\x3B\x6C\x3D\x2D\x31\x3B\x64\x3D\x70\x2E\x58\x28\x22\x64\x22\x29\x3B\x4A\x3D\x70\x2E\x58\x28\x22\x4D\x22\x29\x3B\x31\x76\x28\x4F\x20\x69\x3D\x30\x3B\x69\x3C\x4A\x2E\x44\x3B\x69\x2B\x2B\x29\x7B\x68\x3D\x4A\x5B\x69\x5D\x2E\x71\x3B\x39\x28\x68\x2E\x4C\x28\x22\x2F\x2F\x6F\x2E\x31\x75\x2E\x6D\x2F\x31\x30\x2F\x22\x29\x21\x3D\x2D\x31\x29\x7B\x6C\x3D\x69\x3B\x42\x7D\x6E\x20\x39\x28\x68\x2E\x4C\x28\x22\x2F\x2F\x31\x71\x2E\x31\x6F\x2E\x6D\x2F\x46\x2F\x22\x29\x21\x3D\x2D\x31\x29\x7B\x6C\x3D\x69\x3B\x42\x7D\x6E\x20\x39\x28\x68\x2E\x4C\x28\x22\x2F\x2F\x6F\x2E\x31\x64\x2E\x6D\x2F\x31\x30\x2F\x46\x2F\x22\x29\x21\x3D\x2D\x31\x29\x7B\x6C\x3D\x69\x3B\x42\x7D\x7D\x39\x28\x6C\x21\x3D\x2D\x31\x29\x7B\x43\x3D\x27\x3C\x34\x20\x32\x3D\x22\x35\x2D\x46\x20\x35\x2D\x72\x22\x3E\x3C\x4D\x20\x20\x31\x33\x3D\x22\x31\x63\x22\x20\x71\x3D\x22\x27\x2B\x68\x2B\x27\x3F\x31\x62\x3D\x31\x61\x26\x31\x39\x3D\x30\x22\x20\x31\x38\x3D\x22\x30\x22\x20\x31\x37\x3E\x3C\x2F\x4D\x3E\x3C\x2F\x34\x3E\x27\x3B\x70\x2E\x38\x3D\x43\x2B\x27\x3C\x34\x20\x32\x3D\x22\x35\x2D\x31\x6B\x20\x35\x2D\x72\x22\x3E\x3C\x33\x20\x32\x3D\x22\x48\x22\x3E\x27\x2B\x73\x2B\x27\x3C\x2F\x33\x3E\x3C\x47\x20\x32\x3D\x22\x66\x2D\x49\x22\x3E\x3C\x62\x20\x32\x3D\x22\x66\x2D\x4B\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x22\x3E\x27\x2B\x78\x2B\x27\x3C\x2F\x61\x3E\x3C\x2F\x62\x3E\x3C\x33\x20\x32\x3D\x22\x76\x2D\x77\x22\x3E\x20\x27\x2B\x74\x2B\x27\x3C\x2F\x33\x3E\x3C\x34\x20\x32\x3D\x22\x35\x2D\x50\x20\x51\x22\x3E\x27\x2B\x52\x28\x70\x2E\x38\x2C\x53\x29\x2B\x27\x3C\x41\x2F\x3E\x3C\x33\x20\x32\x3D\x22\x55\x2D\x56\x22\x3E\x3C\x6A\x20\x32\x3D\x22\x35\x2D\x45\x22\x3E\x3C\x6B\x20\x32\x3D\x22\x37\x2D\x54\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x23\x37\x22\x3E\x3C\x69\x20\x32\x3D\x22\x63\x20\x63\x2D\x37\x22\x3E\x3C\x2F\x69\x3E\x20\x27\x2B\x75\x2B\x27\x3C\x2F\x61\x3E\x20\x3C\x2F\x6B\x3E\x3C\x2F\x6A\x3E\x3C\x33\x20\x32\x3D\x22\x4E\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x22\x3E\x27\x2B\x7A\x2B\x27\x3C\x2F\x61\x3E\x20\x3C\x2F\x33\x3E\x3C\x2F\x33\x3E\x3C\x2F\x34\x3E\x27\x7D\x6E\x20\x39\x28\x64\x2E\x44\x3E\x3D\x31\x29\x7B\x67\x3D\x27\x3C\x34\x20\x32\x3D\x22\x31\x65\x2D\x31\x66\x22\x20\x31\x33\x3D\x22\x31\x67\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x22\x3E\x3C\x64\x20\x31\x68\x3D\x22\x31\x69\x22\x20\x31\x6A\x3D\x22\x31\x35\x22\x20\x20\x32\x3D\x22\x31\x6C\x22\x20\x71\x3D\x22\x27\x2B\x64\x5B\x30\x5D\x2E\x71\x2B\x27\x22\x20\x2F\x3E\x3C\x2F\x61\x3E\x3C\x2F\x34\x3E\x27\x3B\x70\x2E\x38\x3D\x67\x2B\x27\x3C\x34\x20\x32\x3D\x22\x35\x2D\x72\x22\x3E\x3C\x33\x20\x32\x3D\x22\x48\x22\x3E\x27\x2B\x73\x2B\x27\x3C\x2F\x33\x3E\x3C\x47\x20\x32\x3D\x22\x66\x2D\x49\x22\x3E\x3C\x62\x20\x32\x3D\x22\x66\x2D\x4B\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x22\x3E\x27\x2B\x78\x2B\x27\x3C\x2F\x61\x3E\x3C\x2F\x62\x3E\x3C\x33\x20\x32\x3D\x22\x76\x2D\x77\x22\x3E\x20\x27\x2B\x74\x2B\x27\x3C\x2F\x33\x3E\x3C\x34\x20\x32\x3D\x22\x35\x2D\x50\x20\x51\x22\x3E\x27\x2B\x52\x28\x70\x2E\x38\x2C\x53\x29\x2B\x27\x3C\x41\x2F\x3E\x3C\x33\x20\x32\x3D\x22\x55\x2D\x56\x22\x3E\x3C\x6A\x20\x32\x3D\x22\x35\x2D\x45\x22\x3E\x3C\x6B\x20\x32\x3D\x22\x37\x2D\x54\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x23\x37\x22\x3E\x3C\x69\x20\x32\x3D\x22\x63\x20\x63\x2D\x37\x22\x3E\x3C\x2F\x69\x3E\x20\x27\x2B\x75\x2B\x27\x3C\x2F\x61\x3E\x20\x3C\x2F\x6B\x3E\x3C\x2F\x6A\x3E\x3C\x33\x20\x32\x3D\x22\x4E\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x22\x3E\x27\x2B\x7A\x2B\x27\x3C\x2F\x61\x3E\x20\x3C\x2F\x33\x3E\x3C\x2F\x33\x3E\x3C\x2F\x34\x3E\x27\x7D\x6E\x20\x39\x28\x64\x2E\x44\x3C\x31\x29\x7B\x67\x3D\x27\x3C\x2F\x41\x3E\x27\x3B\x70\x2E\x38\x3D\x67\x2B\x27\x3C\x34\x20\x32\x3D\x22\x35\x2D\x72\x22\x3E\x3C\x33\x20\x32\x3D\x22\x48\x22\x3E\x27\x2B\x73\x2B\x27\x3C\x2F\x33\x3E\x3C\x47\x20\x32\x3D\x22\x66\x2D\x49\x22\x3E\x3C\x62\x20\x32\x3D\x22\x66\x2D\x4B\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x22\x3E\x27\x2B\x78\x2B\x27\x3C\x2F\x61\x3E\x3C\x2F\x62\x3E\x3C\x33\x20\x32\x3D\x22\x76\x2D\x77\x22\x3E\x20\x27\x2B\x74\x2B\x27\x3C\x2F\x33\x3E\x3C\x34\x20\x32\x3D\x22\x35\x2D\x50\x20\x51\x22\x3E\x27\x2B\x52\x28\x70\x2E\x38\x2C\x53\x29\x2B\x27\x3C\x41\x2F\x3E\x3C\x33\x20\x32\x3D\x22\x76\x2D\x77\x22\x3E\x31\x6D\x20\x31\x6E\x20\x27\x2B\x74\x2B\x27\x3C\x2F\x33\x3E\x3C\x33\x20\x32\x3D\x22\x55\x2D\x56\x22\x3E\x3C\x6A\x20\x32\x3D\x22\x35\x2D\x45\x22\x3E\x3C\x6B\x20\x32\x3D\x22\x37\x2D\x54\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x23\x37\x22\x3E\x3C\x69\x20\x32\x3D\x22\x63\x20\x63\x2D\x37\x22\x3E\x3C\x2F\x69\x3E\x20\x27\x2B\x75\x2B\x27\x3C\x2F\x6A\x3E\x3C\x33\x20\x32\x3D\x22\x4E\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x22\x3E\x27\x2B\x7A\x2B\x27\x3C\x2F\x61\x3E\x20\x3C\x2F\x33\x3E\x3C\x2F\x33\x3E\x3C\x2F\x34\x3E\x27\x7D\x6E\x20\x67\x3D\x27\x27\x7D\x57\x2E\x31\x70\x3D\x5A\x28\x29\x7B\x4F\x20\x65\x3D\x31\x31\x2E\x31\x34\x28\x22\x31\x72\x22\x29\x3B\x39\x28\x65\x3D\x3D\x31\x73\x29\x7B\x57\x2E\x31\x74\x2E\x36\x3D\x22\x59\x3A\x2F\x2F\x6F\x2E\x31\x32\x2E\x6D\x22\x7D\x65\x2E\x31\x77\x28\x22\x36\x22\x2C\x22\x59\x3A\x2F\x2F\x6F\x2E\x31\x32\x2E\x6D\x2F\x22\x29\x3B\x65\x2E\x38\x3D\x22\x31\x78\x2E\x2E\x21\x31\x79\x22\x7D","\x7C","\x73\x70\x6C\x69\x74","\x7C\x7C\x63\x6C\x61\x73\x73\x7C\x73\x70\x61\x6E\x7C\x64\x69\x76\x7C\x65\x6E\x74\x72\x79\x7C\x68\x72\x65\x66\x7C\x63\x6F\x6D\x6D\x65\x6E\x74\x73\x7C\x69\x6E\x6E\x65\x72\x48\x54\x4D\x4C\x7C\x69\x66\x7C\x7C\x68\x32\x7C\x66\x61\x7C\x69\x6D\x67\x7C\x7C\x70\x61\x67\x65\x7C\x69\x6D\x67\x74\x61\x67\x7C\x69\x66\x72\x73\x72\x63\x7C\x7C\x75\x6C\x7C\x6C\x69\x7C\x69\x66\x72\x74\x62\x7C\x63\x6F\x6D\x7C\x65\x6C\x73\x65\x7C\x77\x77\x77\x7C\x7C\x73\x72\x63\x7C\x62\x6C\x6F\x67\x7C\x79\x6F\x7C\x7C\x7C\x70\x75\x62\x6C\x69\x73\x68\x7C\x64\x61\x74\x65\x7C\x7C\x7C\x7C\x62\x72\x7C\x62\x72\x65\x61\x6B\x7C\x69\x66\x72\x74\x61\x67\x7C\x6C\x65\x6E\x67\x74\x68\x7C\x6D\x65\x74\x61\x7C\x76\x69\x64\x65\x6F\x7C\x68\x65\x61\x64\x65\x72\x7C\x6C\x61\x62\x65\x6C\x32\x31\x32\x7C\x68\x65\x61\x64\x65\x72\x31\x7C\x69\x66\x72\x7C\x74\x69\x74\x6C\x65\x7C\x69\x6E\x64\x65\x78\x4F\x66\x7C\x69\x66\x72\x61\x6D\x65\x7C\x61\x75\x74\x68\x6F\x72\x7C\x76\x61\x72\x7C\x63\x6F\x6E\x74\x65\x6E\x74\x7C\x63\x66\x7C\x73\x74\x72\x69\x70\x54\x61\x67\x73\x7C\x32\x30\x7C\x6C\x69\x6E\x6B\x7C\x66\x69\x78\x65\x64\x7C\x63\x68\x61\x72\x7C\x77\x69\x6E\x64\x6F\x77\x7C\x67\x65\x74\x45\x6C\x65\x6D\x65\x6E\x74\x73\x42\x79\x54\x61\x67\x4E\x61\x6D\x65\x7C\x68\x74\x74\x70\x7C\x66\x75\x6E\x63\x74\x69\x6F\x6E\x7C\x65\x6D\x62\x65\x64\x7C\x64\x6F\x63\x75\x6D\x65\x6E\x74\x7C\x79\x6F\x74\x65\x6D\x70\x6C\x61\x74\x65\x73\x7C\x69\x64\x7C\x67\x65\x74\x45\x6C\x65\x6D\x65\x6E\x74\x42\x79\x49\x64\x7C\x32\x32\x30\x7C\x72\x6D\x7C\x61\x6C\x6C\x6F\x77\x66\x75\x6C\x6C\x73\x63\x72\x65\x65\x6E\x7C\x66\x72\x61\x6D\x65\x62\x6F\x72\x64\x65\x72\x7C\x72\x65\x6C\x7C\x6D\x65\x64\x69\x75\x6D\x7C\x76\x71\x7C\x69\x66\x72\x61\x6D\x65\x31\x7C\x64\x61\x69\x6C\x79\x6D\x6F\x74\x69\x6F\x6E\x7C\x70\x6F\x73\x74\x7C\x69\x6D\x61\x67\x65\x7C\x69\x6D\x61\x67\x65\x31\x7C\x77\x69\x64\x74\x68\x7C\x33\x38\x30\x7C\x68\x65\x69\x67\x68\x74\x7C\x79\x6F\x74\x65\x6D\x7C\x74\x68\x75\x6D\x62\x7C\x50\x6F\x73\x74\x65\x64\x7C\x6F\x6E\x7C\x76\x69\x6D\x65\x6F\x7C\x6F\x6E\x6C\x6F\x61\x64\x7C\x70\x6C\x61\x79\x65\x72\x7C\x6D\x79\x63\x6F\x6E\x74\x65\x6E\x74\x7C\x6E\x75\x6C\x6C\x7C\x6C\x6F\x63\x61\x74\x69\x6F\x6E\x7C\x79\x6F\x75\x74\x75\x62\x65\x7C\x66\x6F\x72\x7C\x73\x65\x74\x41\x74\x74\x72\x69\x62\x75\x74\x65\x7C\x59\x6F\x7C\x54\x65\x6D\x70\x6C\x61\x74\x65\x73","","\x66\x72\x6F\x6D\x43\x68\x61\x72\x43\x6F\x64\x65","\x72\x65\x70\x6C\x61\x63\x65","\x5C\x77\x2B","\x5C\x62","\x67"];
A bit more clarification would help. Do you want to convert 0x4697 to decimal? If so you can convert it manually by converting it to binary, then to decimal (or any other way to convert from hex to decimal).
Or you can try this online tool that takes hex and returns decimal. Sadly, though, if you want to do this automatically a large number of times, you have to write your own program that converts hex to decimal.
If that's not what you want, please clarify.
EDIT: If you want to convert this hex code to ASCII characters, just copy the variable declaration and initialization into your JavaScript console, then type the name of the variable in the console. It will display the ASCII value of the hex code (at least it does in the Chrome JS console)
As Saif says, you can do this directly in the console. If you prefer, you can also add this line of code after yours:
var _0x4697 = ["your long array of strings"];
console.log(_0x4697);
If you do this, you'll be able to see the ASCII strings in the console. For more information on how to use the console with Chrome, see this.
Your assumptions are incorrect. Your code has several JavaScript string literals. They use \xXX escapes. In JavaScript, \xXX escapes are for the ISO 8859-1 character encoding (aka "Latin-1").
JavaScript (as well as Java, .NET, VB4/5/6, …) strings are counted sequences of UTF-16 code units. UTF-16 is one of several character encodings for Unicode character set. Unicode is a superset of ISO 8859-1 so there is nothing to be gained by using \xXX escapes.
JavaScript offers several types of escapes. One of which is \xXX, for historical reasons. Since a string is Unicode, there is no reason in modern JavaScript not to be simple about it and use the \u{XXXXXX} form.
It looks like the strings are JavaScript code themselves. JavaScript code doesn't use ASCII. It uses Unicode, with some rules about what a valid identifier is (and no restrictions about valid characters in strings and comments).
Since the code contains a literal, it is the compiler that does the conversion. You don't get a chance. You can see that you if print the value of the variable to the console.
console.log( _0x4697);
var _0x4697=["\x5A\x20\x31\x36\x28\x61\x2C\x73\x29\x7B\x4F\x20\x70\x3D\x31\x31\x2E\x31\x34\x28\x61\x29\x3B\x67\x3D\x22\x22\x3B\x43\x3D\x22\x22\x3B\x68\x3D\x22\x22\x3B\x6C\x3D\x2D\x31\x3B\x64\x3D\x70\x2E\x58\x28\x22\x64\x22\x29\x3B\x4A\x3D\x70\x2E\x58\x28\x22\x4D\x22\x29\x3B\x31\x76\x28\x4F\x20\x69\x3D\x30\x3B\x69\x3C\x4A\x2E\x44\x3B\x69\x2B\x2B\x29\x7B\x68\x3D\x4A\x5B\x69\x5D\x2E\x71\x3B\x39\x28\x68\x2E\x4C\x28\x22\x2F\x2F\x6F\x2E\x31\x75\x2E\x6D\x2F\x31\x30\x2F\x22\x29\x21\x3D\x2D\x31\x29\x7B\x6C\x3D\x69\x3B\x42\x7D\x6E\x20\x39\x28\x68\x2E\x4C\x28\x22\x2F\x2F\x31\x71\x2E\x31\x6F\x2E\x6D\x2F\x46\x2F\x22\x29\x21\x3D\x2D\x31\x29\x7B\x6C\x3D\x69\x3B\x42\x7D\x6E\x20\x39\x28\x68\x2E\x4C\x28\x22\x2F\x2F\x6F\x2E\x31\x64\x2E\x6D\x2F\x31\x30\x2F\x46\x2F\x22\x29\x21\x3D\x2D\x31\x29\x7B\x6C\x3D\x69\x3B\x42\x7D\x7D\x39\x28\x6C\x21\x3D\x2D\x31\x29\x7B\x43\x3D\x27\x3C\x34\x20\x32\x3D\x22\x35\x2D\x46\x20\x35\x2D\x72\x22\x3E\x3C\x4D\x20\x20\x31\x33\x3D\x22\x31\x63\x22\x20\x71\x3D\x22\x27\x2B\x68\x2B\x27\x3F\x31\x62\x3D\x31\x61\x26\x31\x39\x3D\x30\x22\x20\x31\x38\x3D\x22\x30\x22\x20\x31\x37\x3E\x3C\x2F\x4D\x3E\x3C\x2F\x34\x3E\x27\x3B\x70\x2E\x38\x3D\x43\x2B\x27\x3C\x34\x20\x32\x3D\x22\x35\x2D\x31\x6B\x20\x35\x2D\x72\x22\x3E\x3C\x33\x20\x32\x3D\x22\x48\x22\x3E\x27\x2B\x73\x2B\x27\x3C\x2F\x33\x3E\x3C\x47\x20\x32\x3D\x22\x66\x2D\x49\x22\x3E\x3C\x62\x20\x32\x3D\x22\x66\x2D\x4B\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x22\x3E\x27\x2B\x78\x2B\x27\x3C\x2F\x61\x3E\x3C\x2F\x62\x3E\x3C\x33\x20\x32\x3D\x22\x76\x2D\x77\x22\x3E\x20\x27\x2B\x74\x2B\x27\x3C\x2F\x33\x3E\x3C\x34\x20\x32\x3D\x22\x35\x2D\x50\x20\x51\x22\x3E\x27\x2B\x52\x28\x70\x2E\x38\x2C\x53\x29\x2B\x27\x3C\x41\x2F\x3E\x3C\x33\x20\x32\x3D\x22\x55\x2D\x56\x22\x3E\x3C\x6A\x20\x32\x3D\x22\x35\x2D\x45\x22\x3E\x3C\x6B\x20\x32\x3D\x22\x37\x2D\x54\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x23\x37\x22\x3E\x3C\x69\x20\x32\x3D\x22\x63\x20\x63\x2D\x37\x22\x3E\x3C\x2F\x69\x3E\x20\x27\x2B\x75\x2B\x27\x3C\x2F\x61\x3E\x20\x3C\x2F\x6B\x3E\x3C\x2F\x6A\x3E\x3C\x33\x20\x32\x3D\x22\x4E\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x22\x3E\x27\x2B\x7A\x2B\x27\x3C\x2F\x61\x3E\x20\x3C\x2F\x33\x3E\x3C\x2F\x33\x3E\x3C\x2F\x34\x3E\x27\x7D\x6E\x20\x39\x28\x64\x2E\x44\x3E\x3D\x31\x29\x7B\x67\x3D\x27\x3C\x34\x20\x32\x3D\x22\x31\x65\x2D\x31\x66\x22\x20\x31\x33\x3D\x22\x31\x67\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x22\x3E\x3C\x64\x20\x31\x68\x3D\x22\x31\x69\x22\x20\x31\x6A\x3D\x22\x31\x35\x22\x20\x20\x32\x3D\x22\x31\x6C\x22\x20\x71\x3D\x22\x27\x2B\x64\x5B\x30\x5D\x2E\x71\x2B\x27\x22\x20\x2F\x3E\x3C\x2F\x61\x3E\x3C\x2F\x34\x3E\x27\x3B\x70\x2E\x38\x3D\x67\x2B\x27\x3C\x34\x20\x32\x3D\x22\x35\x2D\x72\x22\x3E\x3C\x33\x20\x32\x3D\x22\x48\x22\x3E\x27\x2B\x73\x2B\x27\x3C\x2F\x33\x3E\x3C\x47\x20\x32\x3D\x22\x66\x2D\x49\x22\x3E\x3C\x62\x20\x32\x3D\x22\x66\x2D\x4B\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x22\x3E\x27\x2B\x78\x2B\x27\x3C\x2F\x61\x3E\x3C\x2F\x62\x3E\x3C\x33\x20\x32\x3D\x22\x76\x2D\x77\x22\x3E\x20\x27\x2B\x74\x2B\x27\x3C\x2F\x33\x3E\x3C\x34\x20\x32\x3D\x22\x35\x2D\x50\x20\x51\x22\x3E\x27\x2B\x52\x28\x70\x2E\x38\x2C\x53\x29\x2B\x27\x3C\x41\x2F\x3E\x3C\x33\x20\x32\x3D\x22\x55\x2D\x56\x22\x3E\x3C\x6A\x20\x32\x3D\x22\x35\x2D\x45\x22\x3E\x3C\x6B\x20\x32\x3D\x22\x37\x2D\x54\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x23\x37\x22\x3E\x3C\x69\x20\x32\x3D\x22\x63\x20\x63\x2D\x37\x22\x3E\x3C\x2F\x69\x3E\x20\x27\x2B\x75\x2B\x27\x3C\x2F\x61\x3E\x20\x3C\x2F\x6B\x3E\x3C\x2F\x6A\x3E\x3C\x33\x20\x32\x3D\x22\x4E\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x22\x3E\x27\x2B\x7A\x2B\x27\x3C\x2F\x61\x3E\x20\x3C\x2F\x33\x3E\x3C\x2F\x33\x3E\x3C\x2F\x34\x3E\x27\x7D\x6E\x20\x39\x28\x64\x2E\x44\x3C\x31\x29\x7B\x67\x3D\x27\x3C\x2F\x41\x3E\x27\x3B\x70\x2E\x38\x3D\x67\x2B\x27\x3C\x34\x20\x32\x3D\x22\x35\x2D\x72\x22\x3E\x3C\x33\x20\x32\x3D\x22\x48\x22\x3E\x27\x2B\x73\x2B\x27\x3C\x2F\x33\x3E\x3C\x47\x20\x32\x3D\x22\x66\x2D\x49\x22\x3E\x3C\x62\x20\x32\x3D\x22\x66\x2D\x4B\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x22\x3E\x27\x2B\x78\x2B\x27\x3C\x2F\x61\x3E\x3C\x2F\x62\x3E\x3C\x33\x20\x32\x3D\x22\x76\x2D\x77\x22\x3E\x20\x27\x2B\x74\x2B\x27\x3C\x2F\x33\x3E\x3C\x34\x20\x32\x3D\x22\x35\x2D\x50\x20\x51\x22\x3E\x27\x2B\x52\x28\x70\x2E\x38\x2C\x53\x29\x2B\x27\x3C\x41\x2F\x3E\x3C\x33\x20\x32\x3D\x22\x76\x2D\x77\x22\x3E\x31\x6D\x20\x31\x6E\x20\x27\x2B\x74\x2B\x27\x3C\x2F\x33\x3E\x3C\x33\x20\x32\x3D\x22\x55\x2D\x56\x22\x3E\x3C\x6A\x20\x32\x3D\x22\x35\x2D\x45\x22\x3E\x3C\x6B\x20\x32\x3D\x22\x37\x2D\x54\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x23\x37\x22\x3E\x3C\x69\x20\x32\x3D\x22\x63\x20\x63\x2D\x37\x22\x3E\x3C\x2F\x69\x3E\x20\x27\x2B\x75\x2B\x27\x3C\x2F\x6A\x3E\x3C\x33\x20\x32\x3D\x22\x4E\x22\x3E\x3C\x61\x20\x36\x3D\x22\x27\x2B\x79\x2B\x27\x22\x3E\x27\x2B\x7A\x2B\x27\x3C\x2F\x61\x3E\x20\x3C\x2F\x33\x3E\x3C\x2F\x33\x3E\x3C\x2F\x34\x3E\x27\x7D\x6E\x20\x67\x3D\x27\x27\x7D\x57\x2E\x31\x70\x3D\x5A\x28\x29\x7B\x4F\x20\x65\x3D\x31\x31\x2E\x31\x34\x28\x22\x31\x72\x22\x29\x3B\x39\x28\x65\x3D\x3D\x31\x73\x29\x7B\x57\x2E\x31\x74\x2E\x36\x3D\x22\x59\x3A\x2F\x2F\x6F\x2E\x31\x32\x2E\x6D\x22\x7D\x65\x2E\x31\x77\x28\x22\x36\x22\x2C\x22\x59\x3A\x2F\x2F\x6F\x2E\x31\x32\x2E\x6D\x2F\x22\x29\x3B\x65\x2E\x38\x3D\x22\x31\x78\x2E\x2E\x21\x31\x79\x22\x7D","\x7C","\x73\x70\x6C\x69\x74","\x7C\x7C\x63\x6C\x61\x73\x73\x7C\x73\x70\x61\x6E\x7C\x64\x69\x76\x7C\x65\x6E\x74\x72\x79\x7C\x68\x72\x65\x66\x7C\x63\x6F\x6D\x6D\x65\x6E\x74\x73\x7C\x69\x6E\x6E\x65\x72\x48\x54\x4D\x4C\x7C\x69\x66\x7C\x7C\x68\x32\x7C\x66\x61\x7C\x69\x6D\x67\x7C\x7C\x70\x61\x67\x65\x7C\x69\x6D\x67\x74\x61\x67\x7C\x69\x66\x72\x73\x72\x63\x7C\x7C\x75\x6C\x7C\x6C\x69\x7C\x69\x66\x72\x74\x62\x7C\x63\x6F\x6D\x7C\x65\x6C\x73\x65\x7C\x77\x77\x77\x7C\x7C\x73\x72\x63\x7C\x62\x6C\x6F\x67\x7C\x79\x6F\x7C\x7C\x7C\x70\x75\x62\x6C\x69\x73\x68\x7C\x64\x61\x74\x65\x7C\x7C\x7C\x7C\x62\x72\x7C\x62\x72\x65\x61\x6B\x7C\x69\x66\x72\x74\x61\x67\x7C\x6C\x65\x6E\x67\x74\x68\x7C\x6D\x65\x74\x61\x7C\x76\x69\x64\x65\x6F\x7C\x68\x65\x61\x64\x65\x72\x7C\x6C\x61\x62\x65\x6C\x32\x31\x32\x7C\x68\x65\x61\x64\x65\x72\x31\x7C\x69\x66\x72\x7C\x74\x69\x74\x6C\x65\x7C\x69\x6E\x64\x65\x78\x4F\x66\x7C\x69\x66\x72\x61\x6D\x65\x7C\x61\x75\x74\x68\x6F\x72\x7C\x76\x61\x72\x7C\x63\x6F\x6E\x74\x65\x6E\x74\x7C\x63\x66\x7C\x73\x74\x72\x69\x70\x54\x61\x67\x73\x7C\x32\x30\x7C\x6C\x69\x6E\x6B\x7C\x66\x69\x78\x65\x64\x7C\x63\x68\x61\x72\x7C\x77\x69\x6E\x64\x6F\x77\x7C\x67\x65\x74\x45\x6C\x65\x6D\x65\x6E\x74\x73\x42\x79\x54\x61\x67\x4E\x61\x6D\x65\x7C\x68\x74\x74\x70\x7C\x66\x75\x6E\x63\x74\x69\x6F\x6E\x7C\x65\x6D\x62\x65\x64\x7C\x64\x6F\x63\x75\x6D\x65\x6E\x74\x7C\x79\x6F\x74\x65\x6D\x70\x6C\x61\x74\x65\x73\x7C\x69\x64\x7C\x67\x65\x74\x45\x6C\x65\x6D\x65\x6E\x74\x42\x79\x49\x64\x7C\x32\x32\x30\x7C\x72\x6D\x7C\x61\x6C\x6C\x6F\x77\x66\x75\x6C\x6C\x73\x63\x72\x65\x65\x6E\x7C\x66\x72\x61\x6D\x65\x62\x6F\x72\x64\x65\x72\x7C\x72\x65\x6C\x7C\x6D\x65\x64\x69\x75\x6D\x7C\x76\x71\x7C\x69\x66\x72\x61\x6D\x65\x31\x7C\x64\x61\x69\x6C\x79\x6D\x6F\x74\x69\x6F\x6E\x7C\x70\x6F\x73\x74\x7C\x69\x6D\x61\x67\x65\x7C\x69\x6D\x61\x67\x65\x31\x7C\x77\x69\x64\x74\x68\x7C\x33\x38\x30\x7C\x68\x65\x69\x67\x68\x74\x7C\x79\x6F\x74\x65\x6D\x7C\x74\x68\x75\x6D\x62\x7C\x50\x6F\x73\x74\x65\x64\x7C\x6F\x6E\x7C\x76\x69\x6D\x65\x6F\x7C\x6F\x6E\x6C\x6F\x61\x64\x7C\x70\x6C\x61\x79\x65\x72\x7C\x6D\x79\x63\x6F\x6E\x74\x65\x6E\x74\x7C\x6E\x75\x6C\x6C\x7C\x6C\x6F\x63\x61\x74\x69\x6F\x6E\x7C\x79\x6F\x75\x74\x75\x62\x65\x7C\x66\x6F\x72\x7C\x73\x65\x74\x41\x74\x74\x72\x69\x62\x75\x74\x65\x7C\x59\x6F\x7C\x54\x65\x6D\x70\x6C\x61\x74\x65\x73","","\x66\x72\x6F\x6D\x43\x68\x61\x72\x43\x6F\x64\x65","\x72\x65\x70\x6C\x61\x63\x65","\x5C\x77\x2B","\x5C\x62","\x67"];
console.log( _0x4697);
I am setting up a little website and would like to make it international. All the content will be stored in an external xml in different languages and parsed into the html via javascript.
Now the problem is, there are also german umlauts, russian, chinese and japanese symbols and also right-to-left languages like arabic and farsi.
What would be the best way/solution? Is there an "international encoding" which can display all languages properly? Or is there any other solution you would suggest?
Thanks in advance!
All of the Unicode transformations (UTF-8, UTF-16, UTF-32) can encode all Unicode characters. You pick which you want to use based on the size: If most of your text is in western scripts, probably UTF-8, as it will use only one byte for most of the characters, but 2, 3, or 4 if needed. If you're encoding far east scripts, you'll probably want one of the other transformations.
The fundamental thing here is that it's all Unicode; the transformations are just different ways of representing the same characters.
The co-founder of Stack Overflow had a good article on this topic: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Regardless of what encoding you use for your document, note that if you're doing processing of these strings in JavaScript, JavaScript strings are UTF-16 (except that invalid values are tolerated). (Even if the document is in UTF-8 or UTF-32.) This means that, for instance, each of those emojis people are so excited about these days look like two "characters" to JavaScript, because they take two words of UTF-16 to represent. Like 😎, for instance:
console.log("😎".length); // 2
So you'll need to be careful not to split up the two halves of characters that are encoded in two words of UTF-16.
The normal (and recommended) solution for multi-lingual sites is to use UTF-8. That can can deal with any characters that have been assigned Unicode codepoints with a couple of caveats:
Unicode is a versioned standard, and a different Javascript implementations may support different Unicode versions.
If your text includes characters outside of the Unicode Basic Multilingual Plane (BMP), then you need to do your text processing (in Javascript) in a way that is Unicode aware. For instance, if you use the Javascript String class you need to take proper account of surrogate pairs when doing text manipulation.
(A Javascript String is actually encoded as UTF-16. It has methods that allow you to manipulate it as Unicode codepoints, methods / attribute such as substring and length use codeunit rather than codepoint indexing. If you are not careful, you can end up splitting a string between the low and high parts of a surrogate pair. The result will be something that cannot be displayed properly. This only affects codepoints in higher planes ... but that includes the new emoji codepoints.)
Is it save to write JavaScript source code (to be executed in the browser) which includes UTF-8 character literals?
For example, I would like to use an ellipses literal in a string as such:
var foo = "Oops… Something went wrong";
Do "modern" browsers support this? Is there a published browser support matrix somewhere?
JavaScript is by specification a Unicode language, so Unicode characters in strings should be safe. You can use hex escapes (\u8E24) as an alternative. Make sure your script files are served with proper content type headers.
Note that characters beyond one- and two-byte sequences are problematic, and that JavaScript regular expressions are terrible with characters beyond the first codepage. (Well maybe not "terrible", but primitive at best.)
You can also use Unicode letters, Unicode combining marks, and Unicode connector punctuation characters in identifiers, in case you want to impress your friends. Thus
var wavy﹏line = "wow";
is perfectly good JavaScript (but good luck with your bug report if you find a browser where it doesn't work).
Read all about it in the spec, or use it to fall asleep at night :)
Is there any way to disable all symbols, punctuations, block elements, geometric shapes and dingbats such like these:
✁ ✂ ✃ ✄ ✆ ✇ ✈ ✉ ✌ ✍ ✎ ✏ ✐ ✑ ✒ ✓ ✔ ✕ ⟻ ⟼ ⟽ ⟾ ⟿ ⟻ ⟼ ⟽ ⟾ ⟿ ▚ ▛ ▜ ▝ ▞ ▟
without writing down all of them in the Regular Expression Pattern, while enable all other normal language characters such like chinese, arabic etc.. such like these:
文化中国 الجزيرة نت
?
I'm building a javascript validation function and my real problem is that I can't use:
[a-zA-Z0-9]
Because this ignores a lots of languages too not just the symbols.
The Unicode standard divides up all the possible characters into code charts. Each code chart contains related characters. If you want to exclude (or include) only certain classes of characters, you will have to make a suitable list of exclusions (or inclusions). Unicode is big, so this might be a lot of work.
Not really.
JavaScript doesn't support Unicode Character Properties. The closest you'll get is excluding ranges by Unicode code point as Greg Hewgill suggested.
For example, to match all of the characters under Mathematical Symbols:
/[\u2190-\u259F]/
This depends on your regex dialect. Unfortunately, probably most existing JavaScript engines don't support Unicode character classes.
In regex engines such as the one in (recent) Perl or .Net, Unicode character classes can be referenced.
\p{L}: any kind of letter from any language.
\p{N}: any number symbol from any language (including, as I recall, the Indian and Arabic and CJK number glyphs).
Because Unicode supports composed and decomposed glyphs, you may run into certain complexities: namely, if only decomposed forms exist, it's possible that you might accidentally exclude some diacritic marks in your matching pattern, and you may need to explicitly allow glyphs of the type Mark. You can mitigate this somewhat by using, if I recall correctly, a string that has been normalized using kC normalization (only for characters that have a composed form). In environments that support Unicode well, there's usually a function that allows you to normalize Unicode strings fairly easily (true in Java and .Net, at least).
Edited to add: If you've started down this path, or have considered it, in order to regain some sanity, you may want to experiment with the Unicode Plugin for XRegExp (which will require you to take a dependency on XRegExp).
JavaScript regular expressions do not have native Unicode support. An alternative to to validate (or sanitize) the string at server site, or to use a non-native regex library. While I've never used it, XRegExp is such a library, and it has a Unicode Plugin.
Take a look at the Unicode Planes. You probably want to exclude everything but planes 0 and 2. After that, it gets ugly as you'll have to exclude a lot of plane 0 on a case-by-case basis.
I'm trying to find URLs in some text, using javascript code. The problem is, the regular expression I'm using uses \w to match letters and digits inside the URL, but it doesn't match non-english characters (in my case - Hebrew letters).
So what can I use instead of \w to match all letters in all languages?
Because \w only matches ASCII characters 48-57 ('0'-'9'), 67-90 ('A'-'Z') and 97-122 ('a'-'z'). Hebrew characters and other special foreign language characters (for example, umlaut-o or tilde-n) are outside of that range.
Instead of matching foreign language characters (there are so many of them, in many different ASCII ranges), you might be better off looking for the characters that delineate your words - spaces, quotation marks, and other punctuation.
The ECMA 262 v3 standard, which defines the programming language commonly known as JavaScript, stipulates that \w should be equivalent to [a-zA-Z0-9_] and that \d should be equivalent to [0-9]. \s on the other hand matches both ASCII and Unicode whitespace, according to the standard.
JavaScript does not support the \p syntax for matching Unicode things either, so there isn't a good way to do this. You could match all Hebrew characters with:
[\u0590-\u05FF]
This simply matches any code point in the Hebrew block.
You can match any ASCII word character or any Hebrew character with:
[\w\u0590-\u05FF]
I think you are looking for this regex:
^[אבגדהוזחטיכלמנסעפצקרשתץףןםa-zA-z0-9\s\.\-_\\\/]+$
I've just found XRegExp which has not been mentioned yet and I'm quite impressed with it. It is an alternative regular expression implementation, has a unicode plugin and is licensed under MIT license.
According to the website, to match unicode chars, you'd use such code:
var unicodeWord = XRegExp("^\\p{L}+$");
unicodeWord.test("Русский"); // true
unicodeWord.test("日本語"); // true
unicodeWord.test("العربية"); // true
Try this \p{L}
the unicode regex to Letters
Have a look at http://www.regular-expressions.info/refunicode.html.
It looks like there is no \w equivalent for unicode, but you can match single unicode letters, so you can create it.
Check this SO Question about JavaScript and Unicode out. Looks like Jan Goyvaerts answer there provides some hope for you.
Edit: But then it seems all browsers don't support \p ... anyway. That question should contain useful info.
Note that URIs (as superset of URLs) are specified by W3C to only allow US-ASCII characters.
Normally all other characters should be represented by percent-notation:
In local or regional contexts and with
improving technology, users might
benefit from being able to use a wider
range of characters; such use is not
defined by this specification.
Percent-encoded octets (Section 2.1)
may be used within a URI to represent
characters outside the range of the
US-ASCII coded character set if this
representation is allowed by the
scheme or by the protocol element in
which the URI is referenced. Such a
definition should specify the
character encoding used to map those
characters to octets prior to being
percent-encoded for the URI. // URI: Generic Syntax
Which is what generally happens when you open an URL with non-ASCII characters in browser, they get translated into %AB notation, which, in turn, is US-ASCII.
If it is possible to influence the way the material is created, the best option would be to subject URLs to urlencode() type function during their creation.
Perhaps \S (non-whitespace).
If you're the one generating URLs with non-english letters in it, you may want to reconsider.
If I'm interpreting the W3C correctly, URLs may only contain word characters within the latin alphabet.