I tried to get Unicode of the "🤔" emoji with javascript but not appear when I try to add it as HTML, a lot of emojis not appear
var x="A".charCodeAt(0);
document.write(x); // gaves 65
A gave A
var y="🤔".charCodeAt(0);
document.write(y); // gaves 55358
? didn't give 🤔
What is the reason?
Characters (or code points) in JavaScript are made up of code units. Sometimes, a code point is made up of one code unit. For example, the code point for "a" is made up of one code unit (97 or 0x61). Other code points however are made up of two code units (known as a surrogate pair) that work together to form a single character. For example, the Thinking Face emoji (🤔) you shared in your question is made up of two code units which when paired together make the one code point to form the thinking face emoji. You can see this by taking the .length of it:
console.log("🤔".length); // 2 (as made up of two code points)
console.log("🤔".charCodeAt(0)); // 55358 or 0xD83E
console.log("🤔".charCodeAt(1)); // 56596 or 0xDD14
When you use .charCodeAt() you're accessing the code units of your string elements. As seen above, "🤔" comprises of two code units, 0xD83E (55358) and 0xDD14 (56596). The first code unit returned by .charCodeAt(0) only makes sense when used together with other code units, and so it doesn't work standalone, which is why you're getting the replacement character symbol (�) in your result:
console.log("🤔"[0], "🤔"[1]); // � �
Rather than using the individual code units obtained by .chartCodeAt(), you can show the entire code point (ie: character), which can be obtained using .codePointAt():
console.log("🤔".codePointAt(0)); // 129300 or 0x1F914
Then, you can use this number as a HTML entity:
<p>🤔</p>
This emoji: 🤔 has a decimal (dec) reference of 129300. If you want it to show in your HTML try this:
<p>🤔</p>
For a full list of dec references visit w3schools' website.
Related
In my code I'm trying to isolate out the first character of a variable, it is the UTF8 symbol: 🌈
The code to outputs are as follows:
Code:
console.log(login_name);
console.log(login_name.charAt(0));
console.log(login_name.substring(0,1));
Output:
🌈 ✨✨✨UTF8MB4
�
�
Obviously, I want .charAt() to print 🌈 and not �. Any known oddities with utf8mb4 that I'm missing? My main problem is I don't know how to word this specific problem.
Also if I swap the rainbow for/ target the ✨, it functions as it should and prints properly.
JavaScript can't handle Unicode properly. charAt() operates on code units instead of code points.
Luckily JavaScript has workarounds. To get the characters in a string instead of UTF-16/UCS-2 code units you need to call Array.from(yourstring), which will get you an array of characters. From there on you can get the first element in the usual way.
let characters = Array.from(login_name);
console.log(characters.shift());
I am using javascript in a Mirth transformer. I apologize for my ignorance but I have no javascript training and have a hard time successfully utilizing info from similar threads here or elsewhere. I am trying to trim a string from 'Room-Bed' to be just 'Bed'. The "Room" and "Bed" values will fluctuate. All associated data is coming in from an ADT interface where our client is sending both the room and bed values, separated by a hyphen, in the bed field creating unnecessary redundancy. Please help me with the code needed to produce the result of 'Bed' from the received 'Room-Bed'.
There are many ways to reduce the string you have to the string you want. Which you choose will depend on your inputs and the output you want. For your simple example, all will work. But if you have strings come in with multiple hyphens, they'll render different results. They'll also have different performance characteristics. Balance the performance of it with how often it will be called, and whichever you find to be most readable.
// Turns the string in to an array, then pops the last instance out: 'Bed'!
'Room-Bed'.split('-').pop() === 'Bed'
// Looks for the given pattern,
// replacing the match with everything after the hyphen.
'Room-Bed'.replace(/.+-(.+)/, '$1') === 'Bed'
// Finds the first index of -,
// and creates a substring based on it.
'Room-Bed'.substr('Room-Bed'.indexOf('-') + 1) === 'Bed'
I'm working on a JavaScript code that displays a string with mixed Arabic, number and English words. The code replaces the substring "##" with one number, as follows,
" بون شاسع ##mi".replace("##",123)
I wish the result is (the number "123" is exactly before "mi" and after the Arabic character ) :
بون شاسع
However, I always get the following result (the number "123" is unexpectedly placed before Arabic character ),
So, can anyone show me how to get this desired result?
Since the text is Arabic, it uses the right-to-left direction. However, things can get tricky when you mix that with the number, which you want to appear left-to-right.
This can be fixed using \u200E, which represents the Left-to-Right Marker character. It is an invisible character that indicates to the text renderer that what follows should be rendered in that direction, regardless of what normal rules would suggest.
You can add the marker in as part of the .replace call, like so:
"بون شاسع##mi".replace("##","\u200E"+123)
Here's a comparison of the console output, without and with the marker:
I'm trying to parse an incoming string to determine whether it contains any non-emojis.
I've gone through this great article by Mathias and am leveraging both native punycode for the encoding / decoding and regenerate for the regex generation. I'm also using EmojiData to get my dictionary of emojis.
With that all said, certain emojis continue to be pesky little buggers and refuse to match. For certain emoji, I continue to get a pair of code points.
// Example of a single code point:
console.log(punycode.ucs2.decode('💩'));
>> [ 128169 ]
// Example of a paired code point:
console.log(punycode.ucs2.decode('⌛️'));
>> [ 8987, 65039 ]
Mathias touches on this in his article (and gives an example of punycode working around this) but even using his example I get an incorrect response:
function countSymbols(string) {
return punycode.ucs2.decode(string).length;
}
console.log(countSymbols('💩'));
>> 1
console.log(countSymbols('⌛️'));
>> 2
What is the best way to detect whether a string contains all emojis or not? This is for a proof of concept so the solution can be as brute force as need be.
--- UPDATE ---
A little more context on my pesky emoji above.
These are visually identical but in fact different unicode values (the second one is from the example above):
⌛ // \u231b
⌛️ // \u231b\ufe0f
The first one works great, the second does not. Unfortunately, the second version is what iOS seems to use (if you copy and paste from iMessage you get the second one, and when receiving a text from Twilio, same thing).
The U+FE0F is not a combining mark, it's a variation sequence that controls the rendering of the glyph (see this answer). Removing such sequences may change the appearance of the character, for example: U+231B+U+FE0E (⌛︎).
Also, emoji sequences can be made from multiple code points. For example, U+0032 (2) is not an emoji by itself, but U+0032+U+20E3 (2⃣) or U+0032+U+20E3+U+FE0F (2⃣️) is—but U+0041+U+20E3 (A⃣) isn't. A complete list of emoji sequences are maintained in the emoji-data.txt file by the Unicode Consortium (the emoji-data-js library appears to have this information).
To check if a string contains emoji characters, you will need to test if any single character is in emoji-data.txt, or starts a substring for a sequence in it.
If, hypothetically, you know what non-emoji characters you expect to run into, you can use a little lodash magic via their toArray or split modules, which are emoji aware. For example, if you want to see if a string contains alphanumeric characters, you could write a function like so:
function containsAlphaNumeric(string){
return _(string).toArray().filter(function(char){
return char.match(/[a-zA-Z0-9]/);
}).value().length > 0 ? true : false;
}
Why does this work:
<p id="emoji">😄</p>
And this doesn't:
document.getElementById("emoji").innerHTML = String.fromCharCode(parseInt('1f604', 16));
A 'char' in JS terms is actually a UTF-16 code unit, not a full Unicode character. (This sad state of affairs stems from ancient times when there wasn't a difference*.) To use a character outside of the Basic Multilingual Plane you have to write it in the UTF-16-encoded form of a surrogate pair of two 16-bit code units:
String.fromCharCode(0xD83D, 0xDE04)
In ECMAScript 6 we will get some interfaces that let us deal with strings as if they were full Unicode code points, though they are incomplete and are only a façade over the String type which is still stored as a code unit sequence. Then we'll be able to do:
String.fromCodePoint(0x1F604)
See this question for some polyfill code to let you use this feature in today's browsers.
(*: When I get access to a time machine I'm leaving Hitler alone and going back to invent UTF-8 earlier. UTF-16 must never have been!)
You can also use the hacky method if you don't want to include String.fromCodePoint() in your code. It consists in creating a virtual element ...
elem=document.createElement('p')
... Filling it with the Working HTML...
elem.innerHTML = "😄"
... And finally getting its value
value = elem.innerHTML
To make it short, this works because of the fact that, as soon as you set the value of a HTML container, the value gets converted into the corresponding character.
Hope I could help.