Convert strange unicode characters into emoji code - javascript

I have a dll i suspect not to be supporting UTF-8 for emojis. (its an addon for mIRC)
This dll changes mIRC (text based chat program), into a full HTML/Javascript.
My problem is, when i receive a message containing emojis, they output like this
😀
Four "stange" chars, cause they are not converted fine i suppose.
I though about make a Javascript function matching those, and changing it to correct emoji code back (maybe using a <span> or not, since the following code type is translated correctly into smileys 😈)
so, is there any way in javascript to catch/convert 😀 erroneous chars into 😈 for example? (those are not the same emoji)
for a correct example :
:grinning face: U+1F600
output this 😀
sending this 😀 finaly output a square... and not the correct smiley so its even not working for all...

Related

Unicode characters cannot be decoded

I use browserless.js (headless Chrome) to fetch the html code of a website, and then use a regular expression to find certain image URLs.
One example is the following:
https://vignette.wikia.nocookie.net/moviepedia/images/8/88/Adrien_Brody.jpg/revision/latest/top-crop/width/360/height/450?cb\u003d20141113231800\u0026path-prefix\u003dde
There are unicode characters such as \u003d, which should be decoded (in this case to =). The reason is that I want to include these images in a site, and without decoding some of them cannot be displayed (like that one above, just paste the URL; it gives broken-image.webp).
I have tried lots of things, but nothing works.
JSON.parse(JSON.stringify(...))
String.prototype.normalize()
decodeURIComponent
Curiously, the regular expression for "\u003d" (i.e. "\\u003d" in js) does not match that string above, but "u003d" does.
This is all very weird, and my current guess is that browserless is responsible for some weird formatting behind the scenes. Namely, when I console log the URL and copy paste it somewhere else, every method mentioned above works for decoding.
I hope that someone can help me on this.
Just to mark this one as answered. Thomas replied:
JSON.parse(`"${url}"`)

Javascript String.fromCharCode() latin1 encoding issue

I don't know if I can write a fiddle for this so I'll just try to explain this as well as I possibly can.
We have an application where we've written an editor. We need to check some grammar rules on strings/tokens that are being entered into the editor.
However, when using String.fromCharCode(190), instead of getting a "." as in utf-8 we get a "¾" from latin1.
I've checked whether or not we set latin1 as the default encoding somewhere but I've been unable to find anything.
Can anyone point me into the right direction or possibly find a solution for this issue?
The HTML charset is UTF-8 as well as the javascript file (this only adds to my confusion haha).
As per the doc, String.fromCharCode() returns a unicode character. It's got nothing to do with encoding. "¾" is the unicode character for 190, that's it. http://unicode-table.com/

Javascript in SSRS not handling number sign correctly

I am using some JavaScript code in SSRS to open a link in a new window on a report. The report links point to file locations on a server. The code I am using within Reporting Services for the link is:
="javascript:void(window.open('"+ "file:" & Replace(Fields!FilePath.Value,"\","/") + "','_blank'))"
This code works just fine when the file name is something 'normal' such as:
\\myserver\images\Files\1969\1-000-002_SE 82ND AVE 1_1969.pdf
However, when there are special characters (at least # for sure), I get an error message. This is what happens. An example file name would be:
\\myserver\images\Files\1978\1-001-003_SE 82nd AVE #12 1_1978.pdf
In this case what gets returned as the URL is:
\\myserver\images\Files\1978\1-001-003_SE 82nd AVE
As can be seen, the URL is cut off at the first instance of the number sign. If I copy the shortcut for the offending link, this is what I get:
javascript:void(window.open('file://myserver/images/Files/1978/1-001-003_SE%2082nd%AVE%20#12%201_1978.pdf','_blank'))
It appears that the JavaScript is encoding the file path correctly but something is getting lost in translation between the JavaScript code and the URL.
I am unable to change the file names so I need to come up with a way to work with the special characters. I have tried using EncodeURI() but could not figure out how to format it correctly in SSRS to work with the existing JavaScript.
Any ideas would be welcomed.
URLs will recognize the HTML character numbers. So, outside of your JavaScript, use an SSRS replace function for each special character you expect to find, replacing each with its corresponding HTML number code. For instance, a pound sign is %23; and a space is %20.
Note, I have some pages that use pound signs to split out URL parameters, and this does NOT seem to work in those cases. However, it might work in your situation. To try this, change your function to the following:
="javascript:void(window.open('"+ "file:" & Replace(Replace(Fields!FilePath.Value,"\","/"),"#","%23") + "','_blank'))"
In case this does work for you, you can find more of these codes here.

Trying to find a few stray ISO-8859-1 characters using JavaScript but not sure what the character codes are

In some older data I have, I have been trying to fix some old forum post entries which have some punctuation marks which show up correctly when the page is viewed in ISO-8859-1 in a browser, but when viewed in UTF-8 they show up at those "black diamond with a question mark" invalid character symbols.
The first screenshot shows what one example looks like when viewed in UTF-8, and you can see the smart quotes when I force the browser to view in ISO-8859-1.
If I check a string using JavaScript indexOf() what should I look for to locate those characters? Is there some hex code I should use?
Similarly I'm trying to find single curly quotes, middle dots and long em (dashes). I think if I can hunt those down I can fix everything.
This needs to be done in JavaScript (my server-side programming language).
I think this is probably simple, but I'm not sure what to hunt for.
Thanks for any suggestions.
doug
You can find the char codes in the console pretty easily.
"”".charCodeAt(0); // This is one of the "smart" quotes, 8221.
// This will help you find the codes.
for (var i = 8208; i < 8251; i++) {
console.log(i, String.fromCharCode(i));
}
You can use something like this to replace the strings once you know what the char codes are if you're not able to copy/paste the characters into your script for some reason.
postText = postText.replace(new RegExp(String.fromCharCode(8221), 'g'), '"');

encodeURIComponent encodes differently, depending on environment

I am passing an object via the url using:
encodeURIComponent(JSON.stringify(myObject))
"ä" is encoded as "%C3%A4" on my local server.
Unfortunately it is encoded as "a%CC%88" on the webserver.
Which breaks my app because it is part of the name of a database field which isn't found when wrong encoded. And I can't control that there are no ä's in field names because the app allows users to upload their own data.
How can I make sure that "ä" is always encoded correctly?
SORRY. To make this clear: The encoding happens both times client-side in the browser. But when the web-app is served from the webserver the "ä" is encoded as "%C3%A4" instead of "a%CC%88" (I've tested both in the same chrome browser)
Thanks for all your help. It got me to dig deeper:
I have code that runs on an event. It loops through checkboxes and creates an array of objects containing (also) the field names. The code gets the field names from an attribute named "feld" of the checkbox:
<div class="checkbox">
<label>
<input class="feld_waehlen" type="checkbox" dstyp="Taxonomie" datensammlung="SISF Index 2 (2005)" feld="Artname vollständig">Artname vollständig
</label>
</div>
running this code:
console.log("this.getAttribute('feld') = " + this.getAttribute('feld'));
gives as expected: $(this).attr('feld') = Artname vollständig
If while looping, I run:
console.log('encodeURIComponent("Artname vollständig") = ' + encodeURIComponent("Artname vollständig"));
the answer is correct: encodeURIComponent("Artname vollständig") = Artname%20vollst%C3%A4ndig
But if I run:
console.log("encodeURIComponent(this.getAttribute('feld')) = " + encodeURIComponent(this.getAttribute('feld')));
the answer is: encodeURIComponent(this.getAttribute('feld')) = Artname%20vollsta%CC%88ndig
This happens all in the browser. But the issue only appears, when the web-app is served from the webserver (a couchapp running on cloudant.com).
How can it be that the method "getAttribute" returns a different encoding?
The following code has been tested on Chrome 29 OS X, IE 8 Windows XP.
encodeURIComponent("ä") //%C3%A4"
decodeURIComponent("%C3%A4") //ä
so basically "%C3%A4" should be the expected output.
I think the issue here might be encodeURIComponent require a UTF-8 encoding while your server-side language returns something other than this.
encodeURICompoent - MDN
just a follow up in case somebody runs into this issue later.
It seems to be unique to cloudant.com where my couchapp was hosted.
This is the answer I got from their very helpful support:
OK - I think I've found the culprit. The issue is that, due to internal optimisations (which are not present in CouchDB), the form of unicode strings can get changed. In this case, ä is represented as:
U+0061 LATIN SMALL LETTER A character
U+0308 COMBINING DIAERESIS character (̈)
instead of
U+00E4 LATIN SMALL LETTER A WITH DIAERESIS character (ä)
Both are semantically equivalent, so the fix is to normalize your unicode strings before comparison. Unfortunately, JavaScript has no built-in unicode normalization, but you can use a library such ashttps://github.com/walling/unorm.
It's not an issue for me any more as I changed to a virtual server running on digitalocean.com with vanilla couchdb (and am very happy with it).
But I do think this could hit others developing couchapps in German or other languages needing utf8 and hosting them on cloudant.com
Thanks for your great help.
Alex

Categories