I don't know if I can write a fiddle for this so I'll just try to explain this as well as I possibly can.
We have an application where we've written an editor. We need to check some grammar rules on strings/tokens that are being entered into the editor.
However, when using String.fromCharCode(190), instead of getting a "." as in utf-8 we get a "¾" from latin1.
I've checked whether or not we set latin1 as the default encoding somewhere but I've been unable to find anything.
Can anyone point me into the right direction or possibly find a solution for this issue?
The HTML charset is UTF-8 as well as the javascript file (this only adds to my confusion haha).
As per the doc, String.fromCharCode() returns a unicode character. It's got nothing to do with encoding. "¾" is the unicode character for 190, that's it. http://unicode-table.com/
Related
I use browserless.js (headless Chrome) to fetch the html code of a website, and then use a regular expression to find certain image URLs.
One example is the following:
https://vignette.wikia.nocookie.net/moviepedia/images/8/88/Adrien_Brody.jpg/revision/latest/top-crop/width/360/height/450?cb\u003d20141113231800\u0026path-prefix\u003dde
There are unicode characters such as \u003d, which should be decoded (in this case to =). The reason is that I want to include these images in a site, and without decoding some of them cannot be displayed (like that one above, just paste the URL; it gives broken-image.webp).
I have tried lots of things, but nothing works.
JSON.parse(JSON.stringify(...))
String.prototype.normalize()
decodeURIComponent
Curiously, the regular expression for "\u003d" (i.e. "\\u003d" in js) does not match that string above, but "u003d" does.
This is all very weird, and my current guess is that browserless is responsible for some weird formatting behind the scenes. Namely, when I console log the URL and copy paste it somewhere else, every method mentioned above works for decoding.
I hope that someone can help me on this.
Just to mark this one as answered. Thomas replied:
JSON.parse(`"${url}"`)
I have a dll i suspect not to be supporting UTF-8 for emojis. (its an addon for mIRC)
This dll changes mIRC (text based chat program), into a full HTML/Javascript.
My problem is, when i receive a message containing emojis, they output like this
😀
Four "stange" chars, cause they are not converted fine i suppose.
I though about make a Javascript function matching those, and changing it to correct emoji code back (maybe using a <span> or not, since the following code type is translated correctly into smileys 😈)
so, is there any way in javascript to catch/convert 😀 erroneous chars into 😈 for example? (those are not the same emoji)
for a correct example :
:grinning face: U+1F600
output this 😀
sending this 😀 finaly output a square... and not the correct smiley so its even not working for all...
I know my question is weird little bit but I need to know how can I convert the application/JSON to text/HTML it is a form submit which I want to learn to convert I will place the JSON code blow which I want to convert is there any auto tool online do tell me but guide manually is best here is the code below:
{"ValidateIdPwdRequest":{"onlineId": "sfsdfsdfds","pwd": "IBuGEeZDHahjcQRyN+LAUg==","source": "","channel": "","traceid": ""}}
So please anyone can teach me how can I convert this I know equal is transfer as colon and, as & but confused when colon curly bracket or comma curly brack comes in and my whole work goes wrong and whole goes wrong please explain and tell me how can I convert it properly.
use textContent and innerHTML.
function toHTML(text){
toHTML.ele=toHTML.ele||document.createElement("span");
toHTML.ele.textContent=text;
return toHTML.ele.innerHTML;
}
In some older data I have, I have been trying to fix some old forum post entries which have some punctuation marks which show up correctly when the page is viewed in ISO-8859-1 in a browser, but when viewed in UTF-8 they show up at those "black diamond with a question mark" invalid character symbols.
The first screenshot shows what one example looks like when viewed in UTF-8, and you can see the smart quotes when I force the browser to view in ISO-8859-1.
If I check a string using JavaScript indexOf() what should I look for to locate those characters? Is there some hex code I should use?
Similarly I'm trying to find single curly quotes, middle dots and long em (dashes). I think if I can hunt those down I can fix everything.
This needs to be done in JavaScript (my server-side programming language).
I think this is probably simple, but I'm not sure what to hunt for.
Thanks for any suggestions.
doug
You can find the char codes in the console pretty easily.
"”".charCodeAt(0); // This is one of the "smart" quotes, 8221.
// This will help you find the codes.
for (var i = 8208; i < 8251; i++) {
console.log(i, String.fromCharCode(i));
}
You can use something like this to replace the strings once you know what the char codes are if you're not able to copy/paste the characters into your script for some reason.
postText = postText.replace(new RegExp(String.fromCharCode(8221), 'g'), '"');
I've been training heavily in JS obfuscation, starting to know my way around all advanced concepts, but I recently found an obfuscated code, I believe it is some form of "Native Javascript Code", I just can't find ANY documentation on this type of obfuscation :
Here is a small extract :
'\141\75\160\162\157\155\160\164\50\47\105\156\164\162\145\172\40'
It is called this way :
eval(eval('\141\75\160\162\157\155\160\164\50\47\105\156\164\162\145\172\40'))
Since the code is the work of another and I encoutered it in a JS challenge I'm not posting the full code, so the example I gave won't work, but the full code does work.
So here is my question:
What type of code is this? And where can I learn more about it?
Any suggestions appreciated :)
It's just a string with the characters escaped. You can read it in the JavaScript console in any browser:
console.log('\141\75\160\162\157\155\160\164\50\47\105\156\164\162\145\172\40')
will print:
"a=prompt('Entrez "
It's just escaped characters, one part outputting the string of a query and another actually running the returned string - try calling it in a console.
eval('\160\162\157\155\160\164\50\47\105\156\164\162\145\172\47\51')
Might help?
These numbers is the ascii codes (http://www.asciitable.com/index/asciifull.gif) of characters (in Octal representation).
You can convert it to characters. This is used when somebody wants to make an XSS attack, or wants to hide the js code.
So the string what you written represents:
a=prompt('Entrez
The js engines, browsers can translate the octal format to the 'real' string. With eval function it could run. (in case the 'translated' code has no syntax errors)