Proper Regex to find and replace escaped UTF-8 strings - javascript

(edited) I am reading a JSON file that includes some UTF-8 characters that are encoded like this: "\uf36b". I am trying to write a RegExp to convert this to an HTML entity that looks like "🍫". This displays the character correctly in my html page.
I haven't been able to correctly display the character that should be associated with "\uf36b", especially when in a longer sentence that also includes other text.
How can I write a regexp that replaces strings like "\uf4d6" and "\uf36b" but leaves other text alone?
Example:
var str = "I need \uf36b #chocolate";
This should be converted to:
I need 🍫 #chocolate;

The \uf36b here is a Unicode code point that represents a character, it should be possible to have your page support characters like this without needing to escape them by encoding to UTF-8.
That being said, the printable ASCII range is from \u0020 (space character) to \u007e (tilde), so you should be able to use something like the following to only escape the characters you need to:
var escaped = "I need \uf36b #chocolate".replace(/[^\x20-\x7e]+|%/g, escape);
This will call escape() only on the non-ASCII or non-printable ASCII characters in your string, as well as any % characters.
You can then use var str = escaped.replace(/%(..)/g,"&#x1f") + ";"; to do your conversion, although this looks pretty strange and I can't really see how it would do anything too useful. You probably actually want something like the following:
var str = escaped.replace(/%(?:u([0-9a-f]{4})|([0-9a-f]{2}))/gi, "&#x$1$2;");

Related

JavaScript textContent display special characters [duplicate]

My server returns value as support\testing. When I get this value in client it can be escaped as support testing. \t is escaped as tab space.
How do I avoid escaping special characters in JavaScript?
Your server needs to output the string with proper escaping.
In this case, you want a backslash character in the output; backslash is a special character, so that should be escaped.
The escape sequence for a backslash is \\ (ie two backslashes), but you shouldn't need to think about specific escape codes -- if you're outputting JS data, you should be outputting it using proper escaping for the whole string, which generally means you should be using JSON encoding.
Most server languages these days provide JSON encoding as a built-in feature. You haven't specified which language your server is using, but for example if it's written in PHP, you would output your string as json_encode($string) rather than just outputting $string directly. Other languages provide a similar feature. This will protect you not just from broken backslash characters, but also from other errors, such as quote marks or line feeds in your strings, which will also cause errors if you put them into a Javascript code as an unescaped string.
You can use tagged template literals
var str = (s => s.raw)`support\testing`[0]
The anonymous arrow function will serve as tag and s.raw contains the original input
If you are able to change the server-side code, you should add the escape character there: "support\\testing".
That will result in the desired result.
You can do a simple replace:
str.replace("\t","\\t");
And do this for other characters you need replacing.
Best Solution for this
function valid(f) {
debugger;
var s = "!##$%^&*()+=-[]\\\';,./{}|\":<>?~";
str = f.value;
for (var i = 0; i < str.length; i++) {
if (s.indexOf(str.charAt(i)) != -1) {
//alert("The box has special characters. \nThese are not allowed.\n");
f.value = f.value.replace(str.charAt(i), '');// : null;
return false;
}
}
}

html special symbols is displayed as characters

I've been trying to set content of a text input dynamically using JS, the problem I encountered is I can not have the browser render the special symbols rather than chars so for example
document.getElementById("textField").value = "nbsp";
Instead of displaying a space it displays &nbsp, anybody got any idea?
Thanks a lot
It seems that you want to enter special characters like NO-BREAK SPACE in a JavaScript string literal. You can do that directly, provided that the character encoding of the file containing JavaScript code is properly declared, as it should be anyway:
document.getElementById("textField").value = ' ';
Here the character between apostrophes is the real NO-BREAK SPACE character. In rendering, it is usually indistinguishable from normal SPACE, but it has different effects. Similarly you can write e.g.
document.getElementById("textField").value = 'Ω';
using the Greek letter capital omega directly.
If you do not know how to enter such characters (e.g., via Windows CharMap program) or if you cannot control character encoding issues, you can use JavaScript Unicode escape notations for characters, e.g.
document.getElementById("textField").value = '\u00A0'; // no-break space
or
document.getElementById("textField").value = '\u03A9'; // capital omega
For the small set of characters with Unicode numbers less than 0x100, you can alternatively use \x escapes, e.g. '\xA0' instead of '\u00A0'. (But if you didn’t know this, it is better to learn to use the universal \u escape insteadd.)
is an HTML entity and you can't put an HTML entity in a text field like that.
Try using unicode, like this:
document.getElementById("textField").value = '\xA0';
What about using jquery and this:
$("#textField").html('&nbsp').text()
Or in more general:
$(element).html(encodedString).text()
document.getElementById("textField").value = " ";
you should use " " instead of "nbsp"

How to find out if a given string is HTML Escaped or not?

Is there any method to find out if the given string is HTML Escaped or not?
Consider the following javascript code:
<script>
var str="hello";
var str_esc=escape(str);
document.write(isHTMLEscaped(str)) // *Should print False*
document.write(isHTMLEscaped(str_esc)); // *Should print True*
</script>
Is there any method equivalent to isHTMLEscaped in the above case?
I found that using
escape(unescape(str))
will always provide an escaped string. And the unescape string will do nothing unless the string itself contains escaped expressions.
Note: should have used encodeURI(decodeURI(str)) instead as escape is now depreciated.
As "hello"==escape("hello"), no, you can't at all guess if escaping was applied.
If you want to know if it's probable that the string has been escaped, then you might test
var wasProbablyEscaped = /%\d\d/.test(str);
var wasProbablyNotEscaped = !wasProbablyEscaped && /%\d\d/.test(escape(str));
as escaping adds % followed by two digits when something has to be escaped. But you can't be totally sure as some strings don't change when you escape them.
In your case, I'd probably advise you not to escape if wasProbablyEscaped is true.

Can't use javascript regex to get everything between html/xml tags

So I receive some xml in plaintext (and no I can't use DOM or JSON because apparently I am not allowed to), I want to strip all elements encased in a certain element and put them into an array, where I can strip out the text in the individual segments.
Now I am used to using POSIX regex and I will never actually understand the point behind PCRE regex, nor do I get the syntax.
Now here is the code I am using:
var strResponse = objResponse.text;
var strRegex = new RegExp("<item>(.*?)<\/item>","i");
var arrMatches = "";
var match;
while (match = strRegex.exec(strResponse)) {
arrMatches[] = match[1];
}
I have no idea why it won't find any matches with this code, can someone please help me on this and perhaps elaborate on what exactly it is I am continuously doing wrong with the PCRE syntax?
If those tags are in different rows the . will not match the newline characters and therefor your expression will not match. This is just a guess, I don't know your source.
You can try
var strRegex = new RegExp("<item>([\\s\\S]*?)<\\/item>","i");
[\\s\\S] is a character class. containing all whitespace and all non whitespace characters. linebreaks are covered by the whitespace characters.
The best way to complete this task is using the following, to parse it as proper HTML and navigate it with the DOM parser:
Javascript function to parse HTML string into DOM?
Regex has it with being very faulty and is in general not very good for parsing irregular text like HTML structure.

JavaScript backslash (\) in variables is causing an error

In Javascript, when I put a backslash in some variables like:
var ttt = "aa ///\\\";
var ttt = "aa ///\";
Javascript shows an error.
If I try to restrict user in entering this character, I also get an error:
(("aaa ///\\\").indexOf('"') != -1)
Restricting backslashes from user input is not a good strategy, because you have to show an annoying message to the user.
Why am I getting an error with backslash?
The backslash (\) is an escape character in Javascript (along with a lot of other C-like languages). This means that when Javascript encounters a backslash, it tries to escape the following character. For instance, \n is a newline character (rather than a backslash followed by the letter n).
In order to output a literal backslash, you need to escape it. That means \\ will output a single backslash (and \\\\ will output two, and so on). The reason "aa ///\" doesn't work is because the backslash escapes the " (which will print a literal quote), and thus your string is not properly terminated. Similarly, "aa ///\\\" won't work, because the last backslash again escapes the quote.
Just remember, for each backslash you want to output, you need to give Javascript two.
You may want to try the following, which is more or less the standard way to escape user input:
function stringEscape(s) {
return s ? s.replace(/\\/g,'\\\\').replace(/\n/g,'\\n').replace(/\t/g,'\\t').replace(/\v/g,'\\v').replace(/'/g,"\\'").replace(/"/g,'\\"').replace(/[\x00-\x1F\x80-\x9F]/g,hex) : s;
function hex(c) { var v = '0'+c.charCodeAt(0).toString(16); return '\\x'+v.substr(v.length-2); }
}
This replaces all backslashes with an escaped backslash, and then proceeds to escape other non-printable characters to their escaped form. It also escapes single and double quotes, so you can use the output as a string constructor even in eval (which is a bad idea by itself, considering that you are using user input). But in any case, it should do the job you want.
You have to escape each \ to be \\:
var ttt = "aa ///\\\\\\";
Updated: I think this question is not about the escape character in string at all. The asker doesn't seem to explain the problem correctly.
because you had to show a message to user that user can't give a name which has (\) character.
I think the scenario is like:
var user_input_name = document.getElementById('the_name').value;
Then the asker wants to check if user_input_name contains any [\]. If so, then alert the user.
If user enters [aa ///\] in HTML input box, then if you alert(user_input_name), you will see [aaa ///\]. You don't need to escape, i.e. replace [\] to be [\\] in JavaScript code. When you do escaping, that is because you are trying to make of a string which contain special characters in JavaScript source code. If you don't do it, it won't be parsed correct. Since you already get a string, you don't need to pass it into an escaping function. If you do so, I am guessing you are generating another JavaScript code from a JavaScript code, but it's not the case here.
I am guessing asker wants to simulate the input, so we can understand the problem. Unfortunately, asker doesn't understand JavaScript well. Therefore, a syntax error code being supplied to us:
var ttt = "aa ///\";
Hence, we assume the asker having problem with escaping.
If you want to simulate, you code must be valid at first place.
var ttt = "aa ///\\"; // <- This is correct
// var ttt = "aa ///\"; // <- This is not.
alert(ttt); // You will see [aa ///\] in dialog, which is what you expect, right?
Now, you only need to do is
var user_input_name = document.getElementById('the_name').value;
if (user_input_name.indexOf("\\") >= 0) { // There is a [\] in the string
alert("\\ is not allowed to be used!"); // User reads [\ is not allowed to be used]
do_something_else();
}
Edit: I used [] to quote text to be shown, so it would be less confused than using "".
The backslash \ is reserved for use as an escape character in Javascript.
To use a backslash literally you need to use two backslashes
\\
If you want to use special character in javascript variable value, Escape Character (\) is required.
Backslash in your example is special character, too.
So you should do something like this,
var ttt = "aa ///\\\\\\"; // --> ///\\\
or
var ttt = "aa ///\\"; // --> ///\
But Escape Character not require for user input.
When you press / in prompt box or input field then submit, that means single /.

Categories