Prevent Smart Quotes on HTML5 Input

Prevent Smart Quotes on HTML5 Input - javascript

Had an annoying issue with my web application more recently. Using HTML5, a user can create an account with a login ID. The ID can contain pretty much any character. A user will enter an account ID, for example
Bob'sAccount
And their device (unknown to them) turns the ' into a "smartquote" with the ` style apostrophe
So now their account is created (with a smartquote in the user ID). However, if they try to log in using a device which doesn't automatically create smart quotes, they try to log in using the standard apostrophe, and since it's a different character, their account is not found.
I'm sure I could limit the characters a user can enter for an account ID, but I would rather just prevent the smartquotes from happening in the first place.
Is there a way to disable "smartquotes" in an HTML5 input field?

If you know what the character is turning into, just replace it with a regular quote before you submit the form.
str
.replace(/[\u2014]/g, "--") // emdash
.replace(/[\u2022]/g, "*") // bullet
.replace(/[\u2018\u2019]/g, "'") // smart single quotes
.replace(/[\u201C\u201D]/g, '"'); // smart double quotes
This is just an example, i know it's tedious but that's one way to do it. Check out: https://asna.com/us/tech/kb/doc/remove-smart-quote for a script that has a lot of other special characters that are replaced to serve your same function

Related

Can i use the user defined regex in Meteor find function without escaping the special characters?

I have a simple filter for searching images saved in a database. And therefore I use regex:Images.find({"name":{$regex:".*"+query+".*"}});
Of course I check the value with check(query, String); function. Could it be a big security issue, if I don't escape the special characters in the regex (query var, whose content is specified by user)? It is an advantage for me, that the users can define something like (nameOfImage1|nameOfImage2).

According to #Michel Floyd´s comment above, that is not a security problem, I use Regex in find(). But I also replaced some selected characters with query.replace(/[\/\\^$*[\]{}]/g, "");

How to combat the backslash return character breaking my javascript?

I have just realised that if a user enters a '\' into the input field, when I am trying access that value my javascript breaks due to the escape character.
What I wish to know is how I would get around this? I want my users to still be able to input a '\', I have tried using a regEX replace and other similar methods, but my code still breaks regardless (with no error mind you)
JavaScript:
var r = document.getElementById('r-box').value;
If the above contains a \, the code will just freeze.

Rewrite regex to accept conditional terms

^([a-z0-9_\.-])+#[yahoo]{5}\.([com]{3}\.)?[com]{3}$
this currently matches xxxx#yahoo.com , how can I rewrite this to match some additional domains? for example, gmail.com and deadforce.com. I tried the following but it did not work, what am I doing wrong?
^([a-z0-9_\.-])+#[yahoo|gmail|deadforce]{5,9}\.([com]{3}\.)?[com]{3}$
Thank you in advance!

Your regex doesn't say what you think it says.
^([a-z0-9_\.-])+#[yahoo]{5}\.([com]{3}\.)?[com]{3}$
Says any characters a-z, 0-9, ., - one or more times.
That later part where you are trying match yahoo.com is incorrect. It says y, a, h, or o, any of those characters are allowed 5 times. Same with the com so aaaaa.ooo would be valid here. I'm not sure what the ([com]{3}\.)?[com]{3} was trying to say but I presume you wanted to check for .com.
See character classes documentation here, http://www.regular-expressions.info/charclass.html.
What you want is
^([a-z0-9_.\-])+#yahoo\.com$
or for more domains use grouping,
^([a-z0-9_.\-])+#(yahoo|gmail|deadforce)\.com$
You haven't stated what language you are using so a real demo can't be given.
Functional demo, https://jsfiddle.net/qa9x9hua/1/

Email validation is a notoriously difficult problem, and many people have failed quite horribly at trying to validate them themselves.
Filter var has a filter just for emails. Use that to check for email address validity. See http://php.net/manual/en/function.filter-var.php
if (filter_var('bob#example.com', FILTER_VALIDATE_EMAIL)) {
// Email is valid
}
There's probably no downside to doing the domain check the easy way. Just check for the domain strings in the email address. e.g.
if (
filter_var($email, FILTER_VALIDATE_EMAIL) &&
preg_match("/#(yahoo|gmail|deadforce)\.com/", $email)
) {
// Email is valid
}
In terms of your original regular expression, quite a lot of it was incorrect, which is why you were having trouble changing it.
regexper shows what you've created.
([a-z0-9_\.-])+ should be [a-z0-9_\.-]+ or ([a-z0-9_\.-]+)
The () are only capturing results in this section. If you want results move the brackets, if not remove them.
[yahoo]{5} should be yahoo
That's matching 5 characters that are one of y,a,h,o so it would match hayoo etc.
\.([com]{3}\.)?[com]{3} should be \.com
Dunno what this was trying to accomplish but you only wanted .com
Take a look at http:// www.regular-expressions.info /tutorial.html for a guide to regular expressions

How to handle possibly HTML encoded values in javascript

I have a situation where I'm not sure if the input I get is HTML encoded or not. How do I handle this? I also have jQuery available.
function someFunction(userInput){
$someJqueryElement.text(userInput);
}
// userInput "<script>" returns "<script>", which is fine
// userInput "<script>" returns &lt;script&gt;", which is bad
I could avoid escaping ampersands (&), but what are the risks in that? Any help is very much appreciated!
Important note: This user input is not in my control. It returns from a external service, and it is possible for someone to tamper with it and avoid the html escaping provided by that service itself.

You really need to make sure you avoid these situations as it introduces really difficult conditions to predict.
Try adding an additional variable input to the function.
function someFunction(userInput, isEncoded){
//Add some conditional logic based on isEncoded
$someJqueryElement.text(userInput);
}
If you look at products like fckEditor, you can choose to edit source or use the rich text editor. This prevents the need for automatic encoding detection.
If you are still insistent on automatically detecting html encoding characters, I would recommend using index of to verify that certain key phrases exist.
str.indexOf('<') !== -1
This example above will detect the < character.
~~~New text added after edit below this line.~~~
Finally, I would suggest looking at this answer. They suggest using the decode function and detecting lengths.
var string = "Your encoded & decoded string here"
function decode(str){
return decodeURIComponent(str).replace(/</g,'<').replace(/>/g,'>');
}
if(string.length == decode(string).length){
// The string does not contain any encoded html.
}else{
// The string contains encoded html.
}
Again, this still has the problem of a user faking out the process by entering those specially encoded characters, but that is what html encoding is. So it would be proper to assume html encoding as soon as one of these character sequences comes up.

You must always correctly encode untrusted input before concatenating it into a structured language like HTML.
Otherwise, you'll enable injection attacks like XSS.
If the input is supposed to contain HTML formatting, you should use a sanitizer library to strip all potentially unsafe tags & attributes.
You can also use the regex /<|>|&(?![a-z]+;) to check whether a string has any non-encoded characters; however, you cannot distinguish a string that has been encoded from an unencoded string that talks about encoding.

XSS and other ways to terminate JavaScript strings

Is there a different way to terminate strings in JavaScript?
I'm testing a server for XSS vulnerabilities, and I'm seeing the following code in the HTTP response:
<script>
var myVar = "USER CONTROLLED STRING";
</script>
The user-controlled string comes from the URL and all double quotes are removed before the response is generated. Besides that, all other characters are allowed.
Is XSS possible?
(And ,yes, I know the proper thing to do here would would hex encode(\xHH) all non-alphanumeric characters as per recommended by the OWASP XSS Prevention Cheat Sheet, but from I tester's perspective I want to know if I could exploit this.)

Yes, you can perform an attack: http://jsfiddle.net/vTmq6/1/
<script>
var myVar = "</script><script>alert('hacked');</script>";
</script>

A </script> would denote the script element’s end tag regardless of whether the resulting JavaScript code is valid or not. This is due to the restrictions on the contents of raw text elements, which the script element belongs to:
The text in raw text and RCDATA elements must not contain any occurrences of the string "</" (U+003C LESS-THAN SIGN, U+002F SOLIDUS) followed by characters that case-insensitively match the tag name of the element followed by one of "tab" (U+0009), "LF" (U+000A), "FF" (U+000C), "CR" (U+000D), U+0020 SPACE, ">" (U+003E), or "/" (U+002F).

We Keep Coding

JavaScript is the programming language of the Web.

Prevent Smart Quotes on HTML5 Input - javascript

Related

Can i use the user defined regex in Meteor find function without escaping the special characters?

How to combat the backslash return character breaking my javascript?

Rewrite regex to accept conditional terms

How to handle possibly HTML encoded values in javascript

XSS and other ways to terminate JavaScript strings

Categories

Resources