I'm working on multi-language site based on php.
For supporting multi-language, I'm using localizing file like below.
[localize.en-US.php]
$lang_code = "en-US";
$is_rtl = false;
.
.
.
define("WORD_EMAIL", "e-mail");
define("WORD_NAME", "name");
.
.
.
Defined words used by two way like below.
[HTML]
<?=WORD_EMAIL?> : <input type="text" name="email"/>
<?=WORD_NAME?> : <input type="text" name="name"/>
[Javascript]
if(frm.email.value==="") {
alert("<?=WORD_EMAIL?> required.");
return false;
}
The problem occured when I'd working on Hebrew.
The word "e-mail" of Hebrew tanslation has a double quote in it.
I tried to escaping double quote.
To escape double quote,
PHP need one backslash, and javascript need one more and one another for backslash.
So I added 3 backslashes before double quote.
it shows propery on javascript alert. but on HTML backslash(for javascript) appears..
Yes, I Know using single quote can solve this simply.
But it occurs an exception among localize files(some French word uses single quote).
Can anyone help about this? Any clues welcome.
You always need to encode or escape values for the context you're embedding them in. When putting anything into HTML, you need to HTML-encode it unless you accidentally want the values interpreted as HTML. When putting anything into Javascript source code, you need to escape it properly there, for which JSON-encoding happens to be the right technique:
<?= htmlspecialchars(WORD_EMAIL, ENT_COMPAT, 'UTF-8'); ?> : <input type="text" name="email"/>
alert(<?= json_encode(WORD_EMAIL); ?> + " required.");
Also see The Great Escapism (Or: What You Need To Know To Work With Text Within Text).
I would argue that your i18n approach of pretty flawed though; "אִימֵיְיל required"* seems like a very insufficient localisation. You will want to look into proper tools like gettext and implementations/analogues of it for Javascript.
* Google translation, I don't speak Hebrew…
Related
To use a tts method in my javascript code I wrote this code:
i(class="btn-left-margin fa fa-volume-down fa-2x speak-task", onclick=`responsiveVoice.speak(${word.rq}, "French Female");return false;`)
and the French sentence is saved to mongodb as "...jusqu'à...", that is it has a single quote.
This breaks the code obviously and I get:
SyntaxError: missing ) after argument list
How to properly escape to prevent this and similar cases to happen?
I remember "validate input, escape output" but I don't know how!
Edit
If I remove the single quote and have it as "...jusqu à...", it works fine. I just found out that I don't know how properly escape all javascript related characters before saving to mongodb.
By reading forum here, I solved my case with this improvement:
onclick=`let x="${word.ea}".replace(/['"]+/g, ' ');speakEn(x)`
but I think there must be some best practice for escaping the code in JS that I don't know, which will escape out all special characters when publishing text from mongodb (like "<" and ">", etc.). I first thought it could be escape() method, but it happens not to be the one and even deprecated.
This code solved my problem:
onclick=`let x="${word.ea}".replace(/['"]+/g, ' ');speakEn(x)`
Do you have to convert anything besides the quotes (") to (") inside of:
<input type="text" value="$var">
I personally do not see how you can possibly break out of that without using " on*=....
Is this correct?
Edit: Apparently some people think my question is too vague;
<input type="text" value="<script>alert(0)</script>"> does not execute. Thus, making it impossible to break out of using without the usage of ".
Is this correct?
There really are two questions that you're asking (or at least can be interpreted):
Can the quoted value attribute of input[type="text"] be injected if quotes are disallowed?
Can an arbitrary quoted attribute of an element be injected if quotes are disallowed.
The second is trivially demonstrated by the following:
Foo
Or
<div onmousemove="alert(123);">...
The first is a bit more complicated.
HTML5
According to the HTML5 spec:
Attribute values are a mixture of text and character references, except with the additional restriction that the text cannot contain an ambiguous ampersand.
Which is further refined in quoted attributes to:
The attribute name, followed by zero or more space characters, followed by a single U+003D EQUALS SIGN character, followed by zero or more space characters, followed by a single """ (U+0022) character, followed by the attribute value, which, in addition to the requirements given above for attribute values, must not contain any literal U+0022 QUOTATION MARK characters ("), and finally followed by a second single """ (U+0022) character.
So in short, any character except an "ambiguous ampersand" (&[a-zA-Z0-9]+; when the result is not a valid character reference) and a quote character is valid inside of an attribute.
HTML 4.01
HTML 4.01 is less descriptive than HTML5 about the syntax (one of the reasons HTML5 was created in the first place). However, it does say this:
When script or style data is the value of an attribute (either style or the intrinsic event attributes), authors should escape occurrences of the delimiting single or double quotation mark within the value according to the script or style language convention. Authors should also escape occurrences of "&" if the "&" is not meant to be the beginning of a character reference.
Note, this is saying what an author should do, not what a parser should do. So a parser could technically accept or reject invalid input (or mangle it to be valid).
XML 1.0
The XML 1.0 Spec defines an attribute as:
Attribute ::= Name Eq AttValue
where AttValue is defined as:
AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"
The & is similar to the concept of an "ambiguous ampersand" from HTML5, however it's basically saying "any unencoded ampersand".
Note though that it explicitly denies < from attribute values.
So while HTML5 allows it, XML1.0 explicitly denies it.
What Does It Mean
It means that for a compliant and bug free parser, HTML5 will ignore < characters in an attribute, and XML will error.
It also means that for a compliant and bug free parser, HTML 4.01 will behave in unspecified and potentially odd ways (since the specification doesn't detail the behavior).
And this gets down to the crux of the issue. In the past, HTML was such a loose spec, that every browser had slightly different rules for how it would deal with malformed html. Each would try to "fix" it, or "interpret" what you meant. So that means that while a HTML5 compliant browser wouldn't execute the JS in <input type="text" value="<script>alert(0)</script>">, there's nothing to say that a HTML 4.01 compliant browser wouldn't. And there's nothing to say that a bug may not exist in the XML or HTML5 parser that causes it to be executed (though that would be a pretty significant problem).
THAT is why OWASP (and most security experts) recommend you encode either all non-alpha-numeric characters or &<" inside of an attribute value. There's no cost in doing so, only the added security of knowing how the browser's parser will interpret the value.
Do you have to? no. But defense in depth suggests that, since there's no cost to doing so, the potential benefit is worth it.
If your question is "what types of xss-attacks are possible" then you better google it. I'll just leavev some examples of why you should sanitize your inputs
If input is generated by echo '<input type="text" value="$var">', then simple ' breaks it.
If input is plain HTML in PHP page then value=<?php deadly_php_script ?> breaks it
If this is plain HTML input in HTML file - then converting doublequotes should be enough.
Although, converting other special symbols (like <, > and so on) is a good practice. Inputs are made to input info that would be stored on server\transferred into another page\script, so you need to check what could break those files. Let's say we have this setup:
index.html:
<form method=post action=getinput.php>
<input type="text" name="xss">
<input type="submit"></form>
getinput.php:
echo $_POST['xss'];
Input value ;your_deadly_php_script breaks it totally (you can also sanitize server-side in that case)
If that's not enough - provide more info on your question, add more examples of your code.
I believe the person is referring to cross site scripting attacks. They tagged this as php, security, and xss
take for example
<input type="text" value=""><script>alert(0)</script><"">
The above code will execute the alert box code;
<?php $var= "\"><script>alert(0)</script><\""; ?>
<input type="text" value="<?php echo $var ?>">
This will also execute the alert box.
To solve this you need to escape ", < >, and a few more to be safe. PHP has a couple of functions worth looking into and each have their ups and downs!
htmlentities() - Convert all applicable characters to HTML entities
htmlspecialchars() - Convert special characters to HTML entities
get_html_translation_table() - Returns the translation table used by htmlspecialchars and htmlentities
urldecode() - Decodes URL-encoded string
What you have to be careful of is that you are passing in a variable and there ways to create errors and such to cause it to break out. Your best bet is to make sure that data is not formatted in an executable manner in case of errors. But you are right if they are no quotes you can't break out but there are ways you or I don't understand at this point that will allow that to happen.
$var = "><script>alert(0);</script> would work... If you can close the quotes you can then close the tag and open another one... But I think you are right, without closing the quotes no injection is possible...
I am aware with escaping special characters in HTML.
But, I am still asking this as I have come across a situation.
I have a JSP, in which I am not allowed put validation on input. Users are entering special characters to test.
Input string:
'##$%
When I am displaying from database, I am using
<%= StringEscapeUtils.escapeHtml(map[i].get("text").toString())%>
where "map" is an array of Hashmap. This works fine.
The problem comes when I need to pass this string to JavaScript using
<input type="Button"
onclick="onEdit('<%= StringEscapeUtils.escapeHtml(map[i].get("text").toString())%>',
'<%= strShortCut%>','<%= map[i].get("uid")%>')" value="Edit">
The string becomes ''##$%'.
How do I escape a single quote?
If you would be using Java, maybe you can do the below in Java.
import org.apache.commons.lang.StringEscapeUtils;
...
String result = StringEscapeUtils.escapeJavaScript(jsString);
Just prepend every single quote with a backslash. Like the following:
StringEscapeUtils.escapeHtml(map[i].get("text").toString()).replace("\'","\\'")
But your problem is not only in the single quote. There is also the double quote (") and the backslash itself (\).
Use the same technique as shown before. You can also use regular expressions, but I showed you the simplest way.
To check the escape characters, look at the URL http://docs.oracle.com/javase/tutorial/java/data/characters.html.
I'm trying to write a Javascript HTML/php parser which would extract all opening tags from a HTML/php source and return the type of tag and attributes with their values while at the same time monitoring whether the values/attributes should be evaluated from static text or php variables. The problem is when I try to compose the Javascript RegExp pattern and more specifically certain rare cases. The RegExp I was able to come up with either involve negative lookbehind (to cope with the closing php tag - that is to match a closing bracket that is not preceded by a question mark) or fails in certain cases. The lookbehind version looks like:
<[a-zA-Z]+.*?(?<!\?)>
...and works perfect except for my case which must avoid using lookbehind. A more Javascript friendly version would be:
<[a-zA-Z]+((.(?!</)(?!<[a-zA-Z]+))*)?>
...which works except in this case:
<option value="<?php echo $img; ?>"<?php echo ($hpb[$i]['image_filename']==$img?' selected="selected"':''); ?>><?php echo $img; ?></option>
Am I approaching the problem completely messed up or is the lookbehind really necessary in my case? Any help is greatly appreciated.
Just make sure the last letter before the '>' is not a ?, using [^?]. No lookaheads or -behinds needed.
<[a-zA-Z](.*?[^?])?>
the parentheses and the last ? is to also match tags like <b>.
EDIT The solution didn't work for single character tags without attributes. So here is one that does:
<[a-zA-Z]+(>|.*?[^?]>)
much simpler answer would be <[^/^>]+>
I have a form where users can enter any HTML.
var title = "Cool Check This"
As you can see, the variable is having " but it can be also '. It causes an error if there is ". What is better way to fix this? Storing escaped string in database like below?
$title = str_replace('"', "'", $_REQUEST['title']); // Replace double quote with single quote as js variable above is wrapped with double quotes.
Or escape it before showing on page? Anything in jQuery like escape that can help here?
var title="Cool Check This"
Well, you cannot escape it using JavaScript because JavaScript needs to see what you want to escape and you want to escape that. If you use PHP, you can use addslashes() prior to inserting into JavaScript.
Anyways, you should be careful of allowing to insert any HTML. Wrongly escaped HTML (like allowing to insert <script>) can allow to do various dangerous stuff, like stealing all cookies.