Do browsers widely support numeric quotes in attributes? - javascript

There exist other ways of linking to JS, apart from this (the usual)..
<script src="myscript.js" type="text/javascript"></script>
...that utilize other quote types:
<script src="myscript.js" type="text/javascript"></script>
Are these widely supported in modern browsers, and older browsers in use, such as IE6? Basically is it safe to use this method, just as you would use the regular double-quote method?
Edit: The HTML4 spec seems to allow it, but is it well supported in practical reality?
3.2.2 Attributes
Authors may also use numeric character references to represent
double quotes (") and single quotes (').
For double quotes authors can also use the
character entity reference ".

Using " instead of " is simply wrong, it doesn't have the same meaning within the SGML and XML specifications. Argument values of elements should use either single (') or double quotes ("). In the old SGML specification this element
<foo bar="quux" />
could be read as an element with the name foo, and attribute named bar with the value "quux". However, the standard defines that unquoted attribute values should not include escaped characters. And this element
<foo bar="quux" />
should be read as an element with the name foo, and attribute named bar with the value quux without the quotes. This is because in SGML the quotes are optional, and everything up to the next space will be used as the value for the attribute.
XML requires quotes.

There is a difference between an attribute value delimiter and a quote or double quote character.
You have to use a literal " or ' to delimit attribute values (except where delimiters are optional). In this case, the squence of bytes means "attribute value delimited" not "(double) quote mark"
The character references can be used to represent a (double) quote mark but is a more complicated and inefficient way compared to using a literal so should only be used when the literal is not available (i.e. when it would be an attribute value delimiter because you are inside an an attribute value where the opening delimiter was that character).

Just out of curiosity. Why would you want to use the encoded variants? In most of the text editors it will break the formatting. For me that would be very annoying.

You should stick with the double quotes - othewise the attribute might not be correctly read.

<script src=myscript.js></script>
is valid in HTML5 and supported by every significant browser.

Related

Escaping several quote levels in vues templates

Going through the Vue online guide, I ran into something that looks like a quote escaping problem. More specifically, I am toying around with the example provided in chapter components->events.
The template in my component looks like
"<div class=\"blog-post\">\
<h3>{{ post.title }}</h3>\
<button #click=\"$emit(\\\"enlarge-text\\\")\" >Enlarge text</button>\
<div v-text=\"post.content\"></div>\
</div>"
And instead of the expected button, I get the string
")" >Enlarge text
I managed to circumvent my issue by replacing the two occurrences of the double escape \\\" by single quotes, but I have the feeling there is something I am missing here. Can you help me to understand what is happening here or provide me pointers towards the relevant doc?
Any explanation is welcome.
As I'm sure you're aware, escaping is used to include characters literally within text that would otherwise be interpreted as having a special meaning. Establishing which characters have special meaning requires us to look at the 'channels' that will be interpreting that text and then select a suitable escaping mechanism for those channels.
In this case the text will be interpreted by 3 channels...
As a JavaScript string literal.
By the Vue template compiler, which has a format very similar to HTML.
Expressions within the template, such as binding expressions, will be treated as JavaScript, potentially including yet more JavaScript string literals.
JavaScript string literals use the \ character to introduce escape sequences. However, the HTML-like syntax used for Vue templates do not. As for HTML they use entities prefixed with &.
So, working backwards, we first need to consider how we escape the expressions within the template. In this case that is $emit("enlarge-text"). As the string "enlarge-text" doesn't contain any special characters we don't need to apply any escaping. Easy so far.
Then we need to escape the template 'HTML'. Now we do run into problems because the #click attribute is delimited with double-quotes and its value contains double-quotes. Obviously we could dodge the issue by using different types of quotes but if we instead hit the problem head-on we'd need to use & entities to escape those quotes. i.e. " for ". That gives us:
<button #click="$emit("enlarge-text")">Enlarge text</button>
I believe this is where the escaping in the question goes wrong as it attempts to use \ escaping to escape the attribute value.
If we were using SFCs then that would be sufficient. But for a template written as a string literal we've still got one more level of escaping to apply, using \. The original quotes around enlarge-text are no longer present so they don't require any further escaping but we still have the quotes around the attribute. That gives us:
"<button #click=\"$emit("enlarge-text")\">Enlarge text</button>"
However, all that said, the usual conventions when specifying string templates are:
Use backticks for the template string itself, giving better multi-line support.
Use double-quotes around attributes.
Use single-quotes for strings within expressions.
Obviously there are cases where that isn't possible, such as if you want to use backticks within an expression, but if you stick to those conventions as much as possible you usually won't need to escape anything. When you do it'll also be a lot simpler to perform the escaping as you aren't using the same delimiters at all three levels.
You could use template literal / template string for this:
let tpl = `<div class="blog-post">
<h3>{{ post.title }}</h3>
<button #click="$emit('enlarge-text')">Enlarge text</button>
<div v-text="post.content"></div>
</div>`;
Not only does it read better, it is way more maintanable than multiple escaped quotes.
You can wrap enlarge-text with single quotes. Like this:
<button #click=\"$emit('enlarge-text')\">Enlarge text</button>

Preserving attributes without value when manipulating with JQuery

The crux of my problem comes down to this issue:
$('<video allowfullscreen></video>').prop('outerHTML') === '<video allowfullscreen></video>' //Is False
$('<video allowfullscreen></video>').prop('outerHTML') === '<video allowfullscreen=""></video>' //Is True
The input I'm giving to jQuery gets partially mangled and transformed in an unwanted way.
My goal is that I have (trusted) html coming in that I want to modify by adding some attributes and wrapping it in other elements before converting it back to a String and passing it to the user as text they can copy.
So an expected output might be something like:
<div><video class="myClass" allowfullscreen></video></div>
Since the input html is coming from elsewhere I'd like to make as little assumptions about it as possible. So ideally I don't want to take the string and parse over it to fix specific attributes or remove instances of ="" (in case there's a reason at some point to specifically set a property to "").
Even if I don't care about having a value set on these properties the correct value would be allowfullscreen="allowfullscreen" anyways. I don't have control over the html coming in so I need to take it as-is. So I can't simply 'fix' the html to pass along something like allowfullscreen="allowfullscreen".
Are there any options or ways to preserve valueless properties when I go from string->jQuery->string?
I'm even open to other technology suggestions that would be better suited to this sort of DOM manipulation, but jQuery would otherwise be ideal because of how concise its syntax is. Vanilla Javascript can do it properly, but the syntax makes the code more brittle which I would like to avoid.
See HTML5 - 8.1.2.3 Attributes
8.1.2.3 Attributes
Attributes for an element are expressed inside the element's start
tag.
Attributes have a name and a value. Attribute names must consist of
one or more characters other than the space characters, U+0000 NULL,
U+0022 QUOTATION MARK ("), U+0027 APOSTROPHE ('), ">" (U+003E), "/"
(U+002F), and "=" (U+003D) characters, the control characters, and any
characters that are not defined by Unicode. In the HTML syntax,
attribute names, even those for foreign elements, may be written with
any mix of lower- and uppercase letters that are an ASCII
case-insensitive match for the attribute's name.
Attribute values are a mixture of text and character references,
except with the additional restriction that the text cannot contain an
ambiguous ampersand.
Attributes can be specified in four different ways:
Empty attribute syntax
Just the attribute name. The value is implicitly the empty string.

Sanitizing HTML input value

Do you have to convert anything besides the quotes (") to (") inside of:
<input type="text" value="$var">
I personally do not see how you can possibly break out of that without using " on*=....
Is this correct?
Edit: Apparently some people think my question is too vague;
<input type="text" value="<script>alert(0)</script>"> does not execute. Thus, making it impossible to break out of using without the usage of ".
Is this correct?
There really are two questions that you're asking (or at least can be interpreted):
Can the quoted value attribute of input[type="text"] be injected if quotes are disallowed?
Can an arbitrary quoted attribute of an element be injected if quotes are disallowed.
The second is trivially demonstrated by the following:
Foo
Or
<div onmousemove="alert(123);">...
The first is a bit more complicated.
HTML5
According to the HTML5 spec:
Attribute values are a mixture of text and character references, except with the additional restriction that the text cannot contain an ambiguous ampersand.
Which is further refined in quoted attributes to:
The attribute name, followed by zero or more space characters, followed by a single U+003D EQUALS SIGN character, followed by zero or more space characters, followed by a single """ (U+0022) character, followed by the attribute value, which, in addition to the requirements given above for attribute values, must not contain any literal U+0022 QUOTATION MARK characters ("), and finally followed by a second single """ (U+0022) character.
So in short, any character except an "ambiguous ampersand" (&[a-zA-Z0-9]+; when the result is not a valid character reference) and a quote character is valid inside of an attribute.
HTML 4.01
HTML 4.01 is less descriptive than HTML5 about the syntax (one of the reasons HTML5 was created in the first place). However, it does say this:
When script or style data is the value of an attribute (either style or the intrinsic event attributes), authors should escape occurrences of the delimiting single or double quotation mark within the value according to the script or style language convention. Authors should also escape occurrences of "&" if the "&" is not meant to be the beginning of a character reference.
Note, this is saying what an author should do, not what a parser should do. So a parser could technically accept or reject invalid input (or mangle it to be valid).
XML 1.0
The XML 1.0 Spec defines an attribute as:
Attribute ::= Name Eq AttValue
where AttValue is defined as:
AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"
The & is similar to the concept of an "ambiguous ampersand" from HTML5, however it's basically saying "any unencoded ampersand".
Note though that it explicitly denies < from attribute values.
So while HTML5 allows it, XML1.0 explicitly denies it.
What Does It Mean
It means that for a compliant and bug free parser, HTML5 will ignore < characters in an attribute, and XML will error.
It also means that for a compliant and bug free parser, HTML 4.01 will behave in unspecified and potentially odd ways (since the specification doesn't detail the behavior).
And this gets down to the crux of the issue. In the past, HTML was such a loose spec, that every browser had slightly different rules for how it would deal with malformed html. Each would try to "fix" it, or "interpret" what you meant. So that means that while a HTML5 compliant browser wouldn't execute the JS in <input type="text" value="<script>alert(0)</script>">, there's nothing to say that a HTML 4.01 compliant browser wouldn't. And there's nothing to say that a bug may not exist in the XML or HTML5 parser that causes it to be executed (though that would be a pretty significant problem).
THAT is why OWASP (and most security experts) recommend you encode either all non-alpha-numeric characters or &<" inside of an attribute value. There's no cost in doing so, only the added security of knowing how the browser's parser will interpret the value.
Do you have to? no. But defense in depth suggests that, since there's no cost to doing so, the potential benefit is worth it.
If your question is "what types of xss-attacks are possible" then you better google it. I'll just leavev some examples of why you should sanitize your inputs
If input is generated by echo '<input type="text" value="$var">', then simple ' breaks it.
If input is plain HTML in PHP page then value=<?php deadly_php_script ?> breaks it
If this is plain HTML input in HTML file - then converting doublequotes should be enough.
Although, converting other special symbols (like <, > and so on) is a good practice. Inputs are made to input info that would be stored on server\transferred into another page\script, so you need to check what could break those files. Let's say we have this setup:
index.html:
<form method=post action=getinput.php>
<input type="text" name="xss">
<input type="submit"></form>
getinput.php:
echo $_POST['xss'];
Input value ;your_deadly_php_script breaks it totally (you can also sanitize server-side in that case)
If that's not enough - provide more info on your question, add more examples of your code.
I believe the person is referring to cross site scripting attacks. They tagged this as php, security, and xss
take for example
<input type="text" value=""><script>alert(0)</script><"">
The above code will execute the alert box code;
<?php $var= "\"><script>alert(0)</script><\""; ?>
<input type="text" value="<?php echo $var ?>">
This will also execute the alert box.
To solve this you need to escape ", < >, and a few more to be safe. PHP has a couple of functions worth looking into and each have their ups and downs!
htmlentities() - Convert all applicable characters to HTML entities
htmlspecialchars() - Convert special characters to HTML entities
get_html_translation_table() - Returns the translation table used by htmlspecialchars and htmlentities
urldecode() - Decodes URL-encoded string
What you have to be careful of is that you are passing in a variable and there ways to create errors and such to cause it to break out. Your best bet is to make sure that data is not formatted in an executable manner in case of errors. But you are right if they are no quotes you can't break out but there are ways you or I don't understand at this point that will allow that to happen.
$var = "><script>alert(0);</script> would work... If you can close the quotes you can then close the tag and open another one... But I think you are right, without closing the quotes no injection is possible...

Getting dynamically created element by ID (Jquery)

I am using JQuery, and I am creating elements dynamically via before() but I am unable to access those elements via ID. It's a bunch of code and I'd rather not post it if possible, but I am checking the text being used as the selector and then checking the length; the length says 0, but I am cross-referencing with the "inspect element" tool in chrome and the ID is most certainly what it should be. Is this a known problem with newly created elements? When I was accessing the element by accessing siblings of a given class it worked just fine, but now there are multiple siblings with the same class and the most efficient way to do things is with an ID.
Here's what I mean when I say I'm checking the text:
alert("id=\"impactPlusMinus~"+questionAnswerNameId+"\"\n"+
$("#impactPlusMinus~"+questionAnswerNameId).length);
The tilde ~ is not a valid character for the id attribute in HTML4. Try using a different character.
ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".").
Source: HTML4 specification. http://www.w3.org/TR/html4/types.html#type-id
The HTML5 specification is not so strict on this, but you may run into problems using the tilde on many browsers or frameworks like jQuery. For instance, in jQuery, the tilde in a selector means something else entirely.
You'll need to escape the tilde:
$("#impactPlusMinus\\~"+questionAnswerNameId)
Since you are using reserved keywords in Id, You can access it as an attribute value or escape the special char with \\ if you are directly accessing it.
$('[id="impactPlusMinus~' + questionAnswerNameId+ '"]'
or
$("#impactPlusMinus\\~" + questionAnswerNameId);
from docs
To use any of the meta-characters ( such as !"#$%&'()*+,./:;<=>?#[]^`{|}~ ) as a literal part of a name, it must be escaped with with two backslashes: \.
To avoid clashes with how jQuery determines what an identifier looks like, especially in the case of HTML5 which is less strict about naming, you could create the element reference like this:
$(document.getElementById('impactPlusMinus~' + questionAnswerNameId));
This makes sure only the browser is used to find the element, after which the jQuery constructor is applied.

Whether it will alert?

is this correct
cricket
onclick="add('alert("Google !")');" is being parsed as:
onclick # attribute name
=
"add('alert(" # string
Google ! # random garbage
")');" # another string
You'll have to escape the inner quotes, otherwise they terminate the string:
onclick="add('alert("Google !")');"
Beyond that, it depends on what add() does.
No.
onclick="add('alert("
You don't have a complete JavaScript statement inside your attribute value.
Some authors use the character entity reference """ to encode instances of the double quote mark (") since that character may be used to delimit attribute values.
— http://www.w3.org/TR/html4/charset.html#h-5.3
(And as an aside:
Don't use href="#", build on stuff that works
Don't use the style attribute, separate presentation and content
Don't forget to put spaces between your attributes
Don't use intrinsic event attributes (such as onclick), use unobtrusive JS (which would also solve the problem of the nested quotes)
Where possible, avoid tabindex in favour of a sensible natural tab order
)

Categories