encodeURIComponent() vs JSON.stringify() in data-* attribute - javascript

I'd like to use array as data-* attribute and a lot of StackOverflow answers suggest that I should use JSON.stringify();
How to pass an array into jQuery .data() attribute
Store and use an array using the HTML data tag and jQuery
https://gist.github.com/charliepark/4266921
etc.
So, if I have this array: ['something', 'some\'thing', 'some"thing'] it will be parsed to "["something","some'thing","some\"thing"]" and therefore it won't fit neither data-*='' nor data-*="" because either ' or " will break the HTML tag.
Am I missing something or encodeURIComponent() is a true solution to encoding arrays like that? Why in other StackOverflow answers nobody noticed this?

The reasoning that JSON.stringify is not guaranteed to be safe in HTML attributes when the text is part of the HTML markup itself is valid. However, there is no escaping issue if using one of the access methods (eg. .data or .attr) to assign the value as these do not directly manipulate raw HTML text.
While encodeURIComponent would "work" as it escapes all the problematic characters, it both results in overly ugly values/markup and requires a manual decodeURIComponent step when consuming the values - yuck!
Instead, if inserting the data directly into the HTML, simply "html encode" the value and use the result as the attribute value. Such a function comes with most server-side languages, although an equivalent is not supplied natively with JavaScript.
Assuming the attribute values are quoted, the problematic characters that need to be replaced with the appropriate HTML entities are:
& - &, escape-the-escape, applied first
" - ", for double-quoted attribute
' - ', for single-quoted attribute
Optional (required for XML): < and >
Using the above approach relies on the parsing of the HTML markup, and the automatic decoding of HTML entities therein, such that the actual (non-encoded) result is stored as the data-attribute value in the DOM.

Related

javascript generating invalid HTML5 attributes in Firefox

I am noticing some very strange behavior in firefox and I'm wondering if anyone has a strategy for how to normalize or work around this behavior.
Specifically if you provide firefox a basic anchor containing html entities it will unescape those entities, fail to re-escape them and hand you back invalid html.
For example firefox mishandles the following url:
My Original Link
If this url is parsed by firefox it will unescape the ><" and start handling a url like:
My Original Link
This same operation appears to work fine elsewhere, even safari and edge.
I tried quite a few different ways of handing the html to firefox to avoid this problem. Tried manually invoking the parser, tried setting innerHTML, tried jQuery html(), tried giving jQuery constructor a giant string, etc. All methods produced the same broken result.
See a fiddle here:
https://jsfiddle.net/kamelkev/hfd2b6sn/
I am a little mystified by how broken this handling seems to be. There must be a way to work around this issue, but I can't seem to find a way.
My application is an html manipulation tool, so I typically normalize around issues like this by dropping down to XML and handling the problems there before persisting to a dumb key-value store, but in this particular case the <> characters are preventing me from processing this document as XML.
Ideas?
A < or a > is valid inside of an attribute value, unescaped. It's not best practice, but it is valid.
What's happening is that Firefox is parsing the original HTML and making elements out of it. At that point, the original HTML no longer exists. When you call .outerHTML, the HTML is reconstructed from the element.
Firefox then generates it using a different set of rules than Chrome does.
It isn't clear what exactly you need to do this for... really you should edit the DOM and export the HTML for the whole DOM when done. Constantly re-interpreting HTML isn't necessary.
The > and < are unescaped when the parser parses the source to construct the DOM. When you serialize an element back to a string, you are not guaranteed to obtain the same text as the source.
In this case, innerHTML and outerHTML use the HTML fragment serialization algorithm, which escapes attribute values using attribute mode:
Escaping a string (for the purposes of the algorithm above) consists
of running the following steps:
Replace any occurrence of the "&" character by the string "&".
Replace any occurrences of the U+00A0 NO-BREAK SPACE character by the string " ".
If the algorithm was invoked in the attribute mode, replace any occurrences of the """ character by the string """.
If the algorithm was not invoked in the attribute mode, replace any occurrences of the "<" character by the string "<", and any
occurrences of the ">" character by the string ">".
That's why " is escaped to ", but < and > remain.
This is OK, because < and > are allowed in HTML double-quoted attribute values:
U+0022 QUOTATION MARK ("): Switch to the after attribute value (quoted) state.
U+0026 AMPERSAND (&): Switch to the character reference in attribute value state [...]
U+0000 NULL: Parse error [...]
EOF: Parse error [...]
Anything else: Append the current input character to the current attribute's value.
However, XML does not allow < and > in attribute values. If you want to get valid XHTML, use a XML serializer:
var s = new XMLSerializer();
var str = s.serializeToString(document.querySelector('a'));
console.log(str);
My Original Link

Preserving attributes without value when manipulating with JQuery

The crux of my problem comes down to this issue:
$('<video allowfullscreen></video>').prop('outerHTML') === '<video allowfullscreen></video>' //Is False
$('<video allowfullscreen></video>').prop('outerHTML') === '<video allowfullscreen=""></video>' //Is True
The input I'm giving to jQuery gets partially mangled and transformed in an unwanted way.
My goal is that I have (trusted) html coming in that I want to modify by adding some attributes and wrapping it in other elements before converting it back to a String and passing it to the user as text they can copy.
So an expected output might be something like:
<div><video class="myClass" allowfullscreen></video></div>
Since the input html is coming from elsewhere I'd like to make as little assumptions about it as possible. So ideally I don't want to take the string and parse over it to fix specific attributes or remove instances of ="" (in case there's a reason at some point to specifically set a property to "").
Even if I don't care about having a value set on these properties the correct value would be allowfullscreen="allowfullscreen" anyways. I don't have control over the html coming in so I need to take it as-is. So I can't simply 'fix' the html to pass along something like allowfullscreen="allowfullscreen".
Are there any options or ways to preserve valueless properties when I go from string->jQuery->string?
I'm even open to other technology suggestions that would be better suited to this sort of DOM manipulation, but jQuery would otherwise be ideal because of how concise its syntax is. Vanilla Javascript can do it properly, but the syntax makes the code more brittle which I would like to avoid.
See HTML5 - 8.1.2.3 Attributes
8.1.2.3 Attributes
Attributes for an element are expressed inside the element's start
tag.
Attributes have a name and a value. Attribute names must consist of
one or more characters other than the space characters, U+0000 NULL,
U+0022 QUOTATION MARK ("), U+0027 APOSTROPHE ('), ">" (U+003E), "/"
(U+002F), and "=" (U+003D) characters, the control characters, and any
characters that are not defined by Unicode. In the HTML syntax,
attribute names, even those for foreign elements, may be written with
any mix of lower- and uppercase letters that are an ASCII
case-insensitive match for the attribute's name.
Attribute values are a mixture of text and character references,
except with the additional restriction that the text cannot contain an
ambiguous ampersand.
Attributes can be specified in four different ways:
Empty attribute syntax
Just the attribute name. The value is implicitly the empty string.

Must I escape strings before I set them as the value of a textarea?

In the following scenario:
var evil_string = "...";
$('#mytextarea').val(evil_string);
Do I have to escape an untrusted string before using it as the value of a textarea element?
I understand that I will have to handle the string with care if I want to do anything with it later on, but is the act of putting the string in a textarea without escaping inherently dangerous?
I have done some basic testing and the usual special characters &'"< seem to be successfully added to the textarea without interpretation.
No, you don't need to do that. When you assign directly to property of DOM element (which jQuery's .val does under the hood), the data is interpreted verbatim. You only need to quote text with methods that explicitly treat input as HTML - i.e. outer/innerHTML and like.
Putting unescaped strings as values of textboxes or textareas is fine. You only need to worry about it when you are putting strings in your HTML that could potentially be interpreted as other HTML. Generally speaking, this means you should escape the strings when the text could be a child of some HTML DOM Element. This could be done on the server (as lolka_bolka suggested), or on the client before adding the potentially dangerous string to the DOM.

unescape in javascript not working when %26 ( & sign) is in value

I have the below code in my JSP. UI displays every character correctly other than "&".
<c:out value="<script>var escapedData=unescape('${column}');
$('div').html(escapedData);</script>" escapeXml="false" /> </div>
E.g. 1) working case
input = ni!er#
Value in my escapedData variable is ni%21er%40. Now when I put it in my div using
$('div').html(escapedData); then o/p on html is as expected
E.g. 2) Issue case
input = nice&
Value in my escapedData variable is nice%26. Now when I put it in my div using
$('div').html(escapedData); then also it displays below
$('#test20').html('nice%26');
However, when output is displayed in JSP, it just prints "nice". It truncates everything after &.
Any suggestions?
It looks like you have some misunderstandings what unescape(val)/escape(val) do and where you need them. And what you need to take attention of when you use .html().
HTML and URI have certain character that have special meanings. The most important ones are:
HTML: <, >, &
URI: /,?,%,&
If you want to use one of those characters in HTML or URI you need to escape them.
The escaping for URI and for HTML are different.
The functions unescape/escape (deprecated) and decodeURI/endcodeURI are for URI. But was you want is to escape your data into the HTML format.
There is no build-in function in_JS_ that does this but you could e.g. use the code of the answer to this question Can I escape html special chars in javascript?.
But as it seems that you use jQuery you could think of just using .text instead of .html as this will do the escaping for you.
An additional note:
I'm pretty sure that the var escapedData=unescape('${column}'); does not do anything. I assume that ${column} already is ni!er#/nice&.
So please check your source code. If var escapedData=unescape('${column}'); will look like var escapedData=unescape('ni!er#'); then you should remove the unescape otherwise you would not get the expected result if the ${column} contains something like e.g. %23.

jQuery URI encode (char &) .html() attribute value

I've read a lot of the HTML encoding post for the last day to solve this. I just managed to locate it.
Basicly I have set an attribute on an embed tag with jQuery. It all works fine in the browser.
No I want to read the HTML itself to add the result as a value for an input field to let the user copy & past it.
The PROBLEM is that the .html() function (also plain JS .innerHTML) converts the '&' char into '& amp;' (without the space). Using differen html encoder functions doesnt make a difference. I need the '&' char in the embed code.
Here is the code:
HTML:
<div id="preview_small">
<object><embed src="main.swf?XY=xyz&YXX=xyzz"></embed>
</object></div>
jQuery:
$("#preview_small object").clone().html();
returns
... src=main.swf?XY=xyz&YXX=xyzz ...
When I use:
$("#preview_small object").clone().children("embed").attr("src");
returns
main.swf?XY=xyz&YXX=xyzz
Any ideas how I can get the '&' char direct, without using regex after I got the string with .html()
I need the & char in the embed code.
No you don't. This:
<embed src="xyz&YXX=xyz"></embed>
is invalid HTML. It'll work in browsers since they try to fix up mistakes like this, but only as long as the string YXX doesn't happen to match an HTML entity name. You don't want to rely on that.
This:
<embed src="xyz&YXX=xyz"></embed>
is correct, works everywhere, and is the version you should be telling your users to copy and paste.
attr("src") returns xyz&YXX=xyz
Yes, that's the underlying value of that attribute. Attribute values and text content can contain almost any character directly. It's only the HTML serialisation of them where they have to be encoded:
<div title="a<b"&c>d">
$('div').attr('title') -> a<b"&c>d
I want to read the HTML itself to add the result as a value for an input field
<textarea id="foo"></textarea>
$('#foo').val($('#preview_small object').html());
However note that the serialised output of innerHTML/html() is not in any particular fixed dialect of HTML, and in particular IE may give you code that, though generally understandable by browsers, is also not technically valid:
$('#somediv').html('<div title="a/b"></div>');
$('#somediv').html() -> '<DIV title=a/b></DIV>' - missing quotes
So if you know the particular format of HTML you want to present to the user, you may be better off generating it yourself:
function encodeHTML(s) {
return s.replace(/&/g, '&').replace(/</g, '<').replace(/"/g, '"');
}
var src= 'XY=xyz&YXX=xyzz';
$('#foo').val('<embed src="'+encodeHTML(src)+'"><\/embed>');
(The \/ in the close tag is just so that doesn't get mistaken as the end of a <script> block, in case you're in one.)

Categories