changing string delimiters to backticks : possible impact? - javascript

ES6 introduced template strings delimited by backticks `.
In which cases would replacing single ' or double " quotes around a string by backticks yield a different result, or otherwise be unsafe ?
Escaping of existing backticks inside the code is performed as part of the operation.
// before
var message = "Display backtick ` on screen";
// after
var message = `Display backtick \` on screen`;
I understand that any string containing ${...} would fail as it would be (mis)interpreted as a placeholder. Are there any other relevant patterns ?
Context : this is for the development of a JS compression tool that automatically processes input code. The latter, and the strings it contains is user-provided, hence I have no control over its contents. The only assumption one can make is that it is valid Javascript.
Execution environment can by any recent browser or Node.js.

You can check the grammar of string literals and no-substitution-templates.
Apart from the ` vs. ' vs. " and the special meaning of ${ that you already mentioned, only line breaks differ between the two forms. We can ignore line continuations ("escaped linebreaks") as your minifier would strip them away anyway, but plain line breaks are valid inside templates while not inside string literals, so if you convert back and forth you'll have to care about those as well. You can even use them to save another byte if your string literal contains \n.

Related

Escape string for concatentaion with string and template literals

I need to concatenate untrusted* data into a javascript string, but I need it to work for all types of strings (single quoted, double quoted, or backtick quoted)
And ideally, I need it to work for multiple string types at once
I could use string replace, but this is usually a bad idea.
I was using JSON.stringify, but this only escapes double quotes, not single or backtick.
Other answers that I've found deal with escaping only a single type of quote at a time (and never backticks).
An example of what I need:
untrustedData = 'a String with \'single quotes\', \"double quotes\" and \`backticks\` in it';
const someJS = `console.log(\`the thing is "${escapingFunctionHere(untrustedString)}"\`)`
someJS will be passed to new Function
* N.B. In my context "untrusted" here doesn't mean potentially malicous, but it does need to cope with quotes, escapes and the like.
I am building javascript code dynamically, the constructed code will not be in any way web-facing. In fact its likely that I am the only one who will use this tool directly or indirectly.
I am happy to accept the minimal associated risks
NOTE TO OTHERS: Be sure you understand the risks before doing this kind of thing.
For those interested, I am writing a parser creator. Given an input ebnf grammar file, it will output a JS class that can be used to parse things.
I really do need to output code here.
If all you need to do is escape single quotes ', double quotes " and backticks `, then using replace to prepend a backslash \ should be enough:
untrustedData.replace(/['"`]/g, '\\$&')
const untrustedData = 'I \'am\' "a `string`"';
console.log(untrustedData);
const escapedData = untrustedData.replace(/['"`]/g, '\\$&');
console.log(escapedData);

Regex works in Browser but not in NodeJs

On a browser this
"C:\\backup\\".replace(/\\\\/g, '\\')
returns:
"C:\backup\"
BUT in NodeJs v12.13
'C:\\backup\\'
Is it intended to not work the similar way?
If so, how to replace all double backslash to a single one in Node ?
In JavaScript string a single backslash is represented using "\\". That's because \" represents a single " character and doesn't terminate the string.
The regex you wrote actually has no effect because the string doesn't contain two backslashes next to one another.
You can (and should) use \\ in any place/function that expects a backslash.
alert("Backslash: \\");
// alert("Backslash: \"); // SyntaxError
However, the JavaScript console sometimes likes to display the sting in JavaScript notation, so it replaces \ with \\ in order to print valid code. This also sometimes happens when typing just a string in the console REPL.
console.log("%o", "\\");
console.log({ backslash: "\\"});
Notice the "sometimes". Per spec, the JavaScript console printer is implementation-defined. Although most consoles, including Node.js's, in these circumstances display "\\" (a valid JavaScript string), Chrome and other Chromium-based browsers display "\" (the actual string value, wrapped in ").
P.S.: There's also a way to make the parser parse a single backslash as a single backslash, using String.raw(), however, it has some limitations:
alert(String.raw `C:\Windows\System32\calc.exe`);
// alert(String.raw `C:\Windows\System32\`); // SyntaxError because string ends with \

Escape dotnet resources in javascript

I need to read dot ner reesources string in java script as mention below.
var resources = #Html.ResourceStrings("Home_General_", Resources.ResourceManager);
The above line will render all the resources (from Dot net resource file) which start with resource key as "Home_General_"
Some of the values from the resources are like "Hi "XYZ" are you there" i.e The string contains quotes character.
If the string has quotes the above call fails.
The one way to avoid this problem is escape the special character as "Hi \"XYZ\" are you there"
Any other way where we can avoid this, As I don't want to pollute my resource string with lot of escape (\) characters.
You need to Javascript-escape any string when you render it as a Javascript string literal.
You must also remove the outer quotes from the string resource; that should be text, not a half-valid Javascript expression.
Use code like this to retrieve a single resource string:
var resourceXYZ = '#Html.Raw(HttpUtility.JavaScriptStringEncode(Resources.ResourceManager.GetString("Home_General_XYZ")))';
We do the following:
We get the resource string via Resources.ResourceManager.GetString().
We pass the result to HttpUtility.JavaScriptStringEncode to escape any special characters in JavaScript.
We pass the result to Html.Raw() to prevent Razor from applying HTML encoding on this string.
We then output the text enclosed in single quote quaracters into the page.
The function Html.ResourceStrings is not a standard function that is part of MVC. Someone at your place must have written it. If you show us this code, we could tell you how to rewrite it to return valid JavaScript literals.
You could wrap your #Html.ResourcesString(...) with HttpUtility.JavaScriptStringEncode which will handle all escape issues.
var resources = #HttpUtility.JavaScriptStringEncode(Html.ResourceStrings("Home_General_", Resources.ResourceManager));

Javascript eval fails to evaluate json when escape characters are involved

I am using rhino javascript engine to evaluate the json. The Json structure is as following :
{"DataName":"111","Id":"222","Date":"2015-12-31T00:00:00","TextValue":"{\"Id\":\"1\",\"Name\":\"Daugherty\",\"ContactName\":\"May C\",\"ContactEmail\":\"may.c#gamil.com\",\"Total\":25,\"Phone\":\"111-111-1111\",\"Type\":\"Daily\",\"Notes\":[{\"Comments\":\"One\",\"Date\":\"2014-11-27T00:00:00.000\"},{\"Comments\":\"Two\",\"Date\":\"2014-11-28T00:00:00.000\"}],\"ImportComplete\":true,\"RunComplete\":true,\"CompleteDate\":\"2014-07-31T00:00:00.000\",\"Amount\":2400.00,\"ProcessingComplete\":true}","NumberValue":4444.5555,"DateValue":"2014-12-01T00:00:00"}
Since I am using Rhino js engine I can't use JSON.parse and JSON.stringify.
As you can see the json has embedded json, this json I am getting from a .net web api which is putting the escape character '\'. I am trying to replace that escape character in javascript but no help.
Is there any way in javascript where we can replace that escape character and use 'eval()' to evaluate the json.
Here's the code that I am trying
var json = '{"DataName":"111","Id":"222","Date":"2015-12-31T00:00:00","TextValue":"{\"Id\":\"1\",\"Name\":\"Daugherty\",\"ContactName\":\"May C\",\"ContactEmail\":\"may.c#gamil.com\",\"Total\":25,\"Phone\":\"111-111-1111\",\"Type\":\"Daily\",\"Notes\":[{\"Comments\":\"One\",\"Date\":\"2014-11-27T00:00:00.000\"},{\"Comments\":\"Two\",\"Date\":\"2014-11-28T00:00:00.000\"}],\"ImportComplete\":true,\"RunComplete\":true,\"CompleteDate\":\"2014-07-31T00:00:00.000\",\"Amount\":2400.00,\"ProcessingComplete\":true}","NumberValue":4444.5555,"DateValue":"2014-12-01T00:00:00"}';
var find = '\"';
var regex = new RegExp(find,'g');
var inj = json.replace(regex,'"');
var pl = eval('(' + inj +')');
confusing backslashes
The problem you are getting is due to the fact of not fully understanding escape characters, when you are more than one level of "string" deep. Whilst a single slash is fine for one level i.e:
"It is no coincidence that in no known language does the " +
"phrase \"As pretty as an Airport\" appear.";
If you take that and then wrap it in outer quotes:
'"It is no coincidence that in no known language does the "' +
'"phrase \"As pretty as an Airport\" appear."';
The backslashes (if supported as escape characters by the system parsing the string) work for the outer-most wrapping quote, not any of the inner quotes/strings as they were before. This means once the js engine has parsed the string, internally the string will be.
'"It is no coincidence that in no known language does the phrase "As pretty as an Airport" appear."';
Which makes it impossible to tell the difference between the " and the \" from the original string. In order to get around this, you need to escape the backslashes in the original string, before you wrap it. This has the result of one level of escaping being used by the JavaScript engine, but still leaving another level remaining within the string. e.g.
'"It is no coincidence that in no known language does the "' +
'"phrase \\"As pretty as an Airport\\" appear."';
Now when the string is parsed, internally it will be:
'"It is no coincidence that in no known language does the phrase \"As pretty as an Airport\" appear."';
ignore the my random Douglas Adams quotes being separated onto more than one line (using +), I've only done that for ease of reading within a fix width area. I've kept it parsable by JavaScript, just in case people copy and paste and expect things to work.
So in order to fix your issue, your JSON source (before placing in the JavaScript code) will have to look like this:
var json = '{"DataName":"111","Id":"222","Date":"2015-12-31T00:00:00","TextValue":"{\\"Id\\":\\"1\\",\\"Name\\":\\"Daugherty\\",\\"ContactName\\":\\"May C\\",\\"ContactEmail\\":\\"may.c#gamil.com\\",\\"Total\\":25,\\"Phone\\":\\"111-111-1111\\",\\"Type\\":\\"Daily\\",\\"Notes\\":[{\\"Comments\\":\\"One\\",\\"Date\\":\\"2014-11-27T00:00:00.000\\"},{\\"Comments\\":\\"Two\\",\\"Date\\":\\"2014-11-28T00:00:00.000\\"}],\\"ImportComplete\\":true,\\"RunComplete\\":true,\\"CompleteDate\\":\\"2014-07-31T00:00:00.000\\",\\"Amount\\":2400.00,\\"ProcessingComplete\\":true}","NumberValue":4444.5555,"DateValue":"2014-12-01T00:00:00"}';
You should find the above will eval directly, without any replacements.
In order to achieve the above programatically, you will have to see what the .NET system you are using offers in the way of escaping backslashes. I mainly work with PHP or Python on the server side. Using those languages you could use:
the $s and s strings below have been cropped for brevity.
<?php
$s = '{"DataName":"111","Id":"222"...';
$s = str_replace("\\", "\\\\", $s);
echo "var json = '$s';";
or ...
#!/usr/bin/env python
s = r'{"DataName":"111","Id":"222"...'
s = s.replace("\\", "\\\\")
print "var json = '" + s + "';"
another solution
It all depends on how you are requesting the content you are wrapping in the string in JavaScript. If you have the ability to write out your js from the server side (most likely with .NET). Like I have shown above with PHP or Python, you don't need to wrap the content in a string at all. You can instead just output the content without being wrapped in single quotes. JavaScript will then just parse and treat it as a literal object structure:
var jso = {"DataName":"111","Id":"222","Date":"2015-12-31T00:00:00","TextValue":"{\"Id\":\"1\",\"Name\":\"Daugherty\",\"ContactName\":\"May C\",\"ContactEmail\":\"may.c#gamil.com\",\"Total\":25,\"Phone\":\"111-111-1111\",\"Type\":\"Daily\",\"Notes\":[{\"Comments\":\"One\",\"Date\":\"2014-11-27T00:00:00.000\"},{\"Comments\":\"Two\",\"Date\":\"2014-11-28T00:00:00.000\"}],\"ImportComplete\":true,\"RunComplete\":true,\"CompleteDate\":\"2014-07-31T00:00:00.000\",\"Amount\":2400.00,\"ProcessingComplete\":true}","NumberValue":4444.5555,"DateValue":"2014-12-01T00:00:00"};
This works because JSON is just a more strict version of a JavaScript Object, and the quote/escape level you already have will work fine.
The only downside to the above solution is that you have to implicitly trust the source of where you are getting this data from, and it will always have to be well formed. If not, you could introduce parse errors or unwanted js into your code; which could be avoided with an eval/JSON.parse system.

How do I escape quotes in HTML attribute values?

I'm building up a row to insert in a table using jQuery by creating a html string, i.e.
var row = "";
row += "<tr>";
row += "<td>Name</td>";
row += "<td><input value='"+data.name+"'/></td>";
row += "</tr>";
data.name is a string returned from an ajax call which could contain any characters. If it contains a single quote, ', it will break the HTML by defining the end of the attribute value.
How can I ensure that the string is rendered correctly in the browser?
Actually you may need one of these two functions (this depends on the context of use). These functions handle all kind of string quotes, and also protect from the HTML/XML syntax.
(1) The quoteattr() function for embeding text into HTML/XML:
The quoteattr() function is used in a context, where the result will not be evaluated by javascript but must be interpreted by an XML or HTML parser, and it must absolutely avoid breaking the syntax of an element attribute.
Newlines are natively preserved if generating the content of a text elements. However, if you're generating the value of an attribute this assigned value will be normalized by the DOM as soon as it will be set, so all whitespaces (SPACE, TAB, CR, LF) will be compressed, stripping leading and trailing whitespaces and reducing all middle sequences of whitespaces into a single SPACE.
But there's an exception: the CR character will be preserved and not treated as whitespace, only if it is represented with a numeric character reference! The result will be valid for all element attributes, with the exception of attributes of type NMTOKEN or ID, or NMTOKENS: the presence of the referenced CR will make the assigned value invalid for those attributes (for example the id="..." attribute of HTML elements): this value being invalid, will be ignored by the DOM. But in other attributes (of type CDATA), all CR characters represented by a numeric character reference will be preserved and not normalized. Note that this trick will not work to preserve other whitespaces (SPACE, TAB, LF), even if they are represented by NCR, because the normalization of all whitespaces (with the exception of the NCR to CR) is mandatory in all attributes.
Note that this function itself does not perform any HTML/XML normalization of whitespaces, so it remains safe when generating the content of a text element (don't pass the second preserveCR parameter for such case).
So if you pass an optional second parameter (whose default will be treated as if it was false) and if that parameter evaluates as true, newlines will be preserved using this NCR, when you want to generate a literal attribute value, and this attribute is of type CDATA (for example a title="..." attribute) and not of type ID, IDLIST, NMTOKEN or NMTOKENS (for example an id="..." attribute).
function quoteattr(s, preserveCR) {
preserveCR = preserveCR ? '
' : '\n';
return ('' + s) /* Forces the conversion to string. */
.replace(/&/g, '&') /* This MUST be the 1st replacement. */
.replace(/'/g, '&apos;') /* The 4 other predefined entities, required. */
.replace(/"/g, '"')
.replace(/</g, '<')
.replace(/>/g, '>')
/*
You may add other replacements here for HTML only
(but it's not necessary).
Or for XML, only if the named entities are defined in its DTD.
*/
.replace(/\r\n/g, preserveCR) /* Must be before the next replacement. */
.replace(/[\r\n]/g, preserveCR);
;
}
Warning! This function still does not check the source string (which is just, in Javascript, an unrestricted stream of 16-bit code units) for its validity in a file that must be a valid plain text source and also as valid source for an HTML/XML document.
It should be updated to detect and reject (by an exception):
any code units representing code points assigned to non-characters (like \uFFFE and \uFFFF): this is an Unicode requirement only for valid plain-texts;
any surrogate code units which are incorrectly paired to form a valid pair for an UTF-16-encoded code point: this is an Unicode requirement for valid plain-texts;
any valid pair of surrogate code units representing a valid Unicode code point in supplementary planes, but which is assigned to non-characters (like U+10FFFE or U+10FFFF): this is an Unicode requirement only for valid plain-texts;
most C0 and C1 controls (in the ranges \u0000..\u1F and \u007F..\u009F with the exception of TAB and newline controls): this is not an Unicode requirement but an additional requirement for valid HTML/XML.
Despite of this limitation, the code above is almost what you'll want to do. Normally. Modern javascript engine should provide this function natively in the default system object, but in most cases, it does not completely ensure the strict plain-text validity, not the HTML/XML validity. But the HTML/XML document object from which your Javascript code will be called, should redefine this native function.
This limitation is usually not a problem in most cases, because the source string are the result of computing from sources strings coming from the HTML/XML DOM.
But this may fail if the javascript extract substrings and break pairs of surrogates, or if it generates text from computed numeric sources (converting any 16-bit code value into a string containing that one-code unit, and appending those short strings, or inserting these short strings via replacement operations): if you try to insert the encoded string into a HTML/XML DOM text element or in an HTML/XML attribute value or element name, the DOM will itself reject this insertion and will throw an exception; if your javascript inserts the resulting string in a local binary file or sends it via a binary network socket, there will be no exception thrown for this emission. Such non-plain text strings would also be the result of reading from a binary file (such as an PNG, GIF or JPEG image file) or from your javascript reading from a binary-safe network socket (such that the IO stream passes 16-bit code units rather than just 8-bit units: most binary I/O streams are byte-based anyway, and text I/O streams need that you specify a charset to decode files into plain-text, so that invalid encodings found in the text stream will throw an I/O exception in your script).
Note that this function, the way it is implemented (if it is augmented to correct the limitations noted in the warning above), can be safely used as well to quote also the content of a literal text element in HTML/XML (to avoid leaving some interpretable HTML/XML elements from the source string value), not just the content of a literal attribute value ! So it should be better named quoteml(); the name quoteattr() is kept only by tradition.
This is the case in your example:
data.value = "It's just a \"sample\" <test>.\n\tTry & see yourself!";
var row = '';
row += '<tr>';
row += '<td>Name</td>';
row += '<td><input value="' + quoteattr(data.value) + '" /></td>';
row += '</tr>';
Alternative to quoteattr(), using only the DOM API:
The alternative, if the HTML code you generate will be part of the current HTML document, is to create each HTML element individually, using the DOM methods of the document, such that you can set its attribute values directly through the DOM API, instead of inserting the full HTML content using the innerHTML property of a single element :
data.value = "It's just a \"sample\" <test>.\n\tTry & see yourself!";
var row = document.createElement('tr');
var cell = document.createElement('td');
cell.innerText = 'Name';
row.appendChild(cell);
cell = document.createElement('td');
var input = document.createElement('input');
input.setAttribute('value', data.value);
cell.appendChild(input);
tr.appendChild(cell);
/*
The HTML code is generated automatically and is now accessible in the
row.innerHTML property, which you are not required to insert in the
current document.
But you can continue by appending tr into a 'tbody' element object, and then
insert this into a new 'table' element object, which ou can append or insert
as a child of a DOM object of your document.
*/
Note that this alternative does not attempt to preserve newlines present in the data.value, because you're generating the content of a text element, not an attribute value here. If you really want to generate an attribute value preserving newlines using 
, see the start of section 1, and the code within quoteattr() above.
(2) The escape() function for embedding into a javascript/JSON literal string:
In other cases, you'll use the escape() function below when the intent is to quote a string that will be part of a generated javascript code fragment, that you also want to be preserved (that may optionally also be first parsed by an HTML/XML parser in which a larger javascript code could be inserted):
function escape(s) {
return ('' + s) /* Forces the conversion to string. */
.replace(/\\/g, '\\\\') /* This MUST be the 1st replacement. */
.replace(/\t/g, '\\t') /* These 2 replacements protect whitespaces. */
.replace(/\n/g, '\\n')
.replace(/\u00A0/g, '\\u00A0') /* Useful but not absolutely necessary. */
.replace(/&/g, '\\x26') /* These 5 replacements protect from HTML/XML. */
.replace(/'/g, '\\x27')
.replace(/"/g, '\\x22')
.replace(/</g, '\\x3C')
.replace(/>/g, '\\x3E')
;
}
Warning! This source code does not check for the validity of the encoded document as a valid plain-text document. However it should never raise an exception (except for out of memory condition): Javascript/JSON source strings are just unrestricted streams of 16-bit code units and do not need to be valid plain-text or are not restricted by HTML/XML document syntax. This means that the code is incomplete, and should also replace:
all other code units representing C0 and C1 controls (with the exception of TAB and LF, handled above, but that may be left intact without substituting them) using the \xNN notation;
all code units that are assigned to non-characters in Unicode, which should be replaced using the \uNNNN notation (for example \uFFFE or \uFFFF);
all code units usable as Unicode surrogates in the range \uD800..\DFFF, like this:
if they are not correctly paired into a valid UTF-16 pair representing a valid Unicode code point in the full range U+0000..U+10FFFF, these surrogate code units should be individually replaced using the notation \uDNNN;
else if if the code point that the code unit pair represents is not valid in Unicode plain-text, because the code point is assigned to a non-character, the two code points should be replaced using the notation \U00NNNNNN;
finally, if the code point represented by the code unit (or the pair of code units representing a code point in a supplementary plane), independently of if that code point is assigned or reserved/unassigned, is also invalid in HTML/XML source documents (see their specification), the code point should be replaced using the \uNNNN notation (if the code point is in the BMP) or the \u00NNNNNN (if the code point is in a supplementary plane) ;
Note also that the 5 last replacements are not really necessary. But it you don't include them, you'll sometimes need to use the <![CDATA[ ... ]]> compatibility "hack" in some cases, such as further including the generated javascript in HTML or XML (see the example below where this "hack" is used in a <script>...</script> HTML element).
The escape() function has the advantage that it does not insert any HTML/XML character reference, the result will be first interpreted by Javascript and it will keep later at runtime the exact string length when the resulting string will be evaluated by the javascript engine. It saves you from having to manage mixed context throughout your application code (see the final section about them and about the related security considerations). Notably because if you use quoteattr() in this context, the javascript evaluated and executed later would have to explicitly handle character references to re-decode them, something that would not be appropriate. Usage cases include:
when the replaced string will be inserted in a generated javascript event handler surrounded by some other HTML code where the javascript fragment will contain attributes surrounded by literal quotes).
when the replaced string will be part of a settimeout() parameter which will be later eval()ed by the Javascript engine.
Example 1 (generating only JavaScript, no HTML content generated):
var title = "It's a \"title\"!";
var msg = "Both strings contain \"quotes\" & 'apostrophes'...";
setTimeout(
'__forceCloseDialog("myDialog", "' +
escape(title) + '", "' +
escape(msg) + '")',
2000);
Exemple 2 (generating valid HTML):
var msg =
"It's just a \"sample\" <test>.\n\tTry & see yourself!";
/* This is similar to the above, but this JavaScript code will be reinserted below: */
var scriptCode =
'alert("' +
escape(msg) + /* important here!, because part of a JS string literal */
'");';
/* First case (simple when inserting in a text element): */
document.write(
'<script type="text/javascript">' +
'\n//<![CDATA[\n' + /* (not really necessary but improves compatibility) */
scriptCode +
'\n//]]>\n' + /* (not really necessary but improves compatibility) */
'</script>');
/* Second case (more complex when inserting in an HTML attribute value): */
document.write(
'<span onclick="' +
quoteattr(scriptCode) + /* important here, because part of an HTML attribute */
'">Click here !</span>');
In this second example, you see that both encoding functions are simultaneously used on the part of the generated text that is embedded in JavaScript literals (using escape()), with the the generated JavaScript code (containing the generated string literal) being itself embedded again and re-encoded using quoteattr(), because that JavaScript code is inserted in an HTML attribute (in the second case).
(3) General considerations for safely encoding texts to embed in syntactic contexts:
So in summary,
the quotattr() function must be used when generating the content of an HTML/XML attribute literal, where the surrounding quotes are added externally within a concatenation to produce a complete HTML/XML code.
the escape() function must be used when generating the content of a JavaScript string constant literal, where the surrounding quotes are added externally within a concatenation to produce a complete HTML/XML code.
If used carefully, and everywhere you will find variable contents to safely insert into another context, and under only these rules (with the functions implemented exactly like above which takes care of "special characters" used in both contexts), you may mix both via multiple escaping, and the transform will still be safe, and will not require additional code to decode them in the application using those literals. Do not use these functions.
Those functions are only safe in those strict contexts (i.e. only HTML/XML attribute values for quoteattr(), and only Javascript string literals for escape()).
There are other contexts using different quoting and escaping mechanisms (e.g. SQL string literals, or Visual Basic string literals, or regular expression literals, or text fields of CSV data files, or MIME header values), which will each require their own distinct escaping function used only in these contexts:
Never assume that quoteattr() or escape() will be safe or will not alter the semantic of the escaped string, before checking first, that the syntax of (respectively) HTML/XML attribute values or JavaScript string literals will be natively understood and supported in those contexts.
For example the syntax of Javascript string literals generated by escape() is also appropriate and natively supported in the two other contexts of string literals used in Java programming source code, or text values in JSON data.
But the reverse is not always true. For example:
Interpreting the encoded escaped literals initially generated for other contexts than Javascript string literals (including for example string literals in PHP source code), is not always safe for direct use as Javascript literals. through the javascript eval() system function to decode those generated string literals that were not escaped using escape(), because those other string literals may contain other special characters generated specifically to those other initial contexts, which will be incorrectly interpreted by Javascript, this could include additional escapes such as "\Uxxxxxxxx", or "\e", or "${var}" and "$$", or the inclusion of additional concatenation operators such as ' + " which changes the quoting style, or of "transparent" delimiters, such as "<!--" and "-->" or "<[DATA[" and "]]>" (that may be found and safe within a different only complex context supporting multiple escaping syntaxes: see below the last paragraph of this section about mixed contexts).
The same will apply to the interpretation/decoding of encoded escaped literals that were initially generated for other contexts that HTML/XML attributes values in documents created using their standard textual representation (for example, trying to interpret the string literals that were generated for embedding in a non standard binary format representation of HTML/XML documents!)
This will also apply to the interpretation/decoding with the javascript function eval() of string literals that were only safely generated for inclusion in HTML/XML attribute literals using quotteattr(), which will not be safe, because the contexts have been incorrectly mixed.
This will also apply to the interpretation/decoding with an HTML/XML text document parser of attribute value literals that were only safely generated for inclusion in a Javascript string literal using escape(), which will not be safe, because the contexts have also been incorrectly mixed.
(4) Safely decoding the value of embedded syntactic literals:
If you want to decode or interpret string literals in contexts were the decoded resulting string values will be used interchangeably and indistinctly without change in another context, so called mixed contexts (including, for example: naming some identifiers in HTML/XML with string literals initially safely encoded with quotteattr(); naming some programming variables for Javascript from strings initially safely encoded with escape(); and so on...), you'll need to prepare and use a new escaping function (which will also check the validity of the string value before encoding it, or reject it, or truncate/simplify/filter it), as well as a new decoding function (which will also carefully avoid interpreting valid but unsafe sequences, only accepted internally but not acceptable for unsafe external sources, which also means that decoding function such as eval() in javascript must be absolutely avoided for decoding JSON data sources, for which you'll need to use a safer native JSON decoder; a native JSON decoder will not be interpreting valid Javascript sequences, such as the inclusion of quoting delimiters in the literal expression, operators, or sequences like "{$var}"), to enforce the safety of such mapping!
These last considerations about the decoding of literals in mixed contexts, that were only safely encoded with any syntax for the transport of data to be safe only a a more restrictive single context, is absolutely critical for the security of your application or web service. Never mix those contexts between the encoding place and the decoding place, if those places do not belong to the same security realm (but even in that case, using mixed contexts is always very dangerous, it is very difficult to track precisely in your code.
For this reason I recommend you never use or assume mixed contexts anywhere in your application: instead write a safe encoding and decoding function for a single precide context that has precise length and validity rules on the decoded string values, and precise length and validity rules on the encoded string string literals. Ban those mixed contexts: for each change of context, use another matching pair of encoding/decoding functions (which function is used in this pair depends on which context is embedded in the other context; and the pair of matching functions is also specific to each pair of contexts).
This means that:
To safely decode an HTML/XML attribute value literal that has been initially encoded with quoteattr(), you must '''not''' assume that it has been encoded using other named entities whose value will depend on a specific DTD defining it. You must instead initialize the HTML/XML parser to support only the few default named character entities generated by quoteattr() and optionally the numeric character entities (which are also safe is such context: the quoteattr() function only generates a few of them but could generate more of these numeric character references, but must not generate other named character entities which are not predefined in the default DTD). All other named entities must be rejected by your parser, as being invalid in the source string literal to decode. Alternatively you'll get better performance by defining an unquoteattr function (which will reject any presence of literal quotes within the source string, as well as unsupported named entities).
To safely decode a Javascript string literal (or JSON string literal) that has been initially encoded with escape(), you must use the safe JavaScript unescape() function, but not the unsafe Javascript eval() function!
Examples for these two associated safe decoding functions follow.
(5) The unquoteattr() function to parse text embedded in HTML/XML text elements or attribute values literals:
function unquoteattr(s) {
/*
Note: this can be implemented more efficiently by a loop searching for
ampersands, from start to end of ssource string, and parsing the
character(s) found immediately after after the ampersand.
*/
s = ('' + s); /* Forces the conversion to string type. */
/*
You may optionally start by detecting CDATA sections (like
`<![CDATA[` ... `]]>`), whose contents must not be reparsed by the
following replacements, but separated, filtered out of the CDATA
delimiters, and then concatenated into an output buffer.
The following replacements are only for sections of source text
found *outside* such CDATA sections, that will be concatenated
in the output buffer only after all the following replacements and
security checkings.
This will require a loop starting here.
The following code is only for the alternate sections that are
not within the detected CDATA sections.
*/
/* Decode by reversing the initial order of replacements. */
s = s
.replace(/\r\n/g, '\n') /* To do before the next replacement. */
.replace(/[\r\n]/, '\n')
.replace(/
/g, '\n') /* These 3 replacements keep whitespaces. */
.replace(/&#1[03];/g, '\n')
.replace(/ /g, '\t')
.replace(/>/g, '>') /* The 4 other predefined entities required. */
.replace(/</g, '<')
.replace(/"/g, '"')
.replace(/&apos;/g, "'")
;
/*
You may add other replacements here for predefined HTML entities only
(but it's not necessary). Or for XML, only if the named entities are
defined in *your* assumed DTD.
But you can add these replacements only if these entities will *not*
be replaced by a string value containing *any* ampersand character.
Do not decode the '&' sequence here !
If you choose to support more numeric character entities, their
decoded numeric value *must* be assigned characters or unassigned
Unicode code points, but *not* surrogates or assigned non-characters,
and *not* most C0 and C1 controls (except a few ones that are valid
in HTML/XML text elements and attribute values: TAB, LF, CR, and
NL='\x85').
If you find valid Unicode code points that are invalid characters
for XML/HTML, this function *must* reject the source string as
invalid and throw an exception.
In addition, the four possible representations of newlines (CR, LF,
CR+LF, or NL) *must* be decoded only as if they were '\n' (U+000A).
See the XML/HTML reference specifications !
*/
/* Required check for security! */
var found = /&[^;]*;?/.match(s);
if (found.length >0 && found[0] != '&')
throw 'unsafe entity found in the attribute literal content';
/* This MUST be the last replacement. */
s = s.replace(/&/g, '&');
/*
The loop needed to support CDATA sections will end here.
This is where you'll concatenate the replaced sections (CDATA or
not), if you have splitted the source string to detect and support
these CDATA sections.
Note that all backslashes found in CDATA sections do NOT have the
semantic of escapes, and are *safe*.
On the opposite, CDATA sections not properly terminated by a
matching `]]>` section terminator are *unsafe*, and must be rejected
before reaching this final point.
*/
return s;
}
Note that this function does not parse the surrounding quote delimiters which are used
to surround HTML attribute values. This function can in fact decode any HTML/XML text element content as well, possibly containing literal quotes, which are safe. It's your reponsability of parsing the HTML code to extract quoted strings used in HTML/XML attributes, and to strip those matching quote delimiters before calling the unquoteattr() function.
(6) The unescape() function to parse text contents embedded in Javascript/JSON literals:
function unescape(s) {
/*
Note: this can be implemented more efficiently by a loop searching for
backslashes, from start to end of source string, and parsing and
dispatching the character found immediately after the backslash, if it
must be followed by additional characters such as an octal or
hexadecimal 7-bit ASCII-only encoded character, or an hexadecimal Unicode
encoded valid code point, or a valid pair of hexadecimal UTF-16-encoded
code units representing a single Unicode code point.
8-bit encoded code units for non-ASCII characters should not be used, but
if they are, they should be decoded into a 16-bit code units keeping their
numeric value, i.e. like the numeric value of an equivalent Unicode
code point (which means ISO 8859-1, not Windows 1252, including C1 controls).
Note that Javascript or JSON does NOT require code units to be paired when
they encode surrogates; and Javascript/JSON will also accept any Unicode
code point in the valid range representable as UTF-16 pairs, including
NULL, all controls, and code units assigned to non-characters.
This means that all code points in \U00000000..\U0010FFFF are valid,
as well as all 16-bit code units in \u0000..\uFFFF, in any order.
It's up to your application to restrict these valid ranges if needed.
*/
s = ('' + s) /* Forces the conversion to string. */
/* Decode by reversing the initial order of replacements */
.replace(/\\x3E/g, '>')
.replace(/\\x3C/g, '<')
.replace(/\\x22/g, '"')
.replace(/\\x27/g, "'")
.replace(/\\x26/g, '&') /* These 5 replacements protect from HTML/XML. */
.replace(/\\u00A0/g, '\u00A0') /* Useful but not absolutely necessary. */
.replace(/\\n/g, '\n')
.replace(/\\t/g, '\t') /* These 2 replacements protect whitespaces. */
;
/*
You may optionally add here support for other numerical or symbolic
character escapes.
But you can add these replacements only if these entities will *not*
be replaced by a string value containing *any* backslash character.
Do not decode to any doubled backslashes here !
*/
/* Required check for security! */
var found = /\\[^\\]?/.match(s);
if (found.length > 0 && found[0] != '\\\\')
throw 'Unsafe or unsupported escape found in the literal string content';
/* This MUST be the last replacement. */
return s.replace(/\\\\/g, '\\');
}
Note that this function does not parse the surrounding quote delimiters which are used
to surround Javascript or JSON string literals. It's your responsibility of parsing the Javascript or JSON source code to extract quoted strings literals, and to strip those matching quote delimiters before calling the unescape() function.
You just need to swap any ' characters with the equivalent HTML entity character code:
data.name.replace(/'/g, "'");
Alternatively, you could create the whole thing using jQuery's DOM manipulation methods:
var row = $("<tr>").append("<td>Name</td><td></td>");
$("<input>", { value: data.name }).appendTo(row.children("td:eq(1)"));
" = " or "
' = '
Examples:
<div attr="Tim "The Toolman" Taylor"
<div attr='Tim "The Toolman" Taylor'
<div attr="Tim 'The Toolman' Taylor"
<div attr='Tim 'The Toolman' Taylor'
In JavaScript strings, you use \ to escape the quote character:
var s = "Tim \"The Toolman\" Taylor";
var s = 'Tim \'The Toolman\' Taylor';
So, quote your attribute values with " and use a function like this:
function escapeAttrNodeValue(value) {
return value.replace(/(&)|(")|(\u00A0)/g, function(match, amp, quote) {
if (amp) return "&";
if (quote) return """;
return " ";
});
}
My answer is partially based on Andy E and I still recommend reading what verdy_p wrote, but here it is
$("<a>", { href: 'very<script>\'b"ad' }).text('click me')[0].outerHTML
Disclaimer: this is answer not to exact question, but just "how to escape attribute"
Using Lodash:
const serialised = _.escape("Here's a string that could break HTML");
// Add it into data-attr in HTML
<a data-value-serialised=" + serialised + " onclick="callback()">link</a>
// and then at JS where this value will be read:
function callback(e) {
$(e.currentTarget).data('valueSerialised'); // with a bit of help from jQuery
const originalString = _.unescape(serialised); // can be used as part of a payload or whatever.
}
The given answers seem rather complicated, so for my use case I have tried the built in encodeURIComponent and decodeURIComponent and have found they worked well, as per comments this does not escape ' but for that you can use escape() and unescape() methods instead.
I think you could do:
var row = "";
row += "<tr>";
row += "<td>Name</td>";
row += "<td><input value=\""+data.name+"\"/></td>";
row += "</tr>";
If you are worried about in data.name which is existing single quote.
In best case, you could create an INPUT element then setValue(data.name) for it.

Categories