Making a URL W3C valid AND work in Ajax Request - javascript

I have a generic function that returns URLs. (It's a plugin function that returns URLs to resources [images, stylesheets] within a plugin).
I use GET parameters in those URLs.
If I want to use these URLs within a HTML page, to pass W3C validation, I need to mask ampersands as &
/plugin.php?plugin=xyz&resource=stylesheet&....
but, if I want to use the URL as the "url" parameter for a AJAX call, the ampersand is not interpreted correctly, screwing up my calls.
Can I do something get & work in AJAX calls?
I would very much like to avoid adding parameters to th URL generating function (intendedUse="ajax" or whatever) or manipulating the URL in Javascript, as this plugin model will be re-used many times (and possibly by many people) and I want it as simple as possible.

It seems to me that you're running into the problem of having one piece of your application cross multiple layers. In this case it's the plugin.
A URL as specified by RFC 1738 states that a URL should use a & token to separate key/value pairs from one another. However ampersand is a reserved token in HTML and therefore should be escaped into &. Since escaping the ampersands is an artifact of HTML, your plugin should probably not be escaping them directly. Instead you should have a function or something that escapes a canonical URL so that it can be embedded in HTML markup.

The only place that this is likely to actually happen is if you are:
Using XHTML
Serving it as text/html
Using inline <script>
This is not a happy combination, and the solution is in the spec.
Use external scripts if your script
uses < or & or ]]> or --.
The XHTML media types note includes the same advice, but also provides a workaround if you choose to ignore it.

Try returning JSON instead of just a string, that way your Javascript can read the URL value as an object, and you shouldn't have that issue. Other than that, try simply HTML decoding the string, using something like:
function escapeHTML (str)
{
var div = document.createElement('div');
var text = document.createTextNode(str);
div.appendChild(text);
return div.innerHTML;
};
Obviously you'll want to make sure you remove any reference to DOM elements you might create (which I've not done here to simplify the example).
I use this technique in the AJAX sites I create at my work and have used it many times to solve this problem.

When you have markup of the form:
<a href="?a=1&b=2">
Then the value of the href attribute is ?a=1&b=2. The & is only an escape sequence in HTML/XML and doesn't affect the value of the attribute. This is similar to:
<a href="<>">
Where the value of the attribute is <>.
If, instead, you have code of the form:
<script>
var s = "?a=1&b=2";
</script>
Then you can use a JavaScript function:
<script>
var amp = String.fromCharCode(38);
var s = "?a=1"+amp+"b=2";
</script>
This allows code that would otherwise only be valid HTML or only valid XHTML to be valid in both. (See Dorwald's comments for more info.)

Related

why javascript protocol decode the URL automatically?

I am confused about why javascript protocol decodes the encoded URL, for example:
press
function myFunction(id)
{
alert(id); //it will generate =cDO4w67epn64o76
}
I am using these strings in encryption and decryption.
Please provide me with a real reason and a solution (the reason is very important for me), I know I can replace the (=) sign, but I am afraid of the rest of the encoded strings to be decoded also by the wrapper.
Note: in php, the GET, REQUEST Global variables, the url is decoded automatically.
Because it's in an href attribute, where URLs are expected, so the browser is "normalizing" the URI-encoding of the "URL" (which is using the javascript pseudo-scheme).
You can put it in a different attribute and then get that, like so:
function myFunction(element) {
console.log(element.getAttribute("data-value")); //it will generate =cDO4w67epn64o76
}
press
...although I discourage using onclick="..." handlers. Instead:
function linkHandler(e) {
console.log(this.getAttribute("data-value"));
e.preventDefault();
}
var links = document.querySelectorAll("a[data-value]");
Array.prototype.forEach.call(
links,
function(link) {
link.addEventListener("click", linkHandler, false);
}
);
press
A URL using the javascript: scheme is still a URL.
You've attempted to store a URL in a JavaScript string in a URL.
When decoding the outside URL into JavaScript, the percent encoded characters are decoded.
To do what you are attempting you need to convert any special characters (like %) in the JavaScript to URL encoding:
test
You should only use this for creating bookmarklets though.
If you want to run JavaScript when something is clicked, then use a click event handler. You could use an onclick attribute, but addEventListener in the modern approach (for values of modern equal to "not the 1990s").
Likewise, if you aren't linking somewhere, don't use a link. Use a button instead.

What is the right way to safely and accurately insert user-provided URL data into an HTML5 document?

Given an arbitrary customer input in a web form for a URL, I want to generate a new HTML document containing that URL within an href. My question is how am I supposed to protect that URL within my HTML.
What should be rendered into the HTML for the following URLs that are entered by an unknown end user:
http://example.com/?file=some_19%affordable.txt
http://example.com/url?source=web&last="f o o"&bar=<
https://www.google.com/url?source=web&sqi=2&url=https%3A%2F%2Ftwitter.com%2F%3Flang%3Den&last=%22foo%22
If we assume that the URLs are already uri-encoded, which I think is reasonable if they are copying it from a URL bar, then simply passing it to attr() produces a valid URL and document that passes the Nu HTML checker at validator.w3.org/nu.
To see it in action, we set up a JS fiddle at https://jsfiddle.net/kamelkev/w8ygpcsz/2/ where replacing the URLs in there with the examples above can show what is happening.
For future reference, this consists of an HTML snippet
<a>My Link</a>
and this JS:
$(document).ready(function() {
$('a').attr('href', 'http://example.com/request.html?data=>');
$('a').attr('href2', 'http://example.com/request.html?data=<');
alert($('a').get(0).outerHTML);
});
So with URL 1, it is not possible to tell if it is URI encoded or not by looking at it mechanically. You can surmise based on your human knowledge that it is not, and is referring to a file named some_19%affordable.txt. When run through the fiddle, it produces
My Link
Which passes the HTML5 validator no problem. It likely is not what the user intended though.
The second URL is clearly not URI encoded. The question becomes what is the right thing to put into the HTML to prevent HTML parsing problems.
Running it thru the fiddle, Safari 10 produces this:
My Link
and pretty much every other browser produces this:
My Link
Neither of these passes the validator. Three complaints are possible: the literal double quote (from un-escaping HTML), the spaces, or the trailing < character (also from un-escaping HTML). It just shows you the first of these it finds. This is clearly not valid HTML.
Two ways to try to fix this are a) html-escape the URL before giving it to attr(). This however results in every & becoming & and the entities such as & and < become double-escaped by attr(), and the URL in the document is entirely inaccurate. It looks like this:
My Link
The other is to URI-encode it before passing to attr(), which does result in a proper validating URL which actually clicks to the intended destination. It looks like this:
My Link
Finally, for the third URL, which is properly URI encoded, the proper HTML that validates does come out.
My Link
and it does what the user would expect to happen when clicked.
Based on this, the algorithm should be:
if url is encoded then
pass as-is to attr()
else
pass encodeURI(url) to attr()
however, the "is encoded" test seems to be impossible to detect in the affirmative based on these two prior discussions (indeed, see example URL 1):
How to find out if string has already been URL encoded?
How to know if a URL is decoded/encoded?
If we bypass the attr() method and forcibly insert the HTML-escaped version of example URL 2 into the document structure, it would look like this:
My Link
Which seemingly looks like valid HTML, yet fails the HTML5 validator because it unescapes to have invalid URL characters. The browsers, however, don't seem to mind it. Unfortunately, if you do any other manipulation of the object, the browser will re-escape all the &'s anyway.
As you can see, this is all very confusing. This is the first time we're using the browser itself to generate the HTML, and we are not sure if we are getting it right. Previously, we did it server side using templates, and only did the HTML-escape filter.
What is the right way to safely and accurately insert user-provided
URL data into an HTML5 document (using JavaScript)?
If you can assume the URL is either encoded or not encoded, you may be able to get away with something along the lines of this. Try to decode the URL, treat an error as the URL not being encoded and you should be left with a decoded URL.
<script>
var inputurl = 'http://example.com/?file=some_19%affordable.txt';
var myurl;
try {
myurl = decodeURI(inputurl);
}
catch(error) {
myurl = inputurl;
}
console.log(myurl);
</script>

Javascript regex to replace ampersand in all links href on a page

I've been going through and trying to find an answer to this question that fits my need but either I'm too noob to make other use cases work, or their not specific enough for my case.
Basically I want to use javascript/jQuery to replace any and all ampersands (&) on a web page that may occur in a links href with just the word "and". I've tried a couple different versions of this with no luck
var link = $("a").attr('href');
link.replace(/&/g, "and");
Thank you
Your current code replaces the text of the element within the jQuery object, but does not update the element(s) in the DOM.
You can instead achieve what you need by providing a function to attr() which will be executed against all elements in the matched set. Try this:
$("a").attr('href', function(i, value) {
return value.replace(/&/g, "and");
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
link
link
Sometimes when replacing &, I've found that even though I replaced &, I still have amp;. There is a fix to this:
var newUrl = "#Model.UrlToRedirect".replace(/&/gi, '%').replace(/%amp;/gi, '&');
With this solution you replace & twice and it will work. In my particular problem in an MVC app, window.location.href = #Model.UrlToRedirect, the url was already partially encoded and had a query string. I tried encoding/decoding, using Uri as the C# class, escape(), everything before coming up with this solution. The problem with using my above logic is other things could blow up the query string later. One solution is to put a hidden field or input on the form like this:
<input type="hidden" value="#Model.UrlToRedirect" id="url-redirect" />
then in your javascript:
window.location.href = document.getElementById("url-redirect").value;
in this way, javascript won't take the c# string and change it.

What is dangerous in an HTML textarea?

I'm working on the profile section of my users.
They can define several things including their description ("About me") in a textarea, with a max of 400 characters.
In this description, I want to let my users use Font Awesome and Bootstrap icons. I also let them use JS tags (but not PHP ones). I guess this is pretty dangerous, therefore I wanted to know :
Is letting people use JS tags dangerous ? I know I must block functions like $.ajax but maybe there are somethings else.
Does a function which blocks string containing JS or PHP code exist in JS or jQuery ?
Is letting people use HTML tags and attributes dangerous for my site ?
Thank you !
As long as you escape all the tags before saving the form, I think it's all good.
You can do this with the following function:
function escapeTags(value){
return $('<div/>').text(value).html();
}
For eg. the following <script>alert("hello world")</script> will become <script>alert("hello world")</script>.
Also, you can do this with javascript only:
function htmlEntities(str) {
return String(str).replace(/&/g, '&').replace(/</g, '<').replace(/>/g, '>').replace(/"/g, '"');
}
Source: https://css-tricks.com/snippets/javascript/htmlentities-for-javascript/
...if you take a look at comments you'll se that there's also a function that reverse my escapeTags function
// Encode/decode htmlentities
function krEncodeEntities(s){
return $j("<div/>").text(s).html();
}
function krDencodeEntities(s){
return $j("<div/>").html(s).text();
}
Yes, it's in fact very dangerous, and should only be allowed on a very limited set of "tags" or function calls.
Several systems like bbcode exist to address specifically these issues. i suggest implementing one of those.
They're easier to validate, and it is fairly easy to add new features to them.
It should be less work than validating actual js and php code and figuring out whether or not it is trying to do something malicious.

JavaScript multiline strings and templating?

I have been wondering if there is a way to define multiline strings in JavaScript like you can do in languages like PHP:
var str = "here
goes
another
line";
Apparently this breaks up the parser. I found that placing a backslash \ in front of the line feed solves the problem:
var str = "here\
goes\
another\
line";
Or I could just close and reopen the string quotes again and again.
The reason why I am asking because I am making JavaScript based UI widgets that utilize HTML templates written in JavaScript. It is painful to type HTML in strings especially if you need to open and close quotes all the time. What would be a good way to define HTML templates within JavaScript?
I am considering using separate HTML files and a compilation system to make everything easier, but the library is distributed among other developers so that HTML templates have to be easy to include for the developers.
No thats basically what you have to do to do multiline strings.
But why define the templates in javascript anwyay? why not just put them into a file and have a ajax call load them up in a variable when you need them?
For instantce (using jquery)
$.get('/path/to/template.html', function(data) {
alert(data); //will alert the template code
});
#slebetman, Thanks for the detailed example.
Quick comment on the substitute_strings function.
I had to revise
str.replace(n,substitutions[n]);
to be
str = str.replace(n,substitutions[n]);
to get it to work. (jQuery version 1.5? - it is pure javascript though.)
Also when I had below situation in my template:
$CONTENT$ repeated twice $CONTENT$ like this
I had to do additional processing to get it to work.
str = str.replace(new RegExp(n, 'g'), substitutions[n]);
And I had to refrain from $ (regex special char) as the delimiter and used # instead.
Thought I would share my findings.
There are several templating systems in javascript. However, my personal favorite is one I developed myself using ajax to fetch XML templates. The templates are XML files which makes it easy to embed HTML cleanly and it looks something like this:
<title>This is optional</title>
<body><![CDATA[
HTML content goes here, the CDATA block prevents XML errors
when using non-xhtml html.
<div id="more">
$CONTENT$ may be substituted using replace() before being
inserted into $DOCUMENT$.
</div>
]]></body>
<script><![CDATA[
/* javascript code to be evaled after template
* is inserted into document. This is to get around
* the fact that this templating system does not
* have its own turing complete programming language.
* Here's an example use:
*/
if ($HIDE_MORE$) {
document.getElementById('more').display = 'none';
}
]]></script>
And the javascript code to process the template goes something like this:
function insertTemplate (url_to_template, insertion_point, substitutions) {
// Ajax call depends on the library you're using, this is my own style:
ajax(url_to_template, function (request) {
var xml = request.responseXML;
var title = xml.getElementsByTagName('title');
if (title) {
insertion_point.innerHTML += substitute_strings(title[0],substitutions);
}
var body = xml.getElementsByTagName('body');
if (body) {
insertion_point.innerHTML += substitute_strings(body[0],substitutions);
}
var script = xml.getElementsByTagName('script');
if (script) {
eval(substitute_strings(script[0],substitutions));
}
});
}
function substitute_strings (str, substitutions) {
for (var n in substitutions) {
str.replace(n,substitutions[n]);
}
return str;
}
The way to call the template would be:
insertTemplate('http://path.to.my.template', myDiv, {
'$CONTENT$' : "The template's content",
'$DOCUMENT$' : "the document",
'$HIDE_MORE$' : 0
});
The $ sign for substituted strings is merely a convention, you may use % of # or whatever delimiters you prefer. It's just there to make the part to be substituted unambiguous.
One big advantage to using substitutions on the javascript side instead of server side processing of the template is that this allows the template to be plain static files. The advantage of that (other than not having to write server side code) is that you can then set the caching policy for the template to be very aggressive so that the browser only needs to fetch the template the first time you load it. Subsequent use of the template would come from cache and would be very fast.
Also, this is a very simple example of the implementation to illustrate the mechanism. It's not what I'm using. You can modify this further to do things like multiple substitution, better handling of script block, handle multiple content blocks by using a for loop instead of just using the first element returned, properly handling HTML entities etc.
The reason I really like this is that the HTML is simply HTML in a plain text file. This avoids quoting hell and horrible string concatenation performance issues that you'll usually find if you directly embed HTML strings in javascript.
I think I found a solution I like.
I will store templates in files and fetch them using AJAX. This works for development stage only. For production stage, the developer has to run a compiler once that compiles all templates with the source files. It also compiles JavaScript and CSS to be more compact and it compiles them to a single file.
The biggest problem now is how to educate other developers doing that. I need to build it so that it is easy to do and understand why and what are they doing.
You could also use \n to generate newlines. The html would however be on a single line and difficult to edit. But if you generate the JS using PHP or something it might be an alternative

Categories