What I am trying to do is pass some text (commentText in the code below) to a PHP page using an AJAX request using a POST method mixed up with some flags (what=add in the code below) that tells the PHP page what this text is.
I used the post method, because it allows me to easily recover some information from the url in the PHP page:
the JS Script:
xmlhttp.open("POST", "/comment.php", true);
xmlhttp.setRequestHeader("Content-type","application/x-www-form-urlencoded");
xmlhttp.send("what=add&comment=" + commentText);
the PHP file:
if ($_POST['what'] == "add")
{
print_r($_POST['comment']);
exit();
...
}
The problem I have with this approach is that because of the url-encoding, the formatting of the text (comment in the code above) is gone when I get to the PHP page. For instance if commentText is:
this is some comment.
And I have another line.
In the PHP file I get:
$_POST['comment'] -> "this is some comment. And I have another line."
The \n is gone. So by formatting to be clear, I mean essentially in this particular case, the return line (which is what I am after for now).
What would be a solution to this problem? I guess I can pass the text as plain text, but then loose the ability to use the _POST[] functionality to easily retrieve the different fields. Does that mean I somehow have to pass the data, as a plain text, and encode the fields myself within that text? Is this is the only solution or is there a better one?
So while I really appreciated everyone's input, I thought, for the record, I would no so much add my own answer, but detail a little bit some of the answers and describe what I ended up doing.
Now as Quentin suggests, but without making it really clear, is that regardless of whether the data is passed via the URL or in anything over form, eventually if this data (text) gets displayed to the browser's page, it will of course be HTML text, where in HTML, as he said, every \n, is treated as a space.
Thus regardless of what you are trying to do, if you get something like this in your text editor:
this is a test
on two lines
It will be rendered like that in the browser: this is a test on two lines.
Again, that's true regardless of the way you process the data (in my case passing it on to a PHP page using AJAX and the post method. Eventually the PHP page returns the content of that text to the JS script, and this text becomes HTML, thus the two lines are displayed on one line.
SOLUTION:
I am sure they are other ways, but the one I used was indeed, as suggested, to parse the string and do the formatting of that string using HTML tags. So literarily something like this:
var output_text = '';
for (i = 0; i < text.length; ++i) {
if (text.charAt(i) == '\n') {
output_text += '<br/'>;
}
else if ...
}
Then I passed output_text to the PHP page. If it's not a problem for the PHP text to receive the an HTML encoded string then that's okay, and worse case, if you need to also store the string before the HTML encoding, then you can always pass the 2 strings to the PHP page via POST (the 'text' and 'output_text' in my example).
PS: having the question down voted wasn't necessary, and if when you do so, please explain why.
The \n is gone.
This has nothing to do with URL Encoding.
HTML treats, by default, any kind of whitespace as "a space".
Use a <pre> element, replace the new lines with <br> elements or pick another method to change that.
Try adding css white-space:pre; to target element , or replacing \n with <br> at responseText
var commentText = document.getElementsByTagName("pre")[0].innerText;
xmlhttp = new XMLHttpRequest();
xmlhttp.open("POST", "/echo/html/", true);
xmlhttp.setRequestHeader("Content-type","application/x-www-form-urlencoded");
xmlhttp.onload = function() {
if (xmlhttp.status === 200) {
var text = xmlhttp.responseText;
document.body.innerText = text;
};
};
xmlhttp.send("html=" + encodeURIComponent(commentText));
jsfiddle http://jsfiddle.net/aLesvqfk/1/
Related
Given an arbitrary customer input in a web form for a URL, I want to generate a new HTML document containing that URL within an href. My question is how am I supposed to protect that URL within my HTML.
What should be rendered into the HTML for the following URLs that are entered by an unknown end user:
http://example.com/?file=some_19%affordable.txt
http://example.com/url?source=web&last="f o o"&bar=<
https://www.google.com/url?source=web&sqi=2&url=https%3A%2F%2Ftwitter.com%2F%3Flang%3Den&last=%22foo%22
If we assume that the URLs are already uri-encoded, which I think is reasonable if they are copying it from a URL bar, then simply passing it to attr() produces a valid URL and document that passes the Nu HTML checker at validator.w3.org/nu.
To see it in action, we set up a JS fiddle at https://jsfiddle.net/kamelkev/w8ygpcsz/2/ where replacing the URLs in there with the examples above can show what is happening.
For future reference, this consists of an HTML snippet
<a>My Link</a>
and this JS:
$(document).ready(function() {
$('a').attr('href', 'http://example.com/request.html?data=>');
$('a').attr('href2', 'http://example.com/request.html?data=<');
alert($('a').get(0).outerHTML);
});
So with URL 1, it is not possible to tell if it is URI encoded or not by looking at it mechanically. You can surmise based on your human knowledge that it is not, and is referring to a file named some_19%affordable.txt. When run through the fiddle, it produces
My Link
Which passes the HTML5 validator no problem. It likely is not what the user intended though.
The second URL is clearly not URI encoded. The question becomes what is the right thing to put into the HTML to prevent HTML parsing problems.
Running it thru the fiddle, Safari 10 produces this:
My Link
and pretty much every other browser produces this:
My Link
Neither of these passes the validator. Three complaints are possible: the literal double quote (from un-escaping HTML), the spaces, or the trailing < character (also from un-escaping HTML). It just shows you the first of these it finds. This is clearly not valid HTML.
Two ways to try to fix this are a) html-escape the URL before giving it to attr(). This however results in every & becoming & and the entities such as & and < become double-escaped by attr(), and the URL in the document is entirely inaccurate. It looks like this:
My Link
The other is to URI-encode it before passing to attr(), which does result in a proper validating URL which actually clicks to the intended destination. It looks like this:
My Link
Finally, for the third URL, which is properly URI encoded, the proper HTML that validates does come out.
My Link
and it does what the user would expect to happen when clicked.
Based on this, the algorithm should be:
if url is encoded then
pass as-is to attr()
else
pass encodeURI(url) to attr()
however, the "is encoded" test seems to be impossible to detect in the affirmative based on these two prior discussions (indeed, see example URL 1):
How to find out if string has already been URL encoded?
How to know if a URL is decoded/encoded?
If we bypass the attr() method and forcibly insert the HTML-escaped version of example URL 2 into the document structure, it would look like this:
My Link
Which seemingly looks like valid HTML, yet fails the HTML5 validator because it unescapes to have invalid URL characters. The browsers, however, don't seem to mind it. Unfortunately, if you do any other manipulation of the object, the browser will re-escape all the &'s anyway.
As you can see, this is all very confusing. This is the first time we're using the browser itself to generate the HTML, and we are not sure if we are getting it right. Previously, we did it server side using templates, and only did the HTML-escape filter.
What is the right way to safely and accurately insert user-provided
URL data into an HTML5 document (using JavaScript)?
If you can assume the URL is either encoded or not encoded, you may be able to get away with something along the lines of this. Try to decode the URL, treat an error as the URL not being encoded and you should be left with a decoded URL.
<script>
var inputurl = 'http://example.com/?file=some_19%affordable.txt';
var myurl;
try {
myurl = decodeURI(inputurl);
}
catch(error) {
myurl = inputurl;
}
console.log(myurl);
</script>
I have a web app in Node.js/MySQL where users can upload their stories. They write in an HTML textarea tag. Now I'm trying to get the uploaded from the database using ejs into a script tag so I can do further 'processes'
<script>
var text = "<%=story.Content%>",
result = anchorme.js(text);
document.getElementById('story-content').innerHTML = twemoji.parse(result);
</script>
Problem is if the user hit enter to start on a new line while writing. It'll give me an error here in the text variable and nothing will be printed so how do I fix this?
If you view source on the page so that you can see how the browser receives it, you'll see something like this - note the line feeds:
var text = "I am a story over multiple lines
and that's what javascript gets confused about
because it can't figure out where the closing quotes are.
Let's not even go into what could happen when someone uses quotes!"
So you really just need a way to pass the story content to javascript without it breaking the page. Instead of just pushing out the string like this...
var text = "<%=story.Content%>"
...you can pass stuff to javascript in JSON format as that allows and escapes newlines into \r\n and quotes into \" and so-on. Note the use of <%- rather than <%= here because you don't want it to pass escaped HTML:
var text = <%-JSON.stringify({ content: story.Content })%>.content;
That passes an object with a nicely escaped string in it to your inline script for it to carry on processing.
I'm currently having an issue with a code. In my code, I've got a textarea where the user can enter the title of an article and I would like this article to be only in one row. That's why I wrote a script to prevent users to press the return key. But they could bypass this security, indeed if they copy/past the line break they could enter a line break. So, is there a way to detect line break ? I suppose we can do this with regular expressions and with \n or \n. However I tried this:
var enteredText = $('textarea[name="titleIdea"]').val();
var match = /\r|\n/.exec(enteredText);
if (match) {
alert('working');
}
and it doesn't work for an unknown reason. I think the var enteredText = $('textarea[name="titleIdea"]').val(); doesn't work because when I try to alert() it, it shows nothing. But something strange is that when I do an alert on $('textarea[name="titleIdea"]').val(); and not on the enteredText variable it shows the content.
Have a great day. (sorry for mistakes, I'm french)
if they copy/past the line break they could enter a line break
That's why you shouldn't even worry about preventing them from entering it - just don't save it. Remove it on the blur and input events if you really want to, but the only time it actually matters is before you save it to the database (or whatever you are using).
$('textarea[name="titleIdea"]').on('blur input', function() {
$(this).val($(this).val().replace(/(\r\n|\n|\r)/gm,""));
});
And, as other people have already mentioned, if they can't do line breaks, you shouldn't be using a textarea.
I assume your problem is with the paste event.
If i guessed this is my snippet:
$(function () {
$('textarea[name="titleIdea"]').on('paste', function(e) {
var data;
if (window.clipboardData) { // for IE
data = window.clipboardData.getData('Text');
} else {
data = e.originalEvent.clipboardData.getData('Text');
}
var match = /\r|\n/.exec(data);
if (match) {
alert('working');
console.log(data);
}
})
});
<script src="https://code.jquery.com/jquery-1.12.4.min.js"></script>
<textarea name="titleIdea">
</textarea>
This needs to be handled in the backend. Even if you use the recommended appropriate HTML input type of text (instead of textarea), you still do not remove the possibility of return chars getting saved.
The two other answers use Javascript - which technically is the domain of this question. However, this can not be solved with Javascript! This assumes that the input will always come from the form you created with the JS function working perfectly.
The only way to avoid specific characters being inserted into your database is to parse and clean the data in the backend language prior to inserting into your database.
For example, if you are using PHP, you could run a similar regex that stripped out the \n\r chars before it went into processing.
Javascript only helps the UX in this case (the user sees what they will be saving). But the only way to ensure you have data integrity is to validate it on the server side.
When I allow users to insert data as an argument to the JS innerHTML function like this:
element.innerHTML = “User provided variable”;
I understood that in order to prevent XSS, I have to HTML encode, and then JS encode the user input because the user could insert something like this:
<img src=a onerror='alert();'>
Only HTML or only JS encoding would not help because the .innerHTML method as I understood decodes the input before inserting it into the page. With HTML+JS encoding, I noticed that the .innerHTML decodes only the JS, but the HTML encoding remains.
But I was able to achieve the same by double encoding into HTML.
My question is: Could somebody provide an example of why I should HTML encode and then JS encode, and not double encode in HTML when using the .innerHTML method?
Could somebody provide an example of why I should HTML encode and then
JS encode, and not double encode in HTML when using the .innerHTML
method?
Sure.
Assuming the "user provided data" is populated in your JavaScript by the server, then you will have to JS encode to get it there.
This following is pseudocode on the server-side end, but in JavaScript on the front end:
var userProdividedData = "<%=serverVariableSetByUser %>";
element.innerHTML = userProdividedData;
Like ASP.NET <%= %> outputs the server side variable without encoding. If the user is "good" and supplies the value foo then this results in the following JavaScript being rendered:
var userProdividedData = "foo";
element.innerHTML = userProdividedData;
So far no problems.
Now say a malicious user supplies the value "; alert("xss attack!");//. This would be rendered as:
var userProdividedData = ""; alert("xss attack!");//";
element.innerHTML = userProdividedData;
which would result in an XSS exploit where the code is actually executed in the first line of the above.
To prevent this, as you say you JS encode. The OWASP XSS prevention cheat sheet rule #3 says:
Except for alphanumeric characters, escape all characters less than
256 with the \xHH format to prevent switching out of the data value
into the script context or into another attribute.
So to secure against this your code would be
var userProdividedData = "<%=JsEncode(serverVariableSetByUser) %>";
element.innerHTML = userProdividedData;
where JsEncode encodes as per the OWASP recommendation.
This would prevent the above attack as it would now render as follows:
var userProdividedData = "\x22\x3b\x20alert\x28\x22xss\x20attack\x21\x22\x29\x3b\x2f\x2f";
element.innerHTML = userProdividedData;
Now you have secured your JavaScript variable assignment against XSS.
However, what if a malicious user supplied <img src="xx" onerror="alert('xss attack')" /> as the value? This would be fine for the variable assignment part as it would simply get converted into the hex entity equivalent like above.
However the line
element.innerHTML = userProdividedData;
would cause alert('xss attack') to be executed when the browser renders the inner HTML. This would be like a DOM Based XSS attack as it is using rendered JavaScript rather than HTML, however, as it passes though the server it is still classed as reflected or stored XSS depending on where the value is initially set.
This is why you would need to HTML encode too. This can be done via a function such as:
function escapeHTML (unsafe_str) {
return unsafe_str
.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/\"/g, '"')
.replace(/\'/g, ''')
.replace(/\//g, '/')
}
making your code
element.innerHTML = escapeHTML(userProdividedData);
or could be done via JQuery's text() function.
Update regarding question in comments
I just have one more question: You mentioned that we must JS encode
because an attacker could enter "; alert("xss attack!");//. But if we
would use HTML encoding instead of JS encoding, wouldn't that also
HTML encode the " sign and make this attack impossible because we
would have: var userProdividedData =""; alert("xss attack!");//";
I'm taking your question to mean the following: Rather than JS encoding followed by HTML encoding, why don't we don't just HTML encode in the first place, and leave it at that?
Well because they could encode an attack such as <img src="xx" onerror="alert('xss attack')" /> all encoded using the \xHH format to insert their payload - this would achieve the desired HTML sequence of the attack without using any of the characters that HTML encoding would affect.
There are some other attacks too: If the attacker entered \ then they could force the browser to miss the closing quote (as \ is the escape character in JavaScript).
This would render as:
var userProdividedData = "\";
which would trigger a JavaScript error because it is not a properly terminated statement. This could cause a Denial of Service to the application if it is rendered in a prominent place.
Additionally say there were two pieces of user controlled data:
var userProdividedData = "<%=serverVariableSetByUser1 %>" + ' - ' + "<%=serverVariableSetByUser2 %>";
the user could then enter \ in the first and ;alert('xss');// in the second. This would change the string concatenation into one big assignment, followed by an XSS attack:
var userProdividedData = "\" + ' - ' + ";alert('xss');//";
Because of edge cases like these it is recommended to follow the OWASP guidelines as they are as close to bulletproof as you can get. You might think that adding \ to the list of HTML encoded values solves this, however there are other reasons to use JS followed by HTML when rendering content in this manner because this method also works for data in attribute values:
<a href="javascript:void(0)" onclick="myFunction('<%=JsEncode(serverVariableSetByUser) %>'); return false">
Despite whether it is single or double quoted:
<a href='javascript:void(0)' onclick='myFunction("<%=JsEncode(serverVariableSetByUser) %>"); return false'>
Or even unquoted:
<a href=javascript:void(0) onclick=myFunction("<%=JsEncode(serverVariableSetByUser) %>");return false;>
If you HTML encoded like mentioned in your comment an entity value:
onclick='var userProdividedData ="";"' (shortened version)
the code is actually run via the browser's HTML parser first, so userProdividedData would be
";;
instead of
";
so when you add it to the innerHTML call you would have XSS again. Note that <script> blocks are not processed via the browser's HTML parser, except for the closing </script> tag, but that's another story.
It is always wise to encode as late as possible such as shown above. Then if you need to output the value in anything other than a JavaScript context (e.g. an actual alert box does not render HTML, then it will still display correctly).
That is, with the above I can call
alert(serverVariableSetByUser);
just as easily as setting HTML
element.innerHTML = escapeHTML(userProdividedData);
In both cases it will be displayed correctly without certain characters from disrupting output or causing undesirable code execution.
A simple way to make sure the contents of your element is properly encoded (and will not be parsed as HTML) is to use textContent instead of innerHTML:
element.textContent = "User provided variable with <img src=a>";
Another option is to use innerHTML only after you have encoded (preferably on the server if you get the chance) the values you intend to use.
I have faced this issue in my ASP.NET Webforms application. The fix to this is relatively simple.
Install HtmlSanitizationLibrary from NuGet Package Manager and refer this in your application. At the code behind, please use the sanitizer class in the following way.
For example, if the current code looks something like this,
YourHtmlElement.InnerHtml = "Your HTML content" ;
Then, replace this with the following:
string unsafeHtml = "Your HTML content";
YourHtmlElement.InnerHtml = Sanitizer.GetSafeHtml(unsafeHtml);
This fix will remove the Veracode vulnerability and make sure that the string gets rendered as HTML. Encoding the string at code behind will render it as 'un-encoded string' rather than RAW HTML as it is encoded before the render begins.
I'm not sure what the terminology is - but what I would like to do is this:
Using PHP, I would create a dynamic link for users to click that would indicate where they clicked it from. (I know how to do this)
I just don't know what the URL needs to look like to change the contents of a textarea on the target page.
So something like: http://website.com?document.getElementByName'your-message'.innerHTML='test'
Except clearly this doesn't work. Should I instead just put a variable in the URL (I don't know how to do that either) and have the javacript on the actual target page change the textarea content?
Basically I just need it to put one line of text in it. "I came from page x" I'm also willing to change the textarea to an input field if that makes things easier.
That's called a Query String website.com?variable1=value1&variable2=value2&...
Here's an example with just plain ole Javascript: http://www.bloggingdeveloper.com/post/JavaScript-QueryString-ParseGet-QueryString-with-Client-Side-JavaScript.aspx
Also see: How can I get query string values in JavaScript?
You can format your url like this:
www.example.com/?name=john%20blah&age=27&something=meh
then you can parse out the parameters with javascript
var parameterArray = location.search.slice(1).split("&");
var parameterObject = {};
for(i in parameters) {
parameterObject[parameters[i].split("=")[0]] = parameters[i].split("=")[1]
}
then you can populate the fields with the data
nameTxtBox.value = parameterObject.name;