Why is my textarea manipulating javascript insertion of text? - javascript

I have a textarea that is designed to take in some script tags that are extracted from an AJAX request and upon appending the text to the textarea, it's being mutated to - I'm assuming - conform to the text boxes rules.
In this case, the code to add the script to the text area is as follows:
for( i = 0; i <= ResponseScripts.length; i++ )
{
JavascriptContent.innerText = JavascriptContent.innerText + "\r\n" + ResponseScripts[i].outerHTML;
}
Rather than a += I just tried a more 'direct' route to no avail - but the issues still prevails.
Effectively, what I'm ending up with in the DOM is the following ( literally ):
<br>
"<script type="text/javascript" src="/tinymce/js/tinymce/tinymce.min.js"></script>"
I've tried using a substring of the outerHTML of the element, but just end up with malformed script tags - as if the text box itself is manipulating the text on input.
Is there a way to modify a text boxes behavior to adapt to this? So I end with:
// Literal \r\n*
<script type="text/javascript" src="/tinymce/js/tinymce/tinymce.min.js"></script>

Related

How to work around setting innerHTML causing escape sequences to expand?

I am trying to avoid a cross-site scripting vulnerability on my server. Before any user-inputted string is embedded within HTML or sent to client-side javascript code it is escaped ('<' replaced with '<', '&' replaced with '&', etc.) When embedding into HTML this works mostly fine; the HTML code produced does not contain any HTML elements inside the user-provided string. However, when the client-side javascript inserts HTML into the document, the escape sequences get expanded back into their special characters, which can result in user-inputted tags appearing in the document HTML. Here's approximately what I'm doing, javascript client-side:
// response_data received from XMLHttpRequest and parsed as JSON
var s = "";
for (var i = 0; i < response_data.length; ++i) {
s += "<p>";
s += response_data[i];
s += "</p>";
}
console.log(s);
elem.innerHTML = s;
Suppose the user inputted the string "abcde <script>alert("Hello!");</script>" earlier. Then response_data could be ["abcde <script>alert("Hello!");</script>"]. The print to console shows s to be "<p>abcde <script>alert("Hello!");</script></p>". However, when I assign elem.innerHTML, I can see in Inspect Element that the inner HTML of the element is actually <p>abcde <script>alert("Hello!");</script></p>! I don't think it executed, probably because of some browser security features regarding script tags within p tags, but it's obviously not very good. How do I work around this?
Code snippet (run and inspect element over the text created, it shows a script tag within the p tag):
var div_elem = document.querySelector("div");
div_elem.innerHTML = "<p><script>alert("Hello!");</script></p>";
<html>
<head></head>
<body>
<div></div>
</body>
</html>
Use innerText, it's like innerHTML but it's treated as pure text and won't decode the HTML entities.
Edit:
Set innerHTML to the p tags, then set the actual text using innerText on the tag
elem.innerHTML = "<p></p>";
elem.childNodes[0].innerText = s;

jquery & doc.createElement throws error for <

Both the $("<test") & document.createElement(",test") throws error due to < character associated to the text. I do not want to replace the character & wanted to see if there is option to create dom or jquery object using such text. I know replace will work but since the code is pre-existing & also since code is written such that it assume it can either have the simple text (textnode) or html tag (like span) hence this error is occuring as it fails to check if it is proper self closing html tag.
I am thinking of creating it to xml node & then check if the childnode is textNode or not before trying to create jquery object,however I am looking for suggestion & best approach to tackle such issue. I know replace of < will work & also there is no need to check for attributes of plain text but since the code is dynamic it sometimes retrieves plain text & some time it gives valid html tag that why this issue appears
I am not sure what your exact end goal is, but basically you need to do something like this:
function makeElemHack( str ) {
var div = $("<div>").html(str); //create a div and add the html
var html = div.html(); //read the html
if (!html.length) { //if the html has no length the str was invalid
div.html(str.replace(/</g,"<")); //escape the < like text should be
//div.text(str); //or you can just add it as plain text
}
return div; //with the div wraper
//return div.contents(); //without the div wrapper
}
var bd = $("body");
bd.append( makeElemHack("<p>Hello</p>") );
bd.append( makeElemHack("1<0") );
bd.append( makeElemHack("<booo") );

How can I Strip all regular html tags except <a></a>, <img>(attributes inside) and <br> with javascript?

When a user create a message there is a multibox and this multibox is connected to a design panel which lets users change fonts, color, size etc.. When the message is submited the message will be displayed with html tags if the user have changed color, size etc on the font.
Note: I need the design panel, I know its possible to remove it but this is not the case :)
It's a Sharepoint standard, The only solution I have is to use javascript to strip these tags when it displayed. The user should only be able to insert links, images and add linebreaks.
Which means that all html tags should be stripped except <a></a>, <img> and <br> tags.
Its also important that the attributes inside the the <img> tag that wont be removed. It could be isplayed like this:
<img src="/image/Penguins.jpg" alt="Penguins.jpg" style="margin:5px;width:331px;">
How can I accomplish this with javascript?
I used to use this following codebehind C# code which worked perfectly but it would strip all html tags except <br> tag only.
public string Strip(string text)
{
return Regex.Replace(text, #"<(?!br[\x20/>])[^<>]+>", string.Empty);
}
Any kind of help is appreciated alot
Does this do what you want? http://jsfiddle.net/smerny/r7vhd/
$("body").find("*").not("a,img,br").each(function() {
$(this).replaceWith(this.innerHTML);
});
Basically select everything except a, img, br and replace them with their content.
Smerny's answer is working well except that the HTML structure is like:
var s = '<div><div>Link<span> Span</span><li></li></div></div>';
var $s = $(s);
$s.find("*").not("a,img,br").each(function() {
$(this).replaceWith(this.innerHTML);
});
console.log($s.html());
The live code is here: http://jsfiddle.net/btvuut55/1/
This happens when there are more than two wrapper outside (two divs in the example above).
Because jQuery reaches the most outside div first, and its innerHTML, which contains span has been retained.
This answer $('#container').find('*:not(br,a,img)').contents().unwrap() fails to deal with tags with empty content.
A working solution is simple: loop from the most inner element towards outside:
var $elements = $s.find("*").not("a,img,br");
for (var i = $elements.length - 1; i >= 0; i--) {
var e = $elements[i];
$(e).replaceWith(e.innerHTML);
}
The working copy is: http://jsfiddle.net/btvuut55/3/
with jQuery you can find all the elements you don't want - then use unwrap to strip the tags
$('#container').find('*:not(br,a,img)').contents().unwrap()
FIDDLE
I think it would be better to extract to good tags. It is easy to match a few tags than to remove the rest of the element and all html possibilities. Try something like this, I tested it and it works fine:
// the following regex matches the good tags with attrinutes an inner content
var ptt = new RegExp("<(?:img|a|br){1}.*/?>(?:(?:.|\n)*</(?:img|a|br){1}>)?", "g");
var input = "<this string would contain the html input to clean>";
var result = "";
var match = ptt.exec(input);
while (match) {
result += match;
match = ptt.exec(input);
}
// result will contain the clean HTML with only the good tags
console.log(result);

HTML textarea ignores 1st new line character, why?

Could you explain why this:
<script type="text/javascript">
document.write("<textarea cols='10' rows='10'>" + "\nhello\nbabe\n" + "</textarea>");
</script>
renders a textarea with one new line at the bottom, but NO new line at the top?
Tested IE8, FF11, Safari 5.1, Chrome 24
And it's not a JS issue, even when you write HTML in page you get the same result, i.e.
<textarea cols='10' rows='10'>
hello
babe
</textarea>
The 1st new line is still missing!!!
I need to add another new line at the top in order to show one:
document.write("<textarea cols='10' rows='10'>" + "\n\nhello\nbabe\n" + "</textarea>");
When writing inside of XHTML use proper entities.
<textarea>
hello</textarea>
If a text node begins with white space (space, new line) it will be ignored by HTML parsers. Encoding the new line into a proper HTML entity forces the parser to acknowledge it.
== carriage return
Answering the question "Why". This is specified in HTML 5 specification in the chapter that describes how DOM tree is created from tags found in a HTML document.
In the current HTML 5 living standard it is "12.2 Parsing HTML documents" > "12.2.6 Tree construction" > "12.2.6.4 The rules for parsing tokens in HTML content" > "12.2.6.4.7 The "in body" insertion mode".
(In HTML 5.2 the same section is numbered 8.2.5.4.7).
Scroll down for item "A start tag whose tag name is "textarea""
A start tag whose tag name is "textarea"
Run these steps:
1. Insert an HTML element for the token.
2. If the next token is a U+000A LINE FEED (LF) character token, then ignore that token and move on to the next one. (Newlines at the start of textarea elements are ignored as an authoring convenience.)
3. Switch the tokenizer to the RCDATA state.
...
The algorithm deals with LF characters only, because CR characters are handled earlier.
(Historically, looking into obsolete HTML 4.01 specification:
Its Chapter 17.7 "The TEXTAREA element" has an example that shows that text content for a textarea starts from a new line.
Appendix B.3.1 Line breaks (informative) explains that such behaviour originates from SGML.)
A line break character before </textarea> end tag is not ignored nowadays, in HTML 5.
If possible, change your code to have the textarea pre-defined as html, then write the string like this instead:
HTML:
<textarea cols='10' rows='10' id='test'></textarea>
Script:
document.getElementById('test').innerHTML = '\nhello\nbabe\n';
That should preserve white-space. Optionally you can add a css rule:
textarea {
white-space:pre;
}
A fiddle to play with:
http://jsfiddle.net/RFLwH/1/
Update:
OP tested in IE8 which this does not work - it appear to be a limitation/bug with this browser. IE8 do actually use CR+LF if you manually insert a line-feed at the top, but when set programmatic this is completely ignored by the browser.
Add this to the html to do a test:
<span onclick="this.innerHTML = escape(document.getElementById('test').innerHTML);">
Get textarea content
</span>
You can see the string returned is:
%0D%0Ahello%20babe%20
meaning the CR+LF is there (the other line-feeds are converted to spaces - but inserting a space at the beginning does not help either). I guess there is nothing you can do about this behavior; the browser is obsolete (but unfortunately still widely used so this can be a problem for those users if this is essential).
Add a whitespace before the first "\n" like this :
<script type="text/javascript">
document.write("<textarea cols='10' rows='10'>" + " \nhello\nbabe\n" + "</textarea>");
</script>
or
<textarea cols='10' rows='10'> <!-- whitespace here -->
hello
babe
</textarea>
otherwise it won't work.
Update:
Later in your server side, you can remove the first whitespace by doing
$str = ltrim ($str,' ');
or
$str2 = substr($str, 4);
if it is PHP.
It should be a \n\r at the top:
document.write("<textarea cols='10' rows='10'>" + "\n\rhello\nbabe\n" + "</textarea>");
jsbin
Finally i finished with this server-side solution:
to double leading(only first!) nl symbol before output it in textarea:
if(str_startswith("\r\n",$value))
{
$value = "\r\n".$value;
}elseif(str_startswith("\n",$value))
{
$value = "\n".$value;
}elseif(str_startswith("\r",$value))
{
$value = "\r".$value;
}
function str_startswith($start, $string)
{
if(mb_strlen($start)>mb_strlen($string))
return FALSE;
return (mb_substr($string, 0,mb_strlen($start))==$start);
}

Indesign Server Scripting Textarea.Contents

I am creating a Javascript script to use with Indesign Server (CS3).
Trying to find all textareas within a document and find the contents of them.
I can easily loop through all the textareas, using the functions provided by Adobe.
However, when i try to get the content of the TextArea, I only get the content that is visible within that textarea, not the out port text.
document.TextAreas[0].contents
In other words, if the Indesign document contains a textarea with a little plus sign, indicating that there is more text, but it did not fit, then my script does not return the hidden text.
Or, to put it another words again. Can i get the entire content when the 'overflows' property of the 'textarea' is false;
Full code:
function FindAllTextBoxes(){
var alertMessage;
for (var myCounter = myDoc.textFrames.length-1; myCounter >= 0; myCounter--) {
var myTextFrame = myDoc.textFrames[myCounter];
alertMessage += "\nTextbox content: " + myTextFrame.contents;
alertMessage += "\nOverflow:" + myTextFrame.overflows;
alert(alertMessage);
}
}
How can I read the full content of the Textarea?
A little late, but just came across this. This is tested with InDesign CS5 - the following line will get all of the overflown text from a TextFrame:
var content = myTextFrame.parentStory.contents;
Hope this helps!

Categories