Get raw HTML from a div using js? - javascript

I'm working on a website where users can create and save their own HTML forms. Instead of inserting form elements and ids one by one in the database I was thinking to use js (preferably jquery) to just get the form's HTML (in code source format) and insert it in a text row via mysql.
For example I have a form in a div
<div class="new_form">
<form>
Your Name:
<input type="text" name="something" />
About You:
<textarea name=about_you></textarea>
</form>
</div>
With js is it possible to get the raw HTML within the "new_form" div?

To get all HTML inside the div
$(".new_form").html()
To get only the text it would be
$(".new_form").text()
You might need to validate the HTML, this question might help you (it's in C# but you can get the idea)

Yes, it is. You use the innerHTML property of the div. Like this:
var myHTML = document.getElementById('new_form').innerHTML;

Note when you use innerHTML or html() as above you won't get the exact raw HTML you put in. You'll get the web browser's idea of what the current document objects should look like serialised into HTML.
There will be browser differences in the exact format that comes out, in areas like name case, spacing, attribute order, which characters are &-escaped, and attribute quoting. IE, in particular, can give you invalid HTML where attributes that should be quoted aren't. IE will also, incorrectly, output the current values of form fields in their value attributes.
You should also be aware of the cross-site-scripting risks involved in letting users submit arbitrary HTML. If you are to make this safe you will need some heavy duty HTML ‘purification’.

Related

what is best practice to show typed data - textarea vs p tag

I have a form that users create a profile. In the form there is a textarea in which hey put their blurb / description whatever.
Later when I want to show the profile in a view only screen, what is best practice? To use a <p> tag or an html tag?
It appears I lose the paragraphs etc when I display the data in a <p> tag.
If, the best practice is to maybe use a readonly textarea for view purposes, how can one dynamically adjust the rows depending on the length of text?
Textareas are designed for accepting input from users, not displaying data back to them.
You need to process the submitted data before displaying it on the user profile. Typically this would involve formatting (like splitting the content on double new lines and then wrapping each part of the result with paragraphs) and implementing protection against XSS attacks.
For formatting, you might consider using a Markdown engine (similar to what Stackoverflow does).
Use the <p> tag.
It is ok, but you should modify the contents, before output at least with two functions:
1) htmlentities() - to protect against XSS attacks and print any html symbols as text
2) nl2br() - to add <br/> tags (html new line) next to \n symbols (new line in text format)
You can look at the <pre> tag as well, then you do not need the nl2br() function

Can I actually use this snippet of code in my upload form?

My website has an upload page which has a form and one of the inputs is meant for tags like this:
<div class="form-group">
<label class='label' for="artwork-tags">Tags</label>
<input class='input' type="text" name="artwork-tags" placeholder="Tags" value='{{ Request::old('artwork-tags') }}'>
#include('partials.invalid', ['field' => 'artwork-tags'])
</div>
I then get the tags on the server side using:
$tagsRaw = $request->input('artwork-tags');
This is where my actual question starts. I found a snippet of code used for tags input styling which separates the written tags in their own containers after you type a comma (,). However, as you can see in the codepen, the tags input is just a:
<div class="tags-input"></div>
No form, no inputs, no submit, no nothing. This is why I'm wondering how would I even get the tags that are written inside that div on the server side?
Codepen - https://codepen.io/juliendargelos/pen/MJjJZm
Looks like a possible loading problem to me. I pulled the code and added it to a page and it worked fine.
So... as this may be more a Javascript question, and less a Laravel one, I'd suggest, you pull up your source on the page and make sure you have both the css and the JS loaded.
This is driven by the JS, but you'll need to handle the incoming items on the server side. Easiest way to set this up is to dump your request object (dd($request->all()) and see what variable is coming in specific to the tags. From the server side, you will need to write code to accept those tags, & see if they already exist. If not, create new one in the database and grab its new ID. If it exists already, get the exiting tag's ID from the database based on a string match.
Something like:
$existing_tags = \App\Tag::pluck('name', 'id')->toArray();
foreach(//on $request->get('tags')){
//Loop to see if name is already in $existing_tags[] - if not, create new tag -- if so, pull ID and attach
}
The answer to your question of no form, no inputs, etc, is likely that you haven't pulled in the whole js lib or any dependencies if the above dump doesn't work. Also, it's not clear from your question what point you are at, but remember, this is only part of the form code. You'll still need to supply the normal Laravel route and the standard rest of the form (e.g. submit button and so forth).

HTML Purifier to clean event attributes

I'm working to address some Stored XSS vulnerabilities and I am using HTMLPurifier. I have an input box on the page and if I type '" onclick="alert(1);" the code is saved to the database and executed on the client. This is happening even after running the input and output through purifier. It seems as if HTMLpurifier only strips these attr when included within html tag. I'm wondering if there is some config for purifier that will strip just the event attr's or any other suggestions on how to cleans these up.
HTML Purifier is purely intended for use on content which will be used as HTML on a page. It is not appropriate for validating content which, for example, will go in an attribute for an HTML element.
You can use some internal APIs of HTML Purifier to validate content for this case. However, for the example quoted in the comments, all you need is htmlspecialchars to do the right thing. The right choice of validator depends on what attribute you put the content in.

How does a browser render this inline JavaScript within an encoded tag?

I was trying to perform a Reflective XSS attack on a tutorial website. The webpage basically consists of a form with an input field and a submit button. On submitting the form, the content of the input field are displayed on the same webpage.
I figured out that the website is blacklisting script tag and some of the JavaScript methods in order to prevent an XSS attack. So, I decided to encode my input and then tried submitting the form. I tried 2 different inputs and one of them worked and the other one didn't.
When I tried:
<body onload="&#97lert('Hi')"></body>
It worked and an alert box was displayed. However, I when encoded some characters in the HTML tag, something like:
&#60body onload="&#97lert('Hi')"&#62&#60/body&#62
It didn't work! It simply printed <body onload="alert('Hi')"></body> as it is on the webpage!
I know that the browsers execute inline JavaScript as they parse an HTML document (please correct me if I'm wrong). But, I'm not able to understand why did the browser show different behavior for the different inputs that I've mentioned.
-------------------------------------------------------------Edit---------------------------------------------------------
I tired the same with a more basic XSS tutorial with no XSS protection. Again:
<script>alert("Hi")</script> -> Worked!
&#60s&#99ript&#62&#97lert("Hi")&#60/s&#99ript&#62 -> Didn't work! (Got printed as string on the Web Page)
So basically, if I encode anything in JavaScript, it works. But if I'm encoding anything that is HTML, it's not executing the JavaScript within that HTML!
I can't come up with words to describe the properly, so i'll just give you an example. Lets say we have this string:
<div>Hello World! <span id="foo">Foobar</span></div>
When this gets parsed, you end up with a div element that contains the text:
Hello World! <span id="foo">Foobar</span>
Note, while there is something that looks like html inside the text, it is still just text, not html. For that text to become html, it would have to be parsed again.
Attributes work a little bit differently, html entities in attributes do get parsed the first time.
tl;dr:
if the service you are using is stripping out tags, there's nothing you can do about it unless the script is poorly written in a way that results in the string getting parsed twice.
Demo: http://jsfiddle.net/W6UhU/ note how after setting the div's inner html equal to it's inner text, the span becomes an html element rather than a string.
When an HTML page says &#60body It treats it the same as if it said <body
That is, it just displays the encoded characters, doesn't parse them as HTML. So you're not creating a new tag with onload attributes http://jsfiddle.net/SSfNw/1/
alert(document.body.innerHTML);
// When an HTML page says <body It treats it the same as if it said <body
So in your case, you're never creating a body tag, just content that ends up getting moved into the body tag http://jsfiddle.net/SSfNw/2/
alert(document.body.innerHTML)
// <body onload="alert('Hi')"></body>
In the case <body onload="&#97lert('Hi')"></body>, the parser is able to create the body tag, once within the body tag, it's also able to create the onload attribute. Once within the attribute, everything gets parsed as a string.

Possible to create custom "DOMs" by loading HTML from string in Javascript?

I'm trying to parse HTML in the browser. The browser receives 2 HTML files as strings, eg. HTML1 and HTML2.
I now need to parse these "documents" just as one would parse the current document. This is why I was wondering if it is possible to create custom documents based on these HTML strings (these strings are provided by the server or user).
So that for example the following would be valid:
$(html1Document).$("#someDivID")...
If anything is unclear, please ask me to clarify more.
Thanks.
var $docFragment = $(htmlString);
$docFragment.find("a"); // all anchors in the HMTL string
Note that this ignores any document structure tags (<html>, <head> and <body>), but any contained tags will be available.
With jQuery you can do this:
$(your_document_string).someParsingMethod().another();
You can always append your html to some hidden div (though innerHTML or jQuery .html(..)). It won't be treated exactly as a new document, but still will be able to search its contents.
It has a few side-effects, though. For example, if your html defines any script tags, they'll be loaded. Also, browser may (and probably will) remove html, body and similar tags.
edit
If you specifically need title and similar tags, you may try iframe loading content from your server.

Categories