Validate an html document that I wrote with document.write()

Validate an html document that I wrote with document.write() - javascript

Ok first off let me state that I know I should never do this under any circumstances for a real site. Ok. That's out of the way.
One of my coworkers was going off that Javascript is not a "real" programming language (his definition of "real" seems to be "it compiles"), because it depends on other languages to do its thing.
I told him I could write a website using nothing but javascript.
I am sure that this can be done, using document.write('') to get the doctype, and some script to create a dom and styles... but the problem is since the page is validated without JS, it can't show him that what the browser is looking at does in fact validate.
Anyone know of a way I can validate the actual source the browser is using instead of the javascript that initially loaded?

If you really want to demonstrate that JS is a "real" language, then you would probably be better off not using a browser as the foundation. A node.js server would allow you to generate an HTML document (using document.write if you like, but DOM is an option (and people have used client side libraries to manipulate a document in node.
Since the JS runs on the server, you can get the actual source from the browser via view-source or point the validator directly at the URI (so long as it is either public or you install a local copy of the validator)

Load the site in Firefox with Firebug installed. Fire up the "HTML" view and rightclick on the <html> node and select "copy HTML".

The closest you get using JavaScript:
var generatedHTML = document.documentElement.innerHTML;
//Retrieves everything within the (missing)HTML tags.
//The only missing parts are DOCTYPE and the <html> itself
var txt = document.createElement("textarea");
txt.style.cssText = "width:99%;height:99%;position:fixed;z-index:999;top:0;left:0";
txt.value = generatedHTML;
txt.ondblclick = function(){this.parentNode.removeChild(this)};
//Adding a simple function to easily remove the textarea once finished
document.body.appendChild(txt);
Bookmarklet (I have slightly adjusted the code to be compact):
javascript:void(function(){var t=document.createElement("textarea");t.style.cssText = "width:99%;height:99%;position:fixed;z-index:999;top:0;left:0";t.value=document.documentElement.innerHTML;txt.ondblclick=function(){t.parentNode.removeChild(t)};document.body.appendChild(t)})()
Focus the generated textarea
Manually add the DOCTYPE + <html> tags
Copy the contents of the textarea to the validator at: http://validator.w3.org/#validate-by-input

Related

forbid javascript in html file

Following problem:
I've given a file with HTML inside but also maybe some script code.
Now I want to edit the file so that no script gets executed when opening the file with a browser.
My question is: What do I have to do?
Which possibilities are there to place a script inside HTML to let it get executed? I know there is the script tag, you could also do it with an iframe but what else is possible?
I definitely want to prevent any kind of script execution. How can I achieve this?

Have a look at an established, well tested HTML filter library such as http://htmlpurifier.org/ which uses a whitelist to filter possibly malicious code. Don't rely on the filtered HTML documents being secure from any javascript though, time and time again browsers are updated and new ways to sneak in javascript are discovered.

Getting access to the original HTML in HtmlUnit HtmlElement?

I am using HtmlUnit to read content from a web site.
Everything works perfectly to the point where I am reading the content with:
HtmlDivision div = page.getHtmlElementById("my-id");
Even div.asText() returns the expected String object, but I want to get the original HTML inside <div>...</div> as a String object. How can I do that?
I am not willing to change HtlmUnit to something else, as the web site expects the client to run JavaScript, and HtmlUnit seems to be capable of doing what is required.

If by original HTML you mean the HTML code that HTMLUnit has already formatted then you can use div.asXml(). Now, if you really are looking for the original HTML the server sent you then you won't find a way to do so (at least up to v2.14).
Now, as a workaround, you could get the whole text of the page that the server sent you with this answer: How to get the pure raw HTML of a page in HTMLUnit while ignoring JavaScript and CSS?
As a side note, you should probably think twice why you need the HTML code. HTMLUnit will let you get the data from the code, so there shouldn't be any need to store the source code but rather the information it is contained in it. Just my 2 cents.

Is there a way to save JavaScript DOM manipulations?

For example if JavaScript performs a bunch of manipulations on a table, the new HTML will not be visible via View -> Source. Is there some way to capture JavaScript manipulations and save everything as a plain HTML document?

The easiest way is to call
document.documentElement.outerHTML
This will get the same output as view source except that it will have all the DOM manipulations visible. It will probably be missing the DOCTYPE however. I realized that the Webkit Console was printing the doctype fine, but there is no public API for getting the DOCTYPE, so you'll have to do that yourself.
A little bookmarklet that you can add to your browser to view the dom:
javascript:(function(){win=open(%22about:blank%22,%20%22View%20DOM%20Source%22,%20%22menubar=no,resizable=yes,status=no,toolbar=no%22);win.document.write(%22<pre>%22%20+%20document.documentElement.outerHTML.split(%22&%22).join(%22&%22).split(%20%22<%22).join(%22<%22).split(%22>%22).join(%22>%22)%20+%20%22</pre>%22);win.focus();})()
(Sorry, can't post a Javascript Link).

You can view it in a DOM inspector like Firebug or IE Developer tools

You could use prompt("test",document.body.innerHTML); and copy & paste the content.

You can access the serialized current state of the table with innerHTML.
var table = document.getElementById("mytable");
table.innerHTML; // "<tbody><tr><td>..."
table.parentNode.innerHTML; // gets the serialization of the whole table, including the <table> tag

I know this question is quite old, but I came across a method to capture and save DOM manipulations using shell.js and thought would list over here in case anyone is interested.
Assuming all DOM manipulations are complete.
After that.
var shell = require('shelljs');
var data = window.document.getElementsByTagName('html')[0].innerHTML;
shell.echo(data).to("your/original/file.html");
That simple.
I found it useful especially in node.js related DOM manipulations with jsdom (which apparently doesn't save DOM manipulations on its own).
Note: This will overwrite original file.

Javascript redirect to dynamically created HTML

I have a javascript routine that dynamically creates an HTML page, complete with it's own head and script tags.
If I take the contents of the string and save it to a file, and view the file in a browser, all is well, but if I try document.write(newHTML), it doesn't behave the same. The javascript in the header of the dynamic newHTML is quite complicated, and I cannot include it here... But please believe me that it works great if I save it to a file, but not if I try to replace the current page with it using document.write. What possible pitfalls could be contributing to this that I'm not considering? Do I possibly need to delete the existing script tags in the existing header first? Do I need to manually re-call onLoad??
Again, it works great when the string is saved to, for example, 'sample.html' and browsed to, but if I set var Samp="[REAL HTML HERE]"; and then say document.write(Samp); document.close(); the javascript routines are not executing correctly.
Any hints as to what I could be missing?
Is there another/better way to dynamically replace the content of the page, other than document.write?
Could I somehow redirect to the new page despite the fact that doesn't exist on disk or on a server, but is only in a string in memory? I would hate to have to upload the entire file to my server simply to re-download again it to view it.
How can I, using javascript, replace the current content of the current page with entirely new content including complex client-side javascripting, dynamically, and always get exactly the same result as if I saved the string to the server as an html file and redirected to it?
How can I 'redirect' to an HTML file that only exists as a client-side string?

You can do this:
var win=window.open("") //open new window and write to it
var html = generate_html();
win.document.write(html)
win.document.close();

Maybe eval() function would help here? It's hard to give ansver without seeing the code.

Never tried this, but i think it should be possible. Some thoughts on what might make it work:
Make sure the document containing your js is sent with the correct headers / mimetype / doctype
Serve the javascript in a valid way, for example by sending a w3c valid page containing the script tag.
Maybe then it works. If not, try to erase the current html before writing the new one.
Also, it might be helpful to look how others managed to accomplish this task. If i remind it correctly, the google page is also essentially a short html page with a bunch of js.

Is there a way to validate the HTML of a page after AJAX operations are performed on it?

I'm writing a web app that inserts and modifies HTML elements via AJAX using JQuery. It works very nicely, but I want to be sure everything is ok under the bonnet. When I inspect the source of the page in IE or Chrome it shows me the original document markup, not what has changed since my AJAX calls.
I love using the WC3 validator to check my markup as it occasionally reminds me that I've forgotten to close a tag etc. How can I use this to check the markup of my page after the original source served from the server has been changed via Javascript?
Thank you.

Use developer tool in chrome to explore the DOM : it will show you all the HTML you've added in javascript.
You can now copy it and paste it in any validator you want.
Or instead of inserting code in JQuery, give it to the console, the browser will then not be able to close tags for you.
console.log(myHTML)

Both previous answers make good points about the fact the browser will 'fix' some of the html you insert into the DOM.
Back to your question, you could add the following to a bookmark in your browser. It will write out the contents of the DOM to a new window, copy and paste it into a validator.
javascript:window.open("").document.open("text/plain", "").write(document.documentElement.outerHTML);

If you're just concerned about well-formedness (missing closing tags and such), you probably just want to check the structure of the chunks AJAX is inserting. (Once it's part of the DOM, it's going to be well-formed... just not necessarily the structure you intended.) The simplest way to do that would probably be to attempt to parse it using an XML library. (one with an HTML mode that can be made strict, if you're not using XHTML)
Actual validation (Testing the "You can't put tag X inside tag Y" rules which browsers generally don't care too much about) is a lot trickier and, depending on how much effort you're willing to put into it, may not be worth the trouble. (Because, if you validate them in isolation, you'll get a lot of "This is just a fragment" false positives)
Whichever you decide to use, you need to grab the AJAX responses before the browser parses them if you want a reliable test result. (While they're still just a string of text rather than a DOM tree)

We Keep Coding

JavaScript is the programming language of the Web.