I have a little testcase over at:
http://jsfiddle.net/9xwUx/1/
The code boils down to the following (given a node with id "target"):
var string = '<div class="makeitpink">this should be pink, but is not</div>';
var parser = new DOMParser();
var domNode = parser.parseFromString(string,"text/xml");
document.getElementById("target").appendChild(domNode.firstChild);
If you run the testcase, and then inspect the target node via firebug/chrome web inspector and select any node within the body tag of jsfiddle's iframe, and do "edit as HTML", add a random charachter anywhere as a string [not an attribute to a domnode, to be clear], and "save", the style is applied. but not before that.
To say that i'm confused is an understatement.
Can anybody please clarify what is going on here?
Thanks.
You can change the mime type to text/html and do the following:
var parser = new DOMParser()
var doc = parser.parseFromString(markup, 'text/html')
return doc.body.firstChild
I didn't test on every browser but it works on Chrome and Firefox. I don't see any reason it wouldn't work elsewhere.
a bit late, but the reason is that you have parsed these using the text/xml option, which means that the results are XML nodes, which don't have CSS applied to them. When you right-click and go "edit as HTML" the browser reinterprets them as HTML and the change in the element will cause a redraw, reapplying the CSS.
I've been parsing my using the relatively hack-ish, yet definitely working method of creating a temporary element and manipulating the innerHTML property, making the browser do the parsing instead:
var temp = document.createElement("div")
//assuming you have some HTML partial called 'fragment'
temp.innerHTML = fragment
return temp.firstChild
Which you've noted in your jsfiddle. Basically it boils down to the output of the DOMParser being an instance of XMLDocument when you use the text/xml option.
Related
I need to get links from the javascript. I using jsoup, but it didn't work.
screen
I need to get this link from the source of page. Can anyone help me how to do it?
String url = "http://www.cda.pl/video/149016ec/Rybki-z-ferajny-2004-1080p-Dubbing-pl";
Document doc = Jsoup.connect(url).get();
Elements scriptElements = doc.getElementsByTag("script");
for (Element element :scriptElements ){
for (DataNode node : element.dataNodes()) {
System.out.println(node.getWholeData());
}
System.out.println("-------------------");
}
I marked on screen what urls i want to get.
You can use this code:
String url = "http://www.cda.pl/video/149016ec/Rybki-z-ferajny-2004-1080p-Dubbing-pl";
Document doc = Jsoup.connect(url).get();
//we pick the script node
Element script = doc.select("#player > script").get(0);
String text = script.html();
//then we parse the script for the desired uri
final String prefix = "l='";
int p1 = text.indexOf(prefix) + prefix.length();
int p2 = text.indexOf("'", p1);
String uri = text.substring(p1, p2);
System.out.println(uri);
It will give the desired output:
http://vgra001.cda.pl/lqcc6f8b3c8f76d1b58c1234813fcf67c7.mp4?st=SjoQ8DDcnH7pW8_XNNkA3w&e=1416438406
Please note that this is only a example, you will need to do error checking.
Now the explanation:
You almost had it done, you had the location of the code with the uri, then it was easy to find the important script node: You can see a <div class="wrapqualitybtn"> near the script tag, then you can find the div that contains both your script tag and that div tag (the <div id="player" ... >, the script's tag parent node)
Once you have the script node you only need to do String parsing. Parsing javascript code could be risky because a little change in the code can break your parser, but I think in this case looking for l=' is a solid bet.
A couple of advices:
When a page uses jQuery you can use jQuery in browser console too! If you put $('#player > script')[0] In the browser you will see your script tag.
You can search through the DOM of a page in Developer tools of your browser (F12) and then right click a node and click in Copy CSS Path (in chrome, something similar in firefox) And you will obtain a selector useable in JSoup.
For a more resiliant script parsing you could use regular expressions instead of plain indexOf search.
I hope it will help, excuse me for the verbosity.
In an app I receive some HTML text: since the app can't display (interpret) HTML, I need to remove any HTML tag and entity from the string I receive from the server.
I tried the following, but this one removes HTML tags but not entities (eg. &bnsp;):
stringFromServer.replace(/(<([^>]+)>)/ig,"");
Any help is appreciated.
Disclaimer: I need a pure JavaScript solution (no JQuery, Underscore, etc.).
[UPDATE] I'm reading all your answers now and I forgot to mention that I'm using JavaScript BUT the environment is not a web page, so I have no DOM.
You can try something like this:
var placeholder = document.createElement('div');
placeholder.innerHTML = stringFromServer;
var theText = placeholder.innerText;
.innerText only grabs text content from the element.
However, since it appears you don't have access to any DOM manipulation at all, you're probably going to have to use some kind of HTML parser, like these:
https://www.npmjs.org/package/htmlparser
http://ejohn.org/blog/pure-javascript-html-parser/
A solution without using regexes or phantom divs can be found on Mozilla's MDN.
I put the code in a JSfiddle here:
var sMyString = "<a id=\"a\"><b id=\"b\">hey!<\/b><\/a>";
var oParser = new DOMParser();
var oDOM = oParser.parseFromString(sMyString, "text/xml");
// print the name of the root element or error message
alert(oDOM.documentElement.nodeName == "parsererror" ?
"error while parsing" : oDOM.documentElement.textContent);
Alternatively, parse the HTML snippet in a new document and do your dom manipulations from that (if you'd rather keep it separate from the current document):
var tmpDoc=document.implementation.createHTMLDocument("");
tmpDoc.body.innerHTML="<a href='#'>some text</a><p style=''> more text</p>";
tmpDoc.body.textContent;
tmpDoc.body.textContent evaluates to:
some text more text
stringFromServer.replace(/(<([^>]+)>|&[^;]+;)/ig, "")
I'm working on an add-on using Mozilla's Add-on SDK, and I've come across the need to HTML encode some text (swap out ampersands and special characters for their & equivalents). You can do this in JavaScript using the DOM by calling document.createElement() and adding text to it (provoking the browser to encode the text). Trouble is, in the privileged code (main.js) there is no DOM, so no way to access these features, or even use a library like jQuery. Is there a best practice here? How can I get access to features that would typically require a global document object from main.js?
If I understood correctly, you want to replace HTML entities (& and similar) by the actual characters. And your solution so far was:
var text = "foo&bar";
var element = document.createElement('foo');
element.innerHTML = text;
text = element.textContent;
Instead of using the DOM of your document (and risking running some script unintentionally) you can use DOMParser - it will parse text without any side-effects. Unfortunately, accessing DOMParser from main.js requires chrome authority but other than that the code is straightforward:
var text = "foo&bar";
var {Cc, Ci} = require("chrome");
var parser = Cc["#mozilla.org/xmlextras/domparser;1"]
.createInstance(Ci.nsIDOMParser);
text = parser.parseFromString(text, "text/html").documentElement.textContent;
The following code [jsfiddle]...
var div = document.createElement("div");
div.innerHTML = "<foo>This is a <bar /> test. <br> Another test.</foo>";
alert(div.innerHTML);
...shows this parsed structure:
<foo>This is a <bar> test. <br> Another test.</bar></foo>
i.e. the browser knows that <br> has no closing tag but since <bar> is an unknown tag to the browser, it assumes that it needs an closing tag.
I know that the /> (solidus) syntax is ignored in HTML5 and invalid in HTML4, but anyway would like to teach somehow the browser that <bar> does not need an ending tag and I can omit it. Is that possible?
Yes, I'm trying to (temporarily) misuse the HTML code for custom tags and I have my specific reasons to do that. After all, browsers should ignore unknown tags and treat them just like unstyled inline tags, so I should not break anything as long I can make sure the tag names won't ever be used in real HTML standards.
You'd have to use Object.defineProperty on HTMLElement.prototype to override the innerHTML setter and getter with your own innerHTML implementation that treats the elements you want as void. Look here for how innerHTML and the HTML parser is implemented by default.
Note though that Firefox sucks at inheritance when it comes to defining stuff on HTMLElement.prototype where it filters down to HTMLDivElement for example. Things should work fine in Opera though.
In other words, what elements are void depends on the HTML parser. The parser follows this list and innerHTML uses the same rules mostly.
So, in other words, unless you want to create your own innerHTML implementation in JS, you probably should just forget about this.
You can use the live DOM viewer though to show others how certain markup is parsed. You'll then probably notice that same end tags will implicitly close the open element.
I have some outdated innerHTML getter (not setter though) code here that uses a void element list. That may give you some ideas. But, writing a setter implementation might be more difficult.
On the other hand, if you use createElement() and appendChild() etc. instead of innerHTML, you shouldn't have to worry about this and the native innerHTML getter will output the unknown elements with end tags.
Note though, you can treat the unknown element as xml and use XMLSerializer() and DOMParser() to do things:
var x = document.createElement("test");
var serializer = new XMLSerializer();
alert(serializer.serializeToString(x));
var parser = new DOMParser();
var doc = parser.parseFromString("<test/>", "application/xml");
var div = document.createElement("div");
div.appendChild(document.importNode(doc.documentElement, true));
alert(serializer.serializeToString(div));
It's not exactly what you want, but something you can play with. (Test that in Opera instead of Firefox to see the difference with xmlns attributes. Also note that Chrome doesn't do like Opera and Firefox.)
I'm creating a document fragment as follow:
var aWholeHTMLDocument = '<!doctype html> <html><head></head><body><h1>hello world</h1></body></html>';
var frag = document.createDocumentFragment();
frag.innerHTML = aWholeHTMLDocument;
The variable aWholeHTMLDocument contains a long string that is the entire html document of a page, and I want to insert it inside my fragment in order to generate and manipulate the DOM dynamically.
My question is, once I have added that string to frag.innerHTML, shouldn't it load this string and convert it to a DOM object?
After setting innerHTML, shouldn't I have access to the DOM through a property?
I tried frag.childNodes but it doesn't seem to contain anything, and all I want is to just access that newly created DOM.
While DocumentFragment does not support innerHTML, <template> does.
The content property of a <template> element is a DocumentFragment so it behaves the same way. For example, you can do:
var tpl = document.createElement('template');
tpl.innerHTML = '<tr><td>Hello</td><td>world</td></tr>';
document.querySelector('table').appendChild(tpl.content);
The above example is important because you could not do this with innerHTML and e.g. a <div>, because a <div> does not allow <tr> elements as children.
NOTE: A DocumentFragment will still strip the <head> and <body> tags, so it won't do what you want either. You really need to create a whole new Document.
You can't set the innerHTML of a document fragment like you would do with a normal node, that's the problem. Adding a standard div and setting the innerHTML of that is the common solution.
DocumentFragment inherits from Node, but not from Element that contains the .innerHTML property.
In your case I would use the <template> tag. In inherits from Element and it has a nifty HTMLTemplateElement.content property that gives you a DocumentFragment.
Here's a simple helpermethod you could use:
export default function StringToFragment(string) {
var renderer = document.createElement('template');
renderer.innerHTML = string;
return renderer.content;
}
I know this question is old, but I ran into the same issue while playing with a document fragment because I didn't realize that I had to append a div to it and use the div's innerHTML to load strings of HTML in and get DOM Elements from it. I've got other answers on how to do this sort of thing, better suited for whole documents.
In firefox (23.0.1) it appears that setting the innerHTML property of the document fragment doesn't automatically generate the elements. It is only after appending the fragment to the document that the elements are created.
To create a whole document use the document.implementation methods if they're supported. I've had success doing this on Firefox, I haven't really tested it out on other browsers though. You can look at HTMLParser.js in the AtropaToolbox for an example of using document.implementation methods. I've used this bit of script to XMLHttpRequest pages and manipulate them or extract data from them. Scripts in the page are not executed though, which is what I wanted though it may not be what you want. The reason I went with this rather verbose method instead of trying to use the parsing available from the XMLHttpRequest object directly was that I ran into quite a bit of trouble with parsing errors at the time and I wanted to specify that the doc should be parsed as HTML 4 Transitional because it seems to take all kinds of slop and produce a DOM.
There is also a DOMParser available which may be easier for you to use. There is an implementation by Eli Grey on the page at MDN for browsers that don't have the DOMParser but do support document.implementation.createHTMLDocument. The specs for DOMParser specify that scripts in the page are not executed and the contents of noscript tags be rendered.
If you really need scripts enabled in the page you could create an iFrame with 0 height, 0 width, no borders, etc. It would still be in the page but you could hide it pretty well.
There's also the option of using window.open() with document.write, DOM methods or whatever you like. Some browsers even let you do data URI's now.
var x = window.open( 'data:text/html;base64,' + btoa('<h1>hi</h1>') );
// wait for the document to load. It only takes a few milliseconds
// but we'll wait for 5 seconds so you can watch the child window
// change.
setTimeout(function () {
console.log(x.document.documentElement.outerHTML);
x.console.log('this is the console in the child window');
x.document.body.innerHTML = 'oh wow';
}, 5000);
So, you do have a few options for creating whole documents offscreen/hidden and manipulating them, all of which support loading the document from strings.
There's also phantomjs, an awesome project producing a headless scriptable web browser based on webkit. You'll have access to the local filesystem and be able to do pretty much whatever you want. I don't really know what you're trying to accomplish with your full page scripting and manipulation.
For a Firefox add-on, it probably makes more sense to use the document.implementation.createHTMLDocument method, and then go from the DOM that gives you.
With a document fragment you would append elements that you had created with document.createElement('yourElement'). aWholeHTMLDocument is merely text. Also, unless your using frames I'm not sure why you would need to create the whole HTML document just use what is inside the <body> tags.
Use appendChild
see https://developer.mozilla.org/en-US/docs/Web/API/Document/createDocumentFragment
var fragment = document.createDocumentFragment();
... fragment.appendChild(some element);
document.querySelector('blah').appendChild(fragment);
Here is a solution for converting a HTML string into a DOM object:
let markup = '<!doctype html><html><head></head><body><h1>hello world</h1></body></html>';
let range = document.createRange();
let fragment = range.createContextualFragment(markup); //Creates a DOM object
The string does not need to be a complete HTML document.
Use querySelector() to get a child of the document fragment (you probably want the body, or some child of the body). Then get the innerHTML.
document.body.innerHTML = aWholeHTMLDocument.querySelector("body").innerHTML
or
aWholeHTMLDocument.querySelector("body").childNodes;
See https://developer.mozilla.org/en-US/docs/Web/API/DocumentFragment.querySelector