So I know I can write my own HTML-encoding function like this:
function getHTMLEncode(t) {
return t.toString().replace(/&/g,"&").replace(/"/g,""").replace(/</g,"<").replace(/>/g,">");
}
But I was wondering if there were any native facility for this that is available to XPCOM components. I'm writing a component, not an overlay, so I don't have a DOM around to do tricks like creating a DOM element and setting its innerHTML.
The answer appears to be no - there is no built in function in Firefox to HTML-encode a string from an XPCOM component.
In theory you could create an XML document, use that to create an HTML div, set its text content to your unencoded string and read off its innerHTML. Note that this only encodes lt, gt and amp characters, not quot.
Related
I know that html-entities like or ö or ð can not be used inside a css like this:
div.test:before {
content:"text with html-entities like ` ` or `ö` or `ð`";
}
There is a good question with good answers dealing with this problem: Adding HTML entities using CSS content
But I am reading the strings that are put into the css-content from a server via AJAX. The JavaScript running at the users client receives text with embedded html-entities and creates style-content from it instead of putting it as a text-element into an html-element's content. This method helps against thieves who try to steal my content via copy&paste. Text that is not part of the html-document (but part of css-content) is really hard to copy. This method works fine. There is only this nasty problem with that html-entities.
So I need to convert html-entities into unicode escape-sequences at runtime. I can do this either on the server with a perl-script or on the client with JavaScript, But I don't want to write a subroutine that contains a complete list of all existing named entities. There are more than 2200 named entities in html5, as listed here: http://www.w3.org/TR/2011/WD-html5-20110113/named-character-references.html And I don't want to change my subroutine every time this list gets changed. (Numeric entities are no problem.)
Is there any trick to perfom this conversion with javascript? Maybe by adding, reading and removing content to the DOM? (I am using jQuery)
I've found a solution:
var text = 'Text that contains html-entities';
var myDiv = document.createElement('div');
$(myDiv).html(text);
text = $(myDiv).text();
$('#id_of_a_style-element').html('#id_of_the_protected_div:before{content:"' + text + '"}');
Writing the Question was half way to get this answer. I hope this answer helps others too.
The following code [jsfiddle]...
var div = document.createElement("div");
div.innerHTML = "<foo>This is a <bar /> test. <br> Another test.</foo>";
alert(div.innerHTML);
...shows this parsed structure:
<foo>This is a <bar> test. <br> Another test.</bar></foo>
i.e. the browser knows that <br> has no closing tag but since <bar> is an unknown tag to the browser, it assumes that it needs an closing tag.
I know that the /> (solidus) syntax is ignored in HTML5 and invalid in HTML4, but anyway would like to teach somehow the browser that <bar> does not need an ending tag and I can omit it. Is that possible?
Yes, I'm trying to (temporarily) misuse the HTML code for custom tags and I have my specific reasons to do that. After all, browsers should ignore unknown tags and treat them just like unstyled inline tags, so I should not break anything as long I can make sure the tag names won't ever be used in real HTML standards.
You'd have to use Object.defineProperty on HTMLElement.prototype to override the innerHTML setter and getter with your own innerHTML implementation that treats the elements you want as void. Look here for how innerHTML and the HTML parser is implemented by default.
Note though that Firefox sucks at inheritance when it comes to defining stuff on HTMLElement.prototype where it filters down to HTMLDivElement for example. Things should work fine in Opera though.
In other words, what elements are void depends on the HTML parser. The parser follows this list and innerHTML uses the same rules mostly.
So, in other words, unless you want to create your own innerHTML implementation in JS, you probably should just forget about this.
You can use the live DOM viewer though to show others how certain markup is parsed. You'll then probably notice that same end tags will implicitly close the open element.
I have some outdated innerHTML getter (not setter though) code here that uses a void element list. That may give you some ideas. But, writing a setter implementation might be more difficult.
On the other hand, if you use createElement() and appendChild() etc. instead of innerHTML, you shouldn't have to worry about this and the native innerHTML getter will output the unknown elements with end tags.
Note though, you can treat the unknown element as xml and use XMLSerializer() and DOMParser() to do things:
var x = document.createElement("test");
var serializer = new XMLSerializer();
alert(serializer.serializeToString(x));
var parser = new DOMParser();
var doc = parser.parseFromString("<test/>", "application/xml");
var div = document.createElement("div");
div.appendChild(document.importNode(doc.documentElement, true));
alert(serializer.serializeToString(div));
It's not exactly what you want, but something you can play with. (Test that in Opera instead of Firefox to see the difference with xmlns attributes. Also note that Chrome doesn't do like Opera and Firefox.)
I'm creating a document fragment as follow:
var aWholeHTMLDocument = '<!doctype html> <html><head></head><body><h1>hello world</h1></body></html>';
var frag = document.createDocumentFragment();
frag.innerHTML = aWholeHTMLDocument;
The variable aWholeHTMLDocument contains a long string that is the entire html document of a page, and I want to insert it inside my fragment in order to generate and manipulate the DOM dynamically.
My question is, once I have added that string to frag.innerHTML, shouldn't it load this string and convert it to a DOM object?
After setting innerHTML, shouldn't I have access to the DOM through a property?
I tried frag.childNodes but it doesn't seem to contain anything, and all I want is to just access that newly created DOM.
While DocumentFragment does not support innerHTML, <template> does.
The content property of a <template> element is a DocumentFragment so it behaves the same way. For example, you can do:
var tpl = document.createElement('template');
tpl.innerHTML = '<tr><td>Hello</td><td>world</td></tr>';
document.querySelector('table').appendChild(tpl.content);
The above example is important because you could not do this with innerHTML and e.g. a <div>, because a <div> does not allow <tr> elements as children.
NOTE: A DocumentFragment will still strip the <head> and <body> tags, so it won't do what you want either. You really need to create a whole new Document.
You can't set the innerHTML of a document fragment like you would do with a normal node, that's the problem. Adding a standard div and setting the innerHTML of that is the common solution.
DocumentFragment inherits from Node, but not from Element that contains the .innerHTML property.
In your case I would use the <template> tag. In inherits from Element and it has a nifty HTMLTemplateElement.content property that gives you a DocumentFragment.
Here's a simple helpermethod you could use:
export default function StringToFragment(string) {
var renderer = document.createElement('template');
renderer.innerHTML = string;
return renderer.content;
}
I know this question is old, but I ran into the same issue while playing with a document fragment because I didn't realize that I had to append a div to it and use the div's innerHTML to load strings of HTML in and get DOM Elements from it. I've got other answers on how to do this sort of thing, better suited for whole documents.
In firefox (23.0.1) it appears that setting the innerHTML property of the document fragment doesn't automatically generate the elements. It is only after appending the fragment to the document that the elements are created.
To create a whole document use the document.implementation methods if they're supported. I've had success doing this on Firefox, I haven't really tested it out on other browsers though. You can look at HTMLParser.js in the AtropaToolbox for an example of using document.implementation methods. I've used this bit of script to XMLHttpRequest pages and manipulate them or extract data from them. Scripts in the page are not executed though, which is what I wanted though it may not be what you want. The reason I went with this rather verbose method instead of trying to use the parsing available from the XMLHttpRequest object directly was that I ran into quite a bit of trouble with parsing errors at the time and I wanted to specify that the doc should be parsed as HTML 4 Transitional because it seems to take all kinds of slop and produce a DOM.
There is also a DOMParser available which may be easier for you to use. There is an implementation by Eli Grey on the page at MDN for browsers that don't have the DOMParser but do support document.implementation.createHTMLDocument. The specs for DOMParser specify that scripts in the page are not executed and the contents of noscript tags be rendered.
If you really need scripts enabled in the page you could create an iFrame with 0 height, 0 width, no borders, etc. It would still be in the page but you could hide it pretty well.
There's also the option of using window.open() with document.write, DOM methods or whatever you like. Some browsers even let you do data URI's now.
var x = window.open( 'data:text/html;base64,' + btoa('<h1>hi</h1>') );
// wait for the document to load. It only takes a few milliseconds
// but we'll wait for 5 seconds so you can watch the child window
// change.
setTimeout(function () {
console.log(x.document.documentElement.outerHTML);
x.console.log('this is the console in the child window');
x.document.body.innerHTML = 'oh wow';
}, 5000);
So, you do have a few options for creating whole documents offscreen/hidden and manipulating them, all of which support loading the document from strings.
There's also phantomjs, an awesome project producing a headless scriptable web browser based on webkit. You'll have access to the local filesystem and be able to do pretty much whatever you want. I don't really know what you're trying to accomplish with your full page scripting and manipulation.
For a Firefox add-on, it probably makes more sense to use the document.implementation.createHTMLDocument method, and then go from the DOM that gives you.
With a document fragment you would append elements that you had created with document.createElement('yourElement'). aWholeHTMLDocument is merely text. Also, unless your using frames I'm not sure why you would need to create the whole HTML document just use what is inside the <body> tags.
Use appendChild
see https://developer.mozilla.org/en-US/docs/Web/API/Document/createDocumentFragment
var fragment = document.createDocumentFragment();
... fragment.appendChild(some element);
document.querySelector('blah').appendChild(fragment);
Here is a solution for converting a HTML string into a DOM object:
let markup = '<!doctype html><html><head></head><body><h1>hello world</h1></body></html>';
let range = document.createRange();
let fragment = range.createContextualFragment(markup); //Creates a DOM object
The string does not need to be a complete HTML document.
Use querySelector() to get a child of the document fragment (you probably want the body, or some child of the body). Then get the innerHTML.
document.body.innerHTML = aWholeHTMLDocument.querySelector("body").innerHTML
or
aWholeHTMLDocument.querySelector("body").childNodes;
See https://developer.mozilla.org/en-US/docs/Web/API/DocumentFragment.querySelector
For example in javascript code running on the page we have something like:
var data = '<html>\n <body>\n I want this text ...\n </body>\n</html>';
I'd like to use and at least know if its possible to get the text in the body of that html string without throwing the whole html string into the DOM and selecting from there.
First, it's a string:
var arbitrary = '<html><body>\nSomething<p>This</p>...</body></html>';
Now jQuery turns it into an unattached DOM fragment, applying its internal .clean() method to strip away things like the extra <html>, <body>, etc.
var $frag = $( arbitrary );
You can manipulate this with jQuery functions, even if it's still a fragment:
alert( $frag.filter('p').get() ); // says "<p>This</p>"
Or of course just get the text content as in your question:
alert( $frag.text() ); // includes "This" in my contrived example
// along with line breaks and other text, etc
You can also later attach the fragment to the DOM:
$('div#something_real').append( $frag );
Where possible, it's often a good strategy to do complicated manipulation on fragments while they're unattached, and then slip them into the "real" page when you're done.
The correct answer to this question, in this exact phrasing, is NO.
If you write something like var a = $("<div>test</div>"), jQuery will add that div to the DOM, and then construct a jQuery object around it.
If you want to do without bothering the DOM, you will have to parse it yourself. Regular expressions are your friend.
It would be easiest, I think, to put that into the DOM and get it from there, then remove it from the DOM again.
Jquery itself is full of tricks like this. It's adding all sorts off stuff into the DOM all the time, including when you build something using $('<p>some html</p>'). So if you went down that road you'd still effectively be placing stuff into the DOM then removing it again, temporarily, except that it'd be Jquery doing it.
John Resig (jQuery author) created a pure JS HTML parser that you might find useful. An example from that page:
var dom = HTMLtoDOM("<p>Data: <input disabled>");
dom.getElementsByTagName("body").length == 1
dom.getElementsByTagName("p").length == 1
Buuuut... This question contains a constraint that I think you need to be more critical of. Rather than working around a hard-coded HTML string in a JS variable, can you not reconsider why it's that way in the first place? WHAT is that hard-coded string used for?
If it's just sitting there in the script, re-write it as a proper object.
If it's the response from an AJAX call, there is a perfectly good jQuery AJAX API already there. (Added: although jQuery just returns it as a string without any ability to parse it, so I guess you're back to square one there.)
Before throwing it in the DOM that is just a plain string.
You can sure use REGEX.
I have a generic function that returns URLs. (It's a plugin function that returns URLs to resources [images, stylesheets] within a plugin).
I use GET parameters in those URLs.
If I want to use these URLs within a HTML page, to pass W3C validation, I need to mask ampersands as &
/plugin.php?plugin=xyz&resource=stylesheet&....
but, if I want to use the URL as the "url" parameter for a AJAX call, the ampersand is not interpreted correctly, screwing up my calls.
Can I do something get & work in AJAX calls?
I would very much like to avoid adding parameters to th URL generating function (intendedUse="ajax" or whatever) or manipulating the URL in Javascript, as this plugin model will be re-used many times (and possibly by many people) and I want it as simple as possible.
It seems to me that you're running into the problem of having one piece of your application cross multiple layers. In this case it's the plugin.
A URL as specified by RFC 1738 states that a URL should use a & token to separate key/value pairs from one another. However ampersand is a reserved token in HTML and therefore should be escaped into &. Since escaping the ampersands is an artifact of HTML, your plugin should probably not be escaping them directly. Instead you should have a function or something that escapes a canonical URL so that it can be embedded in HTML markup.
The only place that this is likely to actually happen is if you are:
Using XHTML
Serving it as text/html
Using inline <script>
This is not a happy combination, and the solution is in the spec.
Use external scripts if your script
uses < or & or ]]> or --.
The XHTML media types note includes the same advice, but also provides a workaround if you choose to ignore it.
Try returning JSON instead of just a string, that way your Javascript can read the URL value as an object, and you shouldn't have that issue. Other than that, try simply HTML decoding the string, using something like:
function escapeHTML (str)
{
var div = document.createElement('div');
var text = document.createTextNode(str);
div.appendChild(text);
return div.innerHTML;
};
Obviously you'll want to make sure you remove any reference to DOM elements you might create (which I've not done here to simplify the example).
I use this technique in the AJAX sites I create at my work and have used it many times to solve this problem.
When you have markup of the form:
<a href="?a=1&b=2">
Then the value of the href attribute is ?a=1&b=2. The & is only an escape sequence in HTML/XML and doesn't affect the value of the attribute. This is similar to:
<a href="<>">
Where the value of the attribute is <>.
If, instead, you have code of the form:
<script>
var s = "?a=1&b=2";
</script>
Then you can use a JavaScript function:
<script>
var amp = String.fromCharCode(38);
var s = "?a=1"+amp+"b=2";
</script>
This allows code that would otherwise only be valid HTML or only valid XHTML to be valid in both. (See Dorwald's comments for more info.)