I have one code
<input type="button" id="btnGetText" value="Get Text"/>
<div id="myDiv">
<p>First Lineα and text.</p>
<p>Second Line <b>Bolded</b></p>
</div>
If I am alerting the html of #myDiv is coming in alert but &alpha not coming.
If I am alerting &alpha its giving α
here is the fiddle attached
http://jsfiddle.net/getyoursuresh/hv2ed/
You are not alerting the original HTML. You are alerting a serialization of the DOM to HTML.
α and α are equivalent in HTML, so either are perfectly valid when designing a serialization algorithm. HTML 5 describes the algorithm browsers are supposed to use.
As described in the section starting "Escaping a string", non-breaking spaces should be serialized to character references while alphas are not.
If you want to work with your original HTML then you'll need to use XMLHttpRequest to refetch the original document from the server and then parse the raw text of the response yourself.
Edited:
Alpha character can be used directly in html although it has an html encoding, unlike the unbreakable space which has to remain in html encoding even after processing it because that character only exists in html encoding.
Related
The W3C validator (Wikipedia) doesn't like self-closing tags (those that end with “/>”) on non-void elements. (Void elements are those that may not ever contain any content.) Are they still valid in HTML5?
Some examples of accepted void elements:
<br />
<img src="" />
<input type="text" name="username" />
Some examples of rejected non-void elements:
<div id="myDiv" />
<span id="mySpan" />
<textarea id="someTextMessage" />
Note:
The W3C validator actually accepts void self-closing tags: the author originally had a problem because of a simple typo (\> instead of />); however, self-closing tags are not 100% valid in HTML5 in general, and the answers elaborate on the issue of self-closing tags across various HTML flavors.
(Theoretically) in HTML 4, <foo / (yes, with no > at all) means <foo> (which leads to <br /> meaning <br>> (i.e. <br>>) and <title/hello/ meaning <title>hello</title>). I use the term "theoretically" because this is an SGML rule that browsers did a very poor job of supporting. There was so little support (I only ever saw it work in emacs-w3m) that the spec advises authors to avoid the syntax.
In XHTML, <foo /> means <foo></foo>. This is an XML rule that applies to all XML documents. That said, XHTML is often served as text/html which (historically at least) gets processed by browsers using a different parser than documents served as application/xhtml+xml. The W3C provides compatibility guidelines to follow for XHTML as text/html. (Essentially: Only use self-closing tag syntax when the element is defined as EMPTY (and the end tag was forbidden in the HTML spec)).
In HTML5, the meaning of <foo /> depends on the type of element:
On HTML elements that are designated as void elements (essentially "An element that existed before HTML5 and which was forbidden to have any content"), end tags are simply forbidden. The slash at the end of the start tag is allowed, but has no meaning. It is just syntactic sugar for people (and syntax highlighters) that are addicted to XML.
On other HTML elements, the slash is an error, but error recovery will cause browsers to ignore it and treat the tag as a regular start tag. This will usually end up with a missing end tag causing subsequent elements to be children instead of siblings.
Foreign elements (imported from XML applications such as SVG) treat it as self-closing syntax.
A self-closing div will not validate. This is because a div is a normal element, not a void element.
According to the HTML5 spec, tags that cannot have any contents (known as void elements) can be self-closing*. This includes the following tags:
area, base, br, col, embed, hr, img, input,
link, meta, param, source, track, wbr
The "/" is completely optional on the above tags, however, so <img/> is not different from <img>, but <img></img> is invalid.
*Note: foreign elements can also be self-closing, but I don't think that's in scope for this answer.
In practice, using self-closing tags in HTML should work just like you'd expect. But if you are concerned about writing valid HTML5, you should understand how the use of such tags behaves within the two different two syntax forms you can use. HTML5 defines both an HTML syntax and an XHTML syntax, which are similar but not identical. Which one is used depends on the media type sent by the web server.
More than likely, your pages are being served as text/html, which follows the more lenient HTML syntax. In these cases, HTML5 allows certain start tags to have an optional / before it's terminating >. In these cases, the / is optional and ignored, so <hr> and <hr /> are identical. The HTML spec calls these "void elements", and gives a list of valid ones. Strictly speaking, the optional / is only valid within the start tags of these void elements; for example, <br /> and <hr /> are valid HTML5, but <p /> is not.
The HTML5 spec makes a clear distinction between what is correct for HTML authors and for web browser developers, with the second group being required to accept all kinds of invalid "legacy" syntax. In this case, it means that HTML5-compliant browsers will accept illegal self-closed tags, like <p />, and render them as you probably expect. But for an author, that page would not be valid HTML5. (More importantly, the DOM tree you get from using this kind of illegal syntax can be seriously screwed up; self-closed <span /> tags, for example, tend to mess things up a lot).
(In the unusual case that your server knows how to send XHTML files as an XML MIME type, the page needs to conform to the XHTML DTD and XML syntax. That means self-closing tags are required for those elements defined as such.)
HTML5 basically behaves as if the trailing slash is not there. There is no such thing as a self-closing tag in HTML5 syntax.
Self-closing tags on non-void elements like <p/>, <div/> will not work at all. The trailing slash will be ignored, and these will be treated as opening tags. This is likely to lead to nesting problems.
This is true regardless of whether there is whitespace in front of the slash: <p /> and <div /> also won't work for the same reason.
Self-closing tags on void elements like <br/> or <img src="" alt=""/> will work, but only because the trailing slash is ignored, and in this case that happens to result in the correct behaviour.
The result is, anything that worked in your old "XHTML 1.0 served as text/html" will continue to work as it did before: trailing slashes on non-void tags were not accepted there either whereas the trailing slash on void elements worked.
One more note: it is possible to represent an HTML5 document as XML, and this is sometimes dubbed "XHTML 5.0". In this case the rules of XML apply and self-closing tags will always be handled. It would always need to be served with an XML mime type.
Self-closing tags are valid in HTML5, but not required.
<br> and <br /> are both fine.
I would be very careful with self closing tags as this example demonstrates:
var a = '<span/><span/>';
var d = document.createElement('div');
d.innerHTML = a
console.log(d.innerHTML) // "<span><span></span></span>"
My gut feeling would have been <span></span><span></span> instead
However -just for the record- this is invalid:
<address class="vcard">
<svg viewBox="0 0 800 400">
<rect width="800" height="400" fill="#000">
</svg>
</address>
And a slash here would make it valid again:
<rect width="800" height="400" fill="#000"/>
Are (non-void) self-closing tags valid in HTML5?
Of course, they are valid but with little modification.
Take an example a self-closing tag <br>.
Even if you write <br/> or <br /> they will eventually be converted to <br> in the browser.
In self-closing tags ending with /> or />, / (slash) and white space will simply be ignored.
Take an example and let's see how it looks in the browser.
<p>This is paragraph with <br><br> and <br/><br/> and then <br /><br />.</p>
The above code will look like the following image in the browser.
You can see all converted to <br>. So it's your choice to close the self-closing tag or not but they are completely valid.
I've got the html string that I'd like bind with knockout.js and display it in textarea, and of course allow submitting it after some editing. What is the proper way to achieve that?
When I use HTML binding, I can bind one <br/> two string and it displays the same in textarea, but after submitting, I get the encoded string one<br/>two, which isn't too bad as I can handle it later, but there's still the line breaks issue.
Basically I'd like to preserve:
original line breaks
original html text
Now, when I pass:
<html>
<body>
using both value and html binding, in knockout I get script exception, as new lines are not handled. Special characters are encoded as well:
Content: ko.observable("<html>
<body>
Any ideas?
You might want to use btoa to encode the actual value so that no munging goes on in submission. Then you would use atob to decode on the server. Here's a little demo that lets you see how various treatments of html come out.
ko.applyBindings({stuff: ko.observable('<h1>Foo</h1>hi\nthere')});
<script src="https://cdnjs.cloudflare.com/ajax/libs/knockout/3.2.0/knockout-min.js"></script>
<textarea data-bind="value:stuff"></textarea>
<pre data-bind="text:stuff"></pre>
<hr />
<pre data-bind="html:stuff"></pre>
<hr />
<pre data-bind="text: btoa(stuff())"></pre>
I want to show the content of this <pre> tag using alert():
<pre> This is the IRNIC Whois server v1.6.2.<br> Available on web at http://whois.nic.ir/<br> Find the terms and conditions of use on http://www.nic.ir/<br> <br> This server uses UTF-8 as the encoding for requests and responses.<br><br> NOTE: This output has been filtered.<br><br> Information related to 'shirzadi.ir'<br><br><br>domain: shirzadi.ir<br>ascii: shirzadi.ir<br>remarks: (Domain Holder) Ehsan Shirzadi<br>remarks: (Domain Holder Address) Ferdowsi Blv , Mahdi 13 ST, Mashhad, Khorasan razavi, IR<br>holder-c: es372-irnic<br>admin-c: es372-irnic<br>tech-c: to52-irnic<br>bill-c: to52-irnic<br>nserver: wres1.irpowerweb.com<br>nserver: wres2.irpowerweb.com<br>last-updated: 2014-01-16<br>expire-date: 2017-10-08<br>source: IRNIC # Filtered<br><br>nic-hdl: es372-irnic<br>person: Ehsan Shirzadi<br>e-mail: ehsan.shirzadi#gmail.com<br>address: Ferdowsi Blv , Mahdi 13 ST, Mashhad, Khorasan razavi, IR<br>phone: +985117688851<br>fax-no: +989155066372<br>source: IRNIC # Filtered<br><br>nic-hdl: to52-irnic<br>org: Fanavarie Etelaate Towseye Saman (Mihannic)<br>e-mail: sales#mihannic.com<br>source: IRNIC # Filtered<br><br></pre>
when I read this content using xx = $('pre').text() and then alert(xx), the is no <br> but when I hard code this content in a variable and alert() I can see them. What is the problem here? finally I want to split the contents by <br>
Try $('pre').html() instead of text(). This should preserve < br > as well as other tags / html entities.
Edit
For completeness, as gillesc said, < br > and other tags will be stripped in alert() (since it does not support inner html). Therefore combination of .html() and replace method is required. Newline can be replaced by \n. Full code would look like this:
xx = $('pre').html().replace(/<br>/g, "\n");
alert(xx);
text() will strip the tags out so use html() instead but alert doesn't support tags so you will need to convert your <br/> before sending it to laert
See this post on how to do just that.
HTML Tags in Javascript Alert() method
Using text() keeps only the inner text, not the HTML markup.
Use html() and eventually replace each captured <br /> by the JS new line character, like said here : How replace HTML <br> with newline character "\n"
Make Like This
<pre id="MyID">This is Test <br /> This is test</pre>
and javascript code
<script>
alert(document.getElementById('MyID').innerHTML);
</script>
This may be a duplicate; it's hard to tell because the key words contain "html" and "content" and even Bing and Google were returning a lot of false positives.
Bootstrap tooltips and popovers support html values for the data-content attribute when data-html=true. However, this isn't valid
<input id="email" class="form-control" type="email"
data-bind="value: Email, valueUpdate: 'afterkeydown'"
data-container="body" data-toggle="popover" data-placement="bottom"
data-title="Email" data-html="true"
data-content="<p>This is <i>your</i> email address.</p>" />
because you can't just put html in the value of an attribute that is itself HTML. It may confuse the parser and is not permitted by the HTML specification.
While it seems to work with Internet Explorer, I really don't feel like testing with fifty different browsers and versions. It certainly does confuse the parser in the Visual Studio 2013 HTML editor. That editor thinks there's no closing quote.
I could dodge this by assigning the attribute from JavaScript in a separate file, but that's clumsy and defeats the separation of concerns.
So, what's the right way to mark this up?
As the accepted answer points out, you can't have a quote " inside a string quoted with ". This problem occurs often. If you want to display text that looks like HTML, then how is the browser supposed to know what it should parse as HTML and what it should simply display.
For example, how do you get a browser to display the text <p></p>
The answer is escaping. Instead of characters like " and <, you use placeholders like " and <
However, the solution of escaping the quotes doesn't work here. Precisely because the browser will not parse it as HTML. If you put escaped quotes in your html, they don't look like quotes to the browser, they look like text.
There is a different solution however: A string that is quoted with " can contain ' without problems. The following is valid:
data-content="<div id='string_in_string' ></div>"
This can be applied to your bootstrap popovers, I've set up a fiddle, it shows how the single quote strings are correctly parsed, while the escaped strings confuse the browser: https://jsfiddle.net/z4t2sud3/3/
This is the code inside the fiddle (the fiddle environment automatically imports bootstrap, jquery, etc)
<mark data-content="
<button class="btnr" type="button">
Doesn't work
</button>
<button class='btn btn-info' type='button'>
Works
</button>
" data-html="true" data-toggle="popover">
Popovered
</mark>
And be sure to activate the popover via Javascript:
$(function () {
$('[data-toggle="popover"]').popover()
})
You can add whatever you want to an HTML attribute as long as it is a valid html attribute value. What is a valid attribute value? What does not contains tags, quotes and so on. So.... and what? The solution is: Scape the string before append it inside the html attribute.
If you are using PHP: http://fi2.php.net/htmlspecialchars
Or Twig: http://twig.sensiolabs.org/doc/filters/escape.html
If you are using jquery, when you do $el.data('my_scaped_json) will be converted to whatever it was originally, such a json object or html-string: $( $el.data('my_scaped_html) );
Consider the following HTML page fragment:
<div id='myDiv'>
Line 1.<br />
Line 2<br />
These are <special> characters & must be escaped !##><>
</div>
<input type='button' value='click' id='myButton' />
<textarea id='myTextArea'></textarea>
<script>
$(document).ready(function () {
$('#myButton').click(function () {
var text = $('#myDiv').text();
$('#myTextArea').val(text);
});
});
</script>
First, there is a div element with id myDiv. It contains some text similar to what might be retrieved form a SQL database at runtime in my production web site.
Next, there is a button and a textarea. I want the text in myDiv to appear in the textarea when the button is clicked.
However, using the code I provided, the line-breaks are stripped out. What can I do about this, taking into consideration that escaping special characters is absolutely non-negotiable?
Your code works great for me in both Firefox and Chrome: http://jsfiddle.net/jYjRc/
However, if you have a client that doesn't do what you want, replace <br>s with newline characters.
Edit: Tested in IE7 and the code breaks. So I updated the fiddle with my suggestion: http://jsfiddle.net/jYjRc/1/
Do your HTML like so:
<div id='myDiv'><pre>
Line 1.
Line 2
These are <special> characters & must be escaped !##><>
</pre></div>
And now .text() will return the text exactly as you specify it in the <pre> tag, even in IE.