I'm attempting to dynamically create a list of HTMLElements with data-* attributes that correspond to different HTML Entities, to be then picked up by CSS and used as pseudo element content like so:
li:after {
content: attr(data-code);
}
The problem with this is that for attr() to properly render the actual entity, rather than the literal code is to prefix said code with &#x - your typical \ doesn't work.
So the desired output HTML is something like so: <li data-code="NTITY"></li>. When added directly to HTML, this works exactly as expected in relation to my CSS rule. The escaped entity is placed on the page in an :after psuedo element and rendered as the entity icon.
Here's where things get curious...
As stated earlier, I'm trying to create and inject these lis dynamically through JavaScript (iterating through a list), and that's where the snag happens.
var entities = [{code: '😂'}, ...];
for (var i = 0; i < entitites.length; i++) {
var entity = entitites[i],
listItem = document.createElement('li');
listItem.setAttribute('data-code', entity.code);
list.appendChild(listItem);
}
The li is correctly added to the DOM with the properly formatted entity set so it gets picked up by my CSS rule. However, rather than rendering the entity icon, the code is shown!
Note in the image above, the first item rendered is the HTML explicitly on the page. The second item is injected via JS (using the exact same code), then given an :after element by CSS. Chrome's web inspector even renders it differently!
Even curiouser still is that I can edit the HTML via WebInspector and inject the escaped data-* attribute manually - Chrome STILL renders the correct icon!
I'm at a loss here, so any guidance would be greatly appreciated!
The HTML entity notation will only be parsed by an HTML parser. If your data-code attributes are "born" in JavaScript, then you need to use the JavaScript notation for getting the Unicode characters you want. Instead of ☺ for a smiley face, in JavaScript you use \u263A (a backslash, a lower-case "u", and four hex digits).
Whether your data-code attributes are coded directly into your HTML source (with HTML entity notation) or else created in JavaScript (with JavaScript notation), by the time the attribute value is part of the DOM, it's Unicode.
Now, things get more complicated when you have characters outside the 16-bit range, because JavaScript is kind-of terrible at dealing with that. You can look up your code point(s) at http://www.fileformat.info/info/unicode/ and that'll give you the "C/C++/Java" UTF-16 code pair you need. For example, your "tears of joy" face is the pair "\uD83D\uDE02".
Related
I'm looking for a way to look for a specific string within a page in the visible text and then wrap that string in <em> tags. I have tried used HTML Agility Pack and had some success with a Regex.Replace but if the string is included within a url it also gets replaced which I do not want, if it's within an image name, it gets replaced and this obviously breaks the link or image url.
An example attempt:
var markup = Encoding.UTF8.GetString(buffer);
var replaced = Regex.Replace(markup, "product-xs", " <em>product</em>-xs", RegexOptions.IgnoreCase);
var output = Encoding.UTF8.GetBytes(replaced);
_stream.Write(output, 0, output.Length);
This does not work as it would replace a <a href="product/product-xs"> with <a href="product/<em>product</em>-xs"> - which I don't want.
The string is coming from a text string value within a CMS so the user can't wrap the words there and ideally, I want to catch all instances of the word that are already published.
Ideally I would want to exclude <title> tags, <img> tags and <a> tags, everything else should get the wrapped tag.
Before I used the HTML Agility Pack, a fellow front end dev tried it with JavaScript but that had an unexpected impact on dropdown menus.
If you need any more info, just ask.
You can use HTML Agility Pack to select only the text nodes (i.e. the text that exists between any two tags) with a bit of XPath and modify them like this.
Looking only in body will exclude <title>, <meta> etc. The not excludes script tags, you can exclude others in the same way (or check the parent node in the loop).
foreach (HtmlNode node in htmlDoc.DocumentNode.SelectNodes("//body//*[not(self::script)]/text()"))
{
var newNode = htmlDoc.CreateTextNode(node.InnerText.Replace("product-xs", "<em>product</em>-xs"));
node.ParentNode.ReplaceChild(newNode, node);
}
I've used a simple replace, regex will work fine too, prob best to check the performance of each approach and choose which works best for your use case.
I'm having trouble using jQuery to retrieve a HTML string retuned by Django. In a Django template I have the line
{{ generated_files.exercise_instructions.content | safe}}
which when expanded contains the string 2. Määritä luvun <span class="inline-math"> -6 </span> vastaluku., as shown in the picture below:
I would like to be able to use KaTeX to render the contents of the span element as math, but
$(".inline-math").each(
function(i, element) {
console.log("Rivimatikkaa: " + element.innerHTML);
katex.render(element.innerHTML,element);
}
);
does nothing towards this end. The span element of class inline-math is not detected by jQuery, as the console.log(element.innerHTML) does nothing, nor is the math rendered.
My question therefore is, how do I detect a string, that contains LaTeX returned by Django, in order to render it as math using KaTeX?
P.S.
As an interesting sidenote, here are pictures of the page source with and without the safe filter in the Django tag.
safety on
safety off
Notice the extra ampersands. That is the only difference, however. The string is not interpreted as HTML in either case.
I have this code in my project.
var quickmode_list = "";
quickmode_list += '<div style="height:100px;width:500px;margin-top:0%;margin-right:0%"value="'
+ quicksetup_item
+'"class="quickmode_block quick_list"><center style="margin-top:20px"><font size="5" style="margin-left:-14%;">'
+quicksetup_item
+'</font></center></div>';
<div id="quickmode_table">
</div>
and I append this variable to a tag like this
$('#quickmode_table').append(quickmode_list);
So, It does show out in browser and it show out as class = "quickmode_block", but when I do alert($('.quickmode_block').length);
It return me "0". How does it going wrong if there perform like class "quickmode_block" but I can't get it by class?
This is because the DOM tree did not manage to refresh between your two JavaScript instruction (appending and alerting).
Better solution would be to use element creators (they are in jQuery) and then you would have the handle to the new element out of the box, you can access it and count the amount of them. It is even more performance friendly than generating html strings and querying the tree.
If quicksetup_item does not end with space, there would be no space before the class attribute.
According to HTML specification (https://www.w3.org/TR/html5/syntax.html#start-tags): "Attributes must be separated from each other by one or more space characters.". Sometimes, browser will tolerate this syntax error by inserting space automatically between attributes. In that case, code will run successfully.
However, it is recommended to follow the HTML specification and add spaces, rather than rely on browser tolerance.
I am currently looking for a solution to find and list out any unclosed HTML tags from an arbitrary slice of raw HTML. I don't feel like this should be an awful problem, but I cannot seem to find something that does it in JS. Unfortunately, this needs to be client-side since it is being used for rendering annotations to HTML pages. Obviously, annotations are somewhat nasty business, since they select or apply formatting that may apply to only part of an HTML element (i.e., a markup overlaid onto an existing HTML markup).
One simple use-case is where you might want to only render part of an HTML page, but then inject the rest later. For example, imagine a hypothetical segment:
<p>This is my text <StartDelayedInject/> with a comment I added. </p>
<p> But it doesn't exist until now. </p> <StopDelayedInject/>
I'll be doing some pre-processing to rebuild the HTML so that I wrap partial elements into span-type elements that apply the appropriate formatting. Initially this would be parsed in the form:
<p><span>This is my text</span></p>
After some user action, it would then be modified to a form such as:
<p><span>This is my text</span><span>with a comment I added.</span></p>
<p>But it doesn't exist until now.</p>
This is a very simplified example case (obviously things like ul elements and tables get hairier), but gives the general principle. However, to do this effectively, I need to be able to check a segment of HTML and figure out there are tags that have opened (but not closed). If I know that information, I can wrap the last unterminated text data into a span, close the unclosed tag, and know to return to that point to inject the remainder of the content when needed. However, I need to know the tags that were still open, so that when I inject or modify another segment of content, I can make sure to put it in the right place (e.g., get "with a comment I added." in the first paragraph).
From my understanding of context-free grammars, this should be a relatively trivial task. Each time you open/enter or close/exit a tag, you could just keep a stack of the tags opened but not yet closed. With that said, I'd much rather use a library that's a bit more of a mature solution than make naive parser for that purpose. I'd assume there's some JS HTML parser around that would do this, right? Plenty of them know how to close tags, so so clearly at some point they calculated this.
The problem is that JavaScript only has access to the html in two ways:
In a sense that each element is an object with properties and methods created by the browser on page load.
In a sense that it is a string of text.
Using the first method of interfacing with html, there is no way to detect unclosed tags as you only have access to the objects that the browser creates for you after it parses the html.
Using the second method, you would have to run the entire string of html through an html parser. Some people might assume you could do it simply with regexp, however, this is not feasible. I refer you to this fantastic stackoverflow question.
Even if you found a really robust html parser to use, you would still run into the problem created by the fact that, before your JavaScript even touches it, the browser will have attempted to parse the potentially broken html and there could be errors everywhere.
Edit:
If you like the parser idea, John Resig created this example one you might want to reference.
Not perfect but here's my quick method for checking for mismatch between open/close tags:
function find_unclosed_tags(str) {
str = str.toLowerCase();
var tags = ["a", "span", "div", "ul", "li", "h1", "h2", "h3", "h4", "h5", "h6", "p", "table", "tr", "td", "b", "i", "u"];
var mismatches = [];
tags.forEach(function(tag) {
var pattern_open = '<'+tag+'( |>)';
var pattern_close = '</'+tag+'>';
var diff_count = (str.match(new RegExp(pattern_open,'g')) || []).length - (str.match(new RegExp(pattern_close,'g')) || []).length;
if(diff_count != 0) {
mismatches.push("Open/close mismatch for tag " + tag + ".");
}
});
return mismatches;
}
How would I get the raw HTML of the selected content on a page using Javascript? For the sake of simplicity, I'm sticking with browsers supporting window.getSelection.
Here is an example; the content between both | represent my selection.
<p>
The <em>quick brown f|ox</em> jumps over the lazy <strong>d|og</strong>.
</p>
I can capture and alert the normalized HTML with the following Javascript.
var selectionRange = window.getSelection().getRangeAt(0);
selectionContents = selectionRange.cloneContents(),
fragmentContainer = document.createElement('div');
fragmentContainer.appendChild(selectionContents);
alert(fragmentContainer.innerHTML);
In the above example, the alerted contents would collapse the trailing elements and return the string <em>ox</em> jumps over the lazy <strong>d</strong>.
How might I return the string ox</em> jumps over the lazy <strong>d?
You would have to effectively write your own HTML serialiser.
Start at the selectionRange.startContainer/startOffset and walk the tree forwards from there until you get to endContainer/endOffset, outputting HTML markup from the nodes as you go, including open tags and attributes when you walk into an Element and close tags when you go up a parentNode.
Not much fun, especially if you are going to have to support the very different IE<9 Range model at some point...
(Note also that you won't be able to get the completely raw original HTML, because that information is gone. Only the current DOM tree is stored by the browser, and that means details like tag case, attribute order, whitespace, and omitted implicit tags will differ between the source and what you get out.)
Looking at the API's, I don't think you can extract the HTML without it being converted to a DocumentFragment, which by default will close any open tags to make it valid HTML.
See Converting Range or DocumentFragment to string for a similar Q.