Javascript: how to convert an element into an HTML-evaluated string? - javascript

Here's a simplified example of what I'd like to do:
var footnote = somewhere.innerHTML // This is <q>the note</q>.
var result = ???(footnote)
target.setAttribute("title", result) // This is "the note".
I've tried various methods and functions for the "???", but end up with either the raw tags displayed in the title, or with plain text and no quotation marks.
Other than processing all the inner tags myself, is there a way to convert an element into a string that contains how it would appear when HTML expanded?
Clarification:
I thought it was obvious from the "I have" and "I want" values indicated in the code comments, but this is what I want to do:
I have an element (say a <p> if you need a specific type)
that has content "This is <q>the note</q>."
I want something that will convert it into a string suitable for use in a title="..." attribute in some other element.
Displayable internal tags (in this specific example <q>) need to be HTML-interpreted so that they display as actual quotation marks, ideally handling nested quotations.
innerHTML conversion to string leaves the raw tags in place.
innerText conversion to string ignores the tags and produces no quotation marks.
Is there some other way of doing the HTML interpretation other than by writing my own function to process it?

When you add the q tag you're actually adding a text node(this is what you get with textContent, innerText, etc) that has two CSS pseudo-elements around it, the open and closing quotations.
Neither pseudo-elements nor pseudo-classes appear in the document source or document tree. They basically don't actually exist in the DOM and are therefore not selectable/won't show up in any values of the element properties.
In short, using <q></q> is more semantic mark-up, but if you're looking to represent those quotations outside the scope of the view you may want to use the traditional "
example:
let p = document.querySelector("p"), div = document.querySelector("div");
div.title = p.textContent;
console.log(div.title);
<p>"Example Text"</p>
<div></div>
Additionally, though I will say that I don't recommend it, if you really wanted to keep what you have and you're not too concerned with optimization you could simply use a replace:
let p = document.querySelector("p"), div = document.querySelector("div");
div.title = p.innerHTML.replace(/<q>|<\/q>/gmi, '"');
console.log(div.title);
<p><q>Example Text</q></p>
<div></div>

Related

How to use querySelectorAll with a variable?

I have two HTML elements that are alternatives of each other and I am trying to write a JS function that removes one if the other is present (they originated as words within <sic> and <corr> beneath <choice> in a TEI document). In my transformation, they are both assigned a code (not an #id: #id is randomly generated and has to remain so for other purposes) with a unique prefix:
<a id="abc" choicePOS="sic0">Element1</a>
<a id="xyz" choicePOS="corr0">Element2</a>
In a JS function that 'belongs' to Element1, I want to select Element2 so as to remove it. This is what I have tried (el is element1):
var choicePOS = el.getAttribute("choicePOS").slice(3); // produces 0
var corrID = "corr" + choicePOS; // produces corr0
var corr = document.querySelectorAll("a[choicePOS=corrID]");
This fails, presumably because the corrID variable in the last line is in quote marks and is being taken as a string. I have read various tutorials on CSS selectors and can't find any guidance on how to use them with a variable attribute value. Is this possible? If so, how? If not, any alternatives?
EDIT: A number of other questions relating how to concatenate strings with variables in JS have been suggested as duplicates of this one. To clarify, I am asking specifically about querySelectorAll, as I cannot find any examples this being used with variables. If the answer is that its selector is to be treated as any other JS string (i.e. variables can be concatenated in), then that is perfectly satisfactory.
Use template literals to evaluate that
var corr = document.querySelectorAll(`a[choicePOS=${corrID}]`);

Adjust regex to ignore anything else inside link HTML tags [duplicate]

This question already has answers here:
RegEx match open tags except XHTML self-contained tags
(35 answers)
Closed 6 years ago.
So I have this regex:
<a(?:.*)href="(.*)"(?:.*)>(.*)<\/a>
So far I have been able to get it to match HTML link tags that have extra attributes in them. Like classes and targets and so on, which works.
What I now want to do, is to adjust it so it matches and ignores any other tags inside the link itself (if there is any), as I only want the text of the link along with the address. I am unsure about the best way to do this.
Always Use DOM Parsing instead of regex
This has been suggested a multitude of times. And based on the comments to the increasingly complicated regex forming, it would be easier to examine just DOM. Take the following for example:
function fragmentFromString(strHTML) {
return document.createRange().createContextualFragment(strHTML);
}
let html = `<a data-popup-text="take me to <p>Testing <span>This</span></p>`;
let fragment = fragmentFromString(html);
let aTags = Array.from(fragment.querySelectorAll('a'));
aTags = aTags.map(a => {
return {
href: a.href,
text: a.textContent
}
});
console.log(aTags);
The above will turn a string of HTML into actual DOM inside of a fragment. You still still need to append that fragment somewhere, but the point is, that you can now query the a tags. The above code gives you an array of objects that contain the data for each a tag, their href value, and the innerText, minus all the html.
Original answer. Don't use it, it stays to serve as context to the real problem:
I changed this a little to use a non-greedy format (.*?). It will also avoid early ending because of ending html in an attribute as pointed out by #Gaby aka G. Petrioli.
<.*?href="(.*?)"(?:[^"]*")+>(.*)<\/a>
Check out the JS fiddle

Convert textNode content to a string

Having problem with a textNode that I can't convert to a string.
I'm trying to scrape a site and get certain information out from it, and when I use an XPath to find this text I'm after I get an textNode back.
When I look in google development tool in chrome, I can se that the textNode itself contain the text I'm after, but how do I convert the textNode to plain text?
here is the line of code I use:
abstracts = ZU.xpath(doc, '//*[#id="abstract"]/div/div/par/text()');
I have tried to use stuff like .innerHTML, toString, textContent but nothing have worked so far.
I usually use Text.wholeText if I want to see the content string of a textNode, because textNode is an object so using toString or innerHTML will not work because it is an object not as the string itself...
Example: from https://developer.mozilla.org/en-US/docs/Web/API/Text/wholeText
The Text.wholeText read-only property returns the full text of all Text nodes logically adjacent to the node. The text is concatenated in document order. This allows to specify any text node and obtain all adjacent text as a single string.
Syntax
str = textnode.wholeText;
Notes and example:
Suppose you have the following simple paragraph within your webpage (with some whitespace added to aid formatting throughout the code samples here), whose DOM node is stored in the variable para:
<p>Thru-hiking is great! <strong>No insipid election coverage!</strong>
However, <a href="http://en.wikipedia.org/wiki/Absentee_ballot">casting a
ballot</a> is tricky.</p>
You decide you don’t like the middle sentence, so you remove it:
para.removeChild(para.childNodes[1]);
Later, you decide to rephrase things to, “Thru-hiking is great, but casting a ballot is tricky.” while preserving the hyperlink. So you try this:
para.firstChild.data = "Thru-hiking is great, but ";
All set, right? Wrong! What happened was you removed the strong element, but the removed sentence’s element separated two text nodes. One for the first sentence, and one for the first word of the last. Instead, you now effectively have this:
<p>Thru-hiking is great, but However, <a
href="http://en.wikipedia.org/wiki/Absentee_ballot">casting a
ballot</a> is tricky.</p>
You’d really prefer to treat all those adjacent text nodes as a single one. That’s where wholeText comes in: if you have multiple adjacent text nodes, you can access the contents of all of them using wholeText. Let’s pretend you never made that last mistake. In that case, we have:
assert(para.firstChild.wholeText == "Thru-hiking is great! However, ");
wholeText is just a property of text nodes that returns the string of data making up all the adjacent (i.e. not separated by an element boundary) text nodes combined.
Now let’s return to our original problem. What we want is to be able to replace the whole text with new text. That’s where replaceWholeText() comes in:
para.firstChild.replaceWholeText("Thru-hiking is great, but ");
We’re removing every adjacent text node (all the ones that constituted the whole text) but the one on which replaceWholeText() is called, and we’re changing the remaining one to the new text. What we have now is this:
<p>Thru-hiking is great, but <a
href="http://en.wikipedia.org/wiki/Absentee_ballot">casting a
ballot</a> is tricky.</p>
Some uses of the whole-text functionality may be better served by using Node.textContent, or the longstanding Element.innerHTML; that’s fine and probably clearer in most circumstances. If you have to work with mixed content within an element, as seen here, wholeText and replaceWholeText() may be useful.
More info: https://developer.mozilla.org/en-US/docs/Web/API/Text/wholeText

How can I separately retrieve the HTML that's before and after a child element inside a parent element?

We're writing a web app that relies on Javascript/jQuery. It involves users filling out individual words in a large block of text, kind of like Mad Libs. We've created a sort of HTML format that we use to write the large block of text, which we then manipulate with jQuery as the user fills it out.
Part of a block of text might look like this:
<span class="fillmeout">This is a test of the <span>NOUN</span> Broadcast System.</span>
Given that markup, I need to separately retrieve and manipulate the text before and after the inner <span>; we're calling those the "prefix" and "suffix".
I know that you can't parse HTML with simple string manipulation, but I tried anyway; I tried using split() on the <span> and </span> tags. It seemed simple enough. Unfortunately, Internet Explorer casts all HTML tags to uppercase, so that technique fails. I could write a special case, but the error has taught me to do this the right way.
I know I could simply use extra HTML tags to manually denote the prefix and suffix, but that seems ugly and redundant; I'd like to keep our markup format as lean and readable and writable as possible.
I've looked through the jQuery docs, and can't find a function that does exactly what I need. There are all sorts of functions to add stuff before and after and around and inside elements, but none that I can find to retrieve what's already there. I could remove the inner <span>, but then I don't know how I can tell what came before the deleted element apart from what came after it.
Is there a "right" way to do what I'm trying to do?
With simple string manipulations you can also use Regex.
That should solve your problem.
var array = $('.fillmeout').html().split(/<\/?span>/i);
Use your jQuery API! $('.fillmeout').children() and then you can manipulate that element as required.
http://api.jquery.com/children/
For completeness, I thought I should point out that the cleanest answer is to put the prefix and suffix text in it's own <span> like this and then you can use jQuery selectors and methods to directly access the desired text:
<span class="fillmeout">
<span class="prefix">This is a test of the </span>
<span>NOUN</span>
<span class="suffix"> Broadcast System.</span>
</span>
Then, the code would be as simple as:
var fillme = $(".fillmeout").eq(0);
var prefix = fillme.find(".prefix").text();
var suffix = fillme.find(".suffix").text();
FYI, I would not call this level of simplicity "ugly and redundant" as you theorized. You're using HTML markup to delineate the text into separate elements that you want to separately access. That's just smart, not redundant.
By way of analogy, imagine you have toys of three separate colors (red, white and blue) and they are initially organized by color and you know that sometime in the future you are going to need to have them separated by color again. You also have three boxes to store them in. You can either put them all in one box now and manually sort them out by color again later or you can just take the already separated colors and put them each into their own box so there's no separation work to do later. Which is easier? Which is smarter?
HTML elements are like the boxes. They are containers for your text. If you want the text separated out in the future, you might as well put each piece of text into it's own named container so it's easy to access just that piece of text in the future.
Several of these answers almost got me what I needed, but in the end I found a function not mentioned here: .contents(). It returns an array of all child nodes, including text nodes, that I can then iterate over (recursively if needed) to find what I need.
I'm not sure if this is the 'right' way either, but you could replace the SPANs with an element you could consistently split the string on:
jQuery('.fillmeout span').replaceWith('|');
http://api.jquery.com/replaceWith/
http://jsfiddle.net/mdarnell/P24se/
You could use
$('.fillmeout span').get(0).previousSibling.textContent
$('.fillmeout span').get(0).nextSibling.textContent
This works in IE9, but sadly not in IE versions smaller than 9.
Based on your example, you could use your target as a delimiter to split the sentence.
var str = $('.fillmeout').html();
str = str.split('<span>NOUN</span>');
This would return an array of ["This is a test of the ", " Broadcast System."]. Here's a jsFiddle example.
You could just use the nextSibling and previousSibling native JavaScript (coupled with jQuery selectors):
$('.fillmeout span').each(
function(){
var prefix = this.previousSibling.nodeValue,
suffix = this.nextSibling.nodeValue;
});
JS Fiddle proof of concept.
References:
each().
node.nextSibling.
node.previousSibling.
If you want to use the DOM instead of parsing the HTML yourself and you can't put the desired text in it's own elements, then you will need to look through the DOM for text nodes and find the text nodes before and after the span tag.
jQuery isn't a whole lot of help when dealing with text nodes instead of element nodes so the work is mostly done in plain javascript like this:
$(".fillmeout").each(function() {
var node = this.firstChild, prefix = "", suffix = "", foundSpan = false;
while (node) {
if (node.nodeType == 3) {
// if text node
if (!foundSpan) {
prefix += node.nodeValue;
} else {
suffix += node.nodeValue;
}
} else if (node.nodeType == 1 && node.tagName == "SPAN") {
// if element and span tag
foundSpan = true;
}
node = node.nextSibling;
}
// here prefix and suffix are the text before and after the first
// <span> tag in the HTML
// You can do with them what you want here
});
Note: This code does not assume that all text before the span is located in one text node and one text node only. It might be, but it also might not be so it collates all the text nodes together that are before and after the span tag. The code would be simpler if you could just reference one text node on each side, but it isn't 100% certain that that is a safe assumption.
This code also handles the case where there is no text before or after the span.
You can see it work here: http://jsfiddle.net/jfriend00/P9YQ6/

Best way to pick up text in a HTML element that is in the parent node only

I have, for example, markup like this
<div id="content">
<p>Here is some wonderful text, and here is a link. All links should have a `href` attribute.</p>
</div>
Now I want to be able to perform some regex replace on the text inside the p element, but not in any HTML, i.e. be able to match the href within backticks, but not inside the anchor element.
I thought about regex, but as the general consensus is, I shouldn't be using them to parse HTML.
My current method of doing this is like so: I've got a bunch of words in an array, and I am looping through them and making an object of data like so:
termsData[term] = {
regex: new RegExp('(\\b' + term + '\\b)', 'gmi'),
replaceWith: '<span>{TERM}</span>'
};
I then loop through it again, making the replacements like so:
var html = obj.html();
$.each(terms, function(i, term) {
// Replace each word in the HTML with the span
html = html.replace(termsData[term].regex, termsData[term].replaceWith.replace(/{TERM}/, '$1'));
});
obj.html(html);
Now I did a lot of this last night at an ungodly hour, and copying and pasting it into here seems to make think I should refactor some of this.
So from you should be able to tell, I want to be able to replace plain text, but not anything inside a HTML tag.
What would be the best way to do it?
Note: The source code is coming from here if you'd like a better look.
You're right to not want to be processing HTML with regex. It's also bad news to be assigning huge chunks of .html(); apart from the performance drawbacks of serialising and reparsing a large amount of HTML, you'll also lose unserialisable data like event listeners, form data and JS properties/references.
See the findText function in this answer and call something like (assuming obj is a jQuery wrapper over your topmost node to search in):
findText(obj[0], /\b(term1|term2|term3)\b/g, function(node, match) {
var span= document.createElement('span');
node.splitText(match.index+match[0].length);
span.appendChild(node.splitText(match.index));
node.parentNode.insertBefore(span, node.nextSibling);
});

Categories