Remove text from around DOM node

Remove text from around DOM node - javascript

I've selected a DOM node, and I want to do some processing if it is both immediately prefixed and suffixed with a $. So, selecting on <code> elements, I want to handle this case:
<p>I assert $<code>1 + 1 = 2</code>$, it's true!</p>
and turn it into this:
<p>I assert <code class="language-inline-math">1 + 1 = 2</code>, it's true!</p>
That is, if my selected DOM node is immediately preceded by some token and immediately succeeded by some token, I want to strip those tokens and do some processing on the node.
I have this working by manipulating innerHTML/outerHTML, but it feels wrong to be manipulating the DOM elements via the serialized HTML rather than the DOM API. Is there a method to accomplish this without writing to innerHTML?
// given a pre-selected `var el` DOM node
var parent = el.parentNode;
var inlineMath = "$" + el.outerHTML + "$";
if (parent.innerHTML.indexOf(inlineMath) !== -1) {
el.classList.add("language-inline-math");
parent.innerHTML = parent.innerHTML.replace("$" + el.outerHTML + "$", el.outerHTML);
}
To avoid the XY problem, here's the actual task I'm trying to solve:
I have some (commonmark) markdown, and I'd like to introduce a lightweight extension syntax on top of a (commonmark compliant) markdown parser. For block equasions, this is the obvious choice:
```math
1 + 1 = 2
```
which becomes
<pre><code class="language-math">1 + 1 = 2
</code></pre>
per the CommonMark spec. This can easily be found and then fed into a math display library from JavaScript. For inline math, however, the inline code syntax doesn't support the language class addition, so some additional syntax has to be put on top.
The reuse of code blocks is semantically useful, as they define a span where markdown doesn't do any processing. The common way of handling inline math for LaTeX/MathJax/KaTeX or other systems is via $-fencing. So I chose to take GitLab's syntax and use $ <no space> <inline-code-block> <no space> $ to represent an inline math equation.
Instead of
I assert $`1 + 1 = 2`$, it's true!
I could have people write
I assert `$1 + 1 = 2$`, it's true!
which would have a similar fallback in the event of no JavaScript, but the problem is that $code$ is something that people want to be able to write normally, thus I prefer the external fencing.
Given that I have a working solution, the proper answer may be "there isn't a better way than what you've done already". I feel there is a better way to do this without using the text-based serialized-HTML properties, but it's quite possible that I'm wrong and this is the best way to accomplish this task.

You can use the element.previousSibling and element.nextSibling properties to get the text nodes right before and after your element, and check if they end and begin with a $. Then you can use the element.splitText(n) method to split the text nodes into two nodes: One with the $, and one with the rest. Then use the element.remove() method to remove the $ text nodes.
var n = e.nextSibling;
var p = e.previousSibling;
if (n && p && /^\$/.test(n.data) && /\$$/.test(p.data)) {
// Whatever you wanted to do with `e` here.
n.splitText(1); n.remove();
p.splitText(p.data.length - 1).remove();
}
(That's what I used here: https://github.com/m-ou-se/rust-horrible-katex-hack/blob/565ffd921d3c0eef2037aef58215c56ba1a09ddc/src/lib.rs#L10-L15)

Related

Rails/Rspec/Capybara: Interpreting quotes for javascript string for execute script

Given that I need to set an element's selected index with javascript in capybara by the input name...
var element = document.querySelector("select[name='user[user_locations_attributes][0][location_attributes][state]']").selectedIndex = '50';
What is the proper way to interpret this as a string so it can be executed in Capybara with execute_script(function_name_string)? Because I keep getting syntax errors, unsure how to nest the " and ' quotations.

Easiest solution to your question is to use a heredoc
page.execute_script <<~JS
var element = document.querySelector("select[name='user[user_locations_attributes][0][location_attributes][state]']").selectedIndex = '50';
JS
Although if you have need for the element for anything else it's probably nicer to find the element in ruby and then just call execute_script on the element
el = find("select[name='user[user_locations_attributes][0][location_attributes][state]']")
el.execute_script('this.selectedIndex = 50;')
As a related question - is there a reason you're doing this via JS rather than just clicking on the correct option? If you're just scraping a page there's no issue, but if you're actually testing something this basically makes your test invalid since you could potentially be doing things a user couldn't
Since you commented that you are testing, you really shouldn't be doing this via JS, but should instead be using select or select_option. select takes the options string (which you should have - otherwise why have a select element in the first place)
select('the text of option', from: 'user[user_locations_attributes][0][location_attributes][state]')
select_option is called on the option element directly, which can be found in a number of ways, such as
find("select[name='user[user_locations_attributes][0][location_attributes][state]'] option:nth-child(50)").select_option

Javascript creating an HTML object vs creating an HTML string

Pretty simple question that I couldn't find an answer to, maybe because it's a non-issue, but I'm wondering if there is a difference between creating an HTML object using Javascript or using a string to build an element. Like, is it a better practice to declare any HTML elements in JS as JS objects or as strings and let the browser/library/etc parse them? For example:
jQuery('<div />', {'class': 'example'});
vs
jQuery('<div class="example></div>');
(Just using jQuery as an example, but same question applies for vanilla JS as well.)
It seems like a non-issue to me but I'm no JS expert, and I want to make sure I'm doing it right. Thanks in advance!

They're both "correct". And both are useful at different times for different purposes.
For instance, in terms of page-speed, these days it's faster to just do something like:
document.body.innerHTML = "<header>....big string o' html text</footer>";
The browser will spit it out in an instant.
As a matter of safety, when dealing with user-input, it's safer to build elements, attach them to a documentFragment and then append them to the DOM (or replace a DOM node with your new version, or whatever).
Consider:
var userPost = "My name is Bob.<script src=\"//bad-place.com/awful-things.js\"></script>",
paragraph = "<p>" + userPost + "</p>";
commentList.innerHTML += paragraph;
Versus:
var userPost = "My name is Bob.<script src=\"//bad-place.com/awful-things.js\"></script>",
paragraph = document.createElement("p");
paragraph.appendChild( document.createTextNode(userPost) );
commentList.appendChild(paragraph);
One does bad things and one doesn't.
Of course, you don't have to create textNodes, you could use innerText or textContent or whatever (the browser will create the text node on its own).
But it's always important to consider what you're sharing and how.
If it's coming from anywhere other than a place you trust (which should be approximately nowhere, unless you're serving static pages, in which case, why are you building html?), then you should keep injection in mind -- only the things you WANT to be injected should be.

Either can be preferable depending on your particular scenario—ie, if everything is hard-coded, option 2 is probably better, as #camus said.
One limitation with the first option though, is that this
$("<div data-foo='X' />", { 'class': 'example' });
will not work. That overload expects a naked tag as the first parameter with no attributes at all.
This was reported here

1/ is better if your attribubes depends on variables set before calling the $ function , dont have to concatenate strings and variables. Aside from that fact ,since you can do both , and it's just some js code somebody else wrote , not a C++ DOM API hardcoded in the browser...

How can I separately retrieve the HTML that's before and after a child element inside a parent element?

We're writing a web app that relies on Javascript/jQuery. It involves users filling out individual words in a large block of text, kind of like Mad Libs. We've created a sort of HTML format that we use to write the large block of text, which we then manipulate with jQuery as the user fills it out.
Part of a block of text might look like this:
This is a test of the NOUN Broadcast System.
Given that markup, I need to separately retrieve and manipulate the text before and after the inner ; we're calling those the "prefix" and "suffix".
I know that you can't parse HTML with simple string manipulation, but I tried anyway; I tried using split() on the and tags. It seemed simple enough. Unfortunately, Internet Explorer casts all HTML tags to uppercase, so that technique fails. I could write a special case, but the error has taught me to do this the right way.
I know I could simply use extra HTML tags to manually denote the prefix and suffix, but that seems ugly and redundant; I'd like to keep our markup format as lean and readable and writable as possible.
I've looked through the jQuery docs, and can't find a function that does exactly what I need. There are all sorts of functions to add stuff before and after and around and inside elements, but none that I can find to retrieve what's already there. I could remove the inner , but then I don't know how I can tell what came before the deleted element apart from what came after it.
Is there a "right" way to do what I'm trying to do?

With simple string manipulations you can also use Regex.
That should solve your problem.
var array = $('.fillmeout').html().split(/<\/?span>/i);

Use your jQuery API! $('.fillmeout').children() and then you can manipulate that element as required.
http://api.jquery.com/children/

For completeness, I thought I should point out that the cleanest answer is to put the prefix and suffix text in it's own <span> like this and then you can use jQuery selectors and methods to directly access the desired text:
<span class="fillmeout">
<span class="prefix">This is a test of the </span>
<span>NOUN</span>
<span class="suffix"> Broadcast System.</span>
</span>
Then, the code would be as simple as:
var fillme = $(".fillmeout").eq(0);
var prefix = fillme.find(".prefix").text();
var suffix = fillme.find(".suffix").text();
FYI, I would not call this level of simplicity "ugly and redundant" as you theorized. You're using HTML markup to delineate the text into separate elements that you want to separately access. That's just smart, not redundant.
By way of analogy, imagine you have toys of three separate colors (red, white and blue) and they are initially organized by color and you know that sometime in the future you are going to need to have them separated by color again. You also have three boxes to store them in. You can either put them all in one box now and manually sort them out by color again later or you can just take the already separated colors and put them each into their own box so there's no separation work to do later. Which is easier? Which is smarter?
HTML elements are like the boxes. They are containers for your text. If you want the text separated out in the future, you might as well put each piece of text into it's own named container so it's easy to access just that piece of text in the future.

Several of these answers almost got me what I needed, but in the end I found a function not mentioned here: .contents(). It returns an array of all child nodes, including text nodes, that I can then iterate over (recursively if needed) to find what I need.

I'm not sure if this is the 'right' way either, but you could replace the SPANs with an element you could consistently split the string on:
jQuery('.fillmeout span').replaceWith('|');
http://api.jquery.com/replaceWith/
http://jsfiddle.net/mdarnell/P24se/

You could use
$('.fillmeout span').get(0).previousSibling.textContent
$('.fillmeout span').get(0).nextSibling.textContent
This works in IE9, but sadly not in IE versions smaller than 9.

Based on your example, you could use your target as a delimiter to split the sentence.
var str = $('.fillmeout').html();
str = str.split('<span>NOUN</span>');
This would return an array of ["This is a test of the ", " Broadcast System."]. Here's a jsFiddle example.

You could just use the nextSibling and previousSibling native JavaScript (coupled with jQuery selectors):
$('.fillmeout span').each(
function(){
var prefix = this.previousSibling.nodeValue,
suffix = this.nextSibling.nodeValue;
});
JS Fiddle proof of concept.
References:
each().
node.nextSibling.
node.previousSibling.

If you want to use the DOM instead of parsing the HTML yourself and you can't put the desired text in it's own elements, then you will need to look through the DOM for text nodes and find the text nodes before and after the span tag.
jQuery isn't a whole lot of help when dealing with text nodes instead of element nodes so the work is mostly done in plain javascript like this:
$(".fillmeout").each(function() {
var node = this.firstChild, prefix = "", suffix = "", foundSpan = false;
while (node) {
if (node.nodeType == 3) {
// if text node
if (!foundSpan) {
prefix += node.nodeValue;
} else {
suffix += node.nodeValue;
}
} else if (node.nodeType == 1 && node.tagName == "SPAN") {
// if element and span tag
foundSpan = true;
}
node = node.nextSibling;
}
// here prefix and suffix are the text before and after the first
// <span> tag in the HTML
// You can do with them what you want here
});
Note: This code does not assume that all text before the span is located in one text node and one text node only. It might be, but it also might not be so it collates all the text nodes together that are before and after the span tag. The code would be simpler if you could just reference one text node on each side, but it isn't 100% certain that that is a safe assumption.
This code also handles the case where there is no text before or after the span.
You can see it work here: http://jsfiddle.net/jfriend00/P9YQ6/

Performance of split() over substr()

In a tutorial for building a CSS selector engine in JavaScript (visible for Tuts+ members here) the author uses the following code to remove everything in a string before the hash character:
// sel = "div#main li"
if (sel.indexOf("#") > 0) {
sel = sel.split("#");
sel = "#" + sel[sel.length -1];
}
While I'm a JavaScript beginner, I'm not a beginner programmer. And this seem such a overwhelming operation, like killing an ant with a cannon. I'd use something like:
sel.substr(sel.indexOf("#"));
Maybe even not enclosed with the if statement which already uses indexof(). So, as the author even wrote a book on JavaScript, there must be some secret that I'm not aware of: are there any advantages of using the former code? On performance maybe?

There's usually a wide variation of performance between different implementations, so testing would be needed. But if performance is really a consideration, I would bet that .split() is slower.
"Maybe even not enclosed with the if statement..."
But I would say that you should't have it inline as you do. The .indexOf() will return -1 if no match is found, which will cause .substr to give you the last character of the string.
var sel = 'tester';
sel.substr(sel.indexOf("#")); // "r"
So keep the if statement...
var sel = 'tester',
idx = sel.indexOf("#"),
sub;
if( idx !== -1 ) {
sub = sel.substr("#");
}

I'm not sure what the tutorial is trying to do, but sel="div#main li#first" is valid CSS and their code would return #first and sel.substr(sel.indexOf("#")); would return #main li#first. I'm guessing, but that could work in a loop where you work backwards through the CSS selectors.

Real life CSS selector engines use regular expressions for everything and this seems to be the best way. The language provides us with a dedicated powerful tool for string manipulations, so why not to use it. In your case:
sub = sel.replace(/^.+?#/, "#")
does the job fast and without extra clutter.
Performance? In javascript we usually don't care much, because our applications are not time-critical. Nobody cares if it takes 0.1 or 0.01 sec to validate a form or to fade in a div.

Javascript .replace command replace page text?

Can the JavaScript command .replace replace text in any webpage? I want to create a Chrome extension that replaces specific words in any webpage to say something else (example cake instead of pie).

The .replace method is a string operation, so it's not immediately simple to run the operation on HTML documents, which are composed of DOM Node objects.
Use TreeWalker API
The best way to go through every node in a DOM and replace text in it is to use the document.createTreeWalker method to create a TreeWalker object. This is a practice that is used in a number of Chrome extensions!
// create a TreeWalker of all text nodes
var allTextNodes = document.createTreeWalker(document.body, NodeFilter.SHOW_TEXT),
// some temp references for performance
tmptxt,
tmpnode,
// compile the RE and cache the replace string, for performance
cakeRE = /cake/g,
replaceValue = "pie";
// iterate through all text nodes
while (allTextNodes.nextNode()) {
tmpnode = allTextNodes.currentNode;
tmptxt = tmpnode.nodeValue;
tmpnode.nodeValue = tmptxt.replace(cakeRE, replaceValue);
}
To replace parts of text with another element or to add an element in the middle of text, use DOM splitText, createElement, and insertBefore methods, example.
See also how to replace multiple strings with multiple other strings.
Don't use innerHTML or innerText or jQuery .html()
// the innerHTML property of any DOM node is a string
document.body.innerHTML = document.body.innerHTML.replace(/cake/g,'pie')
It's generally slower (especially on mobile devices).
It effectively removes and replaces the entire DOM, which is not awesome and could have some side effects: it destroys all event listeners attached in JavaScript code (via addEventListener or .onxxxx properties) thus breaking the functionality partially/completely.
This is, however, a common, quick, and very dirty way to do it.

Ok, so the createTreeWalker method is the RIGHT way of doing this and it's a good way. I unfortunately needed to do this to support IE8 which does not support document.createTreeWalker. Sad Ian is sad.
If you want to do this with a .replace on the page text using a non-standard innerHTML call like a naughty child, you need to be careful because it WILL replace text inside a tag, leading to XSS vulnerabilities and general destruction of your page.
What you need to do is only replace text OUTSIDE of tag, which I matched with:
var search_re = new RegExp("(?:>[^<]*)(" + stringToReplace + ")(?:[^>]*<)", "gi");
gross, isn't it. you may want to mitigate any slowness by replacing some results and then sticking the rest in a setTimeout call like so:
// replace some chunk of stuff, the first section of your page works nicely
// if you happen to have that organization
//
setTimeout(function() { /* replace the rest */ }, 10);
which will return immediately after replacing the first chunk, letting your page continue with its happy life. for your replace calls, you're also going to want to replace large chunks in a temp string
var tmp = element.innerHTML.replace(search_re, whatever);
/* more replace calls, maybe this is in a for loop, i don't know what you're doing */
element.innerHTML = tmp;
so as to minimize reflows (when the page recalculates positioning and re-renders everything). for large pages, this can be slow unless you're careful, hence the optimization pointers. again, don't do this unless you absolutely need to. use the createTreeWalker method zetlen has kindly posted above..

have you tryed something like that?
$('body').html($('body').html().replace('pie','cake'));

We Keep Coding

JavaScript is the programming language of the Web.