Find Custom Elements that start with a specified prefix - javascript

I have a bunch of Custom Elements that begin with 'food-cta-'. I am looking for a way in JavaScript/jQuery to be able to select these elements. This is similar to how I can use $('*[class^="food-cta-"]') to select all the classes that start with food-cta-. Is it possible to do a search for elements that start with 'food-cta-'?
Note that I will be injecting this search onto the page, so I won't have access to Angular.
Example of Custom Elements:
<food-cta-download>
<food-cta-external>
<food-cta-internal>
EDIT: The code I am looking at looks like:
<food-cta-download type="primary" description="Download Recipe">
<img src="">
<h2></h2>
<p></p>
</food-cta-download>
The app uses AngularJS to create Custom Elements which I believe is called Directives.

You can use XPath with the expression
//*[starts-with(name(),'food-cta-')]
Where
//* is wildcard for all nodes
starts-with() is a XPath function to test a string starts with some value
name() gets the QName (node name)
and 'food-cta-' is the search term
Pass it into document.evaluate and you will get a XPathResult that will give you the nodes that were matched.
var result = document.evaluate( "//*[starts-with(name(),'food-cta-')]", document, null, XPathResult.ANY_TYPE, null );
Note you can use any node as the root, you do not need to use document. So you could for instance replace document with the some div:
var container = document.getElementById("#container");
var result = document.evaluate( "//*[starts-with(name(),'food-cta-')]", container, null, XPathResult.ANY_TYPE, null );
Demo
let result = document.evaluate( "//*[starts-with(name(),'food-cta-')]", document, null, XPathResult.ANY_TYPE, null );
let nodes = [];
let anode = null;
while( (anode = result.iterateNext()) ){
nodes.push( anode.nodeName );
}
console.log(nodes);
<div id="container">
<br>
<food-cta-download type="primary" description="Download Recipe">
<img src="">
<h2></h2>
<p></p>
</food-cta-download>
<span>Some span</span>
<food-cta-something>
<img src="">
<h2></h2>
<p></p>
</food-cta-something>
<div>In between
<food-cta-sub>
<img src="">
<h2></h2>
<p></p>
</food-cta-sub>
</div>
<food-cta-hello>
<img src="">
</food-cta-hello>
<food-cta-whattt>
</food-cta-whattt>
</div>

Try this..
let customElements = $('*')
.filter((index,element) => /FOOD-CTI-/.test(element.tagName));
Note, .tagName should return the result in uppercase. This should get you a jQuery object of the elements you want. It will traverse the entire DOM though. It'll be slow.
This uses the "all selector".
Caution: The all, or universal, selector is extremely slow, except when used by itself.
You can traverse less then entire dom too, by specifying something like $("body *"). Not sure where you have put the Custom Elements, and where they're allowed.
As an aside, I wouldn't use Custom Elements, microformats are a better idea at least now, they're also better supported, and they're less likely to change.

You probably have to just go to the elements in question and check if their tagName begins with that given string...
var myPrefix = "mycustom-thing-";
$("body").children().each(function() {
if (this.tagName.substr(0, myPrefix.length).toLowerCase() == myPrefix) {
console.log(this.innerHTML); // or what ever
}
})
https://jsfiddle.net/svArtist/duLo2d0z/
EDIT: Included for efficiency's sake:
If you can predict where the elements will be, you can of course specify that circumstance.
In my example, the elements in question were direct children of body - so I could use .children() to get them. This would not traverse lower levels.
Reduce the need for traversal by the following:
Start on the lowest needed level ($("#specific-id") rather than $("body"))
If the elements are all to be found as direct children of a container:
Use $.children() on the container to obtain just the immediate children
Else
Use $.find("*")
If you can tell something about the containing context, filter by that
For example $("#specific-id").find(".certain-container-class .child-class *")

Why not extend jquery selectors?
$(':tag(^=food-cta-)')
Would be possible with the following implementation:
$.expr[':'].tag = function tag(element, index, match) {
// prepare dummy attribute
// avoid string processing when possible by using element.localName
// instead of element.tagName.toLowerCase()
tag.$.attr('data-tag', element.localName || element.tagName.toLowerCase());
// in :tag(`pattern`), match[3] = `pattern`
var pattern = tag.re.exec(match[3]);
// [data-tag`m`="`pattern`"]
var selector = '[data-tag' + (pattern[1] || '') + '="' + pattern[2] + '"]';
// test if custom tag selector matches element
// using dummy attribute polyfill
return tag.$.is(selector);
};
// dummy element to run attribute selectors on
$.expr[':'].tag.$ = $('<div/>');
// cache RegExp for parsing ":tag(m=pattern)"
$.expr[':'].tag.re = /^(?:([~\|\^\$\*])?=)?(.*)$/;
// some tests
console.log('^=food-cta-', $(':tag(^=food-cta-)').toArray());
console.log('*=cta-s', $(':tag(*=cta-s)').toArray());
console.log('food-cta-hello', $(':tag(food-cta-hello)').toArray());
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="container">
<br>
<food-cta-download type="primary" description="Download Recipe">
<img src="">
<h2></h2>
<p></p>
</food-cta-download>
<span>Some span</span>
<food-cta-something>
<img src="">
<h2></h2>
<p></p>
</food-cta-something>
<div>In between
<food-cta-sub>
<img src="">
<h2></h2>
<p></p>
</food-cta-sub>
</div>
<food-cta-hello>
<img src="">
</food-cta-hello>
<food-cta-whattt>
</food-cta-whattt>
</div>
This supports a pseudo-CSS-style attribute selector with the syntax:
:tag(m=pattern)
Or just
:tag(pattern)
where m is ~,|,^,$,* and pattern is your tag selector.

Related

Excluding inner tags from string using Regex

I have the following text:
If there would be more <div>matches<div>in</div> string</div>, you will merge them to one
How do I make a JS regex that will produce the following text?
If there would be more <div>matches in string</div>, you will merge them to one
As you can see, the additional <div> tag has been removed.
I would use a DOMParser to parseFromString into the more fluent HTMLDocument interface to solve this problem. You are not going to solve it well with regex.
const htmlDocument = new DOMParser().parseFromString("this <div>has <div>nested</div> divs</div>");
htmlDocument.body.childNodes; // NodeList(2): [ #text, div ]
From there, the algorithm depends on exactly what you want to do. Solving the problem exactly as you described to us isn't too tricky: recursively walk the DOM tree; remember whether you've seen a tag yet; if so, exclude the node and merge its children into the parent's children.
In code:
const simpleExampleHtml = `<div>Hello, this is <p>a paragraph</p> and <div>some <div><div><div>very deeply</div></div> nested</div> divs</div> that should be eliminated</div>`
// Parse into an HTML document
const doc = new DOMParser().parseFromString(exampleHtml, "text/html").body;
// Process a node, removing any tags that have already been seen
const processNode = (node, seenTags = []) => {
// If this is a text node, return it
if (node.nodeName === "#text") {
return node.cloneNode()
}
// If this node has been seen, return its children
if (seenTags.includes(node.tagName)) {
// flatMap flattens, in case the same node is repeatedly nested
// note that this is a newer JS feature and lacks IE11 support: https://caniuse.com/?search=flatMap
return Array.from(node.childNodes).flatMap(child => processNode(child, seenTags))
}
// If this node has not been seen, process its children and return it
const newChildren = Array.from(node.childNodes).flatMap(child => processNode(child, [...seenTags, node.tagName]))
// Clone the node so we don't mutate the original
const newNode = node.cloneNode()
// We can't directly assign to node.childNodes - append every child instead
newChildren.forEach(child => newNode.appendChild(child))
return newNode
}
// resultBody is an HTML <body> Node with the desired result as its childNodes
const resultBody = processNode(doc);
const resultText = resultBody.innerHTML
// <div>Hello, this is <p>a paragraph</p> and some very deeply nested divs that should be eliminated</div>
But make sure you know EXACTLY what you want to do!
There's lots of potential complications you could face with data that's more complex than your example. Here are some examples where the simple approach may not give you the desired result.
<!-- nodes where nested identical children are meaningful -->
<ul>
<li>Nested list below</li>
<li>
<ul>
<li>Nested list item</li>
</ul>
</li>
</ul>
<!-- nested nodes with classes or IDs -->
<span>A span with <span class="some-class">nested spans <span id="DeeplyNested" class="another-class>with classes and IDs</span></span></span>
<!-- places where divs are essential to the layout -->
<div class="form-container">
<form>
<div class="form-row">
<label for="username">Username</label>
<input type="text" name="username" />
</div>
<div class="form-row"
<label for="password">Password</label>
<input type="text" name="password" />
</div>
</form>
</div>
Simple approach without using Regex by using p element of html and get its first div content as innerText(exclude any html tags) and affect it to p, finally get content but this time with innerHTML:
let text = 'If there would be more <div>mathces <div>in</div> string</div>, you will merge them to one';
const p = document.createElement('p');
p.innerHTML = text;
p.querySelector('div').innerText = p.querySelector('div').innerText;
console.log(p.innerHTML);

JavaScript evaluate XPATH within an element?

I am trying to get an element using document.evaluate() but also want to search only within a specific element. So for example:
const element = document.evaluate('.//p', ...); //I want this to return the Hello, World p element
<html>
<body>
<div id="someId">
<p>Hello, World!<p>
</div>
</body>
</html>
Is there any way I could pass in the div with id someId into the evaluate to only search within that scope?
I know I could write the whole XPATH like .//div[#id="someId"]/p (Wouldnt work for my case) or do string concatenation but would like to find a cleaner way of doing it like passing the DOM element (or some object it contains) somewhere.
This is precisely the purpose of the second argument of document.evaluate():
contextNode specifies the context node for the query (see the XPath specification). It's common to pass document as the context node.
const someId = document.getElementById('someId');
const result = document.evaluate('.//p', someId, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null)
console.log(result.snapshotItem(0)); // Hello, World!
console.log(result.snapshotItem(1)); // null
<div id="someId">
<p>Hello, World!</p>
</div>
<div>
<p>Goodbye, World!</p>
</div>
You can also use document for the contextNode, using the parent id within the query (like you asked). Like this:
const snapshotType = XPathResult.ORDERED_NODE_SNAPSHOT_TYPE;
const result1 = document.evaluate('.//div[#id="someId"]//p', document, null, snapshotType);
console.log(`result1 contains ${result1.snapshotLength} element, namely`, result1.snapshotItem(0))
//alternatively search for the actual text
const result2 = document.evaluate('.//p[contains(text(), "Hello")]', document, null, snapshotType);
console.log(`result2 contains ${result2.snapshotLength} element, namely`, result2.snapshotItem(0));
// finally, if the element order is fixed, you can take the first one
const result3 = document.evaluate('(.//p)[1]', document, null, snapshotType);
console.log(`result3 contains ${result3.snapshotLength} element, namely`, result3.snapshotItem(0));
<div id="someId">
<p>Hello, World!</p>
</div>
<div>
<p>Goodbye, World!</p>
</div>
<div>
<p>And I left</p>
</div>

querySelectorAll is not a function

I'm trying to find all oferts in the var articleFirst, but the return message in the console says that "querySelectorAll" is not a function. Why I do get that error?
This is my HTML:
<article class="first">
<div class="feature parts">
<div class="oferts">
<div class="heart icons"></div>
<h1>Build with passion</h1>
</div>
</div>
</article>
This is my JavaScript:
var articleFirst = document.querySelectorAll("article.first");
var oferts = articleFirst.querySelectorAll(".oferts");
Error:
Uncaught TypeError: articleFirst.querySelectorAll is not a function
Try do do this:
var articleFirst = document.querySelectorAll("article.first");
console.log(articleFirst)
var oferts = articleFirst[0].querySelectorAll(".oferts");
console.log(oferts)
With console you can see what is happening.
Or just do this:
document.querySelectorAll("article.first .oferts");
querySelectorAll is a method found on Element and Document nodes in the DOM.
You are trying to call it on the return value of a call to querySelectorAll which returns a Node List (which is an array like object). You would need to loop over the Node List and call querySelector all on each node in it in turn.
Alternatively, just use a descendant combinator in your initial call to it.
var oferts = document.querySelectorAll("article.first .oferts");
You need to use document.querySelector instead of document.querySelectorAll because the next query depends on a single HTMLElement but document.querySelectorAll returns a NodeList.
document.addEventListener('DOMContentLoaded', TestCtrl);
function TestCtrl() {
var firstArticle = document.querySelector('article.first');
console.log('oferts', firstArticle.querySelectorAll('.oferts'));
}
<article class="first">
<div class="feature parts">
<div class="oferts">
<div class="heart icons"></div>
<h1>Build with passion</h1>
</div>
</div>
</article>
A little verbose but you could try qselectorAll('article') then turn that nodeList into an array and pick the first index.. so something like:
let articleList = querySelectorAll('article'); // makes a NodeList of all article tags on the webpage
let article = Array.from(articleList);
article[0];

Get child elements within casper.each

Using CasperJS 1.1 with the following codes, I'm able to fetch useful DOM html from web page.
casper.each(c.getElementsInfo(xpath), function(casper, element, j) {
var html = element["html"].trim();
if(html.indexOf('Phone') > -1) {
// what should I put here?
}
});
However, I want to access & obtain the child elements of the element. How can I achieve this? Element's HTML source (a.k.a the value of html) is as follow:
Loop 1
<div class="fields">
Phone
</div>
<div class="values">
12345678 (Mr. Lee) </div>
Loop 2
<div class="fields">
Emergency Phone
</div>
<div class="values">
23456789 (Emergency)
</div>
Loop 3
<div class="fields">
Opening Hours
</div>
<div class="values">
9:00am-6:30pm(Weekday) /
Close on Sundays and Public Holidays(Can be booked)(Holiday)
</div>
Loop 4
<div class="fields">
Last Update
</div>
<div class="values">
11/06/14 </div>
The above HTML is badly formatted, and contains a lot of whitespaces.
The data I wanted to fetch is:
Phone: 12345678
Emergency Phone: 23456789 (Emergency)
Opening Hours: 9:00am-6:30pm(Weekday) / Close on Sundays and Public Holidays(Can be booked)(Holiday)
Last Update: 11/06/14
Tried RegEx, but the RegEx is too complicated.
I don't recommend doing this with regular expressions. It can be easily done with some selectors, but it has to be done in the page context (inside of the evaluate() callback), because DOM nodes cannot be passed to the outside.
CasperJS provides a helper function for matching DOM nodes by XPath with __utils__.getElementsByXPath() through the ClientUtils module that is always automatically inserted. The result of that function is an array, so the normal forEach() pattern applies. DOM nodes can be used as context nodes for selecting child elements with el.querySelector(".class").
var info = casper.evaluate(function(xpath){
var obj = {};
__utils__.getElementsByXPath(xpath).forEach(function(el){
obj[el.querySelector(".fields").textContent.trim()] =
el.querySelector(".values").textContent.trim();
});
return obj;
}, yourXPathString);
If you want to select elements based on a CSS selector use the following:
var info = casper.evaluate(function(cssSelector){
var obj = {};
__utils__.findAll(cssSelector).forEach(function(el){
obj[el.querySelector(".fields").textContent.trim()] =
el.querySelector(".values").textContent.trim();
});
return obj;
}, yourCssSelector);

Recursively get all HTML between two elements - excluding closing tags - in Javascript/Node.js

I need to be able to store certain elements separately in a database, but on retrieval rebuild the HTML for display. Our solution to this (open to suggestions) is to store leadingHTML and trailngHTML properties of the entry.
This should provide us the ability to be as flexible as we want-- but there's just one catch. I'm banging my head against the wall trying to write the code to parse the HTML. Take the following HTML for example:
<h1>this is leadingHTML</h1>
<h2>this is leadingHTML2</h2>
<p class='select' id='1'>A1</p>
<h1 >this is trailngHTML</h1>
<h2>this is trailngHTML2</h2>
<p class='select' id='2'>A2</p>
<h1>this is trailngHTML3</h1>
<h2>this is trailngHTML4</h2>
<p class='select' id='3'>A3</p>
<figure id='fig'>
<figCaption>
this is some text
<span class='select'>B1</span>
<div>some text <span class='select'>B2</span></div>
</figCaption>
<img class='select' alt='test' src='test.jpg'/>
<img class='select' alt='test' src='test.jpg'/>
<img class='select' alt='test' src='test.jpg'/>
</figure>
<p class="select">A4</p>
It's easy to get all the elements with class "select." But I could really use help getting the string of HTML to go between those elements. For the the element <p class='select' id='3'>A3</p> , I need a function that can return to me the following string:
values:
element
<p class='select' id='3'>A3</p>
leadingHTML
leadingHTML= '<h1>this is trailngHTML3</h1><h2>this is trailngHTML4</h2>'
trailingHTML
trailingHTML= '<figure id='fig><figCaption>this is some text'
This way, I can store the elements the way that is required of the project but still reconstruct the HTML for display.
We are using Node.js for a backend, so this will need to be written in Javascript. After lots of frustration, I'm pretty convince there's no way to do this without some ugly code?
Any help is much appreciated.
So far, this is what I've got (can't say I'm proud):
var checkChildren = function walk(node,state,func){
if (state.isPt===false){
var state=func(node,state);
}
else if(state.isPt===true){
return state;
}
node=$(node).children().first();
while (node.length>0 && state.isPt!==true){
state=walk(node,state,func);
node=$(node).next();
}
return state;
};
function getTrailing(start,html){
var checkFind = $(start).find('.pt');
if (checkFind.length>0){
//selector is in the child somewhere
state= { html: html, isPt: false};
var getChildHTML = checkChildren(start,state,function(node,state){
if ($(node).is($(checkFind).first())){
return { html: html, isPt: true,};
} else{
html=html+'<'+$(node)[0].name;
for (var key in $(node)[0].attribs){
html=html+" "+key+"='"+$(node)[0].attribs[key]+"'";
};
html=html+'>';
return { html: html, isPt: false,};
}
});
return getChildHTML;
} else{
return html;
}
}
var start1 = $("#fig");
var html = '';
test=getTrailing(start1,html);
and it's returning this:
{ html: '<figure id=\'fig\' class=\'test\' style=\'color:red;\'><figcaption class=\'test\' style=\'color:red;\'><span><div>',
isPt: true }
Update
To clarify-- the output may be invalid HTML. I simply need string of all the HTML between two elements of interest. If the second element of interest is a descendant, then the result will be invalid HTML (since the string is supposed to stop as soon as it finds the next element).

Categories