Find first <a> tag whose href matches regex

Find first <a> tag whose href matches regex - javascript

I'm building a chrome extension, and one thing this extension does is to look for the first <a> tag in the current page whose href attribute matches a given regex. JS only.
I have a several solutions in mind, I tried them, but each time, the page freezes because of the solution I tried (i.e. if I comment the lines doing this logic, the pages loads correctly). So I need a fast solution.
Here is what I tried:
Solution 1: Xpath
var reg = something;
var result = document.evaluate(
'//*[local-name()="a"][contains(#href, "rss") or contains(#href, "feed")]', //first filtering
document, null, 0, null
);
var item;
while (item = result.iterateNext()) {
if (item.href.matches(reg)) // second and real filtering
return item.href;
}
Page freezes.
Solution 2: Xpath using matches()
var result = document.evaluate(
"//*[local-name()='a'][matches(#href, my_regex)]", //first filtering
document, null, 0, null
);
var item;
while (item = result.iterateNext()) {
return item.href;
}
I tried to hardcode my_regex between ''s, but I got an error in the chrome console (not a valid Xpath expression). Even putting some as simple as [matches(#href, 'rss')] gives the same error. Suspecting something related to xpath 1.0 or 2.0, but didn't investigate too long
Solution 3: document.body.innerHTML.match()
if (url = document.body.innerHTML.toString().match(reg)[0])
return url;
Page freezes.
So now I have not so many ideas left, maybe try to investigate using the xpath's match(), but that's basically all. Any thoughts from you guys?

Here's a solution that you can adapt to look for strings, regexps or both:
var string_match = "";
var regexp_match = new RegExp("www.*", "i");
var filter = {
acceptNode: function(node){
if((node.nodeType === 1) && (node.tagName === "A")){
return NodeFilter.FILTER_ACCEPT;
}
}
}
var tree_walker = document.createTreeWalker(document.body, NodeFilter.SHOW_ELEMENT, filter, false);
while(tree_walker.nextNode()){
if(tree_walker.currentNode.href === string_match){
console.log(tree_walker.currentNode);
break;
}else if(regexp_match.test(tree_walker.currentNode.href)){
console.log(tree_walker.currentNode);
break;
}
}
here's the fiddle: http://jsfiddle.net/59vFt/2/
I'm using document.TreeWalker which I think is more asynchronous that getting element tags and stuff, although that will also work.
Btw, innerHTML is terrible - try to avoid using it :P

Related

Finding HTML tags by using `content`'s of them from a Google Chrome extension [duplicate]

How can I find DIV with certain text? For example:
<div>
SomeText, text continues.
</div>
Trying to use something like this:
var text = document.querySelector('div[SomeText*]').innerTEXT;
alert(text);
But ofcourse it will not work. How can I do it?

OP's question is about plain JavaScript and not jQuery.
Although there are plenty of answers and I like #Pawan Nogariya answer, please check this alternative out.
You can use XPATH in JavaScript. More info on the MDN article here.
The document.evaluate() method evaluates an XPATH query/expression. So you can pass XPATH expressions there, traverse into the HTML document and locate the desired element.
In XPATH you can select an element, by the text node like the following, whch gets the div that has the following text node.
//div[text()="Hello World"]
To get an element that contains some text use the following:
//div[contains(., 'Hello')]
The contains() method in XPATH takes a node as first parameter and the text to search for as second parameter.
Check this plunk here, this is an example use of XPATH in JavaScript
Here is a code snippet:
var headings = document.evaluate("//h1[contains(., 'Hello')]", document, null, XPathResult.ANY_TYPE, null );
var thisHeading = headings.iterateNext();
console.log(thisHeading); // Prints the html element in console
console.log(thisHeading.textContent); // prints the text content in console
thisHeading.innerHTML += "<br />Modified contents";
As you can see, I can grab the HTML element and modify it as I like.

You could use this pretty simple solution:
Array.from(document.querySelectorAll('div'))
.find(el => el.textContent === 'SomeText, text continues.');
The Array.from will convert the NodeList to an array (there are multiple methods to do this like the spread operator or slice)
The result now being an array allows for using the Array.find method, you can then put in any predicate. You could also check the textContent with a regex or whatever you like.
Note that Array.from and Array.find are ES2015 features. Te be compatible with older browsers like IE10 without a transpiler:
Array.prototype.slice.call(document.querySelectorAll('div'))
.filter(function (el) {
return el.textContent === 'SomeText, text continues.'
})[0];

Since you have asked it in javascript so you can have something like this
function contains(selector, text) {
var elements = document.querySelectorAll(selector);
return Array.prototype.filter.call(elements, function(element){
return RegExp(text).test(element.textContent);
});
}
And then call it like this
contains('div', 'sometext'); // find "div" that contain "sometext"
contains('div', /^sometext/); // find "div" that start with "sometext"
contains('div', /sometext$/i); // find "div" that end with "sometext", case-insensitive

This solution does the following:
Uses the ES6 spread operator to convert the NodeList of all divs to an array.
Provides output if the div contains the query string, not just if it exactly equals the query string (which happens for some of the other answers). e.g. It should provide output not just for 'SomeText' but also for 'SomeText, text continues'.
Outputs the entire div contents, not just the query string. e.g. For 'SomeText, text continues' it should output that whole string, not just 'SomeText'.
Allows for multiple divs to contain the string, not just a single div.
[...document.querySelectorAll('div')] // get all the divs in an array
.map(div => div.innerHTML) // get their contents
.filter(txt => txt.includes('SomeText')) // keep only those containing the query
.forEach(txt => console.log(txt)); // output the entire contents of those
<div>SomeText, text continues.</div>
<div>Not in this div.</div>
<div>Here is more SomeText.</div>

Coming across this in 2021, I found using XPATH too complicated (need to learn something else) for something that should be rather simple.
Came up with this:
function querySelectorIncludesText (selector, text){
return Array.from(document.querySelectorAll(selector))
.find(el => el.textContent.includes(text));
}
Usage:
querySelectorIncludesText('button', 'Send')
Note that I decided to use includes and not a strict comparison, because that's what I really needed, feel free to adapt.
You might need those polyfills if you want to support all browsers:
/**
* String.prototype.includes() polyfill
* https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/includes#Polyfill
* #see https://vanillajstoolkit.com/polyfills/stringincludes/
*/
if (!String.prototype.includes) {
String.prototype.includes = function (search, start) {
'use strict';
if (search instanceof RegExp) {
throw TypeError('first argument must not be a RegExp');
}
if (start === undefined) {
start = 0;
}
return this.indexOf(search, start) !== -1;
};
}

You best see if you have a parent element of the div you are querying. If so get the parent element and perform an element.querySelectorAll("div"). Once you get the nodeList apply a filter on it over the innerText property. Assume that a parent element of the div that we are querying has an id of container. You can normally access container directly from the id but let's do it the proper way.
var conty = document.getElementById("container"),
divs = conty.querySelectorAll("div"),
myDiv = [...divs].filter(e => e.innerText == "SomeText");
So that's it.

If you don't want to use jquery or something like that then you can try this:
function findByText(rootElement, text){
var filter = {
acceptNode: function(node){
// look for nodes that are text_nodes and include the following string.
if(node.nodeType === document.TEXT_NODE && node.nodeValue.includes(text)){
return NodeFilter.FILTER_ACCEPT;
}
return NodeFilter.FILTER_REJECT;
}
}
var nodes = [];
var walker = document.createTreeWalker(rootElement, NodeFilter.SHOW_TEXT, filter, false);
while(walker.nextNode()){
//give me the element containing the node
nodes.push(walker.currentNode.parentNode);
}
return nodes;
}
//call it like
var nodes = findByText(document.body,'SomeText');
//then do what you will with nodes[];
for(var i = 0; i < nodes.length; i++){
//do something with nodes[i]
}
Once you have the nodes in an array that contain the text you can do something with them. Like alert each one or print to console. One caveat is that this may not necessarily grab divs per se, this will grab the parent of the textnode that has the text you are looking for.

Google has this as a top result for For those who need to find a node with certain text.
By way of update, a nodelist is now iterable in modern browsers without having to convert it to an array.
The solution can use forEach like so.
var elList = document.querySelectorAll(".some .selector");
elList.forEach(function(el) {
if (el.innerHTML.indexOf("needle") !== -1) {
// Do what you like with el
// The needle is case sensitive
}
});
This worked for me to do a find/replace text inside a nodelist when a normal selector could not choose just one node so I had to filter each node one by one to check it for the needle.

Use XPath and document.evaluate(), and make sure to use text() and not . for the contains() argument, or else you will have the entire HTML, or outermost div element matched.
var headings = document.evaluate("//h1[contains(text(), 'Hello')]", document, null, XPathResult.ANY_TYPE, null );
or ignore leading and trailing whitespace
var headings = document.evaluate("//h1[contains(normalize-space(text()), 'Hello')]", document, null, XPathResult.ANY_TYPE, null );
or match all tag types (div, h1, p, etc.)
var headings = document.evaluate("//*[contains(text(), 'Hello')]", document, null, XPathResult.ANY_TYPE, null );
Then iterate
let thisHeading;
while(thisHeading = headings.iterateNext()){
// thisHeading contains matched node
}

Here's the XPath approach but with a minimum of XPath jargon.
Regular selection based on element attribute values (for comparison):
// for matching <element class="foo bar baz">...</element> by 'bar'
var things = document.querySelectorAll('[class*="bar"]');
for (var i = 0; i < things.length; i++) {
things[i].style.outline = '1px solid red';
}
XPath selection based on text within element.
// for matching <element>foo bar baz</element> by 'bar'
var things = document.evaluate('//*[contains(text(),"bar")]',document,null,XPathResult.ORDERED_NODE_SNAPSHOT_TYPE,null);
for (var i = 0; i < things.snapshotLength; i++) {
things.snapshotItem(i).style.outline = '1px solid red';
}
And here's with case-insensitivity since text is more volatile:
// for matching <element>foo bar baz</element> by 'bar' case-insensitively
var things = document.evaluate('//*[contains(translate(text(),"ABCDEFGHIJKLMNOPQRSTUVWXYZ","abcdefghijklmnopqrstuvwxyz"),"bar")]',document,null,XPathResult.ORDERED_NODE_SNAPSHOT_TYPE,null);
for (var i = 0; i < things.snapshotLength; i++) {
things.snapshotItem(i).style.outline = '1px solid red';
}

There are lots of great solutions here already. However, to provide a more streamlined solution and one more in keeping with the idea of a querySelector behavior and syntax, I opted for a solution that extends Object with a couple prototype functions. Both of these functions use regular expressions for matching text, however, a string can be provided as a loose search parameter.
Simply implement the following functions:
// find all elements with inner text matching a given regular expression
// args:
// selector: string query selector to use for identifying elements on which we
// should check innerText
// regex: A regular expression for matching innerText; if a string is provided,
// a case-insensitive search is performed for any element containing the string.
Object.prototype.queryInnerTextAll = function(selector, regex) {
if (typeof(regex) === 'string') regex = new RegExp(regex, 'i');
const elements = [...this.querySelectorAll(selector)];
const rtn = elements.filter((e)=>{
return e.innerText.match(regex);
});
return rtn.length === 0 ? null : rtn
}
// find the first element with inner text matching a given regular expression
// args:
// selector: string query selector to use for identifying elements on which we
// should check innerText
// regex: A regular expression for matching innerText; if a string is provided,
// a case-insensitive search is performed for any element containing the string.
Object.prototype.queryInnerText = function(selector, text){
return this.queryInnerTextAll(selector, text)[0];
}
With these functions implemented, you can now make calls as follows:
document.queryInnerTextAll('div.link', 'go');
This would find all divs containing the link class with the word go in the innerText (eg. Go Left or GO down or go right or It's Good)
document.queryInnerText('div.link', 'go');
This would work exactly as the example above except it would return only the first matching element.
document.queryInnerTextAll('a', /^Next$/);
Find all links with the exact text Next (case-sensitive). This will exclude links that contain the word Next along with other text.
document.queryInnerText('a', /next/i);
Find the first link that contains the word next, regardless of case (eg. Next Page or Go to next)
e = document.querySelector('#page');
e.queryInnerText('button', /Continue/);
This performs a search within a container element for a button containing the text, Continue (case-sensitive). (eg. Continue or Continue to Next but not continue)

I had similar problem.
Function that return all element which include text from arg.
This works for me:
function getElementsByText(document, str, tag = '*') {
return [...document.querySelectorAll(tag)]
.filter(
el => (el.text && el.text.includes(str))
|| (el.children.length === 0 && el.outerText && el.outerText.includes(str)))
}

Since there are no limits to the length of text in a data attribute, use data attributes! And then you can use regular css selectors to select your element(s) like the OP wants.
for (const element of document.querySelectorAll("*")) {
element.dataset.myInnerText = element.innerText;
}
document.querySelector("*[data-my-inner-text='Different text.']").style.color="blue";
<div>SomeText, text continues.</div>
<div>Different text.</div>
Ideally you do the data attribute setting part on document load and narrow down the querySelectorAll selector a bit for performance.

I was looking for a way to do something similar using a Regex, and decided to build something of my own that I wanted to share if others are looking for a similar solution.
function getElementsByTextContent(tag, regex) {
const results = Array.from(document.querySelectorAll(tag))
.reduce((acc, el) => {
if (el.textContent && el.textContent.match(regex) !== null) {
acc.push(el);
}
return acc;
}, []);
return results;
}

How to compare if an HTML element exists in the node array?

selectedContentWrap: HTML nodes.
htmlVarTag: is an string.
How do I check if the HTML element exists in the nodes?
The htmlVarTag is a string and don't understand how to convert it so it check again if there is a tag like that so that if there is I can remove it?
here is output of my nodes that is stored in selectedContentWrap
var checkingElement = $scope.checkIfHTMLinside(selectedContentWrap,htmlVarTag );
$scope.checkIfHTMLinside = function(selectedContentWrap,htmlVarTag){
var node = htmlVarTag.parentNode;
while (node != null) {
if (node == selectedContentWrap) {
return true;
}
node = node.parentNode;
}
return false;
}

Well if you could paste the content of selectedContentWrap I would be able to test this code, but I think this would work
// Code goes here
var checkIfHTMLinside = function(selectedContentWrap,htmlVarTag){
for (item of selectedContentWrap) {
if (item.nodeName.toLowerCase() == htmlVarTag.toLowerCase()){
return true;
}
}
return false;
}

Simplest is use angular.element which is a subset of jQuery compatible methods
$scope.checkIfHTMLinside = function(selectedContentWrap,htmlVarTag){
// use filter() on array and return filtered array length as boolean
return selectedContentWrap.filter(function(str){
// return length of tag collection found as boolean
return angular.element('<div>').append(str).find(htmlVarTag).length
}).length;
});
Still not 100% clear if objective is only to look for a specific tag or any tags (ie differentiate from text only)
Or as casually mentioned to actually remove the tag
If you want to remove the tag it's not clear if you simply want to unwrap it or remove it's content also ... both easily achieved using angular.element

Try using: node.innerHTML and checking against that

is it me or post a question on stackoverflow and 20min after test testing I figure it.,...
the answer is that in the selectedContentWrap I already got list of nodes, all I need to do i compare , so a simple if for loop will fit.
To compare the names I just need to use .nodeName as that works cross browser ( correct me if I am wrong)
Some dev say that "dictionary of tag names and anonymous closures instead" - but couldn't find anything. If anyone has this library could you please post it to the question?
here is my code.
var node = selectedContentWrap;
console.log('node that is selectedwrapper', selectedContentWrap)
for (var i = 0; i < selectedContentWrap.length; i++) {
console.log('tag name is ',selectedContentWrap[i].nodeName);
var temptagname = selectedContentWrap[i].nodeName; // for debugging
if(selectedContentWrap[i].nodeName == 'B' ){
console.log('contains element B');
}
}

Javascript: getElementById vs getElementsById (both works on different pages)

I'm struggling with a really weird problem...
I have two pages (quite the sames) where I need to disable some selects. On one of them (say page A), I use getElementById to retrieve my element, and on the second one (say page B) I use getElementsById (with a 's') to retrieve it (and it works on both cases).
What is weird is that if I use getElementsById on page A (with the 's'), it gives me the error "document.getElementsById is not a function", which is normal because this function (with the 's') normally doesn't exist. But I don't have this error on page B, and if I use getElementById (without the 's') on this page, it doesn't works !?!?
Can someone give me an explanation ? (I'll lose the few hairs left on my head if it continue ...)
Thanks in advance!
Ps: Sorry for my poor english!
Edit: Here is the code of my pages:
Page A:
function controleDelaiFranchise (casChoix){
var estAvecGarantie = <bean:write property="avecGarantie" name="simulationAutonomeForm" filter="false"/>;
if(estAvecGarantie ==true){
if(casChoix == 'Emprunteur'){
document.getElementById("assDelaiFranchiseEmpr").disabled = false;
}
else {
if(casChoix == 'CoEmprunteur'){
document.getElementById("assDelaiFranchiseCoEmpr").disabled = false;
}
}
}
else{
if(casChoix == 'Emprunteur'){
document.getElementsById("assDelaiFranchiseEmpr").disabled = true;
}
else {
if(casChoix == 'CoEmprunteur'){
document.getElementById("assDelaiFranchiseCoEmpr").disabled = true;
}
}
}
Page B:
function controleDelaiFranchise (casChoix){
var estAvecGarantie = document.getElementsByName("estAvecGarantie")[0].value;
if(estAvecGarantie){
if(casChoix == 'Emprunteur'){
document.getElementsById("assDelaiFranchiseEmpr").disabled = false;
}
else {
if(casChoix == 'CoEmprunteur'){
document.getElementsById("assDelaiFranchiseCoEmpr").disabled = false;
}
}
} else {
if(casChoix == 'Emprunteur'){
document.getElementsById("assDelaiFranchiseEmpr").disabled = true;
}
else {
if(casChoix == 'CoEmprunteur'){
document.getElementsById("assDelaiFranchiseCoEmpr").disabled = true;
}
}
}
}
Edit 2:
Ok so when it was not working on page B (without 's') I had
var estAvecGarantie = document.getElementsByName("estAvecGarantie")[0].value;
if(estAvecGarantie){ ... }
I replace it with
var estAvecGarantie = document.getElementsByName("estAvecGarantie")[0].value;
if(estAvecGarantie == true) { ... }
and now it works using getElementById without the 's'
But I still don't understand why it's still working with this damn 's' ... So my problem is solved (ish), but still, if someone have an explanation for why can I used getElementsbyId() even if the function doesn't exist (and specifically on one page only), I'm all ears because I hate when I don't understand ...

As described by James here id values have to be unique in a document, so there will be only one "element" that matches, rather than multiple "elements".
That is the reason, We should not use s while selecting elements. As Id can be selected only one at a time.
However, there are methods that return multiple elements which do use the plural "elements", such as getElementsByTagName.
Hope that clears your confusion

First things first:
Function-, or rather, methodnames in JavaScript are Case-Sensitive. This means that document.getElementById is not the same as document.getElementbyId.
The weird part:
document.getElementsById does not exsist in JavaScript, so it can't work by default. The only way this can work is if somebody created this function/method on the other page. A more obvious explanation is that you made a type-o on your second page. Maybe you forgot to write the S and you thought you didn't. Can you try again?

javascript: not having to use all the IDs in a html file?

I want to use the same .js for a bunch of html pages, but not necessarily all the ID's from this .js in every single page. Right now, if I don't use one ID; no ID's are showing at all.
var yes = 'Yes';
var no = 'No';
var available = 'Available: ';
document.getElementById("001").innerHTML=available+yes;
document.getElementById("002").innerHTML=available+no;
document.getElementById("003").innerHTML=available+yes;
A html with this works fine:
<div id="001"></div>
<div id="002"></div>
<div id="003"></div>
A html with this, not so fine:
<div id="002"></div>
<div id="003"></div>
What to do to make it run even though some ID's arn't being used?
Complete noob to this, there's probably a super simple solution to it(?) - but I can't find it. Hopefully, someone here can help me - without totally bashing me, or telling me how much of a bad praxis this is and that I should rewrite it in some mega haxxor language that I havn't even heard of :D
Thanks in advance!

While I'd question why you'd need incrementing numeric IDs like that, one solution would simply be to keep an map of IDs to values, then iterate the map checking for null.
var m = {
"001":yes,
"002":no,
"003":yes
};
for (var p in m) {
var el = document.getElementById(p);
if (el) // or if (el !== null)
el.innerHTML = available + m[p];
}

The document.getElementById() function returns null if no matching element is found. null.innerHTML is an error that stops the current script executing. So you just have to test for null:
var el = document.getElementById("001");
if (el != null) el.innerHTML = available + yes;
The null test can be simplified to:
if (el) el.innerHTML = available + yes;
If element "001" is always going to be "yes", "002" is always going to be "no" and so forth then you can do this:
var results = {"001" : yes, "002" : no, "003" : yes};
var el, k;
for (k in results) {
el = document.getElementById(k);
if (el) el.innerHTML = available + results[k];
}

wrap it with if(document.getElementById("id")!==null) ?

Just wrap the whole thing in a try statement to avoid any issues and put code afterwards into a finally statement::
try{
document.getElementById("001").innerHTML=available+yes;
document.getElementById("002").innerHTML=available+no;
document.getElementById("003").innerHTML=available+yes;
//etc
}
finally{
//any other code that there is after the id stuff
}
that'll prevent errors, so if something fails, it will still continue

How might I determine the XPath of a DOM element?

In JavaScript, supposing I have a reference to an element, how do I retrieve an XPath expression that would select it?
Is there something like objElement.xpath?

Since Annibigi doesn't want to post the solution, I'll do it: See this snippet.

This is not XPATH related, but just to show you how you can get the parent/child relationship with a damn simple while loop.
var pathAt = function(node) {
var stack = [];
while(node.parentNode !== null) {
stack.unshift(node.tagName);
node = node.parentNode;
}
return stack.join('/');
}
// Usage : pathAt(document.getElementBy('moo'));
// Outputs : "HTML/BODY/CENTER/TABLE/TBODY/TR/TD/TABLE/TBODY/TR/TD/TABLE/TBODY/TR/TD/TABLE/TBODY/TR/TD"

We Keep Coding

JavaScript is the programming language of the Web.

Find first <a> tag whose href matches regex - javascript

Related

Finding HTML tags by using `content`'s of them from a Google Chrome extension [duplicate]

How to compare if an HTML element exists in the node array?

Javascript: getElementById vs getElementsById (both works on different pages)

javascript: not having to use all the IDs in a html file?

How might I determine the XPath of a DOM element?

Categories

Resources