How to query node-html-parser path with classes - javascript

I am stuck with node-html-parser (https://www.npmjs.com/package/node-html-parser). I read HTML into local variable and I am trying to get to the following node (JS path that is copied from Chrome):
#container > section > div > div.profile__main > div.item.item__profile > div.item__profile__info.cf > div.item__profile__info__data > p
Unfortuantely I get stuck at div.profile__main .
(profile__main is a class within div and the tag looks like <div class="profile__main" ...></div>
How do I query for this stuff. So far I got only here:
var root = this.HTMLParser.parse(this.data)
root.querySelectorAll("#container")
.querySelectorAll("section")
.querySelectorAll("div")
.querySelector("div.profile__main") // Cant get this one. returns null
Thanks

const root = this.HTMLParser.parse(this.data)
const itemProfileInfoData = root.getElementsByTagName("div").find(div => div.attributes.class === "item__profile__info__data")
itemProfileInfoData.childNodes.filter(child => child.tagName === "p")

Did you try something like
var root = this.HTMLParser.parse(this.data)
root.querySelectorAll(".item__profile__info__data")
.querySelectorAll("p")

Maybe last element, tag <p>, is loading async.
Please, check the "view source" of site that you parsing.

Related

jQuery/Cheerio: How to recursively get certain elements by name/tag?

I'm trying to make a bot that scrapes links from a website. I am running in to some weird error that I cannot figure out. My guess is that the second if statement is failing to check and also my unfamiliarity with jQuery is not helping.
Error:
element.each is not a function
const $ = load(html);
const html = $("#id");
const temp = [];
function recursive(element) {
if (element.name === "a") {
temp.push(element);
}
if (!element || element.children().length > 0 === false) {
return "DID NOT FIND IT OR NO CHILDREN FOUND";
}
return element.each((_, item) => recursive(item));
}
recursive(html);
return temp;
I've tried to create simple snippet demonstrating what you seem to accomplished with JQuery.
Firstly, your check if for the Tag of an element doesn't seemed to be working properly. I had to use the .prop('tagName') to get the Tag of the element. And it gets returned in all capital letters.
Your second IF-Statement should work fine, but the .each() Method didnt work as expected. You want to iterate through all children and start the recursive function. And the way you provided the child element didnt end up working.
The .each() Method want a callback function which provides two parameters as you have uses correctly. but the Item is a normal HTML Node and you had to select it with the JQuery Constant $ like $(item). This gives you the desired JQuery Element you can work with.
const html = $("#test");
const temp = [];
function recursive(element) {
if (element.prop("tagName") === "A") {
temp.push(element);
}
if (!element || element.children().length > 0 === false) {
return "DID NOT FIND IT OR NO CHILDREN FOUND";
}
return element.children().each((i, item) => recursive($(item)));
}
recursive(html);
console.log(temp)
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div id="test">
<div class="headings">
<h1>Heading</h1>
Main Page
</div>
<div class="test-cls">
<button>Hello</button>
Test Page
</div>
</div>

How can I select a nested child element of a parent element that has a dynamic ID?

I'm currently toying around with Zendesk, and trying to make changes to a text element on the page. Sadly, Zendesk's elements are dynamic enough that the name of the main element will change on each page load.
Thankfully the structure of the element trees stay pretty static, but it's only the parent element's name that's changing:
#ember6030 > div.comment > div > p
#ember3483 > div.comment > div > p
Currently, here's where I'm at so far:
var item = document.querySelectorAll("[name^=ember] > div.comment > div > p");
var itemtext = item.innerHTML;
console.log(itemtext);
I'm sure I'm missing something, but wouldn't my selector var be correct?
Something like finding an element that begins with "ember" but then follows the rest of the parent-child tree just fine.
EDIT: Things are being a bit more stubborn than I thought, but I've got some extra details if this helps: For the div.comment > div > p elements, a handful of those load up. For right now, I'd like to try targeting just one, but if I can get the text contents of all these elements in console messages, that'd be awesome.
For those CSS paths you would use:
var item = document.querySelector("[id^=ember] > div.comment > div > p");
var itemtext = item.textContent;
console.log(itemtext);
since # is the CSS selector for the id attribute, not the name.
See also David Klinge's answer about NodeLists.
So the final code is:
var items = document.querySelectorAll("[id^=ember] > div.comment > div > p");
items.forEach (item => {
var itemtext = item.textContent;
console.log(itemtext);
} );
Finally, for what you seem to be trying to do, you probably want textContent instead of innerHTML. This avoids false hits on attributes, comments, etc.
document.querySelectorAll returns a NodeList that must be iterated over. You should be able to use a forEach to iterate over the elements querySelectorAll selects.
var items = document.querySelectorAll("[id^=ember] > div.comment > div > p");
items.forEach (item => {
var itemtext = item.textContent;
console.log(itemtext);
} );

javascript if class x contains z get link of class y

i'm no js expert but need to execute some js in my applescript. Don't know if this is possible as the html page contains several instances of this div class.
If nested div class ".product_card__title" contains "my search term"
Extract href link from nested class ".js-search-product-link"
From main div with the class ".product_card"
A ANLTERNATIVE VERSION TO THE ONE ACCEPTED HERE IN THIS THREAD.
My Html:
<div class="product_card powersearch__product_card">
<a href="/shop/XYZ" class="js-search-product-link">
<div class="product_card__image" style="background-image:url(https://image.jpg);"></div>
<div class="product_card__title">SEARCH FOR THIS TITLE</div>
<div class="product_card__meta">€14</div></a></div>
What i have so far is:
tell application "Safari"
open location "https://teespring.com/search?q=rocker"
delay 5
set theLinks to (do JavaScript "Array.prototype.slice.call(document.querySelectorAll('.product_card')).map(function(d,i){var title = d.querySelector('.product_card__title'),link = d.querySelector('a');if(title && link && /Rocker/gi.test(title.textContent)){return link.href}})")
end tell
return theLinks
Replace yourSearchTerm with whatever you want to search below:
Array.prototype.slice.call(document.querySelectorAll(".product_card"))
.map(function(d,i){
var title = d.querySelector(".product_card__title"),
link = d.querySelector("a");
if(title && link && /yourSerchTerm/gi.test(title.textContent)){
return link.href
}
})
For all your divs with class of "product_card" it will return an array containing the hrefs, for the ones it could find, otherwise undefined
FIDDLE:
https://jsfiddle.net/ibowankenobi/gc6r2h3v/1/
As apple returns the last global value it might help to change the part where you set the theLinks variable:
set theLinks to (do JavaScript "someGlobal = Array.prototype.slice.call(document.querySelectorAll('.product_card')).map(function(d,i){var title = d.querySelector('.product_card__title'),link = d.querySelector('a');if(title && link && /Rocker/gi.test(title.textContent)){return link.href}})")

Select element by tag/classname length

I'd like to select an element using javascript/jquery in Tampermonkey.
The class name and the tag of the elements are changing each time the page loads.
So I'd have to use some form of regex, but cant figure out how to do it.
This is how the html looks like:
<ivodo class="ivodo" ... </ivodo>
<ivodo class="ivodo" ... </ivodo>
<ivodo class="ivodo" ... </ivodo>
The tag always is the same as the classname.
It's always a 4/5 letter random "code"
I'm guessing it would be something like this:
$('[/^[a-z]{4,5}/}')
Could anyone please help me to get the right regexp?
You can't use regexp in selectors. You can pick some container and select its all elements and then filter them based on their class names. This probably won't be super fast, though.
I made a demo for you:
https://codepen.io/anon/pen/RZXdrL?editors=1010
html:
<div class="container">
<abc class="abc">abc</abc>
<abdef class="abdef">abdef</abdef>
<hdusf class="hdusf">hdusf</hdusf>
<ueff class="ueff">ueff</ueff>
<asdas class="asdas">asdas</asdas>
<asfg class="asfg">asfg</asfg>
<aasdasdbc class="aasdasdbc">aasdasdbc</aasdasdbc>
</div>
js (with jQuery):
const $elements = $('.container *').filter((index, element) => {
return (element.className.length === 5);
});
$elements.css('color', 'red');
The simplest way to do this would be to select those dynamic elements based on a fixed parent, for example:
$('#parent > *').each(function() {
// your logic here...
})
If the rules by which these tags are constructed are reliably as you state in the question, then you could select all elements then filter out those which are not of interest, for example :
var $elements = $('*').filter(function() {
return this.className.length === 5 && this.className.toUpperCase() === this.tagName.toUpperCase();
});
DEMO
Of course, you may want initially to select only the elements in some container(s). If so then replace '*' with a more specific selector :
var $elements = $('someSelector *').filter(function() {
return this.className.length === 5 && this.className.toUpperCase() === this.tagName.toUpperCase();
});
You can do this in vanilla JS
DEMO
Check the demo dev tools console
<body>
<things class="things">things</things>
<div class="stuff">this is not the DOM element you're looking for</div>
</body>
JS
// Grab the body children
var bodyChildren = document.getElementsByTagName("body")[0].children;
// Convert children to an array and filter out everything but the targets
var targets = [].filter.call(bodyChildren, function(el) {
var tagName = el.tagName.toLowerCase();
var classlistVal = el.classList.value.toLowerCase();
if (tagName === classlistVal) { return el; }
});
targets.forEach(function(el) {
// Do stuff
console.log(el)
})

replace href attribute in anchor element

I need a code for replace the href inside my <div> or in full html document (its same).
My content is look like:
<div class="entry-content post_content">
<center><img src="http://mysite.com/img/redirect.png" class="attachment-medium" alt="redirect-150x150-700x30" width="700" height="30"></center>
<!---Content-->
</div>
I need to change _http://mysite.com/go/ to http://mynewsite.com/go/
Any help ? what code should be?, im searching on stackoverflow but none thread solves my problem.
I am assuming you have some frontend (CMS?) access and no direct file access.
You can put it in the head section of every document.
function fixLinks(){
var links = document.getElementsByTagName('a');//Get all links
for(i = 0 ; i<links.length ; i++){//Loop throught links
var curLink = links[i].href // Cache
links[i].href = curLink .replace('mysite','mynewsite');//Replace mysite with my newsite and return the fixed string
}
}
window.onload = fixLinks;

Categories