Testing a selector for webscraping tool - javascript

I'm using the Web Scraper tool from webscraper.io.
I'm trying to scrape all of the issues from Marvel and add child selectors to scrape the issues, characters, illustrators, etc.
http://www.comicbookdb.com/publisher.php?ID=4#A
When I click on one of the titles, it instead clicks on the entire left side of the page (titles from A-M).
How do I modify the CSS selector in order to only choose specific titles? I tried changing the start URL but it didn't work.
Here is what they have:
table:nth-of-type(2)
td:nth-of-type(1)
a:nth-of-type(n+2)

You would iterate through each a inside the parenting table. You would then be able to look at the first letter of the html inside the a element. You can do something with that letter, not sure if you want to only select A-M or if that's what it's doing. If you only want titles with a, you would simple compare the first char to 'a' and if it matches, then select that element, otherwise don't.

Related

How to find a unique string within html and wrap it with a tag, but exclude links and urls

I'm looking for a way to look for a specific string within a page in the visible text and then wrap that string in <em> tags. I have tried used HTML Agility Pack and had some success with a Regex.Replace but if the string is included within a url it also gets replaced which I do not want, if it's within an image name, it gets replaced and this obviously breaks the link or image url.
An example attempt:
var markup = Encoding.UTF8.GetString(buffer);
var replaced = Regex.Replace(markup, "product-xs", " <em>product</em>-xs", RegexOptions.IgnoreCase);
var output = Encoding.UTF8.GetBytes(replaced);
_stream.Write(output, 0, output.Length);
This does not work as it would replace a <a href="product/product-xs"> with <a href="product/<em>product</em>-xs"> - which I don't want.
The string is coming from a text string value within a CMS so the user can't wrap the words there and ideally, I want to catch all instances of the word that are already published.
Ideally I would want to exclude <title> tags, <img> tags and <a> tags, everything else should get the wrapped tag.
Before I used the HTML Agility Pack, a fellow front end dev tried it with JavaScript but that had an unexpected impact on dropdown menus.
If you need any more info, just ask.
You can use HTML Agility Pack to select only the text nodes (i.e. the text that exists between any two tags) with a bit of XPath and modify them like this.
Looking only in body will exclude <title>, <meta> etc. The not excludes script tags, you can exclude others in the same way (or check the parent node in the loop).
foreach (HtmlNode node in htmlDoc.DocumentNode.SelectNodes("//body//*[not(self::script)]/text()"))
{
var newNode = htmlDoc.CreateTextNode(node.InnerText.Replace("product-xs", "<em>product</em>-xs"));
node.ParentNode.ReplaceChild(newNode, node);
}
I've used a simple replace, regex will work fine too, prob best to check the performance of each approach and choose which works best for your use case.

JavaScript DIV Editing Destroys Functionality of Other Elements

So my website is built using a company's software called Inksoft which leaves me very little to work in the way of customization. So I have to do many workarounds.
Here is my site's homepage.
The header on top of the page only has two links right now. "Products" and "Design Studio". My goal is to add an "About Us" link and "Buyers Guide" to the header as well.
I cannot add new content to the header using Inksoft's backend. So I coded a workaround to replace the content of existing DIV's within the header to say and link to where I want them to go.
The only issue is, the responsive mobile-nav loses functionality when this is implemented. As seen here on this test page.
The test page has the About Us in the top header, added by the use of this code:
<script>
$("#header-nav-designs").html('<document.write="<li id="header-nav-studio"><font color="#000000">About Us</font></li>');
</script>
So, the simplified question is: how do I implement this code without losing the responsive functionality of the nav bar?
The jQuery .html function will replace the HTML inside the target element. If you want to just append the one value, you likely want to .append to the element.
In addition, you aren't setting the HTML to a valid html string. You probably just want to get rid of the <document.write=" at the beginning of the string. The rest of it looks fine with just a cursory glance.
So:
<script>
$("#header-nav-designs").append('<li id="header-nav-studio"><font color="#000000">About Us</font></li>');
</script>
Edit:
After looking at it a little more, it appears as though the $('#header-nav-designs') that you are selecting is already an <li> which means you need to either select the parent <ul> list or you can use the jquery .after function instead.
<script>
$("#header-nav-designs").after('<li id="header-nav-studio"><font color="#000000">About Us</font></li>');
</script>
And as someone else commented above, you are getting an error on the page. It appears as though you are trying to get an element with the id divID and the appending some html to it, but there is no element with the id divID and so you are getting an error saying that you can't read the property innerHTML of null as the call to document.getElementById is returning null (element not found).
Element id header-nav-designs witch your code is referring have CSS style on line 170:
#header-nav-designs {display:none;}
The element will be hidden, and the page will be displayed as if the element is not there. With display:none;
If I understand you correctly your code selector points to wrong element id. It should point $(".header-nav > ul"). Document.write is not needed inside jQuery you need to give only an valid html string as argument.
jQuery html function erase html that is all ready inside element and replace it with html string given as argument. You have to use append if you want to add more html but not remove what is allready in element.
$(".header-nav > ul").append('<li><font color="#000000">About Us</font></li>');
$(".header-nav > ul").append('<li><font color="#000000">Buyers Guide</font></li>');

Angular 2 sentence highlighting on click event

So I use PDF.js to render pdf to html. On top there is a text layer.
What I want to implement is that when you click on a sentence there will be a class added to this sentence.And I want to do this in Angular 4 Component.
I have stumbled upon a problem here because the pdf is rendered to html by lines(every line is in a different div).
Example of pdf in html:
<div style="left: 86.0208px; top: 481.589px; font-size: 8.03709px; font-
family: serif; transform: scaleX(1.00581);">
timestamp server to generate computational proof of the chronological
order of transactions. The
</div>
<div style="left: 86.0208px; top: 490.899px; font-size: 8.03709px; font-
family: serif; transform: scaleX(0.9335);">
system is secure as long
as honest nodes collectively control more CPU
power than any
</div>
Any idea how should I implement this functionality?
Main goal is to highlight the exact sentence what is clicked and doing it by
manipulating html.
Here's an approach that highly depends on how your html looks like and for which part you want to implement the sentence highlighting. If it's just one text block with multiple lines on top of the page and nothing more, I'd say that you can replace the whole block with an updated HTML block, maybe even a single <p>.
combining all to one big string
You should find this part in the HTML created by PDF.js, iterate over all the child divs and combine every text part of it to one big string by adding it up all together, just string concatenation. One problem might be the access of the child divs. If the HTML is rendered by an angular application, you can reference DOM elements by giving them attribute names like #textBlock. Then you can access those elements with #ViewChild which brings some fancy functions with it to walk down an elements subtree like childNodes and data. This may be helpful to extract the text and concatenate the string.
split the text into sentences
Next thing to do is split this big text block string into sentences. Having fixed punctuation marks like . ! ? we can use someting like a regular expression to split it on the right spot. The string function replace in combination with a regular expression should do the job here. As a result we want to have an array of sentences. The regex may look something likes this, also I'm not 100% if it works, because I just found it in this answer:
var bigTextBlock="Big text block. No more divs. Only a string";
var sentences = bigTextBlock.match( /[^\.!\?]+[\.!\?]+/g );
remove the current divs
Now that's not too bad for a start. We now want to remove the current divs and create new html tags. There are multiple ways to do this. In both cases we might need to have a reference to the parent div of the text block divs from before, that we probably already have.
First option is to set something like [innerHTML]. This removes the old divs and creates new ones, but gets tricky when you want to implement an onclick action, because this way we bypass angular.
The other way is to manipulate its children through your reference element. For this we can use a so called Renderer2 that is injected as a service. You can do different stuff with it like creating new tags, removing children and also creating onClick listeners on nodes, which is what we probably need to do anyway. For now we only want to remove the old childNodes.
create adjusted html
As we now have every sentence isolated, we can create one big <p>div that contains a <span> div for every sentence that we have. This way we can give the span just another css class if the user clicks inside of this text part and therefore having a highlight for every sentence. As stated before the html could be placed through [innerHTML] or by creating them as children of our reference. In both cases we need to use Renderer2 to make the <span> listen to an onclick action. Here's some code that combines the span creation and adding the listener both through Renderer2.
#ViewChild('textBlock') textBlock: ElementRef;
constructor(private renderer: Renderer2, private router: Router) { }
createSpans(sentences: string[]){
sentences.forEach(sentence=>{
// create elements
const span = this.renderer.createElement('span');
const spanText = this.renderer.createText(sentence);
// append the sentence to the span div
this.renderer.appendChild(span, spanText);
// append the span div to the parent
this.renderer.appendChild(this.textBlock.nativeElement, span);
// listen to the onClick
this.renderer.listen(span, 'click', (event) => {
// set a highlight class
span.class.highlighted = true;
});
});
}
I know this is a lot to do and it gets tricky at some parts, but this is probably how I would handle it. But again it depends highly on how your HTML currently looks like and how you want it to look like after the changes.

instant search jquery plugin

First of all this is how the script works: jsfiddle
The 'searchList' setting specifies the content that will be searched. In this case it is searchable tr. Finally, the 'searchItem' setting allows to dive in and specify an individual element to search. In this case, I use 'td'.
In this fiddle
I have a list of thumbnail images with some informations, what I want to do is to be able to search for "something" and then to show the image and the text related to that specific thumbnail.
Hope you understand what I'm trying to achieve.
Out on a limb, but your selector for searchList looks wrong:
'searchList' : '.imgscontainer portfolio-v6-items ',
Surely that second class should have a dot before it, and they're on the same element, so no space:
class="imgscontainer portfolio-v6-items"

Jquery toggle mobile menu (remove href javascript)

I'm trying to make a jQuery toggle menu for a mobile website for one of my clients. I'll have to tell you i'm not experienced in javascript and i justed started looking at it.
The current website is a Wordpress website so the menu structure is generated by WP.
Because this is generated by WP i need to use javascript to manipulate the data for adding the + - and > signs for toggleing and if no childeren to go directly to the page.
I use this javascript for adding the spans with the desired icon. I've managed so far.
http://jsfiddle.net/9Dvrr/9/
But there are still 2 problems i can't seem to figure out.
Remove the href from the "a" when the "li" has a "ul" child.
This should remove the links of the items so they will only toggle (not link) to navigate straight throug to the deepest level.
Currently the javascript is adding mutiple spans with the icons. I can't seem to figure out why
I'm stuggeling with this for a while now and was wondering if someone could help me with this.
In the jsfiddle you provided, you loop on the elements to add spans with a "+" or "-" sign inside, depending on the case. The thing is, the HTML you're starting with already has those spans in it, wich is why you're seeing some duplicates.
As you said you can't add those spans in the HTML because of your WP strucutre, I guess they come from a bad copy/paste you did while creating the jsfiddle. I removed them in the HTML and added a return false to prevent linking to another page when there is a ul inside the a tag.
http://jsfiddle.net/wzzGG/
Your first problem can be solved with the following:
$.each($('#menu-mobiel li'), function(i, value) {
var $this = $(this);
if ($this.has('ul').length > 0) {
$this.children('a').attr('href','javascript:');
}
Your second problem is a bit harder for me to understand. Do you only want one + for items with submenus, and one > for items with a link?

Categories