Why doesn't diffbot see the price here? - javascript

I'm using diffbot to scrape products. It gets things right on most sites, and if it doesn't the custom API usually allows me to easily tweak until correct. However there are a few cases that are baffling me.
I know diffbot doesn't execute javascript in the custom API preview window, but for the product endpoint, it should always execute it when a request is made to the API (e.g. from the diffbot client in a Python shell).
Foot asylum
For products on this website, e.g. https://www.footasylum.com/hugo-boss-three-pack-tshirt-103678/, the offerPrice field is empty. I can see the price is in a div#priceFrm, so I try to edit and add a custom selector on that field to this effect. However even when making a new API call from the Python shell, the response is 'offerPrice': ''.
This price is obviously being added by Javascript, but why can't diffbot deal with that? What can I do about it?
I can also see the price I want can be found in some JSON data inside a <script>. Normally I could just scrape it from there, with //script[contains(text(), "dataLayer")]/text() followed by a regex. However in another diffbot custom field I defined a selector script:contains(dataLayer) and even this is blank.
Any ideas on getting the price from this product with diffbot?
Nike
I'm also trying to get the price from https://www.nike.com/gb/t/flyknit-trainer-shoe-GBXjsV/AH8396-600
The first problem is the preview window of custom API just gives a 500 error weirdly.
Next I edit the offerPrice field with a custom selector of div[data-test=product-price], however this field doesn't hit anything - even when called from client in Python shell.
Footlocker
Finally on this site https://www.footlocker.co.uk/en/p/jordan-1-flight-2-men-shoes-6671?v=314100340604#!searchCategory=all diffbot cannot seem to get product image.
The images are loaded by "scene7", and with XPATH can be found with //div[#class="s7thumb"][#data-namespace="s7classic"]/#style and then parsing out the "background-url".
I tried to at least get the style attribute with diffbot using the selector div.s7thumb div[data-namespace=s7classic] and then adding the Attribute filter "style", but again nothing at all is returned.

In some cases, specific rendering of certain elements will be blocked either by Diffbot's renderer or by a target site's anti-block measures. That's why Diffbot has X-eval functionality which lets you add custom JavaScript into calls which will get executed on a target site, as if running from the console. In this case, something like the following helps:
function() {
start();
setTimeout(function() {
price = document.querySelector("[itemprop="
Offers "] [itemprop="
price "]");
currency = document.querySelector("[itemprop="
Offers "] [itemprop="
priceCurrency "]").getAttribute("content");
price.parentElement.setAttribute("style", "");
price.parentElement.innerHTML += '<h1 class="thePrice">' + price.innerText + " " + currency + '</h1>';
setTimeout(function() {
end();
}, 500);
}, 500);
}
This has been applied as a fix and the price returns now.

Related

How to submit a form and execute javascript simultaneously

As a follow-up to my last question, I have run into another problem. I am making a project on google homepage replica. The aim is to show search results the same as google and store the search history on a database. To show results, I have used this javascript:-
const q = document.getElementById('form_search');
const google = 'https://www.google.com/search?q=';
const site = '';
function google_search(event) {
event.preventDefault();
const url = google + site + '+' + q.value;
const win = window.open(url, '_self');
win.focus();
}
document.getElementById("s-btn").addEventListener("click", google_search)
To create my form, I have used the following HTML code:-
<form method="POST" name="form_search" action="form.php">
<input type="text" id="form_search" name="form_search" placeholder="Search Google or type URL">
The terms from the search bar are to be sent to a PHP file with the post method. I have 2 buttons. Let's name them button1 and button2. The javascript uses the id of button1 while button2 has no javascript and is simply a submit button.
The problem is that when I search using button1, the search results show up but no data is added to my database. But when I search using button2, no results show up( obviously because there is no js for it) but the search term is added to my database. If I reverse the id in javascript, the outcome is also reversed. I need help with making sure that when I search with button1, it shows results and also saves the data in the database. If you need additional code, I will provide it. Please keep your answers limited to javascript, PHP, or HTML solutions. I have no experience with Ajax and JQuery. Any help is appreciated.
Tony since there is limited code available so go with what you had stated in your question.
It is a design pattern issue not so much as so the event issue.
Copy pasting from Wikipedia "software design pattern is a general, reusable solution to a commonly occurring problem within a given context in software design. It is not a finished design that can be transformed directly into source or machine code. Rather, it is a description or template for how to solve a problem that can be used in many different situations. Design patterns are formalized best practices that the programmer can use to solve common problems when designing an application or system."
So here is how things play out at present;
forms gets submitted to specific URL i.e. based on action attribute
Requested page gets Query sting in php and lets you play around with it
then from there on .....
3. either you get results from database and return response
4. or you put search request into database and return success response
Problem statement
if its 3 then search request is not added to database if its 4 then results in response to search request are not returned.
Solution
you need to combine both 3 and 4 in to one processing block and will always run regardless of the search query is.
So our design pattern could use mysql transaction so whole bunch of queries would run a single operation example
$db->beginTransaction(); // we tell tell mysql we will multiple queries as single operation
$db->query('insert query');
$results= $db->query('search query');
$db->commit(); // if we have reached to this end it means all went fine no error etc so we commit which will make database record insert query into database. If there were errors then mysql wont record data.
if($results) {echo $results;} else {echo 'opps no result found';}
slightly more safe version
try {
$db->beginTransaction(); // we tell tell mysql we will multiple queries as single operation
$db->query('insert query');
$results= $db->query('search query');
$db->commit(); // if we have reached to this end it means all went fine no error etc so we commit which will make database record insert query into database. If there were errors then mysql wont record data.
if($results) {echo $results;} else {echo 'opps no result found';}
} catch (\Throwable $e) {
// An exception has been thrown must rollback the transaction
$db->rollback();
echo 'oho server could not process request';
}
We have effectively combined two query operation into one always recording into database and always searching in database.

Netsuite getAttribute throws an error

What I am trying to accomplish is to get item information from the internal ID passed to getAttribute but I am getting the following error.
Error processing dynamic tag getAttribute('item',362,'storedisplayname') : id paramter 2 must be an integer
Here is a sample of the code:
var itemIntId = 362;
var id = "<%=getAttribute('item',"+itemIntId+",'storedisplayname')%>";
console.log("ID: " + id);
Doing the following does not seem to change anything as it still gives me the same error
var id = "<%=declareAttribute('item',"+itemIntId+",'storedisplayname')%>";
console.log("ID: " + id);
This is in a presentation tab page and found the following info in the Netsuite help section
getAttribute tag on Presentation tab throws error Sometimes using the
getAttribute() tag in a Presentation tab can throw error %u2018Error
processing dynamic tag getAttribute('item',9047,'storeurl') %u2018. In
that case you have to use declareAttribute() to display the embedded
tag on your Presentation tab. On the Presentation tab > Meta Tag HTML
area, just add:
<%=declareAttribute('item',9047,'storeurl')%>
but did not seem to help
I'm sure I am just missing something simple but have been bashing my head against my desk for a few hours now trying to figure this out and
Well after talking with Netsuite support and going back and forth on the code it seems that the getAttribute or declareAttribute can not take a dynamic variable from JavaScript. Not sure why as its still an int but I guess it is what it is.

connecting javascript to a Web API

I am new to the web development world and I would like to be able to connect an HTML page to a web api through . and I was really not successful in this.
I followed this tutorial to be able to make this connection : http://www.asp.net/web-api/overview/getting-started-with-aspnet-web-api/tutorial-your-first-web-api
All I need is to send some inputs from an HTML page to a web api that takes these parameters and returns an object
I am using this code
$.getJSON("api/GeneratorController/setparameters/"+firstparameter+"/"+secondparameter+"/"+thirdparameter+"/"+fourthparameter+"/"+fifthparameter+"/"+sixthparameter,
function (data) {
alert(data); //never comes here
}).fail(function (jqXHR, textStatus, err) {
alert("All checks are correct, image was not generated. jqXHR = " + jqXHR.valueOf() + " textStatus=" + textStatus + " Error" + err);
});
it always goes into the fail portion , I attached the alert message that comes out of it
Any Reason why it is doing this ?
#smartmeta (I changed the typo , thanks) I followed your advice and here is the output of the alert (as expected , values that I have inserted are displayed):
Your url needs to start with your domain, not 'api/generatorcontroller/...'. If you are developing locally, something like http://localhost:[port]/api/generatorController/....
Also, webApi maps to url verbs, (get, post, put, delete..), not functions like setparameters, unless you have a [name=setparameters] above your get() function.
Also, I am pretty sure you don't have a route setup to handle the url with all those parameters. What you want to look at, as it seems your using jQuery, is jQuery.get documentation. The second example near the bottom shows where to place parameters. WebAPI will check for them in the body if they are not in the query string. so it would end up looking like:
$.getJSON("http://"+window.location.host+"/api/GeneratorController/setparameters", {parameter1: parameter1, parameter2:parameter2 ...});
Well, the first thing to check is to make sure that your server-side function is returning the values you expect. You can do this with Chrome's developer tools or with the Firebug Firefox extension, and I think IE10 has something equivalent, too. Go to the "net" tab, find the request corresponding to your API call, and take a look at what the server responded with.
Please add the line
alert("api/GeneratorController/setparameters/"+firstparemeter+"/"+secondparameter+"/"+thirdparameter+"/"+fourthparameter+"/"+fifthparameter+"/"+sixthparameter)
Then call your script and take the output of the alert into a browser. Then check if your application Handels that route.
By the way I think you have a typo. I guess it should be firstparameter.
I assume you would like to do
"api/GeneratorController?foo=Bar
But when you are new to this, I would suggest that you first try the example like it is. And After that you can start changing setails.
So I found what was the problem with my code
Two things :
1- I shouldn't use the word "Controller" when I call my API ,it should be api/Generator/...
2- the function name needs to start with "get" and not "set" since it "gets" the return value from the api
Thanks everyone!

Writing variables to a second HTML file with Javascript

I've got 2 HTML files and 1 javascript file, index.html, results.html and file.js
Inside index.html I retrieve user input which is used to do some calculations inside the javascript file. The calculating starts when a button is pressed. Now I want to display the data retrieved from index.html to display on results.html so when pressing the button it should run the function and go to another page.
It all works fine to calculate and show the results on 1 page, but I don't know how to display the results on results.html.
This is how a piece of the code looks:
function berekenKoolBehoefte(){
koolBehoefte = (energieBehoefte - (eiwitBehoefte * 4) - vetBehoefte) / 4;
toonKool();
}
function toonKool(){
var uitkomstKool = document.getElementById("uitkomstKool");
uitkomstKool.innerHTML = (koolBehoefte * 4).toFixed(1) + " kcal" + " " + koolBehoefte.toFixed(1) + " gram"
}
bereken.addEventListener('click', berekenKoolBehoefte, false);
This displays all on 1 page, know function toonKool() should be display inside results.html
Your best bet in this case is to use:
server side to store the values and get the results from there
cookies - if it's small data, simply put it inside a cookie. For more info how to use cookies with plain javascript look here
attach your results to url and then parse them with JS. You can see how it's done here
iFrame - you can attach your results in an iframe and access the
data. This method is quite good, but for my taste - it's awful
I think you'd better:
1) Create form on your page aroud the button.
2) On form submit do calculations and put result to hidden field in the form (Or post all fields for calculation to server side). Then all arguments that you want to pass would be accessible on server on second page.
3) Display second page using GET or POST argument from previous.
Cookies is something that you may forgot to clear and it's much harder to manage them.
Calculation in iframe is much worse then using ajax. You may use ajax for partial load the second page, then all javascript variables would remain.

Using jQuery on a string containing HTML

I'm trying to make a field similar to the facebook share box where you can enter a url and it gives you data about the page, title, pictures, etc. I have set up a server side service to get the html from the page as a string and am trying to just get the page title. I tried this:
function getLinkData(link) {
link = '/Home/GetStringFromURL?url=' + link;
$.ajax({
url: link,
success: function (data) {
$('#result').html($(data).find('title').html());
$('#result').fadeIn('slow');
}
});
}
which doesn't work, however the following does:
$(data).appendTo('#result')
var title = $('#result').find('title').html();
$('#result').html(title);
$('#result').fadeIn('slow');
but I don't want to write all the HTML to the page as in some case it redirects and does all sorts of nasty things. Any ideas?
Thanks
Ben
Try using filter rather than find:
$('#result').html($(data).filter('title').html());
To do this with jQuery, .filter is what you need (as lonesomeday pointed out):
$("#result").text($(data).filter("title").text());
However do not insert the HTML of the foreign document into your page. This will leave your site open to XSS attacks.
As has been pointed out, this depends on the browser's innerHTML implementation, so it does not work consistently.
Even better is to do all the relevant HTML processing on the server. Sending only the relevant information to your JS will make the client code vastly simpler and faster. You can whitelist safe/desired tags/attributes without ever worrying about dangerous ish getting sent to your users. Processing the HTML on the server will not slow down your site. Your language already has excellent HTML parsers, why not use them?.
When you place an entire HTML document into a jQuery object, all but the content of the <body> gets stripped away.
If all you need is the content of the <title>, you could try a simple regex:
var title = /<title>([^<]+)<\/title>/.exec(dat)[ 1 ];
alert(title);
Or using .split():
var title = dat.split( '<title>' )[1].split( '</title>' )[0];
alert(title);
The alternative is to look for the title yourself. Fortunately, unlike most parse your own html questions, finding the title is very easy because it doesn;t allow any nested elements. Look in the string for something like <title>(.*)</title> and you should be set.
(yes yes yes I know never use regex on html, but this is an exceptionally simple case)

Categories