I'm trying to scrape data from this url https://drive.getbigger.io/#/stores, however I didn't find the Xpath of the text I want to export, which are the producer's offer.
Firstly I try the importxml function in Google sheet:
=IMPORTXML(A1;"/html/body/flt-ruler-host/div[23]/p")
and it gave me N/A error "the imported content is empty"
so I try to scrape this website with add-ons and Parsehub, and it gave me every time a .csv file where i can't find the data I want to export.
Also I can't find the right Xpath for the data I would like to scrape, when I use the inspection tool, the data isn't in the <body> part.
However the Xpath I use in my importXML function is some code I find in the <body> part and which is close of the text I'd like to extract (the producer's offer).
It seems that the Xpath I am looking for is linked in the <head> part with some JavaScript code, also when I hover the page with the selection tool in order to scrape the data it select the whole page, maybe because there is a "scroll <div>".
So I wonder if the website use some kind of protection against scraping or other.
Please guys tell me if :
I could find the right Xpath in order to scrape with the importXML function?
Should I extract the data with a python script?
if the website block my attempts, how could I do this?
You won't be able to scrape anything with IMPORTXML formula since the website uses dynamic rendering (javascript).
So yes, Python+Selenium (or other combinations) could do the job. The website won't block you if you follow some rules (switch user-agent, add pauses between requests).
You would probably need these XPath :
Product description :
//p[1][string-length(text())>5][parent::flt-dom-canvas]
Product price :
//p[3][contains(text(),"€") and not (contains(text(),","))][parent::flt-dom-canvas]
However, I think the most elegant way to get the data is probably to use the API the website relies upon. With GoogleSheets and a custom ImportJSON script, you can obtain something like this (result for "fromage" as query) :
It won't work out of the box, you'll have to modify some parts of the script since it won't load a JSON (called with POST) which needs headers in the request. In a nutshell, you need to construct the payload part, add headers to the request ("Bearer XXXXX"), and add a parameter to a function to retrieve the results.
All this depends on your objective and your expected output.
EDIT : For references (constructing the payload, adding parameters) you can read :
https://developers.google.com/apps-script/reference/url-fetch/url-fetch-app#fetchurl,-params
Look also the networktab of your browser developper tools in order to find : the url of the API and the correct parameters to send.
I am just wrapping up a long term project I have done for a company, but I am really stuck at this point.
I have a cool little page here: http://hagen-etc.com/test/buy/
It is basically showing all their retailers in the right hand side div while you can narrow down the results with the different options on the left side (Javascript based).
Everything works just fine, but I have run into a problem. The thing is, the person I am developing the site for has absolutely zero knowledge about programming and website managment etc, and therefore I need a smart way for her to change it.
I have simplified the procedures several other places on the website using shortcodes with Visual Composer and Shortcoder-plugin.
The problem here is, The Javascript is in footer.php while the actual content is on a Page in the dashboard. How do I make a smart solution so she can easily manage this in a blink of an eye? You can take a look at the source code in the link above if you would like to.
Would love to get some help on this because I am having a hard time figuring out a solution. Maybe a plugin can even do this?
The different areas, countries, cities and retailers are written in HTML in the Page while reas, countries, cities and retailers are written in Javascript in the footer.php. I know I can move the Javascript over to the Page, but the problem is, she would still have to change both the Javascript and the HTML.
I would like it to work with Shortcodes in a structure like this:
[countryopen]
[areaopen]
[cityopen]
[retailer][retailer]
[cityclose]
[areaclose]
[countryclose]
How would I go about this? The HTML would be in the top of the file while the Javascript would be in the bottom. I cannot really change both things with just one shortcode. How would I do this or is there even a better solution?
So essentially you are trying to allow this person to manage locations? You can use Advanced Custom Fields for WordPress and/or custom posts types for WordPress.
I would use a combination. Create a new custom post type in your functions.php and then, after installing the ACF plugin, create Location, Area, City, and Retailer fields and assign them to the new post type.
Similarly, in the index "Page" that you are working in now, you can create a query to dump any of these Locations onto the page.
I hope this helps. Let me know if I missed the point here, the question is still a little unclear.
UPDATE: There are many great tutorials that will walk you through creating a custom post type in your WordPress theme. WPBeginner and Smashing Magazine do a really good job of bringing you through this step-by-step. It will be very helpful for you to know how to do this and to understand this as a basic part of WordPress's Model-View-Controller system, here you are creating new views for your users to interact with.
After creating your new custom post type, which will seem like any other post/page in the Edit view, you can use the ACF plugin to easily add new fields to this new custom post type:
In the second section called "Location" you can define what type of posts these fields should be appended to. You would make these inputs says:
Post Type is equal to [Your New Post Type]
Your new post type being "Locations" or "Retailers" or however you want to phrase that. Now, when you check out the Edit view of a new custom post type, you can see these new fields appended to the bottom. Lastly, you may want to remove any field that you wouldn't want your web manager adding information into like WordPress' native Description or Excerpt inputs. You can do this by adding a few lines to your functions.php after you have created the post type:
add_action('init', 'remove_editor_from_retailer');
function remove_editor_from_retailer() {
remove_post_type_support( 'retailer', 'editor' );
}
Granted that "Retailer" is the name of your custom post type.
You can't have a user updating data in a javascript file.
So what you need to do is split the data off from the functionality.
To do this, put a script tag in one of your Wordpress template files, and output the area, country etc. data there as a Javascript variable.
You can manage and fetch this data using any Wordpress method of your choice. Anything that allows the user to update data in the admin area which you can then output in your plugin will work. So a plugin, a shortcode on some specific post, etc. are possibilities.
Then, in your existing Javascript file, remove your hardcoded data and instead pull it from that variable.
i need to pass some values from JavaScript to Plugin.
and need to retrieve resultant result from plugin should get invoke in the java script.
can anybody guide how to proceed with this approach.
please send some sample code & Link , how to invoke plugin from java script and pass parameter.
How to get the resultant from plugin to java script.
My plugin fires in retrieve multiple and ill get a collection of records. based on the plugin result , i need to use the result in my java script.
If i am not still clear , please find the bellow link what exactly my requirement .,
Can you please guide me how to proceed on this.
Thanks
You can't call a plugin from javascript.
But, you can retrieve records from CRM, for example with ODATA. On the Retrieve / RetrieveMultiple event of the entity, you can register a plugin with some custom code. Place the result of the plugin in a field of the entity. Then read the result from the entity which you retrieved by ODATA. You can create a custom entity for this Plugin Messaging Mechanism.
I would like to ask for more an opinion than a question:
What would the community recommend to do when you must do a webpage with lots of data, for example, a products listing, that should have some functionality like buy (adds to cart), sorting, etc. if you have to manipulate the data of the current product - price, title, image, link and other attributes? How you do it in your projects?
For example we have a page with dozens of products, each of them has attributes: price, title, description, image(URL), link(URL). How would you store the data to use it on some user interaction? Personally, I've done it by just inserting each of the attribute in tags, something like:
<div class="product" data-product_id="123">
<div class="pr_title">Title</div>
<div class="pr_body">Body</div>
<div class="pr_img"><img src="http://www.www.www/img.png"></div>
<div class="pr_link">Buy!</div>
</div>
This way I have my html structure for presentation and I worked with data in jQuery by something like:
var url = $('.product').find('.pr_link').find('a').attr('href');
But when the project got big and there were 10-15 more attributes added to each element, getting data from current product got pretty complicated and the code became mostly unreadable.
I thought of using same structure, but to keep data in some object like:
var products = {
1: {
title: "abc",
description: "lorem ipsum",
price: 25.19,
img: "http://www.www.www/img.png",
link: "http://www.stackoverflow.com"
}
}
and keep markup as simple as possible, only using elements and styles needed for design with css:
<div class="product" data-product_id="123">
<div class="title">Title</div>
<div>Body</div>
<img src="http://www.www.www/img.png">
Buy!
</div>
so onClick I would need to retrieve the id of the product and query it in our object "products":
var url = products[id].title;
While this is the most convenient way to work with it requires a new object.
Another idea was to keep all data in data- attributes of the parent div element like:
<div class="product" data-product_id="123" data-title="abc" data-body="Body">
but for much as I know jQuery doesn't work with data attributes well (natively).
So what are your suggestions? Maybe you have some even better ideas to share.
P.S. I tried to find some information on the subject, but most likely failed to find the way to formulate it well so I found nothing about it. If there are links or even similar questions on stack exchange sites, please feel free to post them. Thank you in advance!
You can use HTML5 data attribute to store products data, as you have several properties of products to associate with each product block, you can JSON encode the object and assign to the top element, and then can access that on user interaction on that element or any child element.
var product = {
title: "abc",
description: "lorem ipsum",
price: 25.19,
img: "http://www.www.www/img.png",
link: "http://www.stackoverflow.com"
};
$(selector).data('product',JSON.stringify(product));
then to retrieve the object you can do on any event's callback
$product = $.parseJSON($(elem).data('product'));
In fact both facebook and twitter used data attributes to store associated data with tweets and stories. For example here goes some html of a FB story
<li data-ft='{"qid":"5757245005920960301","mf_story_key":"7164261693049558180"}'
id="stream_story_4fe5d7d51bc415443080257">
You can see facebook is storing JSON encoded data into the data-ft attribute.
Similarly an example of a Twitter tweet html
<div data-tweet-id="216534496567230464" data-item-id="216534496567230464"
data-screen-name="onimitch" data-user-id="123682011" data-is-reply-to="">
So twitter is saving associated data for a tweet into different attributes like data-tweet-id, data-user-id.
So As they both handle's a lot amount of data, I think You can also use either of the method to save your data without any performance issue.
If you store data with individual keys then be aware of the automatic data conversion that .data() does as #nnnnnn has already mentioned in comment.
Demo With .data() : http://jsfiddle.net/joycse06/vcFYj/
Demo With .attr() : http://jsfiddle.net/joycse06/vcFYj/1/
No need to use ids or references within the DOM. Keep it clean.
Just use the data() function in jquery and bind it to the html elements. That way when you click on each element you will be able to get the object by calling this from within your click event.
//loop through your elements in the dom or build them dynamically.
$.each('div.products', function(){
$(this).data('product', <your data>);
};
//assign a handler to each element and grab the data object by using this. :)
$(container).delegate('.products', 'click' function(){
console.log($(this).data('product'))
});
I don't recommend you to keep your data in an object and fill HTML with it,
There are reasons for this:
Your data is open to all hackers to steal.
When your data is too big to fetch, your page can be initially loaded without data - and it's what any web developer do not want.
Again, when your data is too big, old computers -computers which has approx. 512M Ram- may come to a deadlock
Again and again, when your data is too big, traversing or sorting it may costs too much time.
I understand that your data is static for minimum about 5 min.. What I recommend is,
Place your data with Server-side languages(PHP,ASP(.NET),Python,etc.)
Fetch data with queries seperately when your script needs it.
Anything you do not need is cost for your user, user may have lots of page in his/her browser and it will be cause of deadlock too.
p.s. Any detail will help me to help you more.
You don't need to traverse the whole object tree.
How about putting IDs:
<div class="pr_link" id='id_link_123'>Buy!</div>
and retrive them as:
// ....
var id = 123;
// ....
var url = $("#id_title_" + id + " a").attr('href');
You should try AngularJS. It works great with jQuery and it is easy to learn. AngularJS contains two way data binding and extends HTML by new attributes and elements. Last but not least it is MVC framework by Google. See more at http://www.angularjs.com
Although there are many excellent clientside mvc frameworks, check out Backbone.js, which is a powerful-but-easy-to-use framework for managing the interface between your data source (usually an http server), and your DOM. In short, it keeps "your truth out of the DOM"
As a contrived example, if you are creating a simple address book app, you would probably have a clientside Contact Backbone model, which would roughly (or exactly) mirror the model you have on your server.
Backbone manages requesting and parsing the contact data (json, xml, etc) from the server (or localstorage, etc) into these Contact model objects, and you provide a js callback to update your DOM when changes occur. This also works in reverse: Backbone updates the server when the model objects change, either instantly, after some delay, or when you explicitly call the save function.
Backbone might not be the correct framework for you. But check them all out before you decide to roll your own solution. Even so, you could create your own clientside MVC layer, if you feel you:
Can do a better job than one of the many excellent existing javascript MVC frameworks,
Have such a unique problem that only a bespoke solution will work for you, or
Find clientside MVC, or javascript frameworks somehow scary or distasteful.
I have to agree to Enes, you shouldnt embed your data into the jquery code or, as you already mentioned, your code will quickly become so complex as to be unsupportable. An MVC framework will help with this.
There are a couple mentioned in the answers above such as http://angularjs.com and http://backbonejs.org although i cant vouch for either of them but you should have a look to see if they help solve your problems.
I am currently rewriting a scrumwall application i originally developed in jquery and raw php into the agile toolkit. With agiletoolkit (aka atk4 on stackoverflow), you define models which map to the database tables, html templates and views which have placeholders where you want the data to go and pages where you add models, forms and grids. It provides the links between javascript and php and has features including ajax refreshes of views.
You could use html5 features but only the latest browsers will support this so you may have problems if the users are not in a controlled environment (e.g. a single company where there is a standard web browser).
The combination of DOM and jQuery is one of the most powerful things the Web has. You are not taking all of them.
Make the DOM help you and work for you. Organize better the DOM. Take all of it. Your example can be optimized this way:
<div class="product" data-product_id="123">
<div class="pr_title">Title</div>
<div class="pr_body">Body</div>
<img class="prodcut-image" src="http://www.www.www/img.png">
<a class="product-link" href="http://www.stackoverflow.com/">Buy!</a>
</div>
To select with jQuery, simple write:
var url = $('.product .product-link').attr('href');
I believe that your problem is that your are forgetting the way CSS selectors work. It's not necessary to transverse into every level of the DOM.
You can take off some DIV's in your example. A link not necessary have to be in-line. A CSS rule can make them work like a DIV.
You can use (as you already are) the data- attributes in HTML through jQuery. When you create the DIV from the server data, you can do something like this:
$('.product').data(yourProduct); // yourProduct would be the JSON representation of the data
To retrieve:
var url = $('product').data('product-link'); // if the JSON object has a product-link property that is.
The caveat someone mentioned above that your data is exposed to the browser is of course still valid, so take care.
I wanted to give my users a little piece of JavaScript or HTML code that they could put on their site and show information about them. Kind of like StackOverFlows new feature Flair.
I have an idea of how to code it. I was going to give them some JS with a HTML that had a DIV id="MySite_Info". Then the JS would go to my site and pull some JSON or XML and then fill in the data with a DIV in the HTML I gave them on their site.
Is there a better way to do this? Or any examples online I should follow? Whats the best way to create these javascript snippets? (Not sure what the proper name is)
There are two basic options.
Images (and pictures of text suck)
JavaScript - as you described
The approach I would take would be to:
Dynamically generate the JS using a server side process. This would include data for the user (using a JSON generator to easily produce the data in a suitable format).
Build the badge using standard DOM methods
Find the element with the document id and appendChild the generated badge