Customize meta info based on the query string values - javascript

My website sells stuff and I would like to customize the page title and meta description in certain pages when certain items are viewed. I would want these custom titles and descripts to be listed when shared on other websites. E.g.: Twitter, FB, etc...
Basically I want to customize the title and description based on the query string values. How is this possible? I've looked a js based plugin or similar on github as well but had no luck.

The issue with JavaScript rendered pages is that there's no guarantee a scraper will pick up the meta tags. The headers are read before scripts are run - meaning to lower overhead they likely won't even bother executing it. Better if you write a script that writes customised meta tags to the HTML file.

Related

I can't find Xpath on this website or maybe wrong syntax

I'm trying to scrape data from this url https://drive.getbigger.io/#/stores, however I didn't find the Xpath of the text I want to export, which are the producer's offer.
Firstly I try the importxml function in Google sheet:
=IMPORTXML(A1;"/html/body/flt-ruler-host/div[23]/p")
and it gave me N/A error "the imported content is empty"
so I try to scrape this website with add-ons and Parsehub, and it gave me every time a .csv file where i can't find the data I want to export.
Also I can't find the right Xpath for the data I would like to scrape, when I use the inspection tool, the data isn't in the <body> part.
However the Xpath I use in my importXML function is some code I find in the <body> part and which is close of the text I'd like to extract (the producer's offer).
It seems that the Xpath I am looking for is linked in the <head> part with some JavaScript code, also when I hover the page with the selection tool in order to scrape the data it select the whole page, maybe because there is a "scroll <div>".
So I wonder if the website use some kind of protection against scraping or other.
Please guys tell me if :
I could find the right Xpath in order to scrape with the importXML function?
Should I extract the data with a python script?
if the website block my attempts, how could I do this?
You won't be able to scrape anything with IMPORTXML formula since the website uses dynamic rendering (javascript).
So yes, Python+Selenium (or other combinations) could do the job. The website won't block you if you follow some rules (switch user-agent, add pauses between requests).
You would probably need these XPath :
Product description :
//p[1][string-length(text())>5][parent::flt-dom-canvas]
Product price :
//p[3][contains(text(),"€") and not (contains(text(),","))][parent::flt-dom-canvas]
However, I think the most elegant way to get the data is probably to use the API the website relies upon. With GoogleSheets and a custom ImportJSON script, you can obtain something like this (result for "fromage" as query) :
It won't work out of the box, you'll have to modify some parts of the script since it won't load a JSON (called with POST) which needs headers in the request. In a nutshell, you need to construct the payload part, add headers to the request ("Bearer XXXXX"), and add a parameter to a function to retrieve the results.
All this depends on your objective and your expected output.
EDIT : For references (constructing the payload, adding parameters) you can read :
https://developers.google.com/apps-script/reference/url-fetch/url-fetch-app#fetchurl,-params
Look also the networktab of your browser developper tools in order to find : the url of the API and the correct parameters to send.

Fetching URL Metadata from JS

Most social media sites have a feature where you can type in a link and the site will generate a link preview of it. See example below from Google+
Let's say I'd like to build my own. I'm using Ruby on Rails as a web framework but that's irrelevant as I imagine I'll have to use JS to fetch this client-side right?
Where do I look for this data? I know it's usually in the <meta> tags, but is that standard? When I tried it for a few links only the description was in the <meta> tags. The image and title didn't match anything else in the meta tags.
How do I go about fetching a remote document asynchronously and parsing it's tags? If anyone could point me to an example I'd be grateful.
Thanks!
There are three common ways how authors might provide this data in HTML documents (from least expressive to most expressive):
Metadata in the head element: This is plain HTML, i.e.,
meta elements (with defined/registered values for the name attribute),
link elements (with defined/registered values for the rel attribute), and
the title element.
Microformats: Still using plain HTML, but together with specific class names. All Microformats are described in their wiki.
Structured data: Using extending/additional syntaxes (JSON-LD, Microdata, RDFa, …) and vocabularies (Schema.org, Open Graph Protocol, Dublin Core …).
You’ll typically find suitable parsers in your programming languages.
You’ll probably find that most sites make use of Open Graph Protocol (in RDFa), as this is used by Facebook and Twitter. Probably followed by Schema.org (in JSON-LD/Microdata/RDFa), as this is sponsored by the major search engines.
Note that 2. and 3. also allow authors to provide data about entities described on (or relevant to) the page, i.e., not every extracted data is suitable for link previews, so you have to take the context into account.

How to create WordPress page with multiple URLs?

Is it possible to create a WordPress page with multiple URLs?
I want one URL per section, all without having duplicate content in SEO.
We don't want to use anchors because they don't have individual rankings like URLs.
There are a lot of plugins that can easily create URLs in bulk for you. I recommend MPG: Multiple Pages Generator because you can customize the content of each page to avoid duplicate content and it's incredibly easy to manage all of your new pages once they’re created.
You just need to have all your products and their variable data in a CSV file and a template page. The plugin creates shortcodes for the information from your file and you just need to add those shortcodes to a template page, like:
Visit us at {{mpg_location}}
The best thing is that if you want to edit anything from your CSV or template page, you can do that anytime. Find out more about it here: https://mpg.porthas.com

Combine PHP file and page - Wordpress

I am just wrapping up a long term project I have done for a company, but I am really stuck at this point.
I have a cool little page here: http://hagen-etc.com/test/buy/
It is basically showing all their retailers in the right hand side div while you can narrow down the results with the different options on the left side (Javascript based).
Everything works just fine, but I have run into a problem. The thing is, the person I am developing the site for has absolutely zero knowledge about programming and website managment etc, and therefore I need a smart way for her to change it.
I have simplified the procedures several other places on the website using shortcodes with Visual Composer and Shortcoder-plugin.
The problem here is, The Javascript is in footer.php while the actual content is on a Page in the dashboard. How do I make a smart solution so she can easily manage this in a blink of an eye? You can take a look at the source code in the link above if you would like to.
Would love to get some help on this because I am having a hard time figuring out a solution. Maybe a plugin can even do this?
The different areas, countries, cities and retailers are written in HTML in the Page while reas, countries, cities and retailers are written in Javascript in the footer.php. I know I can move the Javascript over to the Page, but the problem is, she would still have to change both the Javascript and the HTML.
I would like it to work with Shortcodes in a structure like this:
[countryopen]
[areaopen]
[cityopen]
[retailer][retailer]
[cityclose]
[areaclose]
[countryclose]
How would I go about this? The HTML would be in the top of the file while the Javascript would be in the bottom. I cannot really change both things with just one shortcode. How would I do this or is there even a better solution?
So essentially you are trying to allow this person to manage locations? You can use Advanced Custom Fields for WordPress and/or custom posts types for WordPress.
I would use a combination. Create a new custom post type in your functions.php and then, after installing the ACF plugin, create Location, Area, City, and Retailer fields and assign them to the new post type.
Similarly, in the index "Page" that you are working in now, you can create a query to dump any of these Locations onto the page.
I hope this helps. Let me know if I missed the point here, the question is still a little unclear.
UPDATE: There are many great tutorials that will walk you through creating a custom post type in your WordPress theme. WPBeginner and Smashing Magazine do a really good job of bringing you through this step-by-step. It will be very helpful for you to know how to do this and to understand this as a basic part of WordPress's Model-View-Controller system, here you are creating new views for your users to interact with.
After creating your new custom post type, which will seem like any other post/page in the Edit view, you can use the ACF plugin to easily add new fields to this new custom post type:
In the second section called "Location" you can define what type of posts these fields should be appended to. You would make these inputs says:
Post Type is equal to [Your New Post Type]
Your new post type being "Locations" or "Retailers" or however you want to phrase that. Now, when you check out the Edit view of a new custom post type, you can see these new fields appended to the bottom. Lastly, you may want to remove any field that you wouldn't want your web manager adding information into like WordPress' native Description or Excerpt inputs. You can do this by adding a few lines to your functions.php after you have created the post type:
add_action('init', 'remove_editor_from_retailer');
function remove_editor_from_retailer() {
remove_post_type_support( 'retailer', 'editor' );
}
Granted that "Retailer" is the name of your custom post type.
You can't have a user updating data in a javascript file.
So what you need to do is split the data off from the functionality.
To do this, put a script tag in one of your Wordpress template files, and output the area, country etc. data there as a Javascript variable.
You can manage and fetch this data using any Wordpress method of your choice. Anything that allows the user to update data in the admin area which you can then output in your plugin will work. So a plugin, a shortcode on some specific post, etc. are possibilities.
Then, in your existing Javascript file, remove your hardcoded data and instead pull it from that variable.

How to mimic Facebook's "link share" functionality using node.js and javascript

so what I want to mimic is the link share feature Facebook provides. You simply enter in the URL and then FB automatically fetches an image, the title, and a short description from the target website. How would one program this in javascript with node.js and other javascript libraries that may be required? I found an example using PHP's fopen function, but i'd rather not include PHP in this project.
Is what I'm asking an example of webscraping? Is all I need to do is retrieve the data from inside the meta tags of the target website, and then also get the image tags using CSS selectors?
If someone can point me in the right direction, that'd be greatly appreciated. Thanks!
Look at THIS post. It discusses scraping with node.js.
HERE you have lots of previous info on scraping with javascript and jquery.
That said, Facebook doesn't actually guess what the title and description and preview are, they (at least most of the time) get that info from meta tags present in the sites that want to be more accessible to fb users.
Maybe you could make use of that existing metadata to pull titles, descriptions and img previews. The docs on the available metadata is HERE.
Yes web-scraping is required and that's the easy part. The hard part is the generic algo to find headings and relevant texts and images.
How to scrape
You can use jsdom to download and create a DOM structure in your server and scrape that using jquery on your server. You can find a good tutorial at blog.nodejitsu.com/jsdom-jquery-in-5-lines-on-nodejs as suggested by #generalhenry above.
What to scrape
I guess a good way to find the heading would be:-
var h;
for(var i=6; i<=1; i++)
if(h = $('h'+i).first()){
break;
}
Now h will have the title or undefined if it fails. The alternative for this could be simply get the page's title tag. :)
As for the images. List all or first few images on that page which are reasonably large, i.e. so as to filter out sprites used for buttons, arrows, etc.
And while fetching the remote data make sure that ProcessExternalResources flag is off. This will ensure that script tags for ads do not pollute the fetched page.
And yes the relevant text would be in some tags after h.

Categories