I'm trying to set the Facebook open-graph meta data for a page before FB scrapes it, on the client side.
Everything that I tried, it seems that FB scrapes the page before any JS takes action. Is this correct? Is there anyway to do that?
This is how it is in the html file:
<meta id="ogImage" content='http://www.blueglass.com/wordpress/wp-content/uploads/2012/04/stand-out-in-crowd.jpg' property='og:image' />
<meta id="ogDescription" content='testd' property='og:description' />
<meta id="ogTitle" content='testt' property='og:title' />
After the page loads, according to the url parameters, I want to change the metadata.
Thanks.
Javascript is primarily a client-side technology (although server-side implementations exist) that is executed by the browser. When Facebook scrapes your page it is only looking at the HTML structure and content; any JS inside your page will not be executed. Put your URL in to the Facebook Linter debug tool to see exactly what the linter is seeing.
If you want to set the meta data dynamically you'll need to do this via a server-side script such as PHP or ASP.NET, most likely by interrogating the URL querystring parameters and adjusting the META tags appropriately.
Related
I'm building a asp.net website (use HTML, C#, Jquery ..).
Then. Website is published.
Then. I have some bug need to fix in javascript code file.
I fixed it and republished website.
But, browser save cache or another browsing data, and the user did not get fixed version.
So, how can I clear browsing data or cache of user manually (by javascript, jquery or C#) when I change some litle code or fix small bug in js file? I cannot say with user so "U must clear browsing data to get new version !"
Thanks.
This may not directly answer your question but if you are using any bundler like webpack or task runner like gulp, you can actually generate the file with a hashkey.
for example the bundler will generate a production version of the js file as
myFile.d587bbd6e38337f5accd.js
d587bbd6e38337f5accd is the hash key generated by the bundler & it will be injected by the bundler itself. In such case every release will have a js file with modified hash key. So it wont load from cache
You can check this LINK to know about it
In case of javascript use cacheBustingUrl whose format is given below which is append to Date.Now() which will solve the problem of caching.
<script>
var scriptUrl = "/site/js/script.js",
cacheBustingUrl = scriptUrl + "?" + Date.now(),
newScriptElement = document.createElement("script");
newScriptElement.setAttribute("src",cacheBustingUrl);
document.body.appendChild(newScriptElement);
</script>
This should append a new script element like
<script src="/site/js/script.js?1404918388711"></script>
to the end of the page, which the browser will load like any other script tag.
You can try setting a refresh on your page using window.location.reload(true) in theory this will ignore all the cached files and retrieve new copies of your website to the user.
Maybe create check that will refresh the page if its not up to date.
no browser can allow you to remove its cache.its a very big security aspect but you can prevent cashing by passing this meta in your HTML
<meta http-equiv='cache-control' content='no-cache'>
<meta http-equiv='expires' content='0'>
<meta http-equiv='pragma' content='no-cache'>
Also You have to turn off auto complete textbox by
formname.setAttribute( "autocomplete", "off" );
I need to process html files that have corrupted script files that are added to it via tag.
Im planning to remove all script tag present in the webpage via phantomjs.
But on opening the webpage via webpage.open(), phantomjs parse error is thrown since it cannot parse the JS content within the script tag.
Here is an example:
<html>
<head>
<script>
corrupted JS
if(dadadd
;
</script>
<body>
some content
</body>
</html>
Can someone help me on suggesting the right way to clean this webpage using phantomjs ?
It's not (easily) possible. You could download (not through opening the page, but rather making an Ajax request in page.evaluate()) the static html, then change according to your needs, then assign it to page.content.
This still might not work, because as soon as you assign it to page.content, you're saying that PhantomJS should interpret this source as a page from an unknown domain (about:blank). Since the page source contains all kinds of links/scripts/stylesheets without a domain name, you'll have to change those too in order for the page to successfully load all kinds of resources.
It might be easier to just have a proxy between PhantomJS and the internet with a custom rule to adjust the page source to your needs.
I need to render a page without executing it's JavaScript (however inject my own script), showing the user how the page would look from a bot's POV.
So far I have thought of loading the page using ajax, removing all <script></script> tags from the loaded data, injecting my own <script></script> tags and replacing page html with the filtered data.
Are there any better ways of achieving this?
Maybe not a better way, but an alternative to using javascript to do what you want:
You can write a (php) server-side script, use file_get_contents() to get the original page contents, use php to remove and replace javascript page contents (str_replace, substr_replace, preg_match) then call this php script in an iframe.
See my related answer for more detail: https://stackoverflow.com/a/17262334/888177
<meta http-equiv="refresh" content="5; url=http://example.com/">
Meta refresh.
EDIT:
So, here's something you can do:
Check out this jquery plugin called fancybox.
What it does is, load remote url content into a neat popup div on the page. You can check if you can modify it's code to make it work how you want.
Also quick headsup: bots don't have cookies as well. So, stripping just script tags won't do. Also have to disable cookies in the request.
I have a Website, which is Fully Ajax-Based (Hash Navigation).
Is there a way to refresh Open Graph meta-tags for ajax-based websites using Javascript?
(When I Click on a link, the Tags, and there values should Change)
No. Open Graph markup must be present on HTML pages which are GETable with pure HTTP.
This is because when a user interacts with an OG object (like, performs an action etc) Facebook will perform an HTTP GET on the OG URL, and expect to see OG tags returned in the markup.
The solution is to create canonical URLs for each of your objects. These URLs contain basic HTML markup including OG tags.
On requests to these URLs, if you see the incoming useragent string containing 'facebookexternalhit' then you render the HTML. If you don't, you serve a 302 which redirects to your ajax URL. On the ajax URLs, your like buttons and any OG actions you publish should point to the canonical URL object
Example:
As a user, I'm on http://yoursite.com/#!/artists/monet. I click a like button, or publish an action, but the like button's href parameter, or the URL of the object when you post the action should be a web hittable canonical URL for the object - in this case, perhaps http://yoursite.com/artists/monet
When a user using a browser hits http://yoursite.com/artists/monet you should redirect them to http://yoursite.com/#!/artists/monet, but if the incoming useragent says it is Facebook's scraper, you just return markup which represents the artist Monet.
For real world examples, see Deezer, Rdio and Mog who all use this design pattern.
A little bit more investigation lead to the following findings:
Let's say you made an application with a hash that looks like this:
http://yoursite.com/#/artists/monet
The Facebook scraper will call your url without the /#/artists/monet part. This is a problem because you have no way of knowing what information you have to parse into the meta og: tags.
Then try the same with the suggested url as Simon says:
http://yoursite.com/#!/artists/monet
Now you'll notice that the Facebook scraper is respecting the google ajax specifications and it will convert the #! to ?_escaped_fragment_= so the URL looks like this:
http://yoursite.com/?_escaped_fragment_=/artists/monet
You can check this out for yourself with the facebook debugger: https://developers.facebook.com/tools/debug
upload the php script to your server
go to the facebook debugger
enter the url with the /#/ part
click 'See exactly what our scraper sees for your URL' - no hash fragment
enter the url again with /#!/
click 'See exactly what our scraper sees for your URL' - hash fragment has been turned to
?_escaped_fragment_=
The script
<html>
<head>
<title>Scraping</title>
</head>
<body>
<?
print_r($_SERVER);
?>
</body>
</html>
Or summarized: always use /#!/ (hashbang) deeplinks ;)
I ran a quick test that seems to work. Dependant on the fact the FB scraper doesn't run JavaScript.
As most of my sites are static Single Page Apps with no server logic, I can generate the static pages quickly with tools such as Grunt and Gulp.
If you Share http://wh-share-test.s3-website-eu-west-1.amazonaws.com/test
Facebook will scrape the test page meta tags, when a user clicks the link the JS redirects to /#/test for my single page app to react and present the correct view.
Seems hacky but works;
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title>This is a shared item</title>
</head>
<body>
<h1>This is a shared item page.</h1>
<script>
var path = window.location.pathname;
window.location = '/#' + path;
</script>
</body>
</html>
Can anyone help? I have been designing a site using Javascript but the rest of the html content is static ie. images etc
When i load my page in Firefox i have to clear the cache..
I remember a long time ago there was something you could add to the html to force a reload.
My question is, is this a good thing? I presume it caches for a reason i.e to cahce images etc.. But this causes my pages not to refresh
And how to do it?
Really appreciate any feedback
If you want only the js to be loaded afresh everytime, and leave everything else to load from cache, you can add a version number to the js include line like so:
<script src="scripts.js?v=5643" type="text/javascript"></script>
Change the version number (?v=num) part each time you change the js file. This forces the browser to get the js file from the server.
Note: Your actual file name will be the same - scripts.js
For disabling cache for all files, if you're using apache, put this in your httpd.conf
<Directory "/home/website/cgi-bin/">
Header Set Cache-Control "max-age=0, no-store"
</Directory>
You can also put a meta tag on your html like so:
<meta http-equiv="pragma" content="no-cache" />
<meta http-equiv="cache-control" content="no-cache" />
More info on this here
For web pages you can how the page is cached in the HTTP Header. You should look at Expires if you have a particular date for the cache to expire or Cache-Control for dynamic expiration based on when the page was requested. Here's a pretty good tutorial that covers how cache works and covers set up on the major web servers.
Try pressing Control + F5 when you load your page in FireFox-- this should clear your browser's cache of the page and reload cleanly for you.