I'm trying to familiarize myself with the concept of using script tags. I'm making a ruby on rails app that does something as simple as alert "Hi" when a customer visits a page. I am testing this public app on a local server and I have the shopify_app gem installed. The app has been authenticated and I have access to the store's data. I've viewed the Shopify API documentation on using script tags and I've looked at the Shopify Embedded App example that Shopify has on GitHub. The documentation details the properties of a script tag and gives examples of script tags with their properties defined, but doesn't say anything about where to place the script tag in an application, or how to configure an environment so that the js file in the script tag will go through.
I've discovered that a js file being added with a script tag will only work if the js file is hosted online, so I've uploaded the js file to google drive. I have the code for the script tag in the index action of my HomeController (the default page for the app). This is the code I'm using:
def index
if response = request.env['omniauth.auth']
sess = ShopifyAPI::Session.new(params[:shop], response[:credentials][:token])
session[:shopify] = sess
ShopifyAPI::Base.activate_session(sess)
ShopifyAPI::ScriptTag.create(
:event => "onload",
:src => "https://drive.google.com/..."
)
end
I think the problem may be tied to the request.env. The response is not being read as request.env[omniauth.auth] and I believe that the response coming back as valid may be required for the script tag to go through.
The method that I tried above is from the 2nd answer given in this topic: How to develop rails app for shopify with ScriptTags.
The first answer suggested using this code:
ShopifyAPI::Base.site = token
s = ShopifyAPI::ScriptTag.create(:events => "onload",:src => "your javascript url")
However, it doesn't say where to place both lines of code in a rails application. I tried putting the second line in a js file in my rails application, but it did not work.
I don't know if I'm encountering problems because I'm running the app on a local server or if there is something missing from the configuration of my application.
I'd appreciate it if anyone could point me in the right direction.
Try putting something like this in config/initializers/shopify_app.rb
ShopifyApp.configure do |config|
config.api_key = "xxx-xxxx-xxx-xxx"
config.secret = "xxx-xxxx-xxx-xxx"
config.scope = "read_orders, read_products"
config.embedded_app = true
config.scripttags = [
{event:'onload', src: 'https://yourdomain.herokuapp.com/javascripts/yourjs.js'}
]
end
Yes, you are correct that you'll need the js file you want to include for your script tag publicly available - if you are using localhost for development look into ngrok.
Do yourself the favor of ensuring your callbacks use SSL when interacting with the Shopify API (i.e. configure your app with https://localhost/ as a callback setting in the Shopify app settings). I went through the trouble of configuring thin as the web server locally with a self-signed SSL certificate.
With a proper set up you should be able to debug why the response is failing the omniauth check.
I'm new to the Shopify API(s), but not Rails. Their documentation leaves a lot to be desired.
Good luck to you sir,
I am currently writing a program that collects information from a sports website. (it contains the history of some basketball matches) The problem is that the website uses Angular.js for dynamical HTML binding. Consequently, the HTML source code involves lots of variables.
I need to find out the values of the variables in order to make my program work as I want. Is there any library or framework that could help me?
Edit: I am not limited by anything, but I prefer a web app (MEAN, JS frameworks with node-webkit). If it can't be done, I can also code it in C++ or Java (or extend it further to Android with NDK or SDK)
Disclaimer: This is not grey-hat stuff. I just need to do some web-scraping.
PhantomJS is a headless browser. It will allow you to use JavaScript to get the information you want.
Details:
It will browse to the page you want, execute the JavaScript like any browser and have access to the page as if it was displayed to a normal user using a normal browser. Using JavaScript DOM traversal, you will be able to get the information you need. This is almost the same as automatizing the task of opening a console in a browser and executing javascript which will get the information from the page.
While the below example is really simple, it can do much more than just getting the page results... it can click buttons, navigate to other pages, extract only relevant information, extract the page as an image... Do not hesitate referring to its Quick start documentation to learn more about it.
Example script returning the complete HTML page after waiting 10 seconds for the AngularJS to have finished calculating the page:
Command line usage: phantomjs-1.9.1 this_script.js
this_script.js (PhantomJS 2.0 may have different syntax in some cases):
var url = phantom.args[0]
function getDocumentElementAsHTML(page) {
return page.evaluate(function() {
return document.documentElement.innerHTML
})
}
var page = new WebPage()
page.settings.userAgent = "PhantomJS"
//page.onConsoleMessage = function (msg) { console.log(msg); }
page.open(url, function (status) {
if (status !== 'success') {
console.log('Unable to access network')
phantom.exit()
} else {
setTimeout(function(){
console.log(getDocumentElementAsHTML(page))
phantom.exit()
},10000)
}
});
PS: Waiting 10 seconds is not always a great solution, I used to periodically test the existence of the elements I wanted to get information from to be sure the JavaScript finished loading instead.
Source: grey-hat stuff I did in the past
I'd say you'd want to look at http://phantomjs.org/, http://www.slimerjs.org/, and/or http://casperjs.org/.
Phantom & Slimer give you API access to Webkit and Gecko respectively. Casper adds a more user friendly API over the top.
Scenario:
I've an application made in angularJS and ionic for cordova 3.5
This application loads trough an iframe a web to make some things with a step by step form. This web is on other site.
The code for the html is:
<div id="IframeContainer">
<iframe src="URL" style="width:100%;height:90%" onLoad="checkforclose(this);"></iframe>
</div>
This step-by-step form returns a result that the cordova application needs to know what happens in the form. It can return a json, a text/plain or even an HTML that auto-post to another site (This is linked with this non-answered question: Post and redirect FROM Web Api)
Said this, in my cordova application I've a javascript function in order to close the iframe and take over again the control of my application, detecting if the url contains the word "close". This is the code:
<script type="text/javascript">
function checkforclose(pageURL) {
var urlFrame = pageURL.contentWindow.location;
if (urlFrame.href.indexOf('close') > -1) {
window.location = "#/employees/";
}
}
</script>
Question:
Trying avoid CORS (So I think I can't read the iframe content on load, or I'm wrong?),
without using jQuery (AngularJS is welcome, plain javascript even more)
Taking over the control again to the application
How can I get the data returned by the step-by-step external form?
UPDATE 1:
I tried coding a "onload" reading (CORS errors), and posting to a cordova-html page, but without any respectable result.
A possible solution is Web messaging or cross-document messaging. Here's a blog post where someone used this method to gain access to a mobile device's camera from an external page loaded in an iframe. Although this person had the opposite goal (get data from Cordova to page loaded in iframe), they were able to accomplish cross domain communication between a page in an iframe and Cordova; which is what I believe you are trying to do.
I am going to create a crawelable ajax by jquery, How to do it? before I had a website that used jquery Ajax for searching my website but nothing indexed.
this is the new way tha I use:
page 1
And then show result by ajax and don't allow the link to go:
javascript
$("body").on("click","#linkA",function(e){
e.preventDefault();
var href=$(this).attr('href');
$.ajax({
type:"POST",
url:"ajax/return.php",
data:({page:href}),
success:function(data){
$("body").html(data);
}
})
});
my questions:
1- Is the way that I am using true?
2- Is this way crawelable?
I think the way that you are using is true, and it's a good way, but google has an article about Making AJAX Applications Crawlable
As long as the links you provide in the "href" attribute are also rendered correctly by the server if the browser accesses them directly, you're on the safe site. You should also use HTML5 History API and Pushstate in order to reflect the url of the page currently shown, so visitors can use their browser history buttons, send links to pages and favorize them in their browser.
Google and the other search engines normaly won't execute your javascript and try directly to access the links you provide.
If your site got heavy scripts to load or static parts like header, footer, menu it's a great way to improve your loading / rendering speed by hijacking the links and loading only the needed content via javascript.
I've been looking into ways to improve SEO for angularJS apps that are hosted on a CDN like Amazon S3 (i.e. simple storage with no backend). Most of the solutions out there, PhantomJS, prerender.io, seo.js etc., rely on a backend to recognise the ?_escaped_fragment_ url that the crawler generates and then fetch the relevant page from elsewhere. Even grunt-html-snapshot ultimately needs you to do this, even though you generate the snapshot pages ahead of time.
This solution is basically relying on using cloudflare as a reverse proxy, which seems a bit of a waste given that most of the security apparatus etc. that their service provides is totally redundant for a static site. Setting up a reverse proxy myself as suggested here also seems problematic given that it would require either i) routing all AngularJS apps I need static html for through one proxy server which would potentially hamper performance or ii) setting up a separate proxy server for each app, at which point I may as well set up a backend, which isn't affordable at the scale I am working.
Is there anyway of doing this, or are statically hosted AngularJS apps with great SEO basically impossible until google updates their crawlers?
Reposted on webmasters following John Conde's comments.
Actually this is a task that is indeed very troublesome, but I have managed to get SEO working nicely with my AngularJS SPA site (hosted on AWS S3) at http://www.jobbies.co/. The main idea is to pre-generate and populate the content into the HTML. The templates will still be loaded when the page loads and the pre-rendered content will be replaced.
You can read more about my solution at http://www.ericluwj.com/2015/11/17/seo-for-angularjs-on-s3.html, but do note that there are a lot of conditions.
Here is a full overview of how to make your app SEO-friendly on a storage service such as S3, with nice urls (no #) and everything with grunt with the simple command to be performed after build:
grunt seo
It's still a puzzle of workarounds, but it's working and it's the best you can do. Thank you to #ericluwj and his blogpost who inspired me.
Overview
The goal & url structure
The goal is to create 1 html file per state in your angular app. The only major assumption is that you remove the '#' from your url by using html5history (which you should do !) and that all your paths are absolute or using angular states. There are plenty of posts explaining how to do so.
Urls end with a trailing slash like this
http://yourdomain.com/page1/
Personally I made sure that http://yourdomain.com/page1 (no trailing slash) also reaches its destination, but that's off topic here. I also made sure that every language has a different state and a different url.
The SEO logic
Our goal is that when someone reaches your website through an http request:
If it's a search engine crawler: keep him on the page which contains the required html. The page also contains angular logic (eg to start your app) but the crawler cannot read that so he is intentionally stuck with the html you served him and will index that.
For normal humans and intelligent machines : make sure angular gets activated, erase the generated html and start your app normally
The grunt tasks
Here we go with the grunt tasks:
//grunt plugins you will need:
grunt.loadNpmTasks('grunt-prerender');
grunt.loadNpmTasks('grunt-replace');
grunt.loadNpmTasks('grunt-wait');
grunt.loadNpmTasks('grunt-aws-s3');
//The grunt tasks in the right order
grunt.registerTask('seo', 'First launch server, then prerender and replace', function (target) {
grunt.task.run([
'concurrent:seo' //Step 1: in parrallel launch server, then perform so-called seotasks
]);
});
grunt.registerTask('seotasks', [
'http', //This is an API call to get all pages on my website. Skipping this step in this tutorial.
'wait', // wait 1.5 sec to make sure that server is launched
'prerender', //Step 2: create a snapshot of your website
'replace', //Step 3: clean the mess
'sitemap', //Create a sitemap of your production environment
'aws_s3:dev' //Step 4: upload
]);
Step 1: Launch local server with concurrent:seo
We first need to launch a local server (like grunt serve) so that we can take snapshots of our website.
//grunt config
concurrent: {
seo: [
'connect:dist:keepalive', //Launching a server and keeping it alive
'seotasks' //now that we have a running server we can launch the SEO tasks
]
}
Step 2: Create a snapshot of your website with grunt prerender
The grunt-prerender plugins allows you to take a snapshot of any website using PhantomJS. In our case we want to take a snapshot of all pages of the localhost website we just launched.
//grunt config
prerender: {
options: {
sitePath: 'http://localhost:9001', //points to the url of the server you just launched. You can also make it point to your production website.
//As you can see the source urls allow for multiple languages provided you have different states for different languages (see note below for that)
urls: ['/', '/projects/', '/portal/','/en/', '/projects/en/', '/portal/en/','/fr/', '/projects/fr/', '/portal/fr/'],//this var can be dynamically updated, which is done in my case in the callback of the http task
hashed: true,
dest: 'dist/SEO/',//where your static html files will be stored
timeout:5000,
interval:5000, //taking a snapshot of how the page looks like after 5 seconds.
phantomScript:'basic',
limit:7 //# pages processed simultaneously
}
}
Step 3: Clean the mess with grunt replace
If you open the pre-rendered files, they will work for crawlers, but not for humans. For humans using chrome, your directives will load twice. Therefore you need to redirect intelligent browsers to your home page before angular gets activated (i.e., right after head).
//Add the script tag to redirect if we're not a search bot
replace: {
dist: {
options: {
patterns: [
{
match: '<head>',
//redirect to a clean page if not a bot (to your index.html at the root basically).
replacement: '<head><script>if(!/bot|googlebot|crawler|spider|robot|crawling/i.test(navigator.userAgent)) { document.location = "/#" + window.location.pathname; }</script>'
//note: your hashbang (#) will still work.
}
],
usePrefix: false
},
files: [
{expand: true, flatten: false, src: ['dist/SEO/*/**/*.html'], dest: ''}
]
}
Also make sure you have this code in your index.html on your ui-view element, which clears all the generated html directives BEFORE angular starts.
<div ui-view autoscroll="true" id="ui-view"></div>
<!-- this script is needed to clear ui-view BEFORE angular starts to remove the static html that has been generated for search engines who cannot read angular -->
<script>
if(!/bot|googlebot|crawler|spider|robot|crawling/i.test( navigator.userAgent)) { document.getElementById('ui-view').innerHTML = ""; }
</script>
Step 4: Upload to aws
You first upload your dist folder which contains your build. Then you overwrite it with the files you prerendered and updated.
aws_s3: {
options: {
accessKeyId: "<%= aws.accessKeyId %>", // Use the variables
secretAccessKey: "<%= aws.secret %>", // You can also use env variables
region: 'eu-west-1',
uploadConcurrency: 5, // 5 simultaneous uploads
},
dev: {
options: {
bucket: 'xxxxxxxx'
},
files: [
{expand: true, cwd: 'dist/', src: ['**'], exclude: 'SEO/**', dest: '', differential: true},
{expand: true, cwd: 'dist/SEO/', src: ['**'], dest: '', differential: true},
]
}
}
That's it, you have your solution ! Both humans and bots will be able to read your web-app
if you use ng-cloak in interesting ways there could be a good solution.
I haven't tried this myself, but it should work in theory
The solution is highly dependent on CSS, but it should perfectly well.
For example you have three states in your angular app:
- index (pathname : #/)
- about (pathname : #/about)
- contact (pathname : #/contact)
The base case for index can be added in too, but will be tricky so I'll leave it out for now.
Make your HTML look like this:
<body>
<div ng-app="myApp" ng-cloak>
<!-- Your whole angular app goes here... -->
</div>
<div class="static">
<div id="about class="static-other">
<!-- Your whole about content here... -->
</div>
<div id="contact" class="static-other">
<!-- Your whole contact content here... -->
</div>
<div id="index" class="static-main">
<!-- Your whole index content here... -->
</div>
</div>
</body>
(It's Important that you put your index case last, if you want to make it more awesome)
Next Make your CSS look something like this:-
[ng-cloak], .static { display: none; }
[ng-cloak] ~ .static { display: block; }
Just that will probably work well enough for you anyway.
The mg-cloak directive will keep your angular app hidden when angular is not loaded and will show your static content instead. Google will get your static content in the HTML.
As a bonus end-users can also see well styles static content while angular loads.
You can then get more creative if you start using :target pseudo selectors in your CSS. You can use actual links in your Static content, but just make them links to various ids. So in #index div make sure you have links to #about and #contact. Note the missing '/' in the links. HTML id's can't start with a slash.
Then make your CSS look like this:
[ng-cloak], .static { display: none; }
[ng-cloak] ~ .static { display: block; }
.static-other {display: none;}
.static-other:target {display: block;}
.static-other:target ~ .static-main {display: none;}
You now have a full functioning static app WITH ROUTINg that works before angular starts-up.
As an additional bonus, when angular starts up it is smart enough to convert #about to #/about automatically, and the experience shouldn't even break at all.
Also, not to forget the SEO problem has totally been solved, of course. I've not used this technique yet, as I've always had a server to configure, but I'm very interested in how this works out for you.
Hope this helps.
As AWS is offering Lambda#Edge as a service we can handle this issue without grunt or anything else. (Atleast for basic stuff)
I tried Lambda#Edge and it worked as expected, in my case I just had all the routes set to "/" in Lambda#Edge (Except for the files are present in s3 like css, images etc).
The event for the Lambda that I set to is "viewerRequest" and following is the code.
'use strict';
exports.handler = (event, context, callback) => {
console.log("Event received is", JSON.stringify(event));
console.log("Context received is", context);
const request = event.Records[0].cf.request;
if (request.uri.endsWith(".rt")) {
console.log("URI is matching with .rt, the URI is ", request.uri);
request.uri = "/";
} else {
console.log("URI is not ending with rt so letting it go URI is", request.uri);
}
console.log("Final request URI is", request.uri);
callback(null, request);
};
Logs in the cloudwatch are little difficult to check as the logs are populated in the region of the cloudwatch which is nearer to the edge location which is handling the request.
For ex. Though this Lambda is deployed/written for us-east I see this in ap-south region as I am accessing the cloudfront from Singapore.
Checked it in Google webmaster tools 'Fetch as google' options and the page is being rendered and viewed as expected.
I've been looking for days to find a solution for this. As far as I know there isn't nice solution to the problem. I hope firebase will eventually enable user-agent redirects. If you have the money you could use MaxCDN enterprise. They offer Edge Rules which includes redirects by user agent .
https://www.maxcdn.com/features/rules/