How to remotely fetch answer of phantomjs script run on heroku? - javascript

I want to fetch some information from a website using the phantomjs/casperjs libraries, as I'm looking for the HTML result after all javascripts on a site are run. I worked it out with the following code from this answer:
var page = require('webpage').create();
page.open('http://www.scorespro.com/basketball/', function (status) {
if (status !== 'success') {
console.log('Unable to access network');
} else {
var p = page.evaluate(function () {
return document.getElementsByTagName('html')[0].innerHTML
});
console.log(p);
}
phantom.exit();
});
And I also worked it out to get phantomjs/casperjs running on heroku by following these instructions, so when I now say heroku run phantomjs theScriptAbove.js on OS X terminal I get the HTML of the given basketball scores website as I was expecting.
But what I actually want is to get the html text from within a Mac desktop application, this is the reason why I was looking for a way to run the scripts on a web server like heroku. So, my question is:
Is there any way to get the HTML text (that my script prints as a result) remotely within my Objective-C desktop application?
Or asked in another way: how can I run and get the answer of my script remotely by using POST/GET?
p.s.
I can handle with Rails applications, so if there's a way to do this using Rails - I just need the basic idea of what I have to do and how to get the phantomjs script to communicate with Rails. But I think there might be an even simpler solution ...

If I understand you correctly you're talking about interprocess communication - so that Phantom's result (the page HTML) can somehow be retrieved by the app.
per the phantom docs, couple options:
write the HTML to a file and pick up the file in your app
run the webserver module and do a GET to phantom, and have the phantom script respond with the page HTML
see http://phantomjs.org/api/webserver/

Related

Can't load views in AppScript test deployment because of XFrameOptions 'sameorigin'

I am trying to deploy a Web App using Google AppScript with multiple views. I have an appCover.html with a few buttons and each button redirects to a different page. The app cover (or homepage) loads flawlessly but when I click on any of the buttons I get the error in the console:
Refused to display
'https://script.google.com/macros/s/sriptID/dev?v=newPage'
in a frame because it set 'X-Frame-Options' to 'sameorigin'
I have looked into Google's developer resources and all the references I found tell to add the XFrameOptionsMode.ALLOWALL. So I did but still no success. This is the function that is rendering each page:
function render(file, argsObject) {
var tmp = HtmlService.createTemplateFromFile(file);
if (argsObject) {
var keys = Object.keys(argsObject);
keys.forEach(key => {
tmp.key = argsObject[key];
});
}
return tmp.evaluate().setXFrameOptionsMode(HtmlService.XFrameOptionsMode.ALLOWALL)
}
Right now I am testing the deployment so I get /dev at the end when retrieving the URL and try to route with a parameter like /dev?v=newPage. Does it make a difference tying to access these pages when testing the deployment versus deploying itself?
It's a personal app so I'm the only user.
Any other ideas on how to solve this?
The problem is that you are using the dev version
Deploy your Webapp as an exec version
https://script.google.com/macros/s/XXX/scriptID
and then build your URL as
https://script.google.com/macros/s/sriptID/exec?v=newPage
Once you deploy your Webapp as a exec version, the method ScriptApp.getService().getUrl() will return you the corerct (exec) URL which you can use as a variable to dynamically build your redirection to different pages / views.

Shopify App - Using Script Tags with Ruby on Rails Application

I'm trying to familiarize myself with the concept of using script tags. I'm making a ruby on rails app that does something as simple as alert "Hi" when a customer visits a page. I am testing this public app on a local server and I have the shopify_app gem installed. The app has been authenticated and I have access to the store's data. I've viewed the Shopify API documentation on using script tags and I've looked at the Shopify Embedded App example that Shopify has on GitHub. The documentation details the properties of a script tag and gives examples of script tags with their properties defined, but doesn't say anything about where to place the script tag in an application, or how to configure an environment so that the js file in the script tag will go through.
I've discovered that a js file being added with a script tag will only work if the js file is hosted online, so I've uploaded the js file to google drive. I have the code for the script tag in the index action of my HomeController (the default page for the app). This is the code I'm using:
def index
if response = request.env['omniauth.auth']
sess = ShopifyAPI::Session.new(params[:shop], response[:credentials][:token])
session[:shopify] = sess
ShopifyAPI::Base.activate_session(sess)
ShopifyAPI::ScriptTag.create(
:event => "onload",
:src => "https://drive.google.com/..."
)
end
I think the problem may be tied to the request.env. The response is not being read as request.env[omniauth.auth] and I believe that the response coming back as valid may be required for the script tag to go through.
The method that I tried above is from the 2nd answer given in this topic: How to develop rails app for shopify with ScriptTags.
The first answer suggested using this code:
ShopifyAPI::Base.site = token
s = ShopifyAPI::ScriptTag.create(:events => "onload",:src => "your javascript url")
However, it doesn't say where to place both lines of code in a rails application. I tried putting the second line in a js file in my rails application, but it did not work.
I don't know if I'm encountering problems because I'm running the app on a local server or if there is something missing from the configuration of my application.
I'd appreciate it if anyone could point me in the right direction.
Try putting something like this in config/initializers/shopify_app.rb
ShopifyApp.configure do |config|
config.api_key = "xxx-xxxx-xxx-xxx"
config.secret = "xxx-xxxx-xxx-xxx"
config.scope = "read_orders, read_products"
config.embedded_app = true
config.scripttags = [
{event:'onload', src: 'https://yourdomain.herokuapp.com/javascripts/yourjs.js'}
]
end
Yes, you are correct that you'll need the js file you want to include for your script tag publicly available - if you are using localhost for development look into ngrok.
Do yourself the favor of ensuring your callbacks use SSL when interacting with the Shopify API (i.e. configure your app with https://localhost/ as a callback setting in the Shopify app settings). I went through the trouble of configuring thin as the web server locally with a self-signed SSL certificate.
With a proper set up you should be able to debug why the response is failing the omniauth check.
I'm new to the Shopify API(s), but not Rails. Their documentation leaves a lot to be desired.
Good luck to you sir,

Collecting data doesn't work with Angular.js websites

I am currently writing a program that collects information from a sports website. (it contains the history of some basketball matches) The problem is that the website uses Angular.js for dynamical HTML binding. Consequently, the HTML source code involves lots of variables.
I need to find out the values of the variables in order to make my program work as I want. Is there any library or framework that could help me?
Edit: I am not limited by anything, but I prefer a web app (MEAN, JS frameworks with node-webkit). If it can't be done, I can also code it in C++ or Java (or extend it further to Android with NDK or SDK)
Disclaimer: This is not grey-hat stuff. I just need to do some web-scraping.
PhantomJS is a headless browser. It will allow you to use JavaScript to get the information you want.
Details:
It will browse to the page you want, execute the JavaScript like any browser and have access to the page as if it was displayed to a normal user using a normal browser. Using JavaScript DOM traversal, you will be able to get the information you need. This is almost the same as automatizing the task of opening a console in a browser and executing javascript which will get the information from the page.
While the below example is really simple, it can do much more than just getting the page results... it can click buttons, navigate to other pages, extract only relevant information, extract the page as an image... Do not hesitate referring to its Quick start documentation to learn more about it.
Example script returning the complete HTML page after waiting 10 seconds for the AngularJS to have finished calculating the page:
Command line usage: phantomjs-1.9.1 this_script.js
this_script.js (PhantomJS 2.0 may have different syntax in some cases):
var url = phantom.args[0]
function getDocumentElementAsHTML(page) {
return page.evaluate(function() {
return document.documentElement.innerHTML
})
}
var page = new WebPage()
page.settings.userAgent = "PhantomJS"
//page.onConsoleMessage = function (msg) { console.log(msg); }
page.open(url, function (status) {
if (status !== 'success') {
console.log('Unable to access network')
phantom.exit()
} else {
setTimeout(function(){
console.log(getDocumentElementAsHTML(page))
phantom.exit()
},10000)
}
});
PS: Waiting 10 seconds is not always a great solution, I used to periodically test the existence of the elements I wanted to get information from to be sure the JavaScript finished loading instead.
Source: grey-hat stuff I did in the past
I'd say you'd want to look at http://phantomjs.org/, http://www.slimerjs.org/, and/or http://casperjs.org/.
Phantom & Slimer give you API access to Webkit and Gecko respectively. Casper adds a more user friendly API over the top.

How do I prevent Javascript from mutating a page in Selenium? How do I download the original page source? [duplicate]

This question already has answers here:
getting the raw source from Firefox with javascript
(3 answers)
Closed 8 years ago.
I'm not using Selenium to automate testing, but to automate saving AJAX pages that inject content, even if they require prior authentication to access.
I tried
tl;dr: I tried multiple tools for downloading sites with AJAX and gave up because they were hard to work with or simply didn't work. I'm resorting to using Selenium after trying out WebHTTrack (whose GUI wasn't able to start up on my Ubuntu machine + was a headache to provide authentication with in interactive-terminal mode), wget (which didn't download any of the scripts of stylesheets included on my page, see the bottom for what I tried with wget)... and then I finally gave up after a promising post on using a Mozilla XULRunner AJAX scraper called Crowbar simply seg-faulted on me. So...
ended up making my own broken thing in NodeJS and Selenium-WebdriverJS
My NodeJS script uses selenium-webdriver npm module which is "officially supported by the main project" to:
provide login information + do necessary button-clicking & typing for authentication
download all JS and CSS referenced on target page
download target page with original JS/CSS file links change to local file paths
Now when I view my test page locally I see double of many page elements because the target site loads HTML snippets into the page each time it's loaded. I use this to download my target page right now:
var $;
var getTarget = function () {
driver.getPageSource().then(function (source) {
$ = cheerio.load(source.toString());
});
};
var targetHtmlDest = 'test.html';
var writeTarget = function () {
fs.writeFile(targetHtmlDest, $.html());
}
driver.get(targetSite)
.then(authenticate)
.then(getRoot)
.then(downloadResources)
.then(writeRoot);
driver.quit();
The problem is that the page source I get is the already modified page source, instead of the original one. Trying to run alert("x");window.stop(); within driver.executeAsyncScript() and driver.executeScript() does nothing.
Perhaps using Curl to get the page (you can pass authentication in the command) will get you the bare source?
Otherwise you may be able to turn off JavaScript on your test browsers to prevent JS actions from firing.

Is there a way to use a command line tool to view the JavaScript interpreted source of a web page?

Is there a command line tool that allows you to get the text of the JavaScript interpreted source of a web page similar to how you can see the interpreted code in FireBug on FireFox?
I would like to use CURL or similar tool to request a web page. The catch is I would like to view how the code has been modified by JavaScript. For instance if the dom has been changed or an element has been modified, I would like to see the modified version. I know FireBug does this for FireFox, but I am looking for a way to script the process.
Have you looked in to tools like PhantomJS for running the tests? Many of them support running a "headless" browser, which lets you render pages and run JS against the rendered page without having to actually run a browser. It doesn't use curl, but I don't see why that should be a requirement.
For instance:
$ phantomjs save_page.js http://example.com
with save_page.js:
var system = require('system');
var page = require('webpage').create();
page.open(system.args[1], function()
{
console.log(page.content);
phantom.exit();
});

Categories