I am trying to take the value of am input, use AJax to submit these variables into a php function, call PhantomJS from said PHP function WITH these arguments passed from AJax, and return the result back to the HTML page. I am passing the variables to the PHP file perfectly fine, the problem arises from calling PhantomJS with my script followed by the three arguments.
This is the script on my PHP page to call PhantomJS
echo json_encode(array("abc" => shell_exec('/Applications/XAMPP/htdocs/scripts/phantom/bin/phantomjs /Applications/XAMPP/htdocs/scripts/phantom/examples/test.js 2>&1',$website)));
This is the script referenced in the shell script:
var args = require('system').args;
args.forEach(function(arg, i) {
console.log(i+'::'+arg);
});
var page = require('webpage').create();
var address = args[1];
page.open(address, function () {
console.log("Done")
});
As you can see it should be a relatively simple process, except nothing at all is being echo'd. Permissions for each file are more than adequate, and I am sure these files are executing because if I change the shell script to run hello.jsEverything echo's and logs perfectly.
ALSO NOTE This script is executing on my web server, so I am not 100% certain there IS a system variable.
Any ideas?
First issue, shell_exec() takes a single argument (Documentation). However your example is passing the shell argument ($website) as a second argument on shell_exec().
Corrected Example:
$shellReturn = shell_exec("/Applications/XAMPP/htdocs/scripts/phantom/bin/phantomjs /Applications/XAMPP/htdocs/scripts/phantom/examples/test.js " . $website);
echo json_encode(array("abc" => $shellReturn));
For simplicity i excluded the redirect of the error pipe. In addition i would suggest you pass the arguments as JSON wrapped in base64 encoding. This eliminates URL spacing resulting in multiple arguments. Once PhantomJS receives the system args use atob() to bring the JSON back and iterate over the JSON obj rather than the raw string arguments.
I would also point you towards this project: https://github.com/merlinthemagic/MTS, Under the hood is an instance of PhantomJS, the project just wraps the functionality of PhantomJS.
$myUrl = "http://www.example.com"; //replace with content of your $website variable
$windowObj = \MTS\Factories::getDevices()->getLocalHost()->getBrowser('phantomjs')->getNewWindow($myUrl);
//if you want the DOM or maybe screenshot and any point run:
$dom = $windowObj->getDom();
$imageData = $windowObj->screenshot();
Related
Is there a way I can run a php function through a JS function?
something like this:
<script type="text/javascript">
function test(){
document.getElementById("php_code").innerHTML="<?php
query("hello"); ?>";
}
</script>
<a href="#" style="display:block; color:#000033; font-family:Tahoma; font-size:12px;"
onclick="test(); return false;"> test </a>
<span id="php_code"> </span>
I basically want to run the php function query("hello"), when I click on the href called "Test" which would call the php function.
This is, in essence, what AJAX is for. Your page loads, and you add an event to an element. When the user causes the event to be triggered, say by clicking something, your Javascript uses the XMLHttpRequest object to send a request to a server.
After the server responds (presumably with output), another Javascript function/event gives you a place to work with that output, including simply sticking it into the page like any other piece of HTML.
You can do it "by hand" with plain Javascript , or you can use jQuery. Depending on the size of your project and particular situation, it may be more simple to just use plain Javascript .
Plain Javascript
In this very basic example, we send a request to myAjax.php when the user clicks a link. The server will generate some content, in this case "hello world!". We will put into the HTML element with the id output.
The javascript
// handles the click event for link 1, sends the query
function getOutput() {
getRequest(
'myAjax.php', // URL for the PHP file
drawOutput, // handle successful request
drawError // handle error
);
return false;
}
// handles drawing an error message
function drawError() {
var container = document.getElementById('output');
container.innerHTML = 'Bummer: there was an error!';
}
// handles the response, adds the html
function drawOutput(responseText) {
var container = document.getElementById('output');
container.innerHTML = responseText;
}
// helper function for cross-browser request object
function getRequest(url, success, error) {
var req = false;
try{
// most browsers
req = new XMLHttpRequest();
} catch (e){
// IE
try{
req = new ActiveXObject("Msxml2.XMLHTTP");
} catch(e) {
// try an older version
try{
req = new ActiveXObject("Microsoft.XMLHTTP");
} catch(e) {
return false;
}
}
}
if (!req) return false;
if (typeof success != 'function') success = function () {};
if (typeof error!= 'function') error = function () {};
req.onreadystatechange = function(){
if(req.readyState == 4) {
return req.status === 200 ?
success(req.responseText) : error(req.status);
}
}
req.open("GET", url, true);
req.send(null);
return req;
}
The HTML
test
<div id="output">waiting for action</div>
The PHP
// file myAjax.php
<?php
echo 'hello world!';
?>
Try it out: http://jsfiddle.net/GRMule/m8CTk/
With a javascript library (jQuery et al)
Arguably, that is a lot of Javascript code. You can shorten that up by tightening the blocks or using more terse logic operators, of course, but there's still a lot going on there. If you plan on doing a lot of this type of thing on your project, you might be better off with a javascript library.
Using the same HTML and PHP from above, this is your entire script (with jQuery included on the page). I've tightened up the code a little to be more consistent with jQuery's general style, but you get the idea:
// handles the click event, sends the query
function getOutput() {
$.ajax({
url:'myAjax.php',
complete: function (response) {
$('#output').html(response.responseText);
},
error: function () {
$('#output').html('Bummer: there was an error!');
}
});
return false;
}
Try it out: http://jsfiddle.net/GRMule/WQXXT/
Don't rush out for jQuery just yet: adding any library is still adding hundreds or thousands of lines of code to your project just as surely as if you had written them. Inside the jQuery library file, you'll find similar code to that in the first example, plus a whole lot more. That may be a good thing, it may not. Plan, and consider your project's current size and future possibility for expansion and the target environment or platform.
If this is all you need to do, write the plain javascript once and you're done.
Documentation
AJAX on MDN - https://developer.mozilla.org/en/ajax
XMLHttpRequest on MDN - https://developer.mozilla.org/en/XMLHttpRequest
XMLHttpRequest on MSDN - http://msdn.microsoft.com/en-us/library/ie/ms535874%28v=vs.85%29.aspx
jQuery - http://jquery.com/download/
jQuery.ajax - http://api.jquery.com/jQuery.ajax/
PHP is evaluated at the server; javascript is evaluated at the client/browser, thus you can't call a PHP function from javascript directly. But you can issue an HTTP request to the server that will activate a PHP function, with AJAX.
The only way to execute PHP from JS is AJAX.
You can send data to server (for eg, GET /ajax.php?do=someFunction)
then in ajax.php you write:
function someFunction() {
echo 'Answer';
}
if ($_GET['do'] === "someFunction") {
someFunction();
}
and then, catch the answer with JS (i'm using jQuery for making AJAX requests)
Probably you'll need some format of answer. See JSON or XML, but JSON is easy to use with JavaScript. In PHP you can use function json_encode($array); which gets array as argument.
I recently published a jQuery plugin which allows you to make PHP function calls in various ways: https://github.com/Xaxis/jquery.php
Simple example usage:
// Both .end() and .data() return data to variables
var strLenA = P.strlen('some string').end();
var strLenB = P.strlen('another string').end();
var totalStrLen = strLenA + strLenB;
console.log( totalStrLen ); // 25
// .data Returns data in an array
var data1 = P.crypt("Some Crypt String").data();
console.log( data1 ); // ["$1$Tk1b01rk$shTKSqDslatUSRV3WdlnI/"]
I have a way to make a Javascript call to a PHP function written on the page (client-side script). The PHP part 'to be executed' only occurs on the server-side on load or refreshing'. You avoid 'some' server-side resources. So, manipulating the DOM:
<?PHP
echo "You have executed the PHP function 'after loading o refreshing the page<br>";
echo "<i><br>The server programmatically, after accessing the command line resources on the server-side, copied the 'Old Content' from the 'text.txt' file and then changed 'Old Content' to 'New Content'. Finally sent the data to the browser.<br><br>But If you execute the PHP function n times your page always displays 'Old Content' n times, even though the file content is always 'New Content', which is demonstrated (proof 1) by running the 'cat texto.txt' command in your shell. Displaying this text on the client side proves (proof 2) that the browser executed the PHP function 'overflying' the PHP server-side instructions, and this is because the browser engine has restricted, unobtrusively, the execution of scripts on the client-side command line.<br><br>So, the server responds only by loading or refreshing the page, and after an Ajax call function or a PHP call via an HTML form. The rest happens on the client-side, presumably through some form of 'RAM-caching</i>'.<br><br>";
function myPhp(){
echo"The page says: Hello world!<br>";
echo "The page says that the Server '<b>said</b>': <br>1. ";
echo exec('echo $(cat texto.txt);echo "Hello world! (New content)" > texto.txt');echo "<br>";
echo "2. I have changed 'Old content' to '";
echo exec('echo $(cat texto.txt)');echo ".<br><br>";
echo "Proofs 1 and 2 say that if you want to make a new request to the server, you can do: 1. reload the page, 2. refresh the page, 3. make a call through an HTML form and PHP code, or 4. do a call through Ajax.<br><br>";
}
?>
<div id="mainx"></div>
<script>
function callPhp(){
var tagDiv1 = document.createElement("div");
tagDiv1.id = 'contentx';
tagDiv1.innerHTML = "<?php myPhp(); ?>";
document.getElementById("mainx").appendChild(tagDiv1);
}
</script>
<input type="button" value="CallPHP" onclick="callPhp()">
Note: The texto.txt file has the content 'Hello world! (Old content).
The 'fact' is that whenever I click the 'CallPhp' button I get the message 'Hello world!' printed on my page. Therefore, a server-side script is not always required to execute a PHP function via Javascript.
But the execution of the bash commands only happens while the page is loading or refreshing, never because of that kind of Javascript apparent-call raised before. Once the page is loaded, the execution of bash scripts requires a true-call (PHP, Ajax) to a server-side PHP resource.
So, If you don't want the user to know what commands are running on the server:
You 'should' use the execution of the commands indirectly through a PHP script on the server-side (PHP-form, or Ajax on the client-side).
Otherwise:
If the output of commands on the server-side is not delayed:
You 'can' use the execution of the commands directly from the page (less 'cognitive' resources—less PHP and more Bash—and less code, less time, usually easier, and more comfortable if you know the bash language).
Otherwise:
You 'must' use Ajax.
I am trying to get the HTML (ie what you see initially when the page completes loading) for some web-page URI. Stripping out all error checking and assuming static HTML, it's a single line of code:
function GetDisplayedHTML($uri) {
return file_get_contents($uri);
}
This works fine for static HTML, and is easy to extend by simple parsing, if the page has static file dependencies/references. So tags like <script src="XXX">, <a href="XXX">, <img src="XXX">, and CSS, can also be detected and the dependencies returned in an array, if they matter.
But what about web pages where the HTML is dynamically created using events/AJAX? For example suppose the HTML for the web page is just a brief AJAX-based or OnLoad script that builds the visible web page? Then parsing alone won't work.
I guess what I need is a way from within PHP, to open and render the http response (ie the HTML we get at first) via some javascript engine or browser, and once it 'stabilises', capture the HTML (or static DOM?) that's now present, which will be what the user's actually seeing.
Since such a webpage could continually change itself, I'd have to define "stable" (OnLoad or after X seconds?). I also don't need to capture any timer or async event states (ie "things set in motion that might cause web page updates at some future time"). I only need enough of the DOM to represent the static appearance the user could see, at that time.
What would I need to do, to achieve this programmatically in PHP?
To render page with JS you need to use some browser. PhantomJS was created for tasks like this. Here is simple script to run with Phantom:
var webPage = require('webpage');
var page = webPage.create();
var system = require('system');
var args = system.args;
if (args.length === 1) {
console.log('First argument must be page URL!');
} else {
page.open(args[1], function (status) {
window.setTimeout(function () { //Wait for scripts to run
var content = page.content;
console.log(content);
phantom.exit();
}, 500);
});
}
It returns resulting HTML to console output.
You can run it from console like this:
./phantomjs.exe render.js http://yandex.ru
Or you can use PHP to run it:
<?php
$path = dirname(__FILE__);
$html = shell_exec($path . DIRECTORY_SEPARATOR . 'phantomjs.exe render.js http://phantomjs.org/');
echo htmlspecialchars($html);
My PHP code assumes that PhantomJS executable is in the same directory as PHP script.
I'm developing application using AngularJS. Everything seems to be nice until I meet something that leads me to headache: SEO.
From many references, I found out that AJAX content crawled and indexed by Google bot or Bing bot 'is not that easy' since the crawlers don't render Javascript.
Currently I need a solution using PHP. I use PHP Slim Framework so my main file is index.php which contains function to echo the content of my index.html. My question is:
Is it possible to make a snapshot of rendered Javascript in HTML?
My strategy is:
If the request query string contains _escaped_fragment_, the application will generate a snapshot and give that snapshot as response instead of the exact file.
Any help would be appreciated. Thanks.
After plenty of times searching and researching, I finally managed to solve my problem by mixing PHP with PhantomJS (version 2.0). I use exec() function in PHP to run phantomJS and create Javascript file to take get the content of the targeted URL. Here are the snippets:
index.php
// Let's assume that you have a bin folder under your root folder directory which contains phantomjs.exe and content.js
$script = __DIR__ ."/bin/content.js";
$target = "http://www.kincir.com"; // target URL
$cmd = __DIR__."/bin/phantomjs.exe $script $target";
exec($cmd, $output);
return implode("", $output);
content.js
var webPage = require('webpage');
var system = require('system');
var page = webPage.create();
var url = system.args[1]; // This will get the second argument from $cmd, in this example, it will be the value of $target on index.php which is "http://www.kincir.com"
page.open(url, function (status) {
page.onLoadFinished = function () { // Make sure to return the content of the page once the page is finish loaded
var content = page.content;
console.log(content);
phantom.exit();
};
});
I recently published a project that gives PHP access to a browser. Get it here: https://github.com/merlinthemagic/MTS. It also relies on PhantomJS.
After downloading and setup you would simply use the following code:
$myUrl = "http://www.example.com";
$windowObj = \MTS\Factories::getDevices()->getLocalHost()->getBrowser('phantomjs')->getNewWindow($myUrl);
//now you can either retrive the DOM and parse it, like this:
$domData = $windowObj->getDom();
//this project also lets you manipulate the live page. Click, fill forms, submit etc.
I wrote a perl script that handles some data automatically. However, I face a problem when I try to call the script from my thunderbird extension that is naturally written in javascript.
var file = Components.classes["#mozilla.org/file/local;1"]
.createInstance(Components.interfaces.nsILocalFile);
file.initWithPath("/usr/bin/perl");
// create an nsIProcess
var process = Components.classes["#mozilla.org/process/util;1"]
.createInstance(Components.interfaces.nsIProcess);
process.init(file);
// Run the process.
// If first param is true, calling thread will be blocked until
// called process terminates.
// Params are used to pass command-line arguments
// to the process
var args = ["package/myperlscript.pl", "some arguments];
process.run(true, args, args.length);
I guess I have the perl script placed at the wrong location. I tried various ones, but I could not get it work. If that is my major mistake, where is the base path that the javascript file expects?
I have about 100 static HTML pages that I want to apply some DOM manipulations to. They all follow the same HTML structure. I want to apply some DOM manipulations to each of these files, and then save the resulting HTML.
These are the manipulations I want to apply:
# [start]
$("h1.title, h2.description", this).wrap("<hgroup>");
if ( $("h1.title").height() < 200 ) {
$("div.content").addClass('tall');
}
# [end]
# SAVE NEW HTML
The first line (.wrap()) I could easily do with a find and replace, but it gets tricky when I have to determine the calculated height of an element, which can't be easily be determined sans-JavaScript.
Does anyone know how I can achieve this? Thanks!
While the first part could indeed be solved in "text mode" using regular expressions or a more complete DOM implementation in JavaScript, for the second part (the height calculation), you'll need a real, full browser or a headless engine like PhantomJS.
From the PhantomJS homepage:
PhantomJS is a command-line tool that packs and embeds WebKit.
Literally it acts like any other WebKit-based web browser, except that
nothing gets displayed to the screen (thus, the term headless). In
addition to that, PhantomJS can be controlled or scripted using its
JavaScript API.
A schematic instruction (which I admit is not tested) follows.
In your modification script (say, modify-html-file.js) open an HTML page, modify it's DOM tree and console.log the HTML of the root element:
var page = new WebPage();
page.open(encodeURI('file://' + phantom.args[0]), function (status) {
if (status === 'success') {
var html = page.evaluate(function () {
// your DOM manipulation here
return document.documentElement.outerHTML;
});
console.log(html);
}
phantom.exit();
});
Next, save the new HTML by redirecting your script's output to a file:
#!/bin/bash
mkdir modified
for i in *.html; do
phantomjs modify-html-file.js "$1" > modified/"$1"
done
I tried PhantomJS as in katspaugh's answer, but ran into several issues trying to manipulate pages. My use case was modifying the static html output of Doxygen, without modifying Doxygen itself. The goal was to reduce delivered file size by remove unnecessary elements from the page, and convert it to HTML5. Additionally I also wanted to use jQuery to access and modify elements more easily.
Loading the page in PhantomJS
The APIs appear to have changed drastically since the accepted answer. Additionally, I used a different approach (derived from this answer), which will be important in mitigating one of the major issues I encountered.
var system = require('system');
var fs = require('fs');
var page = require('webpage').create();
// Reading the page's content into your "webpage"
// This automatically refreshes the page
page.content = fs.read(system.args[1]);
// Make all your changes here
fs.write(system.args[2], page.content, 'w');
phantom.exit();
Preventing JavaScript from Running
My page uses Google Analytics in the footer, and now the page is modified beyond my intention, presumably because javascript was run. If we disable javascript, we can't actually use jQuery to modify the page, so that isn't an option. I've tried temporarily changing the tag, but when I do, every special character is replaced with an html-escaped equivalent, destroying all javascript code on the page. Then, I came across this answer, which gave me the following idea.
var rawPageString = fs.read(system.args[1]);
rawPageString = rawPageString.replace(/<script type="text\/javascript"/g, "<script type='foo/bar'");
rawPageString = rawPageString.replace(/<script>/g, "<script type='foo/bar'>");
page.content = rawPageString;
// Make all your changes here
rawPageString = page.content;
rawPageString = rawPageString.replace(/<script type='foo\/bar'/g, "<script");
Adding jQuery
There's actually an example on how to use jQuery. However, I thought an offline copy would be more appropriate. Initially I tried using page.includeJs as in the example, but found that page.injectJs was more suitable for the use case. Unlike includeJs, there's no <script> tag added to the page context, and the call blocks execution which simplifies the code. jQuery was placed in the same directory I was executing my script from.
page.injectJs("jquery-2.1.4.min.js");
page.evaluate(function () {
// Make all changes here
// Remove the foo/bar type more easily here
$("script[type^=foo]").removeAttr("type");
});
fs.write(system.args[2], page.content, 'w');
phantom.exit();
Putting it All Together
var system = require('system');
var fs = require('fs');
var page = require('webpage').create();
var rawPageString = fs.read(system.args[1]);
// Prevent in-page javascript execution
rawPageString = rawPageString.replace(/<script type="text\/javascript"/g, "<script type='foo/bar'");
rawPageString = rawPageString.replace(/<script>/g, "<script type='foo/bar'>");
page.content = rawPageString;
page.injectJs("jquery-2.1.4.min.js");
page.evaluate(function () {
// Make all changes here
// Remove the foo/bar type
$("script[type^=foo]").removeAttr("type");
});
fs.write(system.args[2], page.content, 'w');
phantom.exit();
Using it from the command line:
phantomjs modify-html-file.js "input_file.html" "output_file.html"
Note: This was tested and working with PhantomJS 2.0.0 on Windows 8.1.
Pro tip: If speed matters, you should consider iterating the files from within your PhantomJS script rather than a shell script. This will avoid the latency that PhantomJS has when starting up.
you can get your modified content by $('html').html() (or a more specific selector if you don't want stuff like head tags), then submit it as a big string to your server and write the file server side.