WebExtension how to send data from popup.html to content scripts [duplicate] - javascript

I have a text area and a button in my Chrome extension popup. I want users to input desired text in the text area. Then, once they click the button, it will inject a content script to change the text of the fields on the current page that have <textarea class="comments"> to the text that user entered in the <textarea> in the Chrome extension popup.
My question is, how can I get the text from the <textarea> in my popup.html and pass it from the popup.js to the content script?
This is what I have currently:
popup.html:
<!doctype html>
<html>
<head><title>activity</title></head>
<body>
<button id="clickactivity3">N/A</button>
<textarea rows="4" cols="10" id="comments" placeholder="N/A Reason..."></textarea>
<script src="popup.js"></script>
</body>
</html>
popup.js:
function injectTheScript3() {
chrome.tabs.query({active: true, currentWindow: true}, function(tabs) {
// query the active tab, which will be only one tab
//and inject the script in it
chrome.tabs.executeScript(tabs[0].id, {file: "content_script3.js"});
});
}
document.getElementById('clickactivity3').addEventListener('click', injectTheScript3);
content_script3:
//returns a node list which is good
var objSelectComments = document.querySelectorAll('.comments');
//desired user input how?
function setCommentsValue(objSelectComments,){
//This will be for loop to iterate among all the text fields on the page, and apply
// the text to each instance.
for (var i=0; i<objSelectComments.length; i++) {
objSelectComments[i]= //user's desired text
}

There are three general ways to do this:
Use chrome.storage.local (MDN) to pass the data (set prior to injecting your script).
Inject code prior to your script which sets a variable with the data (see detailed discussion for possible security issue).
Use message passing (MDN) to pass the data after your script is injected.
Use chrome.storage.local (set prior to executing your script)
Using this method maintains the execution paradigm you are using of injecting a script that performs a function and then exits. It also does not have the potential security issue of using a dynamic value to build executing code, which is done in the second option below.
From your popup script:
Store the data using chrome.storage.local.set() (MDN)
In the callback for chrome.storage.local.set(), call tabs.executeScript() (MDN)
var updateTextTo = document.getElementById('comments').value;
chrome.storage.local.set({
updateTextTo: updateTextTo
}, function () {
chrome.tabs.executeScript({
file: "content_script3.js"
});
});
From your content script:
Read the data from chrome.storage.local.get() (MDN)
Make the changes to the DOM
Invalidate the data in storage.local (e.g. remove the key: chrome.storage.local.remove() (MDN)).
chrome.storage.local.get('updateTextTo', function (items) {
assignTextToTextareas(items.updateTextTo);
chrome.storage.local.remove('updateTextTo');
});
function assignTextToTextareas(newText){
if (typeof newText === 'string') {
Array.from(document.querySelectorAll('textarea.comments')).forEach(el => {
el.value = newText;
});
}
}
See: Notes 1 & 2.
Inject code prior to your script to set a variable
Prior to executing your script, you can inject some code that sets a variable in the content script context which your primary script can then use:
Security issue:
The following uses "'" + JSON.stringify().replace(/\\/g,'\\\\').replace(/'/g,"\\'") + "'" to encode the data into text which will be proper JSON when interpreted as code, prior to putting it in the code string. The .replace() methods are needed to A) have the text correctly interpreted as a string when used as code, and B) quote any ' which exist in the data. It then uses JSON.parse() to return the data to a string in your content script. While this encoding is not strictly required, it is a good idea as you don't know the content of the value which you are going to send to the content script. This value could easily be something that would corrupt the code you are injecting (i.e. The user may be using ' and/or " in the text they entered). If you do not, in some way, escape the value, there is a security hole which could result in arbitrary code being executed.
From your popup script:
Inject a simple piece of code that sets a variable to contain the data.
In the callback for chrome.tabs.executeScript() (MDN), call tabs.executeScript() to inject your script (Note: tabs.executeScript() will execute scripts in the order in which you call tabs.executeScript(), as long as they have the same value for runAt. Thus, waiting for the callback of the small code is not strictly required).
var updateTextTo = document.getElementById('comments').value;
chrome.tabs.executeScript({
code: "var newText = JSON.parse('"
+ JSON.stringify(updateTextTo).replace(/\\/g,'\\\\').replace(/'/g,"\\'") + "';"
}, function () {
chrome.tabs.executeScript({
file: "content_script3.js"
});
});
From your content script:
Make the changes to the DOM using the data stored in the variable
if (typeof newText === 'string') {
Array.from(document.querySelectorAll('textarea.comments')).forEach(el => {
el.value = newText;
});
}
See: Notes 1, 2, & 3.
Use message passing (MDN) (send data after content script is injected)
This requires your content script code to install a listener for a message sent by the popup, or perhaps the background script (if the interaction with the UI causes the popup to close). It is a bit more complex.
From your popup script:
Determine the active tab using tabs.query() (MDN).
Call tabs.executeScript() (MDN)
In the callback for tabs.executeScript(), use tabs.sendMessage() (MDN) (which requires knowing the tabId), to send the data as a message.
var updateTextTo = document.getElementById('comments').value;
chrome.tabs.query({active: true, currentWindow: true}, function(tabs) {
chrome.tabs.executeScript(tabs[0].id, {
file: "content_script3.js"
}, function(){
chrome.tabs.sendMessage(tabs[0].id,{
updateTextTo: updateTextTo
});
});
});
From your content script:
Add a listener using chrome.runtime.onMessage.addListener() (MDN)
Exit your primary code, leaving the listener active. You could return a success indicator, if you choose.
Upon receiving a message with the data:
Make the changes to the DOM
Remove your runtime.onMessage listener
#3.2 is optional. You could keep your code active waiting for another message, but that would change the paradigm you are using to one where you load your code and it stays resident waiting for messages to initiate actions.
chrome.runtime.onMessage.addListener(assignTextToTextareas);
function assignTextToTextareas(message){
newText = message.updateTextTo;
if (typeof newText === 'string') {
Array.from(document.querySelectorAll('textarea.comments')).forEach(el => {
el.value = newText;
});
}
chrome.runtime.onMessage.removeListener(assignTextToTextareas); //optional
}
See: Notes 1 & 2.
Note 1: Using Array.from() is fine if you are not doing it many times and are using a browser version which has it (Chrome >= version 45, Firefox >= 32). In Chrome and Firefox, Array.from() is slow compared to other methods of getting an array from a NodeList. For a faster, more compatible conversion to an Array, you could use the asArray() code in this answer. The second version of asArray() provided in that answer is also more robust.
Note 2: If you are willing to limit your code to Chrome version >= 51 or Firefox version >= 50, Chrome has a forEach() method for NodeLists as of v51. Thus, you don't need to convert to an array. Obviously, you don't need to convert to an Array if you use a different type of loop.
Note 3: While I have previously used this method (injecting a script with the variable value) in my own code, I was reminded that I should have included it here when reading this answer.

Related

call PHP function inside JS [duplicate]

Is there a way I can run a php function through a JS function?
something like this:
<script type="text/javascript">
function test(){
document.getElementById("php_code").innerHTML="<?php
query("hello"); ?>";
}
</script>
<a href="#" style="display:block; color:#000033; font-family:Tahoma; font-size:12px;"
onclick="test(); return false;"> test </a>
<span id="php_code"> </span>
I basically want to run the php function query("hello"), when I click on the href called "Test" which would call the php function.
This is, in essence, what AJAX is for. Your page loads, and you add an event to an element. When the user causes the event to be triggered, say by clicking something, your Javascript uses the XMLHttpRequest object to send a request to a server.
After the server responds (presumably with output), another Javascript function/event gives you a place to work with that output, including simply sticking it into the page like any other piece of HTML.
You can do it "by hand" with plain Javascript , or you can use jQuery. Depending on the size of your project and particular situation, it may be more simple to just use plain Javascript .
Plain Javascript
In this very basic example, we send a request to myAjax.php when the user clicks a link. The server will generate some content, in this case "hello world!". We will put into the HTML element with the id output.
The javascript
// handles the click event for link 1, sends the query
function getOutput() {
getRequest(
'myAjax.php', // URL for the PHP file
drawOutput, // handle successful request
drawError // handle error
);
return false;
}
// handles drawing an error message
function drawError() {
var container = document.getElementById('output');
container.innerHTML = 'Bummer: there was an error!';
}
// handles the response, adds the html
function drawOutput(responseText) {
var container = document.getElementById('output');
container.innerHTML = responseText;
}
// helper function for cross-browser request object
function getRequest(url, success, error) {
var req = false;
try{
// most browsers
req = new XMLHttpRequest();
} catch (e){
// IE
try{
req = new ActiveXObject("Msxml2.XMLHTTP");
} catch(e) {
// try an older version
try{
req = new ActiveXObject("Microsoft.XMLHTTP");
} catch(e) {
return false;
}
}
}
if (!req) return false;
if (typeof success != 'function') success = function () {};
if (typeof error!= 'function') error = function () {};
req.onreadystatechange = function(){
if(req.readyState == 4) {
return req.status === 200 ?
success(req.responseText) : error(req.status);
}
}
req.open("GET", url, true);
req.send(null);
return req;
}
The HTML
test
<div id="output">waiting for action</div>
The PHP
// file myAjax.php
<?php
echo 'hello world!';
?>
Try it out: http://jsfiddle.net/GRMule/m8CTk/
With a javascript library (jQuery et al)
Arguably, that is a lot of Javascript code. You can shorten that up by tightening the blocks or using more terse logic operators, of course, but there's still a lot going on there. If you plan on doing a lot of this type of thing on your project, you might be better off with a javascript library.
Using the same HTML and PHP from above, this is your entire script (with jQuery included on the page). I've tightened up the code a little to be more consistent with jQuery's general style, but you get the idea:
// handles the click event, sends the query
function getOutput() {
$.ajax({
url:'myAjax.php',
complete: function (response) {
$('#output').html(response.responseText);
},
error: function () {
$('#output').html('Bummer: there was an error!');
}
});
return false;
}
Try it out: http://jsfiddle.net/GRMule/WQXXT/
Don't rush out for jQuery just yet: adding any library is still adding hundreds or thousands of lines of code to your project just as surely as if you had written them. Inside the jQuery library file, you'll find similar code to that in the first example, plus a whole lot more. That may be a good thing, it may not. Plan, and consider your project's current size and future possibility for expansion and the target environment or platform.
If this is all you need to do, write the plain javascript once and you're done.
Documentation
AJAX on MDN - https://developer.mozilla.org/en/ajax
XMLHttpRequest on MDN - https://developer.mozilla.org/en/XMLHttpRequest
XMLHttpRequest on MSDN - http://msdn.microsoft.com/en-us/library/ie/ms535874%28v=vs.85%29.aspx
jQuery - http://jquery.com/download/
jQuery.ajax - http://api.jquery.com/jQuery.ajax/
PHP is evaluated at the server; javascript is evaluated at the client/browser, thus you can't call a PHP function from javascript directly. But you can issue an HTTP request to the server that will activate a PHP function, with AJAX.
The only way to execute PHP from JS is AJAX.
You can send data to server (for eg, GET /ajax.php?do=someFunction)
then in ajax.php you write:
function someFunction() {
echo 'Answer';
}
if ($_GET['do'] === "someFunction") {
someFunction();
}
and then, catch the answer with JS (i'm using jQuery for making AJAX requests)
Probably you'll need some format of answer. See JSON or XML, but JSON is easy to use with JavaScript. In PHP you can use function json_encode($array); which gets array as argument.
I recently published a jQuery plugin which allows you to make PHP function calls in various ways: https://github.com/Xaxis/jquery.php
Simple example usage:
// Both .end() and .data() return data to variables
var strLenA = P.strlen('some string').end();
var strLenB = P.strlen('another string').end();
var totalStrLen = strLenA + strLenB;
console.log( totalStrLen ); // 25
// .data Returns data in an array
var data1 = P.crypt("Some Crypt String").data();
console.log( data1 ); // ["$1$Tk1b01rk$shTKSqDslatUSRV3WdlnI/"]
I have a way to make a Javascript call to a PHP function written on the page (client-side script). The PHP part 'to be executed' only occurs on the server-side on load or refreshing'. You avoid 'some' server-side resources. So, manipulating the DOM:
<?PHP
echo "You have executed the PHP function 'after loading o refreshing the page<br>";
echo "<i><br>The server programmatically, after accessing the command line resources on the server-side, copied the 'Old Content' from the 'text.txt' file and then changed 'Old Content' to 'New Content'. Finally sent the data to the browser.<br><br>But If you execute the PHP function n times your page always displays 'Old Content' n times, even though the file content is always 'New Content', which is demonstrated (proof 1) by running the 'cat texto.txt' command in your shell. Displaying this text on the client side proves (proof 2) that the browser executed the PHP function 'overflying' the PHP server-side instructions, and this is because the browser engine has restricted, unobtrusively, the execution of scripts on the client-side command line.<br><br>So, the server responds only by loading or refreshing the page, and after an Ajax call function or a PHP call via an HTML form. The rest happens on the client-side, presumably through some form of 'RAM-caching</i>'.<br><br>";
function myPhp(){
echo"The page says: Hello world!<br>";
echo "The page says that the Server '<b>said</b>': <br>1. ";
echo exec('echo $(cat texto.txt);echo "Hello world! (New content)" > texto.txt');echo "<br>";
echo "2. I have changed 'Old content' to '";
echo exec('echo $(cat texto.txt)');echo ".<br><br>";
echo "Proofs 1 and 2 say that if you want to make a new request to the server, you can do: 1. reload the page, 2. refresh the page, 3. make a call through an HTML form and PHP code, or 4. do a call through Ajax.<br><br>";
}
?>
<div id="mainx"></div>
<script>
function callPhp(){
var tagDiv1 = document.createElement("div");
tagDiv1.id = 'contentx';
tagDiv1.innerHTML = "<?php myPhp(); ?>";
document.getElementById("mainx").appendChild(tagDiv1);
}
</script>
<input type="button" value="CallPHP" onclick="callPhp()">
Note: The texto.txt file has the content 'Hello world! (Old content).
The 'fact' is that whenever I click the 'CallPhp' button I get the message 'Hello world!' printed on my page. Therefore, a server-side script is not always required to execute a PHP function via Javascript.
But the execution of the bash commands only happens while the page is loading or refreshing, never because of that kind of Javascript apparent-call raised before. Once the page is loaded, the execution of bash scripts requires a true-call (PHP, Ajax) to a server-side PHP resource.
So, If you don't want the user to know what commands are running on the server:
You 'should' use the execution of the commands indirectly through a PHP script on the server-side (PHP-form, or Ajax on the client-side).
Otherwise:
If the output of commands on the server-side is not delayed:
You 'can' use the execution of the commands directly from the page (less 'cognitive' resources—less PHP and more Bash—and less code, less time, usually easier, and more comfortable if you know the bash language).
Otherwise:
You 'must' use Ajax.

chrome.tabs.executeScript and injection only into pages that pass matches filter in manifest.json

I'm attempting to perform programmatic injection of my content script into open tabs after my Chrome extension is reloaded or updated.
My script may call the following method for an arbitrary tab:
var manifest = chrome.app.getDetails();
var scripts = manifest.content_scripts[0].js;
chrome.tabs.executeScript(nTabID, {
file: scripts[0]
});
This works, except when I try to load it into a page that was not supposed to have a content script running according to the matches clause in the manifest.json. I get the following exception:
Cannot access contents of url "actual-url-here". Extension manifest
must request permission to access this host.
So my question. Is there a way to parse the page URL and see if it matches matches clause from manifest.json and prevent calling chrome.tabs.executeScript for unnecessary URL?
PS. I understand that one "hacky" solution is to catch-and-ignore exceptions. So I'm not asking for it.
When you use chrome.tabs.query for a list of tabs, use the url attribute to filter by a match patterns. As of Chrome 39, this key also supports an array of match patterns. If you need to support Chrome 38 or earlier, or if you got the tabs without chrome.tabs.query, use the parse_match_pattern function from this answer to filter tabs. To use it, copy that function and include it within your (background) page (e.g. by pasting it before the following snippet).
var content_scripts = chrome.runtime.getManifest().content_scripts;
// Exclude CSS files - CSS is automatically inserted.
content_scripts = content_scripts.filter(function(content_script) {
return content_script.js && content_script.js.length > 0;
});
content_scripts.forEach(function(content_script) {
try {
// NOTE: an array of patterns is only supported in Chrome 39+
chrome.tabs.query({
url: content_script.matches
}, injectScripts);
} catch (e) {
// NOTE: This requires the "tabs" permission!
chrome.tabs.query({
}, function(tabs) {
var parsed = content_script.matches.map(parse_match_pattern);
var pattern = new RegExp(parsed.join('|'));
tabs = tabs.filter(function(tab) {
return pattern.test(tab.url);
});
injectScripts(tabs);
});
}
function injectScripts(tabs) {
tabs.forEach(function(tab) {
content_script.js.forEach(function(js) {
chrome.tabs.executeScript(tab.id, {
file: js
});
});
});
}
});
The previous snippet inserts a content script in all tabs. It is your responsibility to make sure that inserting the script does not conflict with an earlier/later instance of your script.
Mimicking the all_frames and match_about_blank functionality is slightly more complex, because the chrome.tabs.executeScript API cannot be used to target specific frames (crbug.com/63979). If you want to inject in frames as well, then you have to insert in every tab (because there might be a frame under the non-matching top-level frame that matches the URL) and check the page's URL within the content script.
Finally, note that your content script must also deal with the fact that it may run at a point different from "run_at". In particular, content scripts that rely on "run_at":"document_start" might fail to work because calling chrome.tabs.executeScript will cause a script to be injected far past the document_start phase.

Chrome extension regarding injected script + localstorage

I am puzzling my way through my first 'putting it all together' Chrome extension, I'll describe what I am trying to do and then how I have been going about it with some script excerpts:
I have an options.html page and an options.js script that lets the user set a url in a textfield -- this gets stored using localStorage.
function load_options() {
var repl_adurl = localStorage["repl_adurl"];
default_img.src = repl_adurl;
tf_default_ad.value = repl_adurl;
}
function save_options() {
var tf_ad = document.getElementById("tf_default_ad");
localStorage["repl_adurl"] = tf_ad.value;
}
document.addEventListener('DOMContentLoaded', function () {
document.querySelector('button').addEventListener('click', save_options);
});
document.addEventListener('DOMContentLoaded', load_options );
My contentscript injects a script 'myscript' into the page ( so it can have access to the img elements from the page's html )
var s = document.createElement('script');
s.src = chrome.extension.getURL("myscript.js");
console.log( s.src );
(document.head||document.documentElement).appendChild(s);
s.parentNode.removeChild(s);
myscript.js is supposed to somehow grab the local storage data and that determines how the image elements are manipulated.
I don't have any trouble grabbing the images from the html source, but I cannot seem to access the localStorage data. I realize it must have to do with the two scripts having different environments but I am unsure of how to overcome this issue -- as far as I know I need to have myscript.js injected from contentscript.js because contentscript.js doesn't have access to the html source.
Hopefully somebody here can suggest something I am missing.
Thank you, I appreciate any help you can offer!
-Andy
First of all: You do not need an injected script to access the page's DOM (<img> elements). The DOM is already available to the content script.
Content scripts cannot directly access the localStorage of the extension's process, you need to implement a communication channel between the background page and the content script in order to achieve this. Fortunately, Chrome offers a simple message passing API for this purpose.
I suggest to use the chrome.storage API instead of localStorage. The advantage of chrome.storage is that it's available to content scripts, which allows you to read/set values without a background page. Currently, your code looks quite manageable, so switching from the synchronous localStorage to the asynchronous chrome.storage API is doable.
Regardless of your choice, the content script's code has to read/write the preferences asynchronously:
// Example of preference name, used in the following two content script examples
var key = 'adurl';
// Example using message passing:
chrome.extension.sendMessage({type:'getPref',key:key}, function(result) {
// Do something with result
});
// Example using chrome.storage:
chrome.storage.local.get(key, function(items) {
var result = items[key];
// Do something with result
});
As you can see, there's hardly any difference between the two. However, to get the first to work, you also have to add more logic to the background page:
// Background page
chrome.extension.onMessage.addListener(function(message, sender, sendResponse) {
if (message.type === 'getPref') {
var result = localStorage.getItem(message.key);
sendResponse(result);
}
});
On the other hand, if you want to switch to chrome.storage, the logic in your options page has to be slightly rewritten, because the current code (using localStorage) is synchronous, while chrome.storage is asynchronous:
// Options page
function load_options() {
chrome.storage.local.get('repl_adurl', function(items) {
var repl_adurl = items.repl_adurl;
default_img.src = repl_adurl;
tf_default_ad.value = repl_adurl;
});
}
function save_options() {
var tf_ad = document.getElementById('tf_default_ad');
chrome.storage.local.set({
repl_adurl: tf_ad.value
});
}
Documentation
chrome.storage (method get, method set)
Message passing (note: this page uses chrome.runtime instead chrome.extension. For backwards-compatibility with Chrome 25-, use chrome.extension (example using both))
A simple and practical explanation of synchronous vs asynchronous ft. Chrome extensions

Applying DOM Manipulations to HTML and saving the result?

I have about 100 static HTML pages that I want to apply some DOM manipulations to. They all follow the same HTML structure. I want to apply some DOM manipulations to each of these files, and then save the resulting HTML.
These are the manipulations I want to apply:
# [start]
$("h1.title, h2.description", this).wrap("<hgroup>");
if ( $("h1.title").height() < 200 ) {
$("div.content").addClass('tall');
}
# [end]
# SAVE NEW HTML
The first line (.wrap()) I could easily do with a find and replace, but it gets tricky when I have to determine the calculated height of an element, which can't be easily be determined sans-JavaScript.
Does anyone know how I can achieve this? Thanks!
While the first part could indeed be solved in "text mode" using regular expressions or a more complete DOM implementation in JavaScript, for the second part (the height calculation), you'll need a real, full browser or a headless engine like PhantomJS.
From the PhantomJS homepage:
PhantomJS is a command-line tool that packs and embeds WebKit.
Literally it acts like any other WebKit-based web browser, except that
nothing gets displayed to the screen (thus, the term headless). In
addition to that, PhantomJS can be controlled or scripted using its
JavaScript API.
A schematic instruction (which I admit is not tested) follows.
In your modification script (say, modify-html-file.js) open an HTML page, modify it's DOM tree and console.log the HTML of the root element:
var page = new WebPage();
page.open(encodeURI('file://' + phantom.args[0]), function (status) {
if (status === 'success') {
var html = page.evaluate(function () {
// your DOM manipulation here
return document.documentElement.outerHTML;
});
console.log(html);
}
phantom.exit();
});
Next, save the new HTML by redirecting your script's output to a file:
#!/bin/bash
mkdir modified
for i in *.html; do
phantomjs modify-html-file.js "$1" > modified/"$1"
done
I tried PhantomJS as in katspaugh's answer, but ran into several issues trying to manipulate pages. My use case was modifying the static html output of Doxygen, without modifying Doxygen itself. The goal was to reduce delivered file size by remove unnecessary elements from the page, and convert it to HTML5. Additionally I also wanted to use jQuery to access and modify elements more easily.
Loading the page in PhantomJS
The APIs appear to have changed drastically since the accepted answer. Additionally, I used a different approach (derived from this answer), which will be important in mitigating one of the major issues I encountered.
var system = require('system');
var fs = require('fs');
var page = require('webpage').create();
// Reading the page's content into your "webpage"
// This automatically refreshes the page
page.content = fs.read(system.args[1]);
// Make all your changes here
fs.write(system.args[2], page.content, 'w');
phantom.exit();
Preventing JavaScript from Running
My page uses Google Analytics in the footer, and now the page is modified beyond my intention, presumably because javascript was run. If we disable javascript, we can't actually use jQuery to modify the page, so that isn't an option. I've tried temporarily changing the tag, but when I do, every special character is replaced with an html-escaped equivalent, destroying all javascript code on the page. Then, I came across this answer, which gave me the following idea.
var rawPageString = fs.read(system.args[1]);
rawPageString = rawPageString.replace(/<script type="text\/javascript"/g, "<script type='foo/bar'");
rawPageString = rawPageString.replace(/<script>/g, "<script type='foo/bar'>");
page.content = rawPageString;
// Make all your changes here
rawPageString = page.content;
rawPageString = rawPageString.replace(/<script type='foo\/bar'/g, "<script");
Adding jQuery
There's actually an example on how to use jQuery. However, I thought an offline copy would be more appropriate. Initially I tried using page.includeJs as in the example, but found that page.injectJs was more suitable for the use case. Unlike includeJs, there's no <script> tag added to the page context, and the call blocks execution which simplifies the code. jQuery was placed in the same directory I was executing my script from.
page.injectJs("jquery-2.1.4.min.js");
page.evaluate(function () {
// Make all changes here
// Remove the foo/bar type more easily here
$("script[type^=foo]").removeAttr("type");
});
fs.write(system.args[2], page.content, 'w');
phantom.exit();
Putting it All Together
var system = require('system');
var fs = require('fs');
var page = require('webpage').create();
var rawPageString = fs.read(system.args[1]);
// Prevent in-page javascript execution
rawPageString = rawPageString.replace(/<script type="text\/javascript"/g, "<script type='foo/bar'");
rawPageString = rawPageString.replace(/<script>/g, "<script type='foo/bar'>");
page.content = rawPageString;
page.injectJs("jquery-2.1.4.min.js");
page.evaluate(function () {
// Make all changes here
// Remove the foo/bar type
$("script[type^=foo]").removeAttr("type");
});
fs.write(system.args[2], page.content, 'w');
phantom.exit();
Using it from the command line:
phantomjs modify-html-file.js "input_file.html" "output_file.html"
Note: This was tested and working with PhantomJS 2.0.0 on Windows 8.1.
Pro tip: If speed matters, you should consider iterating the files from within your PhantomJS script rather than a shell script. This will avoid the latency that PhantomJS has when starting up.
you can get your modified content by $('html').html() (or a more specific selector if you don't want stuff like head tags), then submit it as a big string to your server and write the file server side.

How to use javascript to get information from the content of another page (same domain)?

Let's say I have a web page (/index.html) that contains the following
<li>
<div>item1</div>
details
</li>
and I would like to have some javascript on /index.html to load that
/details/item1.html page and extract some information from that page.
The page /details/item1.html might contain things like
<div id="some_id">
picture
map
</div>
My task is to write a greasemonkey script, so changing anything serverside is not an option.
To summarize, javascript is running on /index.html and I would
like to have the javascript code to add some information on /index.html
extracted from both /index.html and /details/item1.html.
My question is how to fetch information from /details/item1.html.
I currently have written code to extract the link (e.g. /details/item1.html)
and pass this on to a method that should extract the wanted information (at first
just .innerHTML from the some_id div is ok, I can process futher later).
The following is my current attempt, but it does not work. Any suggestions?
function get_information(link)
{
var obj = document.createElement('object');
obj.data = link;
document.getElementsByTagName('body')[0].appendChild(obj)
var some_id = document.getElementById('some_id');
if (! some_id) {
alert("some_id == NULL");
return "";
}
return some_id.innerHTML;
}
First:
function get_information(link, callback) {
var xhr = new XMLHttpRequest();
xhr.open("GET", link, true);
xhr.onreadystatechange = function() {
if (xhr.readyState === 4) {
callback(xhr.responseText);
}
};
xhr.send(null);
}
then
get_information("/details/item1.html", function(text) {
var div = document.createElement("div");
div.innerHTML = text;
// Do something with the div here, like inserting it into the page
});
I have not tested any of this - off the top of my head. YMMV
As only one page exists in the client (browser) at a time and all other (virtual/possible) pages are on the server, how will you get information from another page using JavaScript as you will have to interact with the server at some point to retrieve the second page?
If you can, integrate some AJAX-request to load the second page (and parse it), but if that's not an option, I'd say you'll have to load all pages that you want to extract information from at the same time, hide the bits you don't want to show (in hidden DIVs?) and then get your index (or whoever controls the view) to retrieve the needed information from there ... even though that sounds pretty creepy ;)
You can load the page in a hidden iframe and use normal DOM manipulation to extract the results, or get the text of the page via AJAX, grab the part between <body...>...</body>¨ and temporarily inject it into a div. (The second might fail for some exotic elements like ins.) I would expect Greasemonkey to have more powerful functions than normal Javascript for stuff like that, though - it might be worth to thumb through the documentation.

Categories