I was asked to add this code to my pitch pages by the vendor I sell through:
<script>
(function() {
var p = '/?vendor=2knowmysel&time=' + new Date().getTime();
var cb = document.createElement('script'); cb.type = 'text/javascript';
cb.src = '//header.clickbank.net' + p;
document.getElementsByTagName('head')[0].appendChild(cb);
})();
</script>
The code should let the page load within a header that has a red logo by clickbank. When I added the code in the head section nothing happened.
Next I tried to isolate the problem by posting on a blank html page (away from drupal) which is http://www.2knowmyself.com/testpage.htm.
But the frame doesn't show up.
What's wrong in here? Given clickbank claim the code is perfect.
Here's what your code does:
<script>
// the following line creates an anonymous immediately-invoked function
(function() {
// this will return a string named 'p', which contains the vendors ID and current time
var p = '/?vendor=2knowmysel&time=' + new Date().getTime();
// this creates a new 'script' tag for HTML, name it 'cb', and tells the code it's for JavaScript
var cb = document.createElement('script'); cb.type = 'text/javascript';
// this will take a url with the address and the query string, which you named 'p' earlier, and set it as the source for 'cb'
cb.src = '//header.clickbank.net' + p;
// now you'll insert 'cb' to the HTML, so it'll load the JavaScript file into it
document.getElementsByTagName('head')[0].appendChild(cb);
// the function won't run automatically upon declaration, so you use parenthesis to tell it to run
})();
</script>
Summing it up, it bassically sends the vendor's ID and current time to the given server, and expects a JavaScript file in return from it; it'll then load this file into your HTML document.
Currently, it seems not to be working because this server is getting the information from your page but not sending the JavaScript file back to it. When they adjust it to answer with the right file, you'll see it run accordingly.
EDIT: (to answer your Final Question)
Up to this point, I can see that their server isn't sending the expected JS file back at your page, so it doesn't work. If you want to check this by yourself, please use a JS debugger or a network monitor in your browser (most of the modern webbrowsers come with these built-in, try pressing F12 then reloading the page).
If you want to check whether iframes work on your server, you may contact its administrator or try to embed an iframe in the page yourself. Paste the following code into the document. If you see SO homepage, it works. Otherwise, it'll show nothing. If you see Your browser does not support iframes., then you might have to update your web browser and check it again.
<iframe src="http://stackoverflow.com" width="300" height="300">
<p>Your browser does not support iframes.</p>
</iframe>
Related
I'm trying to get PhantomJS to take an html string and then have it render the full page as a browser would (including execution of any javascript in the page source). I need the resulting html result as a string. I have seen examples of page.open which is of no use since I already have the page source in my database.
Do I need to use page.open to trigger the javascript rendering engine in PhantomJS? Is there anyway to do this all in memory (ie.. without page.open making a request or reading/writing html source from/to disk?
I have seen a similar question and answer here but it doesn't quite solve my issue. After running the code below, nothing I do seems to render the javascript in the html source string.
var page = require('webpage').create();
page.setContent('raw html and javascript in this string', 'http://whatever.com');
//everything i've tried from here on doesn't execute the javascript in the string
--------------Update---------------
Tried the following based on the suggestion below but this still does not work. Just returns the raw source that I supplied with no javascript rendered.
var page = require('webpage').create();
page.settings.localToRemoteUrlAccessEnabled = true;
page.settings.webSecurityEnabled = false;
page.onLoadFinished = function(){
var resultingHtml = page.evaluate(function() {
return document.documentElement.innerHTML;
});
console.log(resultingHtml);
//console.log(page.content); // this didn't work either
phantom.exit();
};
page.url = input.Url;
page.content = input.RawHtml;
//page.setContent(input.RawHtml, input.Url); //this didn't work either
The following works
page.onLoadFinished = function(){
console.log(page.content); // rendered content
};
page.content = "your source html string";
But you have to keep in mind that if you set the page from a string, the domain will be about:blank. So if the html loads resources from other domains, then you should run PhantomJS with the --web-security=false --local-to-remote-url-access=true commandline options:
phantomjs --web-security=false --local-to-remote-url-access=true script.js
Additionally, you may need to wait for the completion of the JavaScript execution which might be not be finished when PhantomJS thought it finished. Use either setTimeout() to wait a static amount of time or waitFor() to wait for a specific condition on a page. More robust ways to wait for a full page are given in this question: phantomjs not waiting for “full” page load
The setTimeout made it work even though I'm not excited to wait a set amount of time for each page. The waitFor approach that is discussed here doesn't work since I have no idea what elements each page might have.
var system = require('system');
var page = require('webpage').create();
page.setContent(input.RawHtml, input.Url);
window.setTimeout(function () {
console.log(page.content);
phantom.exit();
}, input.WaitToRenderTimeInMilliseconds);
Maybe not the answer you want, but using PhantomJsCloud.com you can do it easily, Here's an example: http://api.phantomjscloud.com/api/browser/v2/a-demo-key-with-low-quota-per-ip-address/?request={url:%22http://example.com%22,content:%22%3Ch1%3ENew%20Content!%3C/h1%3E%22,renderType:%22png%22,scripts:{domReady:[%22var%20hiDiv=document.createElement%28%27div%27%29;hiDiv.innerHTML=%27Hello%20World!%27;document.body.appendChild%28hiDiv%29;window._pjscMeta.scriptOutput={Goodbye:%27World%27};%22]},outputAsJson:false} The "New Content!" is the content that replaces the original content, and the "Hello World!" is placed in the page by a script.
If you want to do this via normal PhantomJs, you'll need to use the injectJs or includeJs functions, after the page content is loaded.
i'm trying to migrate from feedly as it is unacceptable (at least to me) that a search query is (fully) enabled only by a pro version.
Anyhow, to export my lengthy list of "saved for later" i found some lovely scripts:
Simple script that exports a users "Saved For Later" list out of Feedly as a JSON string and feedly-to-pocket. where i am instructed to:
You must switch off SSL (http rather than https) or jQuery won't load!
so i though i did by adding (ubuntu 14.04/chrome 40 x64)
--ssl-version-min=tls1
to my /usr/share/applications/google-chrome.desktop file (all lines starting with Exec=). However when i try to run it in the browser console i get
This request has been blocked; the content must be served over HTTPS.
So, any suggestions? (also, excuse me for noobness)
Go to your Feedly "saved" list and scroll down until all articles have loaded.
Open console and paste the following Javascript into it:
function loadJQuery() {
script = document.createElement('script');
script.setAttribute('src', '//code.jquery.com/jquery-2.1.3.js');
script.setAttribute('type', 'text/javascript');
script.onload = loadSaveAs;
document.getElementsByTagName('head')[0].appendChild(script);
}
function loadSaveAs() {
saveAsScript = document.createElement('script');
saveAsScript.setAttribute('src', 'https://cdn.rawgit.com/eligrey/FileSaver.js/5733e40e5af936eb3f48554cf6a8a7075d71d18a/FileSaver.js');
saveAsScript.setAttribute('type', 'text/javascript');
saveAsScript.onload = saveToFile;
document.getElementsByTagName('head')[0].appendChild(saveAsScript);
}
function saveToFile() {
// Loop through the DOM, grabbing the information from each bookmark
map = jQuery(".entry.quicklisted").map(function(i, el) {
var $el = jQuery(el);
var regex = /Published:(.*)(.*)/i;
return {
title: $el.attr("data-title"),
url: $el.attr("data-alternate-link"),
summary: $el.find(".summary")[0].innerHTML,
time: regex.exec($el.find("span.ago").attr("title"))[1]
};
}).get(); // Convert jQuery object into an array
// Convert to a nicely indented JSON string
json = JSON.stringify(map, undefined, 2);
var blob = new Blob([json], {type: "text/plain;charset=utf-8"});
saveAs(blob, "FeedlySavedForLater" + Date.now().toString() + ".txt");
}
loadJQuery()
Source: Feedly-Export-Save4Later
Not javascript but here is how I saved a html page with all the links and excerpts...
Open the saved pages in feedly in chrome
scroll down so they are all there
inspect any element (the top article is a good choice) so it opens the generated html
find the div id="section0_column0" node
right-click & copy it
paste into Notepad++
this html is untidy so carry on...
Do a Regex find & replace
find: (?s)<div id=.+?_main.+?>.+?(<a href=")(.+?)(").+?sans-serif">(.+?)</span>.+?</div>.+?</div>.+?</div>
replace: <div>$1$2$3>$2</a></div> <div> $4<br /> <br /></div>
save the html page.
open it in Chrome
Posted the question in the jquery forum and the solution was rather simple (remove http from attribute string)
line 34 should be
script.setAttribute('src', '//code.jquery.com/jquery-latest.min.js');
So to close the loop - for a full searchable/archived list of links not only by title/url but context also(!) you can:
Follow the instructions in https://github.com/ShockwaveNN/feedly-to-pocket (with the correction suggested by kind stranger jakecigar and you also have to register a pocket app (obtain consumer key) for the ruby script to work)
Export html list from your pocket account
Import pocket list to a Kifi library
and at last feedly-free with my personal search engine
I know I'm a bit late to the party but Ive been hunting around for a few days to find a reasonably simple solution. None of which have been listed clearly or concisely on stack overflow or elsewhere on the web. I have in fact found a much easier way to do this.
Use this java script from this Gist just as it instructs https://gist.github.com/ShockwaveNN/a0baf2ca26d1711f10e2 (Note this is referenced above and found through the link #gep shared in step one)
Once the JS as completed running it will download a text file. (It does still run successfully and on large numbers, I just exported almost 2500 articles)
Create a blank test.json in SublimeText.
Copy all entries from your exported text file into this json file
Weirdly it does seem you need to copy and past as I tried just renaming the text file and when I did that I received errors on the next step
Make sure you are signed into pocket
Go here: https://getpocket.com/import/springpad
Select your newly created test.json
Upload
Note: On large uploads the import page fails to refresh (this did not seem to be an issue as all my articles did make it into my account)
This allows you to directly upload json into your pocket account. Thus no more messing around with random supposed other fixes. I hope this make it a lot easier for everyone in the future.
I'm trying to do this by using a Tampermonkey Script. However I'm open to new approaches...
What I want to do is extract some data (data-video), from a specific <div>. However this data is not available under the HTML code of the page, but it's available under Dev Tools -> Resources and then on Frames.
Anyone knows if it's possible to get that information available under DevTools? And how can I do that?
Comparative between the two pages can be found here: "Original HTML PAGE" and "HTML PAGE under DevTools"
On the first hyperlink the id=video-canvas cannot be seen, however it's on the <object type="application/x-shockwave-flash(...)
As you state in your question the data you're looking for is available in DevTools under the "Resources" tab in the "Frames" folder. What you are looking at there is the Source HTML, similar to View Source.
The code you want, is what is getting replaced. It appears the site is using the JW Player Plugin, which is replacing the <div id="video-canvas"> with the appropriate HTML for the device / browser detected to play the video. With all of my browsers on my Mac, they are being forced to use the Flash, even when it's disabled. When using my iPhone, which can't play flash , and inspecting the page it uses JW's own custom video element. It appears that it must be storing the file location in memory since it is not in the generated markup.
I am able to run through the console in the dev tools and access their JS class. It appears i can call jwplayer._tracker , which has an object b . Object b has an object AlWv3iHmEeOzwBIxOUCPzg This object seems to be consistent each time i check between different browsers, you can use the for loop inmy first example to get the correct value but tirmming it down to .b Following that object is e and in e is the object http://i.n.jwpltx.com/v1.... really long string that appears to contain a url, so it will need to parsed.
So to get the HTML string i ran
for ( var loc in jwplayer._tracker.b.AlWv3iHmEeOzwBIxOUCPzg.e){
loc
}
so if we put that in a function to parse the string and return a value
function getSubURL(){
var initURL;
for ( var loc in jwplayer._tracker.b.AlWv3iHmEeOzwBIxOUCPzg.e){
initURL = loc;
}
//look for 'mp4:' this is in front of the file path
var start = initURL.indexOf("mp4%3A");
//look for the .mp4 for the end of the file name
var stop = initURL.indexOf(".mp4");
//grab the string between
//start+6 to remove characters used to find it
//and stop+4 to include characters used to find it
var subPath = (initURL.substring((start+6),(stop+4))).split("%2F").join("/");
return subPath;
}
//and run it
getSubURL();
it will return ciencia/astronomia/fimsol.mp4
you can run this from your console, but I am unaware of how you can use this in Tamper Monkey, but i think it gets ya a lot closer to what you wanted.
This is the approach I've used to solve my problem... I couldn't grab the code I want under Dev Tools, but I find a way to get the data from jwplayer with the function getPlaylistItem. And this is how I get the url filename of each video:
function getFilename(filename) {
var filename;
if(jwplayer().getPlaylistItem){
filename = jwplayer().getPlaylistItem()['file'];
}
else{
return filename;
}
filename = filename.substring(filename.indexOf("/mp4:") + 5);
return filename;
}
I have about 100 static HTML pages that I want to apply some DOM manipulations to. They all follow the same HTML structure. I want to apply some DOM manipulations to each of these files, and then save the resulting HTML.
These are the manipulations I want to apply:
# [start]
$("h1.title, h2.description", this).wrap("<hgroup>");
if ( $("h1.title").height() < 200 ) {
$("div.content").addClass('tall');
}
# [end]
# SAVE NEW HTML
The first line (.wrap()) I could easily do with a find and replace, but it gets tricky when I have to determine the calculated height of an element, which can't be easily be determined sans-JavaScript.
Does anyone know how I can achieve this? Thanks!
While the first part could indeed be solved in "text mode" using regular expressions or a more complete DOM implementation in JavaScript, for the second part (the height calculation), you'll need a real, full browser or a headless engine like PhantomJS.
From the PhantomJS homepage:
PhantomJS is a command-line tool that packs and embeds WebKit.
Literally it acts like any other WebKit-based web browser, except that
nothing gets displayed to the screen (thus, the term headless). In
addition to that, PhantomJS can be controlled or scripted using its
JavaScript API.
A schematic instruction (which I admit is not tested) follows.
In your modification script (say, modify-html-file.js) open an HTML page, modify it's DOM tree and console.log the HTML of the root element:
var page = new WebPage();
page.open(encodeURI('file://' + phantom.args[0]), function (status) {
if (status === 'success') {
var html = page.evaluate(function () {
// your DOM manipulation here
return document.documentElement.outerHTML;
});
console.log(html);
}
phantom.exit();
});
Next, save the new HTML by redirecting your script's output to a file:
#!/bin/bash
mkdir modified
for i in *.html; do
phantomjs modify-html-file.js "$1" > modified/"$1"
done
I tried PhantomJS as in katspaugh's answer, but ran into several issues trying to manipulate pages. My use case was modifying the static html output of Doxygen, without modifying Doxygen itself. The goal was to reduce delivered file size by remove unnecessary elements from the page, and convert it to HTML5. Additionally I also wanted to use jQuery to access and modify elements more easily.
Loading the page in PhantomJS
The APIs appear to have changed drastically since the accepted answer. Additionally, I used a different approach (derived from this answer), which will be important in mitigating one of the major issues I encountered.
var system = require('system');
var fs = require('fs');
var page = require('webpage').create();
// Reading the page's content into your "webpage"
// This automatically refreshes the page
page.content = fs.read(system.args[1]);
// Make all your changes here
fs.write(system.args[2], page.content, 'w');
phantom.exit();
Preventing JavaScript from Running
My page uses Google Analytics in the footer, and now the page is modified beyond my intention, presumably because javascript was run. If we disable javascript, we can't actually use jQuery to modify the page, so that isn't an option. I've tried temporarily changing the tag, but when I do, every special character is replaced with an html-escaped equivalent, destroying all javascript code on the page. Then, I came across this answer, which gave me the following idea.
var rawPageString = fs.read(system.args[1]);
rawPageString = rawPageString.replace(/<script type="text\/javascript"/g, "<script type='foo/bar'");
rawPageString = rawPageString.replace(/<script>/g, "<script type='foo/bar'>");
page.content = rawPageString;
// Make all your changes here
rawPageString = page.content;
rawPageString = rawPageString.replace(/<script type='foo\/bar'/g, "<script");
Adding jQuery
There's actually an example on how to use jQuery. However, I thought an offline copy would be more appropriate. Initially I tried using page.includeJs as in the example, but found that page.injectJs was more suitable for the use case. Unlike includeJs, there's no <script> tag added to the page context, and the call blocks execution which simplifies the code. jQuery was placed in the same directory I was executing my script from.
page.injectJs("jquery-2.1.4.min.js");
page.evaluate(function () {
// Make all changes here
// Remove the foo/bar type more easily here
$("script[type^=foo]").removeAttr("type");
});
fs.write(system.args[2], page.content, 'w');
phantom.exit();
Putting it All Together
var system = require('system');
var fs = require('fs');
var page = require('webpage').create();
var rawPageString = fs.read(system.args[1]);
// Prevent in-page javascript execution
rawPageString = rawPageString.replace(/<script type="text\/javascript"/g, "<script type='foo/bar'");
rawPageString = rawPageString.replace(/<script>/g, "<script type='foo/bar'>");
page.content = rawPageString;
page.injectJs("jquery-2.1.4.min.js");
page.evaluate(function () {
// Make all changes here
// Remove the foo/bar type
$("script[type^=foo]").removeAttr("type");
});
fs.write(system.args[2], page.content, 'w');
phantom.exit();
Using it from the command line:
phantomjs modify-html-file.js "input_file.html" "output_file.html"
Note: This was tested and working with PhantomJS 2.0.0 on Windows 8.1.
Pro tip: If speed matters, you should consider iterating the files from within your PhantomJS script rather than a shell script. This will avoid the latency that PhantomJS has when starting up.
you can get your modified content by $('html').html() (or a more specific selector if you don't want stuff like head tags), then submit it as a big string to your server and write the file server side.
Basically, everytime anybody does anything on this website, I need to retrieve a new javascript from the server (it is a complex math thing, and I am not concerned with speed).
I make the AJAX call, and stick it in the browser in a tag, like this:
getandplaceajax('id=showtotals','main'); // The first is the URL parameter, and the second is the ID of the <div> tag.
While I am doing this, I would like to re-write the java-script file, on the server, and then reload it. It is like this, in the tag.
<script type="text/javascript" src="randomfilename.js"></script>
After this I refresh my object thusly, by retreiving new XML data:
object1.loadXML("http://mywebsite/mydata.xml",
function(xml, url) {eventSource.loadXML(xml,url); });
How do I tell the browser to re-load the java-script file (force a re-load, on demand)?
I tried to interactively load the java-script into the portion of the page, but this is an iffy situation given that AJAX is asynchronous and unpredictable in the event chain.
And I am not doing page loads, so referencing the java-script with a unique number parameter (to prevent caching) isn't an option.
An extra $50 in the church offering plate next Sunday, in your name, for a solution.
Thanks in advance
Jeff
There is no need to make an Ajax call if you just need to load a JavaScript file. You can just append a new script tag to the page.
var scr = document.createElement("script");
scr.type="text/javascript";
scr.src="myFile.js";
document.body.appendChild(scr)
If you are calling the same file each time you need to force it to fetch the new file by adding a querystring value
scr.src="myFile.js?ts=" + new Date().getTime();
On the server you can request a php, servlet, .NET page, etc and have it return the JavaScript code. It does not have to be a js file. Just set the content type to be JavaScript and your browser will not care.
If you are not using any library, you can use what Facebook is doing
http://developers.facebook.com/docs/reference/javascript/
var e = document.createElement('script'); e.async = true;
e.src = document.location.protocol +
'//connect.facebook.net/en_US/all.js';
document.getElementById('fb-root').appendChild(e);
to load in a new script this way. You can also change the src of the script tag, to something like 'myfile.js?timestamp=' + (new Date()).getTime(), or you can return javascript with <script> tag around it and put it into a div or return javascript and eval them, similar to how RJS is handled.