Zombie.js - Downloading files support

Zombie.js - Downloading files support - javascript

I'm trying to handle download prompts in Zombie.js, looking through the API I don't see anything indicating how to do so.
Basically what I'm trying to do is navigate through an authentication required website, then click a button on the site (no href) that then automatically engages a download. The downloaded file will then be renamed and sent to a specified folder.
Is there a way to achieve this?

Zombie.js doesn't seem to provide a method to directly do what you want, but internally it uses request to download files, then emits a response event which you can listen for (see resources.coffee):
var browser = new Zombie();
browser.on('response', function(request, response) {
browser.response = response;
});
browser.visit('http://test.com/', function() {
browser.clickLink('Download the file', function() {
// the 'response' handler should have run by now
var fileContents = browser.response.body;
});
});
This seems to work pretty well for me.

possibly try:
http://phantomjs.org
you should be able to manipulate the dom...to download.
https://github.com/ariya/phantomjs/wiki/Page-Automation
might have to write a separate script to do the file renaming.

As far as I know, and from taking a detailed look at the API of Zombie.js, I'd say no, this is not possible.
I know it's not the answer you hoped for, but truth is not always nice.

Related

Is there an alternative to preprocessorScript for Chrome DevTools extensions?

I want to create a custom profiler for Javascript as a Chrome DevTools Extension. To do so, I'd have to instrument all Javascript code of a website (parse to AST, inject hooks, generate new source). This should've been easily possible using chrome.devtools.inspectedWindow.reload() and its parameter preprocessorScript described here: https://developer.chrome.com/extensions/devtools_inspectedWindow.
Unfortunately, this feature has been removed (https://bugs.chromium.org/p/chromium/issues/detail?id=438626) because nobody was using it.
Do you know of any other way I could achieve the same thing with a Chrome Extension? Is there any other way I can replace an incoming Javascript source with a changed version? This question is very specific to Chrome Extensions (and maybe extensions to other browsers), I'm asking this as a last resort before going a different route (e.g. dedicated app).

Use the Chrome Debugging Protocol.
First, use DOMDebugger.setInstrumentationBreakpoint with eventName: "scriptFirstStatement" as a parameter to add a break-point to the first statement of each script.
Second, in the Debugger Domain, there is an event called scriptParsed. Listen to it and if called, use Debugger.setScriptSource to change the source.
Finally, call Debugger.resume each time after you edited a source file with setScriptSource.
Example in semi-pseudo-code:
// Prevent code being executed
cdp.sendCommand("DOMDebugger.setInstrumentationBreakpoint", {
eventName: "scriptFirstStatement"
});
// Enable Debugger domain to receive its events
cdp.sendCommand("Debugger.enable");
cdp.addListener("message", (event, method, params) => {
// Script is ready to be edited
if (method === "Debugger.scriptParsed") {
cdp.sendCommand("Debugger.setScriptSource", {
scriptId: params.scriptId,
scriptSource: `console.log("edited script ${params.url}");`
}, (err, msg) => {
// After editing, resume code execution.
cdg.sendCommand("Debugger.resume");
});
}
});
The implementation above is not ideal. It should probably listen to the breakpoint event, get to the script using the associated event data, edit the script and then resume. Listening to scriptParsed and then resuming the debugger are two things that shouldn't be together, it could create problems. It makes for a simpler example, though.

On HTTP you can use the chrome.webRequest API to redirect requests for JS code to data URLs containing the processed JavaScript code.
However, this won't work for inline script tags. It also won't work on HTTPS, since the data URLs are considered unsafe. And data URLs are can't be longer than 2MB in Chrome, so you won't be able to redirect to large JS files.
If the exact order of execution of each script isn't important you could cancel the script requests and then later send a message with the script content to the page. This would make it work on HTTPS.
To address both issues you could redirect the HTML page itself to a data URL, in order to gain more control. That has a few negative consequences though:
Can't reload page because URL is fixed to data URL
Need to add or update <base> tag to make sure stylesheet/image URLs go to the correct URL
Breaks ajax requests that require cookies/authentication (not sure if this can be fixed)
No support for localStorage on data URLs
Not sure if this works: in order to fix #1 and #4 you could consider setting up an HTML page within your Chrome extension and then using that as the base page instead of a data URL.
Another idea that may or may not work: Use chrome.debugger to modify the source code.

Collecting data doesn't work with Angular.js websites

I am currently writing a program that collects information from a sports website. (it contains the history of some basketball matches) The problem is that the website uses Angular.js for dynamical HTML binding. Consequently, the HTML source code involves lots of variables.
I need to find out the values of the variables in order to make my program work as I want. Is there any library or framework that could help me?
Edit: I am not limited by anything, but I prefer a web app (MEAN, JS frameworks with node-webkit). If it can't be done, I can also code it in C++ or Java (or extend it further to Android with NDK or SDK)
Disclaimer: This is not grey-hat stuff. I just need to do some web-scraping.

PhantomJS is a headless browser. It will allow you to use JavaScript to get the information you want.
Details:
It will browse to the page you want, execute the JavaScript like any browser and have access to the page as if it was displayed to a normal user using a normal browser. Using JavaScript DOM traversal, you will be able to get the information you need. This is almost the same as automatizing the task of opening a console in a browser and executing javascript which will get the information from the page.
While the below example is really simple, it can do much more than just getting the page results... it can click buttons, navigate to other pages, extract only relevant information, extract the page as an image... Do not hesitate referring to its Quick start documentation to learn more about it.
Example script returning the complete HTML page after waiting 10 seconds for the AngularJS to have finished calculating the page:
Command line usage: phantomjs-1.9.1 this_script.js
this_script.js (PhantomJS 2.0 may have different syntax in some cases):
var url = phantom.args[0]
function getDocumentElementAsHTML(page) {
return page.evaluate(function() {
return document.documentElement.innerHTML
})
}
var page = new WebPage()
page.settings.userAgent = "PhantomJS"
//page.onConsoleMessage = function (msg) { console.log(msg); }
page.open(url, function (status) {
if (status !== 'success') {
console.log('Unable to access network')
phantom.exit()
} else {
setTimeout(function(){
console.log(getDocumentElementAsHTML(page))
phantom.exit()
},10000)
}
});
PS: Waiting 10 seconds is not always a great solution, I used to periodically test the existence of the elements I wanted to get information from to be sure the JavaScript finished loading instead.
Source: grey-hat stuff I did in the past

I'd say you'd want to look at http://phantomjs.org/, http://www.slimerjs.org/, and/or http://casperjs.org/.
Phantom & Slimer give you API access to Webkit and Gecko respectively. Casper adds a more user friendly API over the top.

Is it possible to use actual image from Phonegap plugin Canvas2Image?

I am using the Phonegap plugin Canvas2Image: https://github.com/devgeeks/Canvas2ImagePlugin and I was wondering if it would be possible to get the rendered image in the callback?
I am using the code out of their documentation:
window.canvas2ImagePlugin.saveImageDataToLibrary(
function(msg){
console.log(msg);
},
function(err){
console.log(err);
},
document.getElementById('myCanvas')
);
This works great to save to the library as intended, but if I want to use the actual photo.png or whatever is generated, can I do this? If not, is there a way in the callback to get the local filesystem URL to the image?

The first function, with msg in the parameter, is going to be your file URL. However, it only is calling back on android. IOS is not working at this time. On the github site, there is an open issue on the matter. I haven't verified it, but someone said that iPhone: How do I get the file path of an image saved with UIImageWriteToSavedPhotosAlbum()? helps to fix the issue.
Github issue: https://github.com/devgeeks/Canvas2ImagePlugin/issues/38

gapi.client.load not working

I have the following code, which is supposed to be a simple example of using the google api javascript client, and simply displays the long-form URL for a hard-coded shortened URL:
<script>
function appendResults(text) {
var results = document.getElementById('results');
results.appendChild(document.createElement('P'));
results.appendChild(document.createTextNode(text));
}
function makeRequest() {
console.log('Inside makeRequest');
var request = gapi.client.urlshortener.url.get({
'shortUrl': 'http://goo.gl/fbsS'
});
request.execute(function(response) {
appendResults(response.longUrl);
});
}
function load() {
gapi.client.setApiKey('API_KEY');
console.log('After attempting to set API key');
gapi.client.load('urlshortener', 'v1', makeRequest);
console.log('After attempting to load urlshortener');
}
</script>
<script src="https://apis.google.com/js/client.js?onload=load"></script>
except with an actual API key instead of the text 'API_KEY'.
The console output is simply:
After attempting to set API key
After attempting to load urlshortener
but I never see 'Inside makeRequest', which is inside the makeRequest function, which is the callback function for the call to gapi.client.load, leading me to believe that the function is not working (or failing to complete).
Can anyone shed some light on why this might be so and how to fix it?
Thanks in advance.

After spending hours googling the problem, I found out the problem was because I was running this file on the local machine and not on a server.
When you run the above code on chrome you get this error in the developer console "Unable to post message to file://. Recipient has origin null."
For some reason the javascript loads only when running on a actual server or something like XAMPP or WAMP.
If there is any expert who can shed some light to why this happens, it would be really great full to learn.
Hope this helps the others noobies like me out there :D

Short answer (http://code.google.com/p/google-api-javascript-client/issues/detail?id=46):
The JS Client does not currently support making requests from a file:// origin.
Long answer (http://en.wikipedia.org/wiki/Same_origin_policy):
The behavior of same-origin checks and related mechanisms is not well-defined
in a number of corner cases, such as for protocols that do not have a clearly
defined host name or port associated with their URLs (file:, data:, etc.).
This historically caused a fair number of security problems, such as the
generally undesirable ability of any locally stored HTML file to access all
other files on the disk, or communicate with any site on the Internet.

how to cancel http request using javascript

i have a page on which there an event handler attached to an onclick event. when the event fires it passes contents of a textbox to a GET request. since the url is not in the same domain so i create a script tag and and attach the url to its source like this
elem.onclick=fire;
function fire()
{
var text=document.getElementById('text').value;
var script=document.createElement("script");
script.className="temp";
script.src="some url"+"?param="+text;
document.body.appendChild(script);
}
now if that event is fired and more than one time i want to cancel all the previous GET request(because they still might be receiving response) and make the GET request with latest text. But for this i need to cancel the previous requests.
i tried
document.body.removeChild(script);
script.src=null;
but this does not work in Firefox(i am using Firefox 5) although this works in Google Chrome.Does anyone know if these requests can be cancelled in Firefox and if yes then how?
UPDATE
As suggested by Alfred, i used window.stop to cancel a request but does not cancel a request but hangs it up. It means that when i look into firebug it looks like the request is being made but there is no response.

The solution is simple: for creating HTTP requests, use <img> instead of <script> element. Also you always have to change the src attribute of the same element.
var img;
function fire()
{
var text = document.getElementById('text').value;
var im = img || (img = new Image());
im.src = "url"+"?param="+text;
}
You may ascertain that it actually works by doing the following: the URL you request should have a huge response time (you can ensure this using e.g. PHP's sleep function). Then, open Net tab in Firebug. If you click the button multiple times, you'll see that all incomplete requests are aborted.

This is entirely shooting from the hip, but if the script tag has not finished loading you can probably simply script.parentElement.removeChild( script ). That is more or less what mootools does anyway. (Technically, they replace /\s+/ with ' ' first, but that does not seem to be terribly important).

Would it be ok for you to use a JS framework? If so, MooTools has this functionality built into its Request.JSONP object

I'm not sure if this is what you're looking for, but it seems like a similar issue:
http://www.velocityreviews.com/forums/t506018-how-to-cancel-http-request-from-javascript.html

To get around the cross-domain issue, you might be able to use CORS instead (assuming you can change what's on the server):
http://hacks.mozilla.org/2009/07/cross-site-xmlhttprequest-with-cors/
If you do this, you could then use the more standard XMLHttpRequest's abort() function.
CORS is compatible with all the major modern browsers except Opera (http://caniuse.com/cors).

We Keep Coding

JavaScript is the programming language of the Web.

Zombie.js - Downloading files support - javascript

possibly try: http://phantomjs.org you should be able to manipulate the dom...to download. https://github.com/ariya/phantomjs/wiki/Page-Automation might have to write a separate script to do the file renaming.

As far as I know, and from taking a detailed look at the API of Zombie.js, I'd say no, this is not possible. I know it's not the answer you hoped for, but truth is not always nice.

Related

Is there an alternative to preprocessorScript for Chrome DevTools extensions?

Collecting data doesn't work with Angular.js websites

Is it possible to use actual image from Phonegap plugin Canvas2Image?

gapi.client.load not working

how to cancel http request using javascript

Categories

Resources