Is it possible to autorun any script, for example from scratchpad in Firefox?
I want to add one button to website without making an extension, because I don't need to use XUL and Rtf.
Maybe can I make add-up, containing only JS file?
If you use Python, then you may use Selenium to mimic person clicking in browser by writing python script.
Besides, Python mechanize is also a good module to automatically do something related to website.
Related
I need to scrape a site with python. I obtain the source html code with the urlib module, but I need to scrape also some html code that is generated by a javascript function (which is included in the html source). What this functions does "in" the site is that when you press a button it outputs some html code. How can I "press" this button with python code? Can scrapy help me? I captured the POST request with firebug but when I try to pass it on the url I get a 403 error. Any suggestions?
In Python, I think Selenium 1.0 is the way to go. It’s a library that allows you to control a real web browser from your language of choice.
You need to have the web browser in question installed on the machine your script runs on, but it looks like the most reliable way to programmatically interrogate websites that use a lot of JavaScript.
Since there is no comprehensive answer here, I'll go ahead and write one.
To scrape off JS rendered pages, we will need a browser that has a JavaScript engine (e.i, support JavaScript rendering)
Options like Mechanize, url2lib will not work since they DO NOT support JavaScript.
So here's what you do:
Setup PhantomJS to run with Selenium. After installing the dependencies for both of them (refer this), you can use the following code as an example to fetch the fully rendered website.
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get('http://jokes.cc.com/')
soupFromJokesCC = BeautifulSoup(driver.page_source) #page_source fetches page after rendering is complete
driver.save_screenshot('screen.png') # save a screenshot to disk
driver.quit()
I have had to do this before (in .NET) and you are basically going to have to host a browser, get it to click the button, and then interrogate the DOM (document object model) of the browser to get at the generated HTML.
This is definitely one of the downsides to web apps moving towards an Ajax/Javascript approach to generating HTML client-side.
I use webkit, which is the browser renderer behind Chrome and Safari. There are Python bindings to webkit through Qt. And here is a full example to execute JavaScript and extract the final HTML.
For Scrapy (great python scraping framework) there is scrapyjs: an additional downloader handler / middleware handler able to scraping javascript generated content.
It's based on webkit engine by pygtk, python-webkit, and python-jswebkit and it's quite simple.
First I use python and selenium to load a website in firefox. Then I fill in a simple javascript type form. The site is poorly made but usually if I tell selenium to use (Keys.RETURN) then it will drop down a list of options. The problem is I don't know how to click on one of these because they didn't load with the web page. I tried using (Keys.ARROW_DOWN) to go through them but it still doesn't really work.
How can I interact with javascript through selenium using python?
Thanks.
P.S. I know almost nothing about javascript so if there is some way to do it, I still would be clueless on how to use javascript anyway...
You might have to tell the browser to wait for a few milliseconds.
Some places to look:
Clicking on a Javascript Link on Firefox with Selenium
http://seleniumhq.org/docs/04_webdriver_advanced.jsp
I want to interact with my local HTML page through my C++ application. Just like using java script console, we can edit a page in real time, e.g
document.getElementById('divlayer').style.visibility = 'hidden';
Similarly i want to call such functions in real time through my application.
Can you give me some idea if there is a way to accomplish this job?
I am using Google Chrome at the moment.
Do i need some plugin, but how can i make plugin to interact with my application then?
Also, i head about JQuery, can this be done using JQuery? Or do i have to try some server mecahnism may be using Ajax??
I know that you can control your IE browser with COM on Windows, and you can interact with the page with it. But I didn't try it with C++, I just use it with Python and it works well. May be you'd like to check it out.
Is it possible for the javascript you write for a XUL component to interact with the javascript defined in a webpage?
Eg, if a particular webpage has a dooSomethingNeat() function, can I have a button defined in a XUL overlay execute that function, or does it live in another namespace?
Put another way: if I'm looking to enhance the functionality of a website via my own code, does it make more sense to write a Firefox extension or use something like greasemonkey?
See my answer to another question here.
The webpage code does live in a 'namespace' separate from the scopes the browser code executes in.
It doesn't mean you can't access it from an extension, though.
On the other hand, running a function in a content page is not very easy to do securely at this moment.
Greasemonkey scripts (and ubiquity scripts, which can also interact with web pages) are somewhat easier to develop than extensions, and Greasemonkey already implements the required security precautions to allow you interact with web pages safely.
If you want others to use your script, packaging it as a standalone extension lowers the barrier to entry (on the other case, existing GM users may prefer simpler GM scripts to a separate extension).
So if you can implement what you need to do with a GM script or an ubiquity script, I'd say go with it. At least you can start with it, then convert to an extension when you find something you can't do with GM.
If you need features not supported by Greasemonkey or if you just want to try creating an extension, it is also a viable option.
There is a Greasemonkey-to-firefox-extension "compiler" available, but it isn't up-to-date with the latest GM changes.
However, it does have the basic GM framework for page interaction and security all wrapped up as a standalone extension, ready for you to modify and extend.
Wether to use standalone extension or GM-script depends upon who will be installing this. Will the user-base be willing to install GreaseMonkey, THEN the script? Or is the extension alone enough of an installation barrier?
The GM license does allow for repackaging it with pre-set scripts, I believe, but I can't find back citations for this, at the moment.
I'm writing a web crawler (web spider) that crawl all links in a website.
My application is a Win32 App, written in C# with .Net framework 3.5.
Now I'm using HttpWebRequest an HttpWebResponse to communicate with the web server.
I also built my own Http Parser that can parse anything I want.
I found all link like "href", "src", "action"... in the parse.
But I can not solve one problem: Simulate Client Script in the page (like JS and VBS)
For example, if a link like:
a href = "javascript:buildLink(1)"
... with buildLink(parameter) is a Javascript function that will make a custom link due to the parameter.
Please help me to solve this problem. How to simulate JavaScript in this app? I can parse the HTML source code and take all JavaScript code to another file, but how to simulate a function of it?
Thanks.
Your only real option is to automate a browser. As other answers have said, you cannot reliably simulate browser javascript without having a complete DOM.
There are fortunately ways to automate the browser, check out Selenium.
It has a C# API, so you can control the browser from C#.
Use your .NET web crawler code to crawl the site. Whenever you encounter a href="javascript:... link, handle the page containing the link in Selenium:
Use the Selenium API to tell the browser to load the page.
Use the Selenium API to find all links on the page.
This way, your spider only uses Selenium when necessary (pages without javascript links can be handled by the browser-less spider code you already got). And since this is an embarrassingly parallel workload, you could easily have multiple Selenium processes running at the same time (either on one computer or on other computers).
But remember that href="javascript is hardly the only way a page can have dynamic links. The more common case is probably that a onload or $(document).ready() script manipulates the DOM and adds links that way.
To catch that case (and others), the spider probably will have to use Selenium for all pages that have a <script> tag.
You are basically pretending to be a browser, except that HttpWebRequest only does the networking stuff for you.
I would recommend using the ie web browser control and interop'ing into that from your c# application. That will allow you to run JavaScript, set variables, post, etc etc.
Here's some basic links I found after a search for "ie web browser control":
http://www.c-sharpcorner.com/UploadFile/mahesh/WebBrowserInCSMDB12022005001524AM/WebBrowserInCSMDB.aspx
http://support.microsoft.com/kb/313068
This is a problem which is not easily solved. You could consider taking one of the existing JavaScript implementations and porting or interfacing with it somehow.
If I were tackling this problem, I'd probably build a small side application in Java on top of Rhino, with some sort of RPC framework layered on top of that so that I could communicate with it from my primary application.
Unfortunately, without having a complete DOM implementation on top of that, you would be limited to only very simple javascript.
You could execute the javascript by using the MS JScript engine or something similar.
MSDN Reference
Eric Lippert's blog on using Eval (part 1) (part 2) (part 3)
This isn't guaranteed to work, especially if the javascript tries to access the DOM, or somesuch... But for simple scripts, it might be enough.