I have a few clients that pay me to post ads on sites such as craigslist.com an backpage.com. Currently every hour or so I have a macro that runs and I manually do the captchas (which I'm fine with). But now I have some free time and I want to write a proper program to prevent stupid errors that can happen with macros (screen resize, miss clicks etc).
Part of my posting includes selecting images to upload. I know for security reasons javascript doesn't let you specify which file the user uploads, that part is decided on their own. I'm sure I could do it using NodeJS somehow since it would be local on my machine, but I don't even have the slightest idea how I would even approach this.
Any guidance or direction would be very helpful.
if you use nodeJS, you need to work hard, like
- get html content and parse it
- construct input that you want
- re-submit form, re-post data
the easier way is to use web browser automation like selenium to work end to end for you
more info: http://www.seleniumhq.org/
If you are familiar to Nodejs and JavaScript then I recommend you use Protractor.
It is the current default end-to-end automation testing tool for AngularJs applications but I'm pretty sure it will solve your problem.
Instead of using AngularJs specific selectors (like element(by.model)) to "find" your html elements you will use regular css selectors like: $("div.top") returning an array of all divs with a css class named top, for exemple.
Protractor implements Selenium Web-driver protocol, that means that the scripts that you write will communicate with almost any automation ready browser like ChromeDriver, FirefoxDriver or PhantomJsDriver (for a no GUI low fidelity but fast alternative).
Make sure you check the getting started section for a jump start.
Related
I'm scraping content from a website using Python. First I used BeautifulSoup and Mechanize on Python but I saw that the website had a button that created content via JavaScript so I decided to use Selenium.
Given that I can find elements and get their content using Selenium with methods like driver.find_element_by_xpath, what reason is there to use BeautifulSoup when I could just use Selenium for everything?
And in this particular case, I need to use Selenium to click on the JavaScript button so is it better to use Selenium to parse as well or should I use both Selenium and Beautiful Soup?
Before answering your question directly, it's worth saying as a starting point: if all you need to do is pull content from static HTML pages, you should probably use a HTTP library (like Requests or the built-in urllib.request) with lxml or BeautifulSoup, not Selenium (although Selenium will probably be adequate too). The advantages of not using Selenium needlessly:
Bandwidth, and time to run your script. Using Selenium means fetching all the resources that would normally be fetched when you visit a page in a browser - stylesheets, scripts, images, and so on. This is probably unnecessary.
Stability and ease of error recovery. Selenium can be a little fragile, in my experience - even with PhantomJS - and creating the architecture to kill a hung Selenium instance and create a new one is a little more irritating than setting up simple retry-on-exception logic when using requests.
Potentially, CPU and memory usage - depending upon the site you're crawling, and how many spider threads you're trying to run in parallel, it's conceivable that either DOM layout logic or JavaScript execution could get pretty expensive.
Note that a site requiring cookies to function isn't a reason to break out Selenium - you can easily create a URL-opening function that magically sets and sends cookies with HTTP requests using cookielib/cookiejar.
Okay, so why might you consider using Selenium? Pretty much entirely to handle the case where the content you want to crawl is being added to the page via JavaScript, rather than baked into the HTML. Even then, you might be able to get the data you want without breaking out the heavy machinery. Usually one of these scenarios applies:
JavaScript served with the page has the content already baked into it. The JavaScript is just there to do the templating or other DOM manipulation that puts the content into the page. In this case, you might want to see if there's an easy way to pull the content you're interested in straight out of the JavaScript using regex.
The JavaScript is hitting a web API to load content. In this case, consider if you can identify the relevant API URLs and just hit them yourself; this may be much simpler and more direct than actually running the JavaScript and scraping content off the web page.
If you do decide your situation merits using Selenium, use it in headless mode, which is supported by (at least) the Firefox and Chrome drivers. Web spidering doesn't ordinarily require actually graphically rendering the page, or using any browser-specific quirks or features, so a headless browser - with its lower CPU and memory cost and fewer moving parts to crash or hang - is ideal.
I would recommend using Selenium for things such as interacting with web pages whether it is in a full blown browser, or a browser in headless mode, such as headless Chrome. I would also like to say that beautiful soup is better for observing and writing statements that rely on if an element is found or WHAT is found, and then using selenium ot execute interactive tasks with the page if the user desires.
I used Selenium for web scraping, but it is not happy solution. In my last project I used https://github.com/chromedp/chromedp . It is more simple solution than Selenium.
I am trying to figure out how to load test my JavaScript widget. It will be running on around 1000 websites each with about 2k visitors a day so i really need to find a way to test this scenario before letting my users install the widget.
Any ideas what approach i should take?
It sounds like the widget itself (an instance of your javascript code running in a browser) will never be used by more than one user (one browser) at a time. Your server-side APIs, OTOH, will. So you need to test your server by simulating the level of HTTP traffic to your server that you expect the use of your widget to generate. For this, you want web load testing tools.
Depending on how simple the APIs are, you may be fine with something like JMeter. If the API usage is more complex, you might want a more sophisticated tool.
I am looking into multiple web testing tools. I am trying watir now. My main concern is dealing with javascript. I just want to know if anyone can give me an overview on dealing with javascript in watir. What are some of the pitfalls and difficulties with it? Is it basically using javascript injections to tell the page what to do?
And if someone wants to suggest other web testing tools like watir I would appreciate it.
I tried selenium first and found it to be a tad unreliable.
Are there any cheap tools on the market?
Thank You!
Watir + Javascript => generally it's possible to inject javascript into your tests e.g.
#b.goto("javascript:openWin(2)")
When you say 'dealing with javascript' I assume you mean how well does Watir handle client side code in terms of rendering/execution. Since Watir drives real browsers (like Selenium) then JS will execute generally as expected.
Watir has different many different drivers, e.g. watir, firewatir, safariwatir, chromewatir, operawatir and now watir-webdriver. All of them drive the browser in slightly different implementations depending on the browser and OS. Firewatir for example uses JSSH which is in effect controlling the browser via JS. Can you explain what you mean by Selenium being 'unreliable'?
I'd recommend looking at the latest implementation of watir-webdriver. That way you get the benefit of a nice watir API on top of a new driver implementation. Webdriver has some strong backing in terms of support (Selenium 2 uses it, Google is coding it!) so I reckon it's a safe bet. You can also control most of the major browsers with this implementation.
Alternative tools => http://wiki.openqa.org/display/WTR/Alternative+Tools+For+Web+Testing
Tim provides a pretty good answer.
The only thing I have to add to what he said is that I've found that now and then I have to use the watir methods to fire specific javascript events such as onmouseover in order to accurately simulate the user interacting with the page. Since watir has a method for this, the hard part is not the watir code, but reverse engineering the page (or noticing subtle page interactions based on user actions) to figure out what elements are 'wired' up to what events and the order to fire those events against those specific elements.
Usually it's pretty easy to look at the HTML for an element and see what's going on. But with some custom controls it can take a bit of learning because they manage to do a pretty good job of 'hiding' all the event wiring, and you may have to parse through various aspects of the page (styles and all) using something like fiddler.
(after all, the normal user will never 'force' javacript to execute, or 'inject' javascript. They will use the mouse and keyboard to interact with the page, and any javascript is going to be a result of scripts that execute when the page is loaded, or as a result of scripts triggered via events based on specific user actions)
If your JS does not trigger a HTML refresh then WATiR will get confused. When you click an object in WATiR it waits for the page to load before continuing. You can overcome this with custom waiting commands and use of '.click!'.
If you are a reasonable ruby coder then WATiR is a solution for most things. It has the potential to be a rather stable and reliable source of automated web testing.
You may want to look into Firewatir, sahi, watir-webdriver, just to give you some more leads (would suggest googling for "open source web testing" and the like if you haven't. I looked into these and many more and settled on WATiR for reasons of cost, power, flexibility and prior knowledge (in ruby and WATiR). With the right gems it will speak to most databases and to Excel (or other file) to load test data.
I'm currently using WATiR to test a ZK-generated interface where none of the IDs are ever static and there's a lot of AJAXiness going on. I just built a framework to deal with these, and it works just fine.
Also, some semi-true and true things that may help.
To pass javascript from watir use browser.execute_script()
Example:
Watir::Wait.until { $browser.execute_script("return document.readyState") == "complete" }
I have the following situation. A customer uses JavaScript with jQuery to create a complex website. We would like to use JavaScript and jQuery on the server (IIS) for the following reasons:
Skills transfer - we would like to use JavaScript and jQuery on the server and not have to use eg VB Script. / classic asp. .Net framework/Java etc is ruled out because of this.
Improved options for search/accessibility. We would like to be able to use jQuery as a templating system, but this isn't viable for search engines and users with js turned off - unless we can selectively run this code on the server.
There is significant investment in IIS and Windows Server, so changing that is not an option.
I know you can run jScript on IIS using windows Script host, but am unsure of the scalability and the process surrounding this. I am also unsure whether this would have access to the DOM.
Here is a diagram that hopefully explains the situation. I was wondering if anyone has done anything similar?
EDIT: I am not looking for critic on web architecture, I am simply wanting to know if there are any options for manipulating the DOM of a page before it is sent to the client, using javascript. Jaxer is one such product (no IIS) Thanks.
Have a look at bringing the browser to the server, Rhino, and Use Microsoft's IIS as a Java servlet engine.
The first link is from John Resig's (jQuery's creator) blog.
Update August 2 2011
Node.js is coming to Windows.
The idea to reuse client JS on the server may sound tempting, but I am not sure that jQuery itself would be ready to run in server environment.
You will need to define global context for jQuery somehow by initializing window, document, self, location, etc.. I am not sure it is doable.
Besides, as Cheeso has mentioned, Active Server Pages is a very outdated technology, it was replaced with ASP.Net by Microsoft in the beginning of the century. I used to maintain a legacy system using ASP 3.0 for more than a year and that was pain. The most wonderful pastime was debugging: you will hardly find anything for the purpose today and will have to decript beautiful errors like in IIS log:
error '800a9c68'
Application-defined or object-defined error
Nevertheless, I can confirm that I managed to reuse client and server JScript. But this was code written by me who knew that it was going to be used on the server.
P.S. I would not recommend move that way. There are plenty templating frameworks which are familiar to those who write HTML and JavaScript.
JScript runs on IIS via something called ASP.
Active Server Pages.
It was first available in 1996.
Eventually ASP.NET was introduced as a successor. But ASP is still supported.
There is no DOM for the HTML page, though.
You might need to reconsider your architecture a bit.
I think the only viable solutions you're likely to find anywhere near ready to go involve putting IIS in front of Java. There are two browser-like environments I'm aware of coded for Java:
1) Env-js (see http://groups.google.com/group/envjs and http://github.com/thatcher/env-js )
I believe this one has contributions from jQuery's John Resig and was put together with jQuery testing/support in mind.
2) HTMLUnit (see http://htmlunit.sourceforge.net/ ) This one's older, and wasn't originally conceived around jQuery, but there are reports in the wild of using it to run jQuery's test suite successfully (http://daniel.gredler.net/2007/08/08/htmlunit-taming-jquery/ ).
If you want something pure-IIS/MS, I think your observation about windowsScript host and/or something like the semi-abandoned JScript.NET is probably about as close as you're going to come, along with a port (which you'll probably have to start) of something like Env-js or HTMLUnit.
Also, I don't know if you've seen the Wikipedia list of server-side JavaScript solutions:
http://en.wikipedia.org/wiki/Server-side_JavaScript
Finally... you could probably write a serviceable jQuery-like library in any language that already has some kind of DOM library and first-class functions (or, failing that an eval facility). See, for example pQuery for Perl (http://metacpan.org/pod/pQuery ). This would get you the benefits of the jQuery style of manipulating documents. Skill transfer is great and JavaScript has a wonderful confluence of very nice features, but on the other hand, having developers who care enough to learn multiple languages is also great, and js isn't the only nice language out there.
I think it's mainly a browser based script so probably you are better of using technologies based on VB or .NET to perform or generate HTML from templates. I'm sure there are because in the java world there are a few of these around (like velocity). You'd then use jQuery to create or add client side functionality and usability so it makes the website more usable than it would have been.
What exactly do you mean by
"A customer uses JavaScript with
jQuery to create a complex website"
Half the point of jQuery is to make it easy for the developer to manipulate the DOM, and therefore add interactive enhancements to a web site. By running the Javascript on the server and only rendering HTML you will lose the ability to add these enhancements, without doing a round trip to the server (think WebForms postback model...ugh).
Now if what you really mean is the customer uses a site builder based on jQuery, why not have that tool output flat HTML in the first place?
Take a look at this technology. You can invoke scripts to run at server, at client, or both. Plus, this really implements the firefox engine on the server. Take a look at it.
Aptana's Jaxer is the first AJAX web server so far. I have not tryed it yet, but I will. Looks promising and very powerful.
I'm trying to do some browser automation, but I'm having some problems. Basically, what I'd like to do is load a set pages, set some forms options, click a button and view the results for each page that I open. Originally, I tried to do this by placing the pages I wanted to automate in iframes and then using javascript to drive the interactions I want in each, but that results in a Permissions Error, since the sites I want to automate are not on my server. Is there any way around this? The other possibility I've thought of is to use QT's webkit class and the evaluateJavaScript method to accomplish what I'd like to do, but this seems a bit more heavy weight for something that is, conceptually, pretty simple.
The tasks that I wanted to accomplish weren't really test related, so a lot of the test-frameworks don't fit the use case that I had in mind (I did try to use Selenium, but ran into problems). I ended up doing what I mentioned in original question and injecting javascript into pages through QT. This ended up working pretty well, although it was a pain to debug, since the javascript had to be passed in as a string and the base environment provided by QT's webkit class doesn't reveal a whole lot.
Check out Selenium: http://seleniumhq.org/. It lets you automate Firefox and is probably the easiest to get start with.
Are you trying to do test automation? If so, there are plenty frameworks for that, like Selenium, WatiN, WebAii and even that built in Visual Studio.
Some of them (WebAii is my favorite) allow you to launch test in a real browser like FireFox.
If a peace of software you searching for is more like form filler, than take a look at iMacros, thay have a complete browser-side scriptable solution.
An easier way of doing this would be to use a web debugging proxy and injecting javascript that way. This should allow you to debug the code you wrote within the browser.
I haven't personally used web debugging proxies, But I wrote my own proxy and did this a while ago just for fun and it worked great.