Automating Add to Cart Javascript Sequence without browser open with Python - javascript

I'm looking for a method to automate an add-to-cart process using Python WITHOUT needing to have a browser window open.
I've tried using modules such as mechanize but it does not have the functionality of directly "clicking" a web element
Currently I've been able to automate this process using Selenium but the process of having to open the browser and load web elements, photos, etc adds up to a lengthy process where time is of the essence.
An example page that I would like to automate is here :
http://store.nike.com/us/en_us/pd/kd-vi-elite-basketball-shoe/pid-972328/pgid-972324?cp=usns_twit_041214_basketball_kdelitehome
Any direction is greatly appreciated.

It seems that in the web page you listed, the "Add to Cart" button is actually a form submit button. What you can do is simulate the submission of the form by doing a POST request, with all the necessary form parameters, which you can get from all the <input> elements on the page.
A possible python implementation may be:
Download the page with urllib2. You will probably have to enable cookies.
Parse the page using BeautifulSoup or similar, and find all the <input> tags and their values.
Construct a new POST request with all these params (while maintaining cookies).
You can use your Browser's Network sniffing capabilities to see an actual request being sent, and try to mimic it using the above tools.
Hope it helps.

Related

Building a bot that fetches data from browser and saves it as text file

My problem is a bit complex. Hard to explain with words, so I broke it down into steps with pictures at each step.
Select a single date from these boxes. Hit submit
I will land on a page with a table. Copy the <tbody> element from the developer console.
Paste it into a text file. Save the text file with the date that was selected.
Repeat steps 1-3 for as many times as needed, selecting a new date each time (01-15-2018, 01-14-2018, 01-13-2018, and so on...)
Is it even possible to build a bot that does this? If yes, what tools would I use?
I know a fair amount of JavaScript and Python, so I'd prefer to use those 2 if possible.
Would need to know the URL you're looking at/look at the page source. If the date is supplied as any part of a request, and the response contains this data you're looking for, it should be simple to farm and analyze that data from a python script.
Walk through your clicks with the network tab of your browser's developer tools and you should see a request go out when you hit submit. Expedia just uses query parameters, and so the entire URL that you'll need pops up in the URL bar of your browser after hitting submit...
Tools:
If request-based:
Python
Requests module
If something cached/more complicated, there are tools for automating clicks and saving the results...I would guess that this won't be necessary though...
Update:
AJAX calls are HTTP requests and responses, and so you should be able to observe them in your networking tab of our web browser developer tools, and then mimic that request from a script, rather than from your browser.
The readability of the requests/responses and/or any measures the organization has implemented to make any application other than a browser unable to get the same response would be potential impediments, but even those should be imitable. If your browser is making the request, then there is no reason your python script can't make the same.
The method you seem to be interested in, although it sounds more complicated to me, is possible with automation tools like Selenium, as the other poster answered. Best of luck.
It is possible:
Take a look at selenium library (its commonly used for automated testing) for python. It should be able to select single dates, hit the submit button then go through the HTML code and grab data in the tag. After that you can use python by itself to store this data in a text file with a name of your choice in a location of your choice.

Web scraping of modal window(dialogue box) using jsoup

I am studying about the project in which I have to extract the data from the website . The project is in java and the website is in java script . I am using Jsoup to extract the data from the website But there are some modal windows(dialogue box , pop up windows) present in the web page.So Is it possible to extract the data of modal windows using jsoup?????
So if answer is yes , then how could I do it?? please provide links and if not, then what are the other best ways to do it???
Thanks for your help. I really appreciate it.
I assume that the modal is generated by Javascript.
Jsoup is just a parser. This means that it will make an HTTP request (GET or POST, whatever you tell it to do) and the server (website) will respond with the initial html. By saying initial, I mean the html before any javascript is executed.
Javascript can generate html (like the modal in question), but this is not visible to Jsoup because a parser can only read, it cannot execute code. The browser is able to generate the modal because it includes a Javascript execution engine that parses and executes Javascript.
When you visit a web page you don't know what is dynamic (generated by Javascript) and what is static (fetched by the server as is).
A little trick to check what is dynamic and what is static (static is visible to Jsoup) is to do the following:
Visit the web page you want to parse (with chrome if possible, mozilla will work too I think).
Press Ctrl + U. This will open a new tab.
The new tab will contain some mesh of html, css and js. This is what the server fetches to the browser and is also visible to Jsoup.
If the modal is in there, then great, it is visible to Jsoup. If not, then you have to use a library that acts as a headless browser.
A headless browser is essentially a browser without the graphical interface. It can parse and execute Javascript. It "sees" what a normal browser sees.
The most common library used is selenium webdriver. Be careful, selenium is a testing framework that has a lot of parts. What you need is the webdriver.
There a lot of examples out there with ready made code to get you started.

Auto form filling using java script

My requirement is to write one script when I run the script it opens the page and fill the fields and automatically take me to next page.
For e.g. Script for www.irctc.co.in. When we login to irctc it ask the user name and password and when click on submit it redirect to next page.
I want to write a script in such a way that I just click on the script it internally does all these things and I could see the next page.
I am unable form where I should start.
I think you are looking for something like Greasemonkey: https://addons.mozilla.org/en-US/firefox/addon/greasemonkey/
Greasemonkey is a Mozilla Firefox extension that allows users to install scripts that make on-the-fly changes to web page content after or before the page is loaded in the browser.
If you use a different browser, then you can refer to: http://en.wikipedia.org/wiki/Greasemonkey#Equivalents_for_other_browsers
Check Watir - Web Application Testing in Ruby. Although it is used for automation, it might solve the purpose here. With Watir, you write scripts in ruby and execute it and then see the magic. More information can be found here

How do you keep content from your previous web page after clicking a link?

I'm sorry if this is a newbie question but I don't really know what to search for either. How do you keep content from a previous page when navigating through a web site? For example, the right side Activity/Chat bar on facebook. It doesn't appear to refresh when going to different profiles; it's not an iframe and doesn't appear to be ajax (I could be wrong).
Thanks,
I believe what you're seeing in Facebook is not actual "page loads", but clever use of AJAX or AHAH.
So ... imagine you've got a web page. It contains links. Each of those links has a "hook" -- a chunk of JavaScript that gets executed when the link gets clicked.
If your browser doesn't support JavaScript, the link works as it normally would on an old-fashioned page, and loads another page.
But if JavaScript is turned on, then instead of navigating to an HREF, the code run by the hook causes a request to be placed to a different URL that spits out just the HTML that should be used to replace a DIV that's already showing somewhere on the page.
There's still a real link in the HTML just in case JS doesn't work, so the HTML you're seeing looks as it should. Try disabling JavaScript in your browser and see how Facebook works.
Live updates like this are all over the place in Web 2.0 applications, from Facebook to Google Docs to Workflowy to Basecamp, etc. The "better" tools provide the underlying HTML links where possible so that users without JavaScript can still get full use of the applications. (This is called Progressive Enhancement or Graceful degradation, depending on your perspective.) Of course, nobody would expect Google Docs to work without JavaScript.
In the case of a chat like Facebook, you must save the entire conversation on the server side (for example in a database). Then, when the user changes the page, you can restore the state of the conversation on the server side (with PHP) or by querying your server like you do for the chat (Javascript + AJAX).
This isn't done in Javascript. It needs to be done using your back-end scripting language.
In PHP, for example, you use Sessions. The variables set by server-side scripts can be maintained on the server and tied together (between multiple requests/hits) using a cookie.
One really helpful trick is to run HTTPFox in Firefox so you can actually monitor what's happening as you browse from one page to the next. You can check out the POST/Cookies/Response tabs and watch for which web methods are being called by the AJAX-like behaviors on the page. In doing this you can generally deduce how data is flowing to and from the pages, even though you don't have access to the server side code per se.
As for the answer to your specific question, there are too many approaches to list (cookies, server side persistence such as session or database writes, a simple form POST, VIEWSTATE in .net, etc..)
You can open your last closed web-page by pressing ctrl+shift+T . Now you can save content as you like. Example: if i closed a web-page related by document sharing and now i am on travel web page. Then i press ctrl+shift+T. Now automatic my last web-page will open. This function works on Mozilla, e explorer, opera and more. Hope this answer is helpful to you.

web crawler/spider to fetch ajax based link

I want to create a web crawler/spider to iteratively fetch all the links in the webpage including javascript-based links (ajax), catalog all of the Objects on the page, build and maintain a site hierarchy. My question is:
Which language/technology should be better (to fetch javascript-based links)?
Is there any open source tools there?
Thanks
Brajesh
You can automate the browser. For example, have a look at http://watir.com/
Fetching ajax links is something that even the search-giants haven't accomplished yet. It is because, the ajax links are dynamic and the command and response both vary greatly as per the user's actions. That's probably why, SEF-AJAX (Search Engine Friendly AJAX) is now being developed. It is a technique that makes a website completely indexable to search engines that when visited by a web browser, acts as a web application. For reference, you may check this link: http://nixova.com
No offence but I dont see any way of tracking ajax links. That's where my knowledge ends. :)
you can do it with php, simple_html_dom and java. let the php crawler copy the pages on your local machine or webserver, open it with an java application (jpane or something) mark all text as focused and grab it. send it to your database or where you want to store it. track all a tags or tags with an onclick or mouseover attribute. check what happens when you call it again. if the source html (the document returned from server) size or md5 hash is different you know its an effective link and can grab it. i hope you can understand my bad english :D

Categories