Scrape a webpage with AJAX using Python [closed] - javascript

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I know about the basics of scraping HTML with Python's Beautiful Soup. However, this soccer statistics page makes a AJAX call to get data on minutes played by a player. (I identified the network call using firebug).
My question: is it even possible to use python to "scrape" this information? What tools would I need and what beyond HTML should I know? (I'm currently reading up on JavaScript and AJAX).
I apologize for this non-specific question, but I don't even know how to Google about tools that may or may not exist.
UPDATE: After a few days I came up with a solution using Selenium in Python in conjunction with PhantomJS. I basically used Selenium to go to each link, waited for the page to load, then scraped the information. PhantomJS serves as the headless webdriver in Selenium.
I understand why mods want to close this, but the advice people gave me here was extremely helpful since they launched me into the right direction. My question wasn't too much about what tool is best either, but more about how I can do this in Python.

Using python is unnecessary and will not work in many cases, best way is to run a proper browser and use javascript to do all the scraping, as it will have access to whole DOM, and you can even bind to events.
There are many good headless browsers with scripting support, my favourite is PhantomJS, you can use it to load webpages and scrape them or save them as image e.g.
var page = require('webpage').create();
page.open('http://github.com/', function () {
page.render('github.png');
phantom.exit();
});
But then there are scraping frameworks build over PhantomJS e.g pjscrape

If you have to use Python to crawl the AJAX information, maybe you can try ghost.py project. ghost.py is a webkit web client in Python, using PyQt webkit. You can acquire the AJAX information after execute relevant js code.
Anyway, PhantomJS is a better choice if you are familiar with js.
Hope my answer helps.

Related

Simple Way to Experiment with Javascript? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I first would like to probably apologize in advance for this question, because this is so low-level it's embarrassing.
Right now, I'm learning Javascript through Codecademy, and while I'm enjoying it, I want to have an environment where I can experiment with what I'm learning in a way where I can see results of what I'm programming, much like what I see when I'm going through the tutorials.
I'm sure I'm missing some incredibly obvious answer, but it looks to me like every system I've seen so far is for either writing the code or running it, not something that will let me quickly try something, hit 'run', and see what the results are. I've looked at Sublime Text, Aptana, and some other things, but they don't really do what I want.
I'd really just like a basic environment that's like Codecademy Labs, but in software form.
Again, I apologize, I feel really dumb asking this question, but I was hoping to get some help.
A modern web browser (e.g. Chrome) is a full-featured Javascript environment with a console, interactive debugging, and all manner of useful tools. Write your code in the editor of your choice (I do like Sublime, myself, but to each one's own) and open the file in your browser with the dev tools. You can even open the file in multiple tabs for multiple independent sessions.
If you want offline solution only, then ya any modern browser like Google Chrome Console is enough.
However you can also try the w3schools try it editor, you can run both html and javascript in it. I use it sometimes, quite simple and handy (although online)
http://www.w3schools.com/html/tryit.asp?filename=tryhtml_basic

Why is a browser extension more secure than a pure web-based application? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I love the idea of crypto.cat for sharing sensitive information. Recently, i had to send my wife my social security number and i didn't want to use email/sms/IM/etc... I wanted to use crypto.cat, but she didn't want to install an extension on her work computer, so just ended up calling her.
I found myself wondering why an extension is even necessary. Looking back through their blog, I found that they switched from a pure web-based application to a browser extension. They claimed this improves security but they didn't explain why.
Looking through their github, the code appears to be all javascript so why not just skip the extension? Im thining about forking Crypto.cat and re-implementing a pure web-based version, but I'd like to understand why this is a bad idea before i start.
My ideas so far
Using an extension would make phishing more difficult
It helps prevent code injection attacks by men-in-the-middle. If you go to the Crypto.cat website every time you want to use the service, your browser will download the application source code to execute. A MITM could use this opportunity to inject code, which undermines the whole security of the service. Even SSL wouldn't necessarily help much unless you pay very close attention to the certificate and the entire chain of trust, since a MITM could wedge in his own certificate.
Installing a browser extension under trusted conditions once mitigates these concerns, since then the whole code is already on your machine and nobody can inject anything.

possible JS "file browser"? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
What I have been looking for is almost like a music player. Where it will display folders (artists) and then display the contents of that folder (music). This will allow for me to upload folders and files using FTP and then my users to play or download the files. I do not need any type of reading, editing, deleting features.
Here is a quick mockup of what I have pictured in my head:
If anyone has any idea of what this is called or where to look for something please let me know because I have not been able to find anything close.
You won't be able to do it with pure HTML/Javascript. You will need some other coding framework/language to access the file system, because client-side Javascript does not have file system access.
[Update]
As some users have noted, Javascript framesworks such as Node.js could be used to gain file system access.
What you're describing is a web application. This will involve the usual webserver+web framework+clientside javascript stack. If you want to do this all in js, use something based on node.js on the server side.
I built a desktop-like media player based on Chrome's webkitdirectory a few years ago.
Here's a demo of the attribute https://html5-demos.appspot.com/static/html5storage/demos/upload_directory/index.html (Only works in Google Chrome afaik). Just select some folder and there you go.
On non-Chrome browsers the same is still possible but since there is no support for directory attribute in the input element, it will be less convenient to select files. Also on Firefox you would need MP3 decoder implemented in user code.

Tutorial/Example of how to implement "Browse A to Z" for a website [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I have found that simply googling this does not return what I am looking for. I am to find something simple and easy. I don't know if this requires javascript or not. I know I can "View Page Source" but I was hoping to find a tutorial. Some examples of what I am talking about can be found here:
-IBM
-Auburn
-About.com
Javascipt code works by running code on the viewer's computer. The pages you're linking are being dynamically generated by code that runs on the webserver itself, not in the browser. More than likely, all of those sites have some sort of database behind them.
I see from your other questions that you know C#. Microsoft provides a framework that uses C# known as ASP.NET. You can write code in C# that will run whenever someone views a page on your site (provided your site is running under IIS).
The ASP.NET Community website is a great resource if you want to find out more about that.
Other such tools that perform server-side operations would be PHP, Ruby on Rails, or Django (to name a popular few).
From viewing the examples you mentioned, it does not seem like there is any javascript used to make these "browse a to z" lists. (There should be a better name for them than that. I'm just going to call it a sitemap.)
I couldn't find any tutorials online that would teach expressly this type of sitemap, but figuring it out should be pretty straight-forward. (At least for implementing sitemaps like the IBM or Auburn examples. The About.com example would be more difficult as it seems that it is backed by a database or lots and lots of individual html pages.)
The trickiest part of making a sitemap page like these is using the tag, and luckily, it is way easy. Just keep checking the source of those pages you can have your own version in no time. The most time consuming part will be putting all of the links down, from A to Z.
http://www.w3schools.com/HTML/html_links.asp

What is the best way to profile javascript execution? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Is there a good profiler for javascript? I know that firebug has some support for profiling code. But I want to determine stats on a longer scale.
Imagine you are building a lot of javascript code and you want to determine what are actually the bottlenecks in the code. At first I want to see profile stats of every javascript function and execution time. Next would be including DOM functions. This combined with actions that slows things down like operation on the rendering tree would be perfect. I think this would give a good impression if the performance is killed in my code, in DOM preparation or in updates to the rendering tree/visual.
Is there something close to what I want? Or what would be the best tool to achieve the most of what I've described? Would it be a self compiled browser plus javascript engine enhanced by profile functionality?
Firebug
Firebug provides a highly detailed profiling report. It will tell you how long each method invocation takes in a giant (detailed) table.
console.profile([title])
//also see
console.trace()
You need to call console.profileEnd () to end your profile block. See the console API here: http://getfirebug.com/wiki/index.php/Console_API
Blackbird
Blackbird also has a simpler profiler
Blackbird official site from Wayback Machine
Source from Google Code Archive
Source from Github (pockata/blackbird-js: A fork of the cool Blackbird logging utility)
Source from Github (louisje/blackbirdjs: Blackbird offers a dead-simple way to log messages)
Chrome's Developer Tools has a built-in profiler.
Although Firebug has been mentioned, one additional thing you would want to look at with Firebug is a plugin for Firebug called FireUnit; John Resig talks about it in this blog post:
JavaScript Function Call Profiling
Hope that helps.
Firebug+Firefox is a must have. And IE 8's developer toolbar also has a profiler built in (IE 8 ships with the developer toolbar).
Safari 4's web inspector also includes a profiler (although the version in the nightlies is improved wrt. recursive function calls). The Web Inspector also supports Firebug's profiler APIs.
For JavaScript, XmlHttpRequest, DOM Access, Rendering Times and Network traffic for IE6, 7 & 8 you can use the FREEdynaTrace AJAX Edition

Categories