I have a small personal project which consists of grouping several data from several websites in order to have them all in the same place. Because of this, I use BeautifulSoup with Flask to be able to scrape the data.
However, I wish I could do the same for the following site: https://cartographie.ville.terrebonne.qc.ca/patin-exterieur/
but it's an interactive map and I can't take the information from each popup when clicking on a skate. After reading about it, I've come to the conclusion that the website in question must be using an API to display their data so I could be able to parse the datas, but navigating through Google chrome's developer mode I can't seem to find the correct api where all the data is stored. Does someone have any ideas?
In Chrome DevTools "Fetch/XHR" tab, you can see new API requests as you move the map around. Looks like the API response is application/x-protobuf, which is most likely a binary data response that you would need to parse.
They are using https://developers.arcgis.com/javascript/latest/maps-and-views/
Which is part of Esri. A big machine learning data science service.
You wont get far with this to be honest as you wont get any authorization.
You can go through the website and press F12, then go to the Network part and Headers look for "Request Headers"
Related
I'll try to explain everything the way I understand it as clearly as possible, so correct me if I'm confused with something.
I was trying to scrape the users from a member list on a website, I used Python and the first thing I did was making a post request to the Request URL with the required headers so I get a response that contains the data I need but this didn't work, so I tried to find out the reason.
From what I understand now the website uses AJAX to make XHR and JavaScript calls which respond with the content (users).
The JS code is stored on a static website from what Chrome's developer tool request initiators
tell me (Here is an image for reference), which responds with the HTML that contains the users
The idea is to create a script that runs this static JS script that's stored online and fetch the data about the users from it. (Image for clarification)
How do I achieve this, I'm using python. What libraries do I need etc.? Any help/advice is greatly appreciated!
Based on your questions, I think you're trying to load data from a website that uses AJAX to load data.
In my opinion, have a look at Scray and some Headless Browers.
Check the flowing links for more information
https://scrapy.org/
https://github.com/puppeteer/puppeteer
https://github.com/pyppeteer/pyppeteer
Im kind of new to this and looking to expand pulling API results and displaying them on page, whether it's from a blog resource or content generation.
For example, I want to pull from VirusTotal's API to display returned content. What is the best way to capture that in an input tag and display it in a DIV. And what if it were an option to pull from different API's based on drop down selection?
An example of the API to pull content would be here https://developers.virustotal.com/reference#api-responses under the /file/report section.
To call the data from the API, you need to send a request. However, there is a problem with CORS. Basically, you can't call the website from inside your web browser from a page on your local machine, because your browser blocks the request. The web browser will only allow calls to and from the same server, except for a few exceptions.
There's two ways to approach this.
The simplest one is to make a program that calls the API and outputs an HTML file. You can then open that HTML file to read the contents. If you want to update the info, you would need to run that program once again manually. You could easily do this building off the python they provided.
The other, little bit more complex way, is where you host a server on your PC. When you go to the webpage on that server, it sends a request to the website, and then provides the latest information. There's tons of frameworks and ways to do this. For an absolute beginner on this subject, ExpressJS is a good start. You can make a hello world program, and once you do that you can figure out how to call the API whenever a page is loaded, and display the results.
I have a news application that's on my desktop, and I am wondering if there is a way to pull data from this app directly using python. I will post a picture of what the app looks like.
Lets say I want to pull the DE GDP results as they change upon news release. Is there any way I can gain access to these values locally, rather than trying to pull it using json or xpath? I tried JSON and got it to pull values, but it takes up to 10 seconds to update, and I want it to update as fast as possible. I figured the fastest way to get the updated values would be to pull them locally on my computer somehow, but I am not very knowledgeable on this sort of thing. Thank you for your help!
you can inspect the page and in network section see the url which is being called to render the data, and you can see the response as well.
like i am done it for sample page.
you can see the screenshot.
click on this link to see the image
No. You need to use a web service. After all, the app you are showing must be pulling its data from some service. You could use a network sniffer to find out where exactly.
[
seeing the network it says the url which is fetching data is.
https://next.newsimpact.com/NewsWidget/GetNextEvents?offset=-330
and when you open it in new tab.
so you can easily use this url to fetch the json data with request library in python
some brief code:
pip install requests
import requests
r = requests.get('https://next.newsimpact.com/NewsWidget/GetNextEvents?offset=-330')
print r.json
So apparently, Facebook collects huge amount of data from user, and not just from hitting the Like button, but also the amount of time that user has spent looking at some post by someone ( reading friends status update ). Is there a method to see what am I actually sending to Facebook and when (time is relevant). Can I view those requests in Windows 7?
Is it possible to do reverse engineering on this particular topic?
This will be available in all browsers but if you have access to Chrome then go to "View" and then "Developer" and then "Developer Tools".
From here select the "Network" tab.
You can then see all traffic travelling between your browser and the internet. You can filter this list down based on the Domain Name, e.g. facebook or any other aliases they use.
Click on any item in the list and then you can see the request and response.
This should help you to get started.
It's not that simple..
Facebook will use their own format, not like the most simple aka REST / JSON applications.
Facebook is making it very very hard to read/understand their APIs, obviously...
They will use some kind of their own binary data implementation, so really if you look at the post data, its just number, some (maybe) encrypted token like data, stored in a base64 format.. what ever..
Additionally, FB is using a lot of AI processing, this is no rocket-science anymore.. The internal APIs could also work based on that. So reverse-engineering FB makes no sense. Just write you own.
I also think, that many very good IT specialists are already trying it. Companies like FB will also make internal contests on this topic, to make their APIs even more secure. Actually, if you do some Online Banking, you will find more useful information on what data was send, then on FB.
I have a tricky one, I am trying to scrape data from http://www.vafinancials.com/v5/plugins/quick_stats.php?id=25129.
I was able to do this for another website where I pulled the page source and parsed the source for the data I was looking for. However with this site I can't seem to find any way of scraping the relevant data.
So I am curious how would one scrape data on such a site where jQuery seems to be generating the result behind closed doors.
Anyone have any ideas?
Use Firebug or the integrated developer tools of your browser to see what's going on with the requests and responses of the page.
In your case there is an ajax call fetching the data:
The response is XML:
I would use this URL to fetch the data.