How can I make JavaScript calls to scrape data from a website? - javascript

I'll try to explain everything the way I understand it as clearly as possible, so correct me if I'm confused with something.
I was trying to scrape the users from a member list on a website, I used Python and the first thing I did was making a post request to the Request URL with the required headers so I get a response that contains the data I need but this didn't work, so I tried to find out the reason.
From what I understand now the website uses AJAX to make XHR and JavaScript calls which respond with the content (users).
The JS code is stored on a static website from what Chrome's developer tool request initiators
tell me (Here is an image for reference), which responds with the HTML that contains the users
The idea is to create a script that runs this static JS script that's stored online and fetch the data about the users from it. (Image for clarification)
How do I achieve this, I'm using python. What libraries do I need etc.? Any help/advice is greatly appreciated!

Based on your questions, I think you're trying to load data from a website that uses AJAX to load data.
In my opinion, have a look at Scray and some Headless Browers.
Check the flowing links for more information
https://scrapy.org/
https://github.com/puppeteer/puppeteer
https://github.com/pyppeteer/pyppeteer

Related

Simplest way to display API results

Im kind of new to this and looking to expand pulling API results and displaying them on page, whether it's from a blog resource or content generation.
For example, I want to pull from VirusTotal's API to display returned content. What is the best way to capture that in an input tag and display it in a DIV. And what if it were an option to pull from different API's based on drop down selection?
An example of the API to pull content would be here https://developers.virustotal.com/reference#api-responses under the /file/report section.
To call the data from the API, you need to send a request. However, there is a problem with CORS. Basically, you can't call the website from inside your web browser from a page on your local machine, because your browser blocks the request. The web browser will only allow calls to and from the same server, except for a few exceptions.
There's two ways to approach this.
The simplest one is to make a program that calls the API and outputs an HTML file. You can then open that HTML file to read the contents. If you want to update the info, you would need to run that program once again manually. You could easily do this building off the python they provided.
The other, little bit more complex way, is where you host a server on your PC. When you go to the webpage on that server, it sends a request to the website, and then provides the latest information. There's tons of frameworks and ways to do this. For an absolute beginner on this subject, ExpressJS is a good start. You can make a hello world program, and once you do that you can figure out how to call the API whenever a page is loaded, and display the results.

Pulling data from php website that uses jQuery. No relevant source information

I have a tricky one, I am trying to scrape data from http://www.vafinancials.com/v5/plugins/quick_stats.php?id=25129.
I was able to do this for another website where I pulled the page source and parsed the source for the data I was looking for. However with this site I can't seem to find any way of scraping the relevant data.
So I am curious how would one scrape data on such a site where jQuery seems to be generating the result behind closed doors.
Anyone have any ideas?
Use Firebug or the integrated developer tools of your browser to see what's going on with the requests and responses of the page.
In your case there is an ajax call fetching the data:
The response is XML:
I would use this URL to fetch the data.

How to prevent script sharing from it's users?

I am making a small payment system, basically it's just a point system, you pay say 1 USD and you get 100 points which is used later on in a game project to get bonuses. It's a script for game servers, something like a user panel.
Now, the script system is ready, but I'm afraid to give it away, since than someone will share it and it will spread all over the gaming area. What would be the solution keeping it working only if I give them a permission?
I thought about re-making whole code and make it work on my website but I don't think that people will want to put their SQL data to website that located NOT on their host. Please help me out, at least with some clues, maybe its possible to make some widgets? or maybe some license system?
I'm really lost.
You should implement the logic on the server side as an api REST call and include in the script only an ajax call to the api. You can limit the use of the api through an api key that you'll provide only to qualified sites.
You'd need to implement some sort or serverside authentication/api so that only varified users can use the script. Much like how software checks a licence.
On script load your javascript could make a ajax call to a server passing through the users IP, auth key, username etc etc.
This can then be varified on the server, maybe returning a dynamically generated url containing a javascript file which contains your business logic
(so that urls are dynamically generated for that users session only)
That way people cant hot link the script, and the script you give out is solely the ajax call
(With the business logic script injected on auth)

How to gather information from a URL and decode it in another application

I'm looking to create a web based tool that will allow me to pull information from a website via URL and decode this content in a more usable fashion.
Here is the situation: The company I work for has log files any time there is an error in the code. The error's are then searchable in a back end application but contain raw data and is fairly time consuming to find the useful information. Those logs are then accessible with a static link.
I'm pretty sure I can figure out the decoding process, however I am struggling with how to get the content I need to decode.
The content I am looking to decode is not on the same domain but are within my companies network.
Any help would be greatly appreciated
If it's not on your domaine you can't request the link using javascript, so you need to do it in the backend with curl or other HTTP clients.

Rails pushing content dynamically after finishing request

At the moment, im building a Website with Ruby on Rails.
Problem:
My Website is using different foreign API's for getting Data, for example the Amazon Product Advertising API. If i load e.g. 10 objects at once, it tooks to mutch time.
Its possible to load each object particular? (If one request finished push it with javascript on the page, or something like that) The user should be able to read the first objects while the rest of the content is loading in the background.
simple example:
list.each do |object|
result << AmazonRequest.getItem(object)
[And now push the changed result list to the view]
end
Is this possible? If yes, how?
Thanks :)
I don't really know RoR, but if you do not need to have all the different API results on the server before sending them out (which seems to be the case), you could just make multiple ajax requests and display content from the different APIs independently.

Categories