Scraping a webpage with python to get onclick values - javascript

First of all I have to say: be patient with me because I am not familiar with the argument that I am going to illustrate you.
I'd like to download the intraday historical values of some equities on Frankfurt Boerse website. Let me take this equity for example: http://www.boerse-frankfurt.de/en/equities/adidas+ag+DE000A1EWWW0/price+turnover+history/tick+data#page=1
As you can see there are two options: trades on Frankfurt and trades on Xetra. I'd love to download the latters. I tried to scrape the data but my knowledge of python is very poor.
How can I 'select' the desired onclick option?
Thanks in advance for your replies. Regards
Ps: For your information, I noted the following fact inspecting the Xetra element: it changes value when I move on to next page and if I come back the value is again different. Here an example: first time on page 1 I got
a onclick="d39081344_fkt_set_par('6');d39081344_fkt_set_active(this);" class="brs_d39081344_li current last"
, then I moved on to page 2 and I got
a onclick="d51109535_fkt_set_par('6');d51109535_fkt_set_active(this);" class="brs_d51109535_li current last" and coming back to page 1 I got a onclick="d96086211_fkt_set_par('6');d96086211_fkt_set_active(this);" class="brs_d96086211_li current last"

The trick is to look at what calls are made when you navigate through the pages. Your browser's network analysis tool is invaluable for this. When I go from page to page, a POST is made to 'http://www.boerse-frankfurt.de/en/parts/boxes/history/_tickdata_full.m with data about the request.
Then the goal is to replicate and loop the requests using python. Here is code to get you started:
import requests
r = requests.post('http://www.boerse-frankfurt.de/en/parts/boxes/history/_tickdata_full.m', data={'component_id':'PREKOP97077bf9dec39f14320bf9d40b636c7c589', 'page':"3", 'page_size':'50', 'boerse_id':'6', 'titel':'Tick-Data', 'lang':'en', 'text':'LOcbaec84ecad1b94ad2fd257897c87361', 'items_per_page':'50', 'template':'0', 'pages_total':'50', 'use_external_secu':'1', 'item_count':'2473', 'include_url':'/parts/boxes/history/_tickdata_full.m', 'ag':'291', 'secu':'291', })
print r.text #here is your data of interest, it still needs to be parsed
That is the general idea. You would then put that in a loop, adding one to the page parameter each time.

Related

Web scraping in R by first navigating through a JavaScript module

I looked up various questions and answers but unfortunately none of the problems I found dealt with a case that is similar to mine. In a typical question, the JavaScript table builds up directly when the website is loaded. In my case, however, I first have to navigate through the JavaScript module and select several criteria before I get the sought-after result.
This is my case: I have to scrape the exchange rates for various currencies from this website www.globocambio.co. To do that, I have (1) to navigate to “I WANT COLOMBIAN PESO”, (2) select the currency (e.g., “Chilean Peso”), (3) and the collection destination (e.g., “El Dorado International Airport”). Only then the respective exchange rate is being loaded. See this screenshot for illustration. I marked the three selection steps red. Green is the data point that I want to scrape for different currencies.
I am not very familiar with JavaScript but I tried to understand what is going on. Here is what I found out:
Using Chrome DevTools, I investigated the Network activity when loading an exchange rate. There is an XHR called “GetPrice” that requests the price using this URL: https://reservations.globocambio.co/DesktopModules/GlobalExchange/API/Widget/GetPrice and using the following Form Data
ISOAOrigen=CLP&cantidadOrigen=9000&ISOADestino=COP&cantidadDestino=0&centerId=27&operationType=OperationTypesBuying
I understand that the Form Data contains the information that I initially selected manually:
operationType=OperationTypesBuying: this is the “I WANT COLOMBIAN PESO” option
ISOAOrigen=CLP: this is the “Chilean Peso”
centerId=27: this is the “El Dorado International Airport”
The server responds to my request with the following information:
{“MonedaOrigen":{"ISOA":"CLP","Nombre":null,"Margen":0.1630000000,"Tramo":0.0,"Fixing":2.9000000000},"CantidadOrigen":9000.00,"MonedaDestino":{"ISOA":"COP","Nombre":null,"Margen":0.0,"Tramo":0.0,"Fixing":0.0},"CantidadDestino":21845.70,"TipoCambio":2.42730000000000000000,"MargenOrigen":0.0,"TramoOrigen":0.0,"FixingOrigen":0.0,"MargenDestino":0.0,"TramoDestino":0.0,"FixingDestino":0.0,"IdCentro":"27","Comision":null,"ComisionTramoSuperior":null,"ComisionAplicada":{"CodigoMoneda":null,"CodigoTipoMoneda":0,"ComisionFija":0.0,"ComisionVariable":0.0,"TramoInicio":0.0,"TramoFin":null,"Orden”:0}}
From this response, "TipoCambio":2.42730000000000000000 is then being written on the website using this line of HTML code: <span id="spTipoCambioCompra">2.427300</span>
This means that "TipoCambio" is the value that I am looking for.
So, I have to communicate somehow via R with the server using the Form Data as input variables. Can anyone tell me how to do this?
I mean, understand that I have to combine the URL https://reservations.globocambio.co/DesktopModules/GlobalExchange/API/Widget/GetPrice with the Form Data “ISOAOrigen=CLP&cantidadOrigen=9000&ISOADestino=COP&cantidadDestino=0&centerId=27&operationType=OperationTypesBuying” somehow but I do not know how it works..
Any help will be appreciated!
Update:
I still have no idea how to solve the above issue, yet. However, I try to approach it with small steps.
Using RSelenium, I am currently trying to find out how to click on the option “I WANT COLOMBIAN PESO”. My idea was to use the following code:
library(RSelenium)
remDr <- RSelenium::remoteDriver(remoteServerAddr = "localhost",
port = 4445L,
browserName = "chrome")
remDr$open()
remDr$navigate("https://www.globocambio.co/en/home")
webElem <- remDr$findElement("id", "tabCompra") #What is wrong here?
webElem$clickElement() # Click on "I WANT COLOMBIAN PESO"
But I get an error message after executing webElem <- remDr$findElement("id", "tabCompra"):
Selenium message:no such element: Unable to locate element: {"method":"css selector","selector":"#tabCompra"}
(Session info: chrome=81.0.4044.113)
For documentation on this error, please visit: https://www.seleniumhq.org/exceptions/no_such_element.html
...
Error: Summary: NoSuchElement
Detail: An element could not be located on the page using the given search parameters.
class: org.openqa.selenium.NoSuchElementException
Further Details: run errorDetails method
What am I doing wrong here?
I solved my problem using selenium in Python:
from selenium import webdriver
driver = webdriver.Firefox(executable_path = '/your_path/geckodriver')
driver.get("https://www.globocambio.co/en/")
driver.switch_to.frame("iframeWidget");
elem = driver.find_element_by_id('tabCompra')
elem.click()
elem = driver.find_element_by_id('inputddlMonedaOrigenCompra')
elem.click()
elem.send_keys(Keys.CLEAR)
elem.send_keys("Chilean Peso")
elem.send_keys(Keys.ENTER)
elem.send_keys(Keys.ARROW_DOWN)
elem.send_keys(Keys.RETURN)
elem = driver.find_element_by_id('info-change-compra')
print(elem.text)

Using Wikia API

I am trying to access the X-men API on wikia, to try and extract the name and image of each character, to then be used on a SPA using javascript.
This is the link too the page on the wiki:
http://x-men.wikia.com/wiki/Category:Characters
I cannot for the life of me figure out how to access the API. It doesn't seem to be RESFTful, and that's all I have any experience in.
Has anyone used the Wikia API successfully before? I can get some articles and such, but nothing useful.
(The documentation is shocking, been searching around for hours.)
Probably you have already found a solution, but I think you should write something like this:
import requests
xmen_url = "http://x-men.wikia.com/api/v1/Articles/List?expand=1&category=Characters&limit=10000"
r = requests.get(xmen_url)
response = r.json()
# print response
a = 0
for item in response['items']:
a += 1
print("{}\t{}\t({})".format(str(a),item['title'].encode(encoding='utf-8'),item['id']))
This will print a list of all the articles of the category Characters (I think there also some subcategories, you should check). If you want to take a deeper look at the json file you can uncomment the commented code.
Hope it helps.

Adding MySQL requests to JS Code from JSFiddle

First and foremost, I want to say how amazing this community is. I've been reading and using this place for a bit now to get answers to a plethora of questions.
I'm currently working on building a student list (never really built a system before) system for our company using Bootstrap 3. I've got the meat of it worked out and have found this awesome JSFiddle by user Mils (many thanks) that does what I need it to in terms of adjusting data dynamically, which would be ideal for what we want.
http://jsfiddle.net/NfPcH/645/
My question is: how can I alter this so that it pulls data from a MySQL database I've created, and how do I alter it so that when adding/editing a row, it writes it to the db? I have a students.php page I created that pulls in the information as such:
// Prepare SQL Query
$STM = $db->prepare("SELECT `student_firstname`, `student_lastname`, `student_class`, `year` FROM students ORDER BY student_firstname");
// For executing prepared statement
$STM->execute();
// Fetch records
$STMrecords = $STM->fetchAll();
foreach($STMrecords as $row)
{
echo"<tr>";
echo"<td><a href='#' id='student-firstname' data-type='text' data-pk=".$row['student_firstname']."</td>";
echo"<td>".$row['student_lastname']."</td>";
echo"<td>".$row['student_class']."</td>";
echo"<td>".$row['year']."</td>";
echo"</tr>";
}
But this doesn't go hand-in-hand with the aforementioned JSFiddle, as it only posts the data on the page.
Thanks, everyone!
You need to redirect the get request in the JSFiddle to you php script. And the php script needs to return something on the expected format.
At the end of the fiddle there is a couple of mocks, there you can see how the output from the php script should be formatted.

Increment the number of times an article has been read

I have a situation where I need to increase the number of time article has been read.
Once someone opens an article it should be reflected in the database by incrementing number of reads by one. Simple.
Sending POST request to the server increments the number of reads by one. The article in question is supplied via URL parameter.
Doing it manually by typing the URL in a browser works as expected. So server side is not at fault.
My problems start with the javascript side of it or rather jquery. I hook the event to the article link. So every time a user clicks on the article link it increments the number of reads like so:
$('#list-articles .article-link').click(function(e){
var oid = $(this).parent().parent().attr('data-oid').toString(); //Get the article id
$.post( "/articles/viewed/" + oid );
});
Now this does not work! Number is not increased.
I don't prevent default action since I need the link to actually open and display the article.
Now if I put an alert right after the post like this:
$('#list-articles .article-link').click(function(e){
var oid = $(this).parent().parent().attr('data-oid').toString(); //Get the article id
$.post( "/articles/viewed/" + oid );
alert(oid);
});
This variant works. After I dismiss the alert window, the number is incremented. Why is this so?? How can I fix this to actually work without the alert event present?
UPDATE
Thank you for helping to solve this. All answers are great and help one way or another. The only variant that works so far is disabling async on ajax call. It would be great if someone could elaborate on why switching the async mode off in ajax fixed it. So the post request in the original was never executed? If I was simply checking too early and the number increase was not visible upon page load, it should be still visible on the next page reload, right? SInce it wasn't updated on the database at all I assume that post was not run at all. Why is this so? I want to understand the issue so I do't get into this problem again. Thanks.
Your problem could be due to $.post being asynchronous and you checking this too soon and try posting synchronously:
$.ajax({
type: 'POST',
url: "/articles/viewed/" + oid,
async:false
});
Prevent default. Wait for the response from the server to say incrementing article reads by 1 was successful, then redirect to the article.
If it works with the alert in place it sounds like a race condition.

How to update/modify webpage content with Javascript before page load completed?

I'm trying to display a progress bar during mass mailing process. I use classic ASP, disabled content compression too. I simply update the size of an element which one mimics as progress bar and a text element as percent value.
However during the page load it seems Javascript ignored. I only see the hourglass for a long time then the progress bar with %100. If I make alerts between updates Chrome & IE9 refresh the modified values as what I expect.
Is there any other Javascript command to replace alert() to help updating the actual values? alert() command magically lets browser render the content immediately.
Thanks!
... Loop for ASP mail send code
If percent <> current Then
current = percent
%>
<script type="text/javascript">
//alert(<%=percent%>);
document.getElementById('remain').innerText='%<%=percent%>';
document.getElementById('progress').style.width='<%=percent%>%';
document.getElementById('success').innerText='<%=success%>';
</script>
<%
End If
... Loop end
These are the screenshots if I use alert() in the code: As you see it works but the user should click OK many times.
First step is writing the current progress into a Session variable when it changes:
Session("percent") = percent
Second step is building a simple mechanism that will output that value to browser when requested:
If Request("getpercent")="1" Then
Response.Clear()
Response.Write(Session("percent"))
Response.End()
End If
And finally you need to read the percentage with JavaScript using timer. This is best done with jQuery as pure JavaScript AJAX is a big headache. After you add reference to the jQuery library, have such code:
var timer = window.setTimeout(CheckPercentage, 100);
function CheckPercentage() {
$.get("?getpercent=1", function(data) {
timer = window.setTimeout(CheckPercentage, 100);
var percentage = parseInt(data, 10);
if (isNaN(percentage)) {
$("#remain").text("Invalid response: " + data);
}
else {
$("#remain").text(percentage + "%");
if (percentage >= 100) {
//done!
window.clearTimeout(timer);
}
}
});
}
Holding respond untill your complete processing is done is not a viable option, just imagine 30 people accessing the same page, you will have 30 persistent connections to the server for a long time, especially with IIS, i am sure its not a viable option, it might work well in your development environment but when you move production and more people start accessing page your server might go down.
i wish you look into the following
Do the processing on the background on the server and do not hold the response for a long time
Try to write a windows service which resides on the server and takes care of your mass mailing
if you still insist you do it on the web, try sending one email at a time using ajax, for every ajax request send an email/two
and in your above example without response.flush the browser will also not get the % information.
Well, you don't.
Except for simple effects like printing dots or a sequence of images it won't work safely, and even then buffering could interfere.
My approach would be to have an area which you update using an ajax request every second to a script which reads a log file or emails sent count file or such an entry in the database which is created by the mass mailing process. The mass mailing process would be initiated by ajax as well.
ASP will not write anything to the page until it's fully done processing (unless you do a flush)
Response.Buffer=true
write something
response.flush
write something else
etc
(see example here: http://www.w3schools.com/asp/met_flush.asp)
A better way to do this is to use ajax.
Example here:
http://jquery-howto.blogspot.com/2009/04/display-loading-gif-image-while-loading.html
I didn't like ajax at first, but I love it now.

Categories