I want to scrape historical results of South African LOTTO draws (especially Total Pool Size, Total Sales, etc.) from the South African National Lottery website. By default one sees links to results for the last ten draws, or one can select a date range to pull up a larger set of links to draws (which will still display only ten per page).
Hovering in the browser over a link e.g. 'LOTTO DRAW 2012' we see javascript:void(); so it is clear that the draw results will be rendered using Javascript. Reading advice on an R Web Scraping Cheat Sheet, I realized that I needed to open Google Chrome Developer tools, then open Network tab, and then click the link to the draw 'LOTTO DRAW 2012'. When I did so, I could see that this url is being called with an initiator
When I right-click on the initiator and select 'Copy Response', I can see the data I need inside a 'drawDetails' object in what appears to be JSON code.
{"code":200,"message":"OK","data":{"drawDetails":{"drawNumber":"2012","drawDate":"2020\/04\/11","nextDrawDate":"2020\/04\/15","ball1":"48","ball2":"6","ball3":"43","ball4":"41","ball5":"25","ball6":"45","bonusBall":"38","div1Winners":"1","div1Payout":"10546013.8","div2Winners":"0","div2Payout":"0","div3Winners":"28","div3Payout":"7676.4","div4Winners":"62","div4Payout":"2751.4","div5Winners":"1389","div5Payout":"206.3","div6Winners":"1872","div6Payout":"133","div7Winners":"28003","div7Payout":"50","div8Winners":"20651","div8Payout":"20","rolloverAmount":"0","rolloverNumber":"0","totalPrizePool":"13280236.5","totalSales":"11610950","estimatedJackpot":"2000000","guaranteedJackpot":"0","drawMachine":"RNG2","ballSet":"RNG","status":"published","winners":52006,"millionairs":1,"gpwinners":"52006","wcwinners":"0","ncwinners":"0","ecwinners":"0","mpwinners":"0","lpwinners":"0","fswinners":"0","kznwinners":"0","nwwinners":"0"},"totalWinnerRecord":{"lottoMillionairs":28716702,"lottoWinners":337285646,"ithubaMillionairs":135763,"ithubaWinners":305615802}},"videoData":[{"id":"1049","listid":"1","parentid":"1","videosource":"youtube","videoid":"chHfFxVi9QI","imageurl":"","title":"LOTTO, LOTTO PLUS 1 AND LOTTO PLUS 2 DRAW 2012 (11 APRIL 2020)","description":"","custom_imageurl":"","custom_title":"","custom_description":"","specialparams":"","lastupdate":"0000-00-00 00:00:00","allowupdates":"1","status":"0","isvideo":"1","link":"https:\/\/www.youtube.com\/watch?v=chHfFxVi9QI","ordering":"10001","publisheddate":"2020-04-11 20:06:17","duration":"182","rating_average":"0","rating_max":"0","rating_min":"0","rating_numRaters":"0","statistics_favoriteCount":"0","statistics_viewCount":"329","keywords":"","startsecond":"0","endsecond":"0","likes":"6","dislikes":"0","commentcount":"0","channel_username":"","channel_title":"","channel_subscribers":"9880","channel_subscribed":"0","channel_location":"","channel_commentcount":"0","channel_viewcount":"0","channel_videocount":"1061","channel_description":"","channel_totaluploadviews":"0","alias":"lotto-lotto-plus-1-and-lotto-plus-2-draw-2012-11-april-2020","rawdata":"","datalink":"https:\/\/www.googleapis.com\/youtube\/v3\/videos?id=chHfFxVi9QI&part=id,snippet,contentDetails,statistics&key=AIzaSyC1Xvk2GUdb_N3UiFtjsgZ-uMviJ_8MFZI"}]}
It is a POST type request, and so I tried to follow this answer, but cannot find onclick values indicating the data submitted with the form. Moreover, the request URL for 'LOTTO DRAW 2012' is identical to that for 'LOTTO DRAW 2011', so there is no unique identifier for the particular draw being passed with the URL itself. Thus it is not clear to me how the unique request for the results of a particular draw is made.
Hence, the smaller question is, given a particular LOTTO draw number or draw date, how does one find out the unique identifier that is used to make the POST request for the data pertaining to that draw specifically?
The larger question is, if one is able to obtain such unique identifiers for all the historical draws, how can one generate the JSON drawDetails object for all the historical draws in turn, or otherwise complete the scraping operation?
You are right - the contents on the page are updated by javascript via an ajax request. The server returns a json string in response to an http POST request. With POST requests, the server's response is determined not only by the url you request, but by the body of the message you send to the server. In this case, your body is a simple form with 3 fields: gameName, which is always LOTTO, isAjax which is always true, and drawNumber, which is the field you want to vary.
If you are using httr, you specify these fields as a named list in the body parameter of the POST function.
Once you have the response for each draw, you will want to parse the json into an R-friendly format such as a list or data frame using a library such as jsonlite. From looking at the structure of this particular json, it makes most sense to extract the component $data$drawDetailsand make that a one-row dataframe. This will allow you to bind several draws together into a single data frame.
Here is a function that does all that for you:
lotto_details <- function(draw_numbers)
{
do.call("rbind", lapply(draw_numbers, function(x)
{
res <- httr::POST(paste0("https://www.nationallottery.co.za/index.php",
"?task=results.redirectPageURL&",
"Itemid=265&option=com_weaver&",
"controller=lotto-history"),
body = list(gameName = "LOTTO", drawNumber = x, isAjax = "true"))
as.data.frame(jsonlite::fromJSON(httr::content(res, "text"))$data$drawDetails)
}))
}
Which you use like this:
lotto_details(2009:2012)
#> drawNumber drawDate nextDrawDate ball1 ball2 ball3 ball4 ball5 ball6
#> 1 2009 2020/04/01 2020/04/04 51 15 7 32 42 45
#> 2 2010 2020/04/04 2020/04/08 43 4 21 24 10 3
#> 3 2011 2020/04/08 2020/04/11 42 43 8 18 2 29
#> 4 2012 2020/04/11 2020/04/15 48 6 43 41 25 45
#> bonusBall div1Winners div1Payout div2Winners div2Payout div3Winners
#> 1 1 0 0 0 0 21
#> 2 22 0 0 0 0 31
#> 3 34 0 0 0 0 21
#> 4 38 1 10546013.8 0 0 28
#> div3Payout div4Winners div4Payout div5Winners div5Payout div6Winners
#> 1 8455.3 60 2348.7 1252 189 1786
#> 2 6004.3 71 2080.6 1808 137.3 2352
#> 3 8584.5 60 2384.6 1405 171.1 2079
#> 4 7676.4 62 2751.4 1389 206.3 1872
#> div6Payout div7Winners div7Payout div8Winners div8Payout rolloverAmount
#> 1 115.2 24664 50 19711 20 3809758.17
#> 2 91.7 35790 50 25981 20 5966533.86
#> 3 100.5 27674 50 21895 20 8055430.87
#> 4 133 28003 50 20651 20 0
#> rolloverNumber totalPrizePool totalSales estimatedJackpot
#> 1 2 6198036.67 9879655 6000000
#> 2 3 9073426.56 11696905 8000000
#> 3 4 10649716.37 10406895 10000000
#> 4 0 13280236.5 11610950 2000000
#> guaranteedJackpot drawMachine ballSet status winners millionairs
#> 1 0 RNG2 RNG published 47494 0
#> 2 0 RNG2 RNG published 66033 0
#> 3 0 RNG2 RNG published 53134 0
#> 4 0 RNG2 RNG published 52006 1
#> gpwinners wcwinners ncwinners ecwinners mpwinners lpwinners fswinners
#> 1 47494 0 0 0 0 0 0
#> 2 66033 0 0 0 0 0 0
#> 3 53134 0 0 0 0 0 0
#> 4 52006 0 0 0 0 0 0
#> kznwinners nwwinners
#> 1 0 0
#> 2 0 0
#> 3 0 0
#> 4 0 0
Created on 2020-04-13 by the reprex package (v0.3.0)
The question already has a satisfactory answer (see above) that I've accepted. I simultaneously arrived at a nearly identical solution; I add it here only because it explicitly covers the full range of available draw numbers and will automatically detect the most recent draw number so that the code can be run 'as is' in the future, provided the National Lottery website design remains the same.
theurl <- "https://www.nationallottery.co.za/index.php?task=results.redirectPageURL&Itemid=265&option=com_weaver&controller=lotto-history"
x <- rvest::html_text(xml2::read_html(theurl))
preceding_string <- "LOTTO, LOTTO PLUS 1 AND LOTTO PLUS 2 DRAW "
drawnums <- as.integer(vapply(gregexpr(preceding_string, x)[[1]] + nchar(preceding_string),
function(k) substr(x, start = k, stop = k + 3), NA_character_))
drawnumrange <- 1506:max(drawnums)
response <- lapply(drawnumrange, function(d) httr::POST(url = theurl,
body = list(gameName = "LOTTO", drawNumber = as.character(d), isAjax =
"true"), encode = "form"))
jsondat <- lapply(response, function(r) jsonlite::parse_json(r)$data$drawDetails)
lottotable <- as.data.frame(do.call(rbind, jsondat))
numericcols <- c(1, 4:32, 36:37)
lottotable[numericcols] <- sapply(lottotable[numericcols], as.numeric)
xlsx::write.xlsx2(lottotable[1:37], "lottotable.xlsx", row.names = FALSE)
I've been searching and haven't found any way to do this in JavaScript or if there is a better way.
My form has a text area field that specific string is entered such as:
1/5/8 18 31.2 0 1847550953 13013135 5598945 3.00e-01
1/5/9 18 34.2 0 1748942583 6401826 5598945 3.00e-01
1/5/10 18 34.6 0 1847550953 13013135 5598945 3.00e-01
1/5/11 18 34.4 0 1847550953 13013135 5598945 3.00e-01
The data comes in this format but the numbers may be different. What I'm trying to do is have a script that grabs what is in the 3rd column so in this example the 31.2, 34.2, 34.6 and 34.4 then takes those numbers, gives me their average by adding them up and dividing by 4 and then displaying the result in a different textarea box.
I'm also wondering if it can be done in a single script or does it need two scripts. One to parse then the other to calculate and display the average in a textarea.
split the lines and map to match the third column. Then you can find the average and put the result in another textarea:
const input = `1/5/8 18 31.2 0 1847550953 13013135 5598945 3.00e-01
1/5/9 18 34.2 0 1748942583 6401826 5598945 3.00e-01
1/5/10 18 34.6 0 1847550953 13013135 5598945 3.00e-01
1/5/11 18 34.4 0 1847550953 13013135 5598945 3.00e-01`;
const thirdRowMatches = input.split('\n')
.map(line => line.split(/ +/)[2])
const avg = thirdRowMatches.reduce((a, str) => a + Number(str), 0) / thirdRowMatches.length;
document.querySelector('#textarea2').value = avg;
<textarea id="textarea2"></textarea>
I was playing around with Web Audio API and maybe found a bug in the AnalyserNode. Let's say I have two sine oscillators playing at different frequencies, 200 Hz and 8000 Hz respectively. Using two different AnalyserNode(s) I extract the non-zero frequency data from the two oscillators, which are the following (from chrome console):
OSC1 (200 Hz)
Bin 0 value 1
Bin 1 value 3
Bin 2 value 9
Bin 3 value 18
Bin 4 value 30
Bin 5 value 43
Bin 6 value 36
Bin 7 value 159
Bin 8 value 236
Bin 9 value 255
Bin 10 value 255
Bin 11 value 212
Bin 12 value 86
Bin 13 value 46
Bin 14 value 36
Bin 15 value 21
Bin 16 value 8
OSC2 (8000 Hz)
Bin 364 value 6
Bin 365 value 18
Bin 366 value 32
Bin 367 value 46
Bin 368 value 52
Bin 369 value 126
Bin 370 value 224
Bin 371 value 255
Bin 372 value 255
Bin 373 value 226
Bin 374 value 132
Bin 375 value 51
Bin 376 value 47
Bin 377 value 33
Bin 378 value 19
Bin 379 value 7
Now if I change the frequency value of the first oscillator to 8000 Hz (the same of the second oscillator) and extract again the non-zero frequency data I expect to obtain non zero values approximately in the same Bins of the second oscillator (say in the 300-400 range), but strangely there are non zero values also in the Bins in range 0-50 (as when we extracted frequency data using a 200 Hz frequency).
OSC1 (8000 Hz)
Bin 2 value 2
Bin 3 value 11
Bin 4 value 23
Bin 5 value 36
Bin 6 value 29
Bin 7 value 152
Bin 8 value 229
Bin 9 value 255
Bin 10 value 248
Bin 11 value 205
Bin 12 value 79
Bin 13 value 38
Bin 14 value 29
Bin 15 value 14
Bin 16 value 1
Bin 364 value 7
Bin 365 value 19
Bin 366 value 33
Bin 367 value 47
Bin 368 value 50
Bin 369 value 137
Bin 370 value 228
Bin 371 value 255
Bin 372 value 255
Bin 373 value 222
Bin 374 value 121
Bin 375 value 52
Bin 376 value 45
Bin 377 value 31
Bin 378 value 18
Bin 379 value 5
Is this the expected behavior or a bug? It seems not correct to me. I am also not sure if this propagates also when analyzing a standard audio file using for example a requestAnimationFrame loop.
Below the code of the full example.
NB: to extract the frequency data is required to wait a bit before the analyser has finished the Fast Fourier Transform algorithm and the frequency data is available, thus I've used 2 timeOut functions, one for the first extraction of frequency data from osc1 and osc2 and the second to extract again frequency data from osc1 after the oscillator frequency has changed to 8000 Hz).
var AudioContext = window.AudioContext || window.webkitAudioContext;
var ctx = new AudioContext();
// first oscillator (200 Hz)
var osc1 = ctx.createOscillator();
osc1.frequency.value = 200;
var analyser1 = ctx.createAnalyser();
var gain1 = ctx.createGain();
gain1.gain.value = 0;
osc1.connect(analyser1);
analyser1.connect(gain1);
gain1.connect(ctx.destination);
// second oscillator (8000 Hz)
var osc2 = ctx.createOscillator();
osc2.frequency.value = 8000;
var analyser2 = ctx.createAnalyser();
var gain2 = ctx.createGain();
gain2.gain.value = 0;
osc2.connect(analyser2);
analyser2.connect(gain2);
gain2.connect(ctx.destination);
// start oscillators
osc1.start();
osc2.start();
// get frequency data
var freqData1 = new Uint8Array(analyser1.frequencyBinCount);
var freqData2 = new Uint8Array(analyser2.frequencyBinCount);
setTimeout(function() {
analyser1.getByteFrequencyData(freqData1);
analyser2.getByteFrequencyData(freqData2);
console.log("OSC1 (200 Hz)");
printNonZeroFreqData(freqData1);
console.log("OSC2 (8000 Hz)");
printNonZeroFreqData(freqData2);
// change frequency of osc1 to 8000 Hz
osc1.frequency.value = 8000;
// wait a bit, then extract again frequency data from osc1
setTimeout(function() {
freqData1 = new Uint8Array(analyser1.frequencyBinCount);
analyser1.getByteFrequencyData(freqData1);
console.log("OSC1 (8000 Hz)");
printNonZeroFreqData(freqData1);
}, 500);
}, 500);
// print non zero frequency values
function printNonZeroFreqData(arr) {
for (var i = 0; i < arr.length; ++i) {
if (arr[i] != 0) {
console.log("Bin " + i, "\tvalue " + arr[i]);
}
}
console.log("");
}
This is expected. According to the spec, successive calls to extract the frequency data combines the data from the current call with a history of the data from previous calls. If we want to see the frequency data only from the current time, set smoothingTimeConstant to 0.
smoothingTimeConstant on Mozilla Developer Network
I am building out an Angular2 Slider component and the current setup is that the value of the slider is a percentage (based off where the slider handle is from 0% - 100%). I have an array of n items and want the slider to grab the appropriate index from the array based off the percentage (where the handle is at).
Here is my current drag event (fired when user is dragging slider handle):
handleDrag(evt, ui) {
let maxWidth = $('#slideBar').width() - 15;
let position = $('#slideHandle').css('left');
position = position.replace('px', '');
let percent = (+position / +maxWidth) * 100;
this.year = percent;
}
The percentage is working correctly but am wondering how I should structure the algorithem to fetch the array index by percentage. So, if i'm at 50%, I want to fetch the array index 73 if the array length is 146.
Is there an easier way of doing this with JavaScript? I have done a similar component where I did a table approach but would like to figure out a way to do this without adding 'helper html elements' to the page.
Your approach sounds fine, so getting the index would could be achieved by using the .length property of your array as follows:
var actualndex = Math.floor((array.length-1) * percentage);
// Where percentage is a value between 0 and 1
This should return an index between 0 and array.length-1 depending on the percentage value.
TL;DR
This answer didn't work for me. I did some experimenting and found that Math.round is better than Math.floor in this situation, but using Math.floor with some additional logic is even better still. For best results, try:
var index = Math.min(Math.floor(array.length * percentage), (array.length-1));
Intro
I know this question seems to be answered, but I'm posting another answer because using the selected answer gave me unexpected results, and I want to save others from running into the same problem.
First of all, I want to clarify that if you only need an approximation, not a precise result, the selected answer will mostly work, especially for a larger array.
However, for smaller arrays, it frequently does not return the expected result. For example, with an array size of 2, it always returns the first element (index 0) unless the percentage is exactly 100%. Even 99% will return the 1st element (Math.floor((2-1) * 0.99) == 0). In fact, 100% is the ONLY value that will yield the last index (no matter the size of the array).
Expected Results
By "expected" results, I mean, that if an array has N elements, each element is represented by 1/Nth of the values from 0.0 to 1.0. An array of 5 elements would be break down like this:
Given Percentage
Expected Index
[0.00%, 20.00%)
0
[20.00%, 40.00%)
1
[40.00%, 60.00%)
2
[60.00%, 80.00%)
3
[80.00%, 100.00%]
4
Algorithms
I looked at 3 different algorithms with various array sizes:
var indexFloor = Math.floor((array.length-1) * percentage);
var indexRound = Math.round((array.length-1) * percentage);
var indexCustom = Math.min(Math.floor(array.length * percentage), (array.length-1));
Results
All 3 algorithms tend to the same results as the array gets larger, but the 3rd one always gives the results that I would expect.
Take a look at the results from arrays of sizes 2 and 5 (incorrect results in BOLD) (Prettier version):
Percentage
Floor(Size 2)
Round(Size 2)
Custom(Size 2)
Floor(Size 5)
Round(Size 5)
Custom(Size 5)
0.00%
0
0
0
0
0
0
5.00%
0
0
0
0
0
0
10.00%
0
0
0
0
0
0
15.00%
0
0
0
0
1
0
20.00%
0
0
0
0
1
1
25.00%
0
0
0
1
1
1
30.00%
0
0
0
1
1
1
35.00%
0
0
0
1
1
1
40.00%
0
0
0
1
2
2
45.00%
0
0
0
1
2
2
50.00%
0
1
1
2
2
2
55.00%
0
1
1
2
2
2
60.00%
0
1
1
2
2
3
65.00%
0
1
1
2
3
3
70.00%
0
1
1
2
3
3
75.00%
0
1
1
3
3
3
80.00%
0
1
1
3
3
4
85.00%
0
1
1
3
3
4
90.00%
0
1
1
3
4
4
95.00%
0
1
1
3
4
4
100.00%
1
1
1
4
4
4
For size 2, both Round and the custom algorithm produce the expected results. However, for size 5, only the custom algorithm does. Specifically, the Round method has 85% selecting the 3rd element (I would expect 80% and over to be the last element) and 15% selecting 2nd element (I would expect that to require 20%).
We see similar results with slightly larger table sizes (Incorrect Results in bold) (Prettier Version):
Percentage
Floor(Size 8)
Round(Size 8)
Custom(Size 8)
Floor(Size 10)
Round(Size 10)
Custom(Size 10)
0.00%
0
0
0
0
0
0
5.00%
0
0
0
0
0
0
10.00%
0
1
0
0
1
1
15.00%
1
1
1
1
1
1
20.00%
1
1
1
1
2
2
25.00%
1
2
2
2
2
2
30.00%
2
2
2
2
3
3
35.00%
2
2
2
3
3
3
40.00%
2
3
3
3
4
4
45.00%
3
3
3
4
4
4
50.00%
3
4
4
4
5
5
55.00%
3
4
4
4
5
5
60.00%
4
4
4
5
5
6
65.00%
4
5
5
5
6
6
70.00%
4
5
5
6
6
7
75.00%
5
5
6
6
7
7
80.00%
5
6
6
7
7
8
85.00%
5
6
6
7
8
8
90.00%
6
6
7
8
8
9
95.00%
6
7
7
8
9
9
100.00%
7
7
7
9
9
9
The Round and Custom methods are very similar (especially for the size 10 array). They are, in fact, identical up until 60%, which the Round method yields index 5, but the custom method correctly classifies that as index 6. From then on, the Round algorithm is off by one.
Conclusion
The original algorithm's use of Math.floor was correct, but the method used to ensure the result was a valid array index skewed the results because there are there are N elements, not N-1 elements. But when we use N, a percentage value of 100% yields a value that is outside of the valid range of 0 to N-1.
However, by definition, 100% should return the last index of the array, and this can be enforced with a simple call to Math.min. The result is this algorithm for determining an array index using a percentage:
var index = Math.min(Math.floor(array.length * percentage), (array.length-1));
Backup
Complete table of all the tested array sizes and percentages
Excel document I made for testing
I do my calculations like this:
117^196
I get:
177
Now what I want to do is to get 117 back so I need to make a replace
(replace)^196 = 117
Whats the opposite operation from the xor operator?
The opposite of xor is xor :). If you xor something twice (a^b)^b == a.
This is relatively easy to show. For each bit:
1 ^ 1 = 0
1 ^ 0 = 1
0 ^ 1 = 1
0 ^ 0 = 0
Doing this on any pair of numbers a,b, it's easy to see that
a^b xor'd by either a or b yields the other (xor a yields b, and vice versa)
1 2 filter result
0^0^0 = 0
0^1^0 = 1
0^1^1 = 0
1^0^0 = 1
1^0^1 = 0
1^1^1 = 1
it's just xor it self.
like +'s opposite is -
xor's opposite is xor
Just use the result that you got: 177
117 ^ 196 = 177 | () ^ 196
117 ^ 196 ^ 196 = 177 ^ 196 | self-inverse
117 ^ 0 = 177 ^ 196 | neutral element
117 = 177 ^ 196
XOR has three important properties. It is
associative
commutative
self-inverse
This means that a value is its own inverse:
a^a = 0
Since it is also both commutative and associative, you can rearrange and xor-expression containing an event amount of the same operands like this:
a^O^b^c^O^d = O^O^a^b^c^d = 0^a^b^c^d = a^b^c^d
You could say that operands that appear an even amount of time "cancel each other out".