I want to scrape historical results of South African LOTTO draws (especially Total Pool Size, Total Sales, etc.) from the South African National Lottery website. By default one sees links to results for the last ten draws, or one can select a date range to pull up a larger set of links to draws (which will still display only ten per page).
Hovering in the browser over a link e.g. 'LOTTO DRAW 2012' we see javascript:void(); so it is clear that the draw results will be rendered using Javascript. Reading advice on an R Web Scraping Cheat Sheet, I realized that I needed to open Google Chrome Developer tools, then open Network tab, and then click the link to the draw 'LOTTO DRAW 2012'. When I did so, I could see that this url is being called with an initiator
When I right-click on the initiator and select 'Copy Response', I can see the data I need inside a 'drawDetails' object in what appears to be JSON code.
{"code":200,"message":"OK","data":{"drawDetails":{"drawNumber":"2012","drawDate":"2020\/04\/11","nextDrawDate":"2020\/04\/15","ball1":"48","ball2":"6","ball3":"43","ball4":"41","ball5":"25","ball6":"45","bonusBall":"38","div1Winners":"1","div1Payout":"10546013.8","div2Winners":"0","div2Payout":"0","div3Winners":"28","div3Payout":"7676.4","div4Winners":"62","div4Payout":"2751.4","div5Winners":"1389","div5Payout":"206.3","div6Winners":"1872","div6Payout":"133","div7Winners":"28003","div7Payout":"50","div8Winners":"20651","div8Payout":"20","rolloverAmount":"0","rolloverNumber":"0","totalPrizePool":"13280236.5","totalSales":"11610950","estimatedJackpot":"2000000","guaranteedJackpot":"0","drawMachine":"RNG2","ballSet":"RNG","status":"published","winners":52006,"millionairs":1,"gpwinners":"52006","wcwinners":"0","ncwinners":"0","ecwinners":"0","mpwinners":"0","lpwinners":"0","fswinners":"0","kznwinners":"0","nwwinners":"0"},"totalWinnerRecord":{"lottoMillionairs":28716702,"lottoWinners":337285646,"ithubaMillionairs":135763,"ithubaWinners":305615802}},"videoData":[{"id":"1049","listid":"1","parentid":"1","videosource":"youtube","videoid":"chHfFxVi9QI","imageurl":"","title":"LOTTO, LOTTO PLUS 1 AND LOTTO PLUS 2 DRAW 2012 (11 APRIL 2020)","description":"","custom_imageurl":"","custom_title":"","custom_description":"","specialparams":"","lastupdate":"0000-00-00 00:00:00","allowupdates":"1","status":"0","isvideo":"1","link":"https:\/\/www.youtube.com\/watch?v=chHfFxVi9QI","ordering":"10001","publisheddate":"2020-04-11 20:06:17","duration":"182","rating_average":"0","rating_max":"0","rating_min":"0","rating_numRaters":"0","statistics_favoriteCount":"0","statistics_viewCount":"329","keywords":"","startsecond":"0","endsecond":"0","likes":"6","dislikes":"0","commentcount":"0","channel_username":"","channel_title":"","channel_subscribers":"9880","channel_subscribed":"0","channel_location":"","channel_commentcount":"0","channel_viewcount":"0","channel_videocount":"1061","channel_description":"","channel_totaluploadviews":"0","alias":"lotto-lotto-plus-1-and-lotto-plus-2-draw-2012-11-april-2020","rawdata":"","datalink":"https:\/\/www.googleapis.com\/youtube\/v3\/videos?id=chHfFxVi9QI&part=id,snippet,contentDetails,statistics&key=AIzaSyC1Xvk2GUdb_N3UiFtjsgZ-uMviJ_8MFZI"}]}
It is a POST type request, and so I tried to follow this answer, but cannot find onclick values indicating the data submitted with the form. Moreover, the request URL for 'LOTTO DRAW 2012' is identical to that for 'LOTTO DRAW 2011', so there is no unique identifier for the particular draw being passed with the URL itself. Thus it is not clear to me how the unique request for the results of a particular draw is made.
Hence, the smaller question is, given a particular LOTTO draw number or draw date, how does one find out the unique identifier that is used to make the POST request for the data pertaining to that draw specifically?
The larger question is, if one is able to obtain such unique identifiers for all the historical draws, how can one generate the JSON drawDetails object for all the historical draws in turn, or otherwise complete the scraping operation?
You are right - the contents on the page are updated by javascript via an ajax request. The server returns a json string in response to an http POST request. With POST requests, the server's response is determined not only by the url you request, but by the body of the message you send to the server. In this case, your body is a simple form with 3 fields: gameName, which is always LOTTO, isAjax which is always true, and drawNumber, which is the field you want to vary.
If you are using httr, you specify these fields as a named list in the body parameter of the POST function.
Once you have the response for each draw, you will want to parse the json into an R-friendly format such as a list or data frame using a library such as jsonlite. From looking at the structure of this particular json, it makes most sense to extract the component $data$drawDetailsand make that a one-row dataframe. This will allow you to bind several draws together into a single data frame.
Here is a function that does all that for you:
lotto_details <- function(draw_numbers)
{
do.call("rbind", lapply(draw_numbers, function(x)
{
res <- httr::POST(paste0("https://www.nationallottery.co.za/index.php",
"?task=results.redirectPageURL&",
"Itemid=265&option=com_weaver&",
"controller=lotto-history"),
body = list(gameName = "LOTTO", drawNumber = x, isAjax = "true"))
as.data.frame(jsonlite::fromJSON(httr::content(res, "text"))$data$drawDetails)
}))
}
Which you use like this:
lotto_details(2009:2012)
#> drawNumber drawDate nextDrawDate ball1 ball2 ball3 ball4 ball5 ball6
#> 1 2009 2020/04/01 2020/04/04 51 15 7 32 42 45
#> 2 2010 2020/04/04 2020/04/08 43 4 21 24 10 3
#> 3 2011 2020/04/08 2020/04/11 42 43 8 18 2 29
#> 4 2012 2020/04/11 2020/04/15 48 6 43 41 25 45
#> bonusBall div1Winners div1Payout div2Winners div2Payout div3Winners
#> 1 1 0 0 0 0 21
#> 2 22 0 0 0 0 31
#> 3 34 0 0 0 0 21
#> 4 38 1 10546013.8 0 0 28
#> div3Payout div4Winners div4Payout div5Winners div5Payout div6Winners
#> 1 8455.3 60 2348.7 1252 189 1786
#> 2 6004.3 71 2080.6 1808 137.3 2352
#> 3 8584.5 60 2384.6 1405 171.1 2079
#> 4 7676.4 62 2751.4 1389 206.3 1872
#> div6Payout div7Winners div7Payout div8Winners div8Payout rolloverAmount
#> 1 115.2 24664 50 19711 20 3809758.17
#> 2 91.7 35790 50 25981 20 5966533.86
#> 3 100.5 27674 50 21895 20 8055430.87
#> 4 133 28003 50 20651 20 0
#> rolloverNumber totalPrizePool totalSales estimatedJackpot
#> 1 2 6198036.67 9879655 6000000
#> 2 3 9073426.56 11696905 8000000
#> 3 4 10649716.37 10406895 10000000
#> 4 0 13280236.5 11610950 2000000
#> guaranteedJackpot drawMachine ballSet status winners millionairs
#> 1 0 RNG2 RNG published 47494 0
#> 2 0 RNG2 RNG published 66033 0
#> 3 0 RNG2 RNG published 53134 0
#> 4 0 RNG2 RNG published 52006 1
#> gpwinners wcwinners ncwinners ecwinners mpwinners lpwinners fswinners
#> 1 47494 0 0 0 0 0 0
#> 2 66033 0 0 0 0 0 0
#> 3 53134 0 0 0 0 0 0
#> 4 52006 0 0 0 0 0 0
#> kznwinners nwwinners
#> 1 0 0
#> 2 0 0
#> 3 0 0
#> 4 0 0
Created on 2020-04-13 by the reprex package (v0.3.0)
The question already has a satisfactory answer (see above) that I've accepted. I simultaneously arrived at a nearly identical solution; I add it here only because it explicitly covers the full range of available draw numbers and will automatically detect the most recent draw number so that the code can be run 'as is' in the future, provided the National Lottery website design remains the same.
theurl <- "https://www.nationallottery.co.za/index.php?task=results.redirectPageURL&Itemid=265&option=com_weaver&controller=lotto-history"
x <- rvest::html_text(xml2::read_html(theurl))
preceding_string <- "LOTTO, LOTTO PLUS 1 AND LOTTO PLUS 2 DRAW "
drawnums <- as.integer(vapply(gregexpr(preceding_string, x)[[1]] + nchar(preceding_string),
function(k) substr(x, start = k, stop = k + 3), NA_character_))
drawnumrange <- 1506:max(drawnums)
response <- lapply(drawnumrange, function(d) httr::POST(url = theurl,
body = list(gameName = "LOTTO", drawNumber = as.character(d), isAjax =
"true"), encode = "form"))
jsondat <- lapply(response, function(r) jsonlite::parse_json(r)$data$drawDetails)
lottotable <- as.data.frame(do.call(rbind, jsondat))
numericcols <- c(1, 4:32, 36:37)
lottotable[numericcols] <- sapply(lottotable[numericcols], as.numeric)
xlsx::write.xlsx2(lottotable[1:37], "lottotable.xlsx", row.names = FALSE)
Related
I'm making a custom function in google sheets which counts the number of occurrences of all items in a given array in a given range of cells.
The way I'm given to understand google sheets functions work are that the range you give is turned into a two-dimensional array of the items in the cells. So range A4:B5 would be transmitted to the function as
[[the contents of A4, the contents of B4],
[the contents of A5, the contents of B5]
The next input is a list of the items to check for in those cells. From what I could find online, arrays are given in google sheets by using brackets like these {}. the function I created is given below. I have never used javascript before but I know other languages and I just googled how to use for loops and if statements to create the function, so I'm certain the error is due to something simple that I don't know about or missed.
function count_if_in_set(range, given_list) {
let counter = 0;
for (dim_1 of range) {
for (dim_2 of dim_1) {
for (item of given_list) {
if (item == dim_2) {
counter += 1
}
}
}
}
return counter
}
When I try to use this function in google sheets with the following input: =count_if_in_set(Z30:Z33, {1}), I receive the following error: TypeError: given_list is not iterable (line 5).
The contents of cells Z30 to Z33 are the integers 1, 2, 3, 3 which should be given to the function as the following 2-dimensional array: [[1], [2], [3], [3]]
The problem is that the list [1] is not iterable. I have 2 hypotheses as to why this is:
I coded something wrong because I'm very new to Javascript
The input {1} is not transmitted to a list when google sheets gives it to the function
To check if it was the former, I went through all the aspects of my function. I first checked if you have to declare the type of variable it was when you created the function, but according to what I saw when I googled it you don't. I then changed all my for (a of b) to for (let a of b) but that did nothing to help, and after that I was stuck.
To try and solve it in the case it was a problem with giving the code an array, I tried changing my input from =count_if_in_set(Z30:Z33, {1}) to =count_if_in_set(Z30:Z33, [1]), but that threw up a formula parse error so I knew that wasn't it, and I tried changing the input to =count_if_in_set(Z30:Z33, (1)) but that returned the same error. And after that I was stuck and had no more ideas.
You can get the same result with a plain vanilla spreadsheet formula, like this:
=arrayformula( countif(Z30:Z43, { 1, 2, 3 }) )
To get just the grand total, use this:
=arrayformula( sum( countif(Z30:Z43, { 1, 2, 3 }) ) )
To count how many cells have a text string that contains one of the search keys, use this:
=arrayformula( sum( countif( Z30:Z43, "*" & { "a", "b" } & "*" ) ) )
If you need to use a custom function for some reason, try something this to get started:
function count_if_in_set(values, given_list) {
let counter = 0;
values.map(row => row.map(value =>
counter += (given_list.indexOf(value) !== -1)
));
return counter;
}
This is really an anti-pattern, because the map result is not used for anything. People would tend to use Array.reduce(), but the map-map pattern may be easier to follow, and it is the one typically employed in custom functions that most often do not aggregate the result but return exactly one value per argument value.
Some of the best resources for learning Google Apps Script include the Beginner's Guide, the New Apps Script Editor guide, the Fundamentals of Apps Script with Google Sheets codelab, the Extending Google Sheets page, javascript.info, Mozilla Developer Network and Apps Script at Stack Overflow.
Try this:
Just looking for the number of 1,2,3,4,5,6,7,8 or 9 in the selected range and return the item and count
function checkforitems(a, b) {
let obj = {pA:[]};
Logger.log(a);
Logger.log(b);
let arr = b[0];//b enters as a 2d array with a single element
//collect counts with a pivot table
a.forEach(r => {
r.forEach(c => {
let index = arr.indexOf(c);
if(~index) {
if(!obj.hasOwnProperty(arr[index])) {
obj[arr[index]]=1;
obj.pA.push(arr[index]);//collect elements as an array
} else {
obj[arr[index]]+=1;
}
}
});
});
let l = '';
//obj.pA.sort((x,y) => x - y);//if searching for numbers you can use this to sort them before displaying them
obj.pA.forEach(e => {
l += `${e}-${obj[e]}\n`;
});
return l;
}
My Test Sheet:
COL1
COL2
COL3
COL4
COL5
COL6
COL7
COL8
COL9
COL10
1
17
8
10
2
7
4
19
12
11
8
13
7
1
6
14
8
19
15
1
17
15
15
6
7
3
3
17
8
12
8
2
17
9
9
7
15
16
19
11
14
11
19
0
15
4
16
11
1
11
1
3
3
19
3
1
5
4
3
16
10
8
8
2
17
18
0
1
17
6
1
0
10
18
12
16
11
4
7
13
10
18
6
12
12
5
3
11
9
5
13
2
2
8
5
4
8
12
18
2
0
18
18
18
17
4
6
14
8
8
1
11
12
1
15
17
18
3
0
6
19
5
17
11
12
9
12
1
6
15
12
5
7
1
14
9
4
4
18
12
3
1
11
8
11
9
17
6
12
5
11
12
16
5
5
5
6
12
3
5
16
0
18
14
8
4
16
0
10
0
15
13
4
17
14
10
9
9
2
4
13
12
11
15
12
18
0
8
19
19
3
1
0
3
1
16
18
6
1
2
My formula (L22):
=checkforitems(A2:J21,{1,2,3,4,5,6,7,8,9})
returned result:
1-16
8-14
2-8
7-6
4-11
6-10
3-12
9-8
5-11
Test Sheet With Results:
To anyone looking for a way to make that function work:
I did the same thing but changed it slightly so the second input was a range of cells which contained the range I wanted to search through
I work in an event organizing company, our main business is organizing “speed dating” meetings between buyers (retailers and distributors) and manufacturers of food and beverages. We have people that create a schedule for events and I would like to somehow automate this process.
I would ask for help with logic for a web app that would schedule the meetings.
There are 10 different companies on each side.
The meetings should be only if one side chose a company from the other side.
Each meeting is 15 minutes.
The whole event should we 2-2,5 hours long.
Any suggestions on how to create a great schedule automatically?
P.S. I am sorry if my question is not clear, this is my first Stack Overflow question.
I don't have a JavaScript solution, but this can be solved with mathematical optimization tools.
Let's start with some data:
---- 11 SET b buyers
buyer1 , buyer2 , buyer3 , buyer4 , buyer5 , buyer6 , buyer7 , buyer8 , buyer9 , buyer10
---- 11 SET s sellers
seller1 , seller2 , seller3 , seller4 , seller5 , seller6 , seller7 , seller8 , seller9
seller10
---- 11 SET r rounds
round1, round2, round3, round4, round5, round6, round7, round8
---- 11 PARAMETER wantMeeting a meeting has been requested
seller1 seller2 seller3 seller4 seller5 seller6 seller7 seller8 seller9
buyer1 1 1
buyer2 1 1
buyer3 1 1
buyer4 1 1
buyer5 1 1 1
buyer6 1 1 1
buyer7 1 1 1 1
buyer8 1 1
buyer9 1 1 1
buyer10 1 1
+ seller10
buyer8 1
Introduce binary variables:
x(b,s,r) = 1 if buyer b meets seller s in round r
0 otherwise
We only consider the cases where wantMeeting=1. Implicitly, when wantMeeting=0, we assume x(b,s,r)=0.
Constraints:
Meeting is requested:
sum(r, x(b,s,r)) = 1 for all b,s with wantMeeting(b,s)=1
Buyer can only have one meeting per round
sum(s|wantMeeting(b,s)=1, x(b,s,r)) <= 1 for all b,r
Seller can only have one meeting per round
sum(b|wantMeeting(b,s)=1, x(b,s,r)) <= 1 for all s,r
Here | is the mathematic notation for "such that".
I have also added some constraints, and an objective to minimize the number of rounds needed. The resulting Mixed Integer Programming model gives as result:
---- 42 VARIABLE x.L meetings
round1 round2 round3 round4
buyer1 .seller1 1
buyer1 .seller9 1
buyer2 .seller5 1
buyer2 .seller7 1
buyer3 .seller3 1
buyer3 .seller4 1
buyer4 .seller1 1
buyer4 .seller3 1
buyer5 .seller2 1
buyer5 .seller4 1
buyer5 .seller6 1
buyer6 .seller5 1
buyer6 .seller6 1
buyer6 .seller9 1
buyer7 .seller1 1
buyer7 .seller2 1
buyer7 .seller5 1
buyer7 .seller6 1
buyer8 .seller3 1
buyer8 .seller8 1
buyer8 .seller10 1
buyer9 .seller2 1
buyer9 .seller5 1
buyer9 .seller6 1
buyer10.seller7 1
buyer10.seller9 1
---- 42 VARIABLE round.L round is used
round1 1, round2 1, round3 1, round4 1
---- 42 VARIABLE numRounds.L = 4 number of rounds needed
I did not have a capacity per round (say n tables are available). This is not very difficult to add. More details are here.
For some more examples of these type of models see:
Scheduling Business Dinners
Speed Dating Scheduling
I probably would solve this on a server, but if you insist on a JavaScript solution, there is a JavaScript port of the GLPK Mixed Integer Programming Solver (link). You may also look into Constraint Programming Solvers (there are a few available in JavaScript).
I was playing around with Web Audio API and maybe found a bug in the AnalyserNode. Let's say I have two sine oscillators playing at different frequencies, 200 Hz and 8000 Hz respectively. Using two different AnalyserNode(s) I extract the non-zero frequency data from the two oscillators, which are the following (from chrome console):
OSC1 (200 Hz)
Bin 0 value 1
Bin 1 value 3
Bin 2 value 9
Bin 3 value 18
Bin 4 value 30
Bin 5 value 43
Bin 6 value 36
Bin 7 value 159
Bin 8 value 236
Bin 9 value 255
Bin 10 value 255
Bin 11 value 212
Bin 12 value 86
Bin 13 value 46
Bin 14 value 36
Bin 15 value 21
Bin 16 value 8
OSC2 (8000 Hz)
Bin 364 value 6
Bin 365 value 18
Bin 366 value 32
Bin 367 value 46
Bin 368 value 52
Bin 369 value 126
Bin 370 value 224
Bin 371 value 255
Bin 372 value 255
Bin 373 value 226
Bin 374 value 132
Bin 375 value 51
Bin 376 value 47
Bin 377 value 33
Bin 378 value 19
Bin 379 value 7
Now if I change the frequency value of the first oscillator to 8000 Hz (the same of the second oscillator) and extract again the non-zero frequency data I expect to obtain non zero values approximately in the same Bins of the second oscillator (say in the 300-400 range), but strangely there are non zero values also in the Bins in range 0-50 (as when we extracted frequency data using a 200 Hz frequency).
OSC1 (8000 Hz)
Bin 2 value 2
Bin 3 value 11
Bin 4 value 23
Bin 5 value 36
Bin 6 value 29
Bin 7 value 152
Bin 8 value 229
Bin 9 value 255
Bin 10 value 248
Bin 11 value 205
Bin 12 value 79
Bin 13 value 38
Bin 14 value 29
Bin 15 value 14
Bin 16 value 1
Bin 364 value 7
Bin 365 value 19
Bin 366 value 33
Bin 367 value 47
Bin 368 value 50
Bin 369 value 137
Bin 370 value 228
Bin 371 value 255
Bin 372 value 255
Bin 373 value 222
Bin 374 value 121
Bin 375 value 52
Bin 376 value 45
Bin 377 value 31
Bin 378 value 18
Bin 379 value 5
Is this the expected behavior or a bug? It seems not correct to me. I am also not sure if this propagates also when analyzing a standard audio file using for example a requestAnimationFrame loop.
Below the code of the full example.
NB: to extract the frequency data is required to wait a bit before the analyser has finished the Fast Fourier Transform algorithm and the frequency data is available, thus I've used 2 timeOut functions, one for the first extraction of frequency data from osc1 and osc2 and the second to extract again frequency data from osc1 after the oscillator frequency has changed to 8000 Hz).
var AudioContext = window.AudioContext || window.webkitAudioContext;
var ctx = new AudioContext();
// first oscillator (200 Hz)
var osc1 = ctx.createOscillator();
osc1.frequency.value = 200;
var analyser1 = ctx.createAnalyser();
var gain1 = ctx.createGain();
gain1.gain.value = 0;
osc1.connect(analyser1);
analyser1.connect(gain1);
gain1.connect(ctx.destination);
// second oscillator (8000 Hz)
var osc2 = ctx.createOscillator();
osc2.frequency.value = 8000;
var analyser2 = ctx.createAnalyser();
var gain2 = ctx.createGain();
gain2.gain.value = 0;
osc2.connect(analyser2);
analyser2.connect(gain2);
gain2.connect(ctx.destination);
// start oscillators
osc1.start();
osc2.start();
// get frequency data
var freqData1 = new Uint8Array(analyser1.frequencyBinCount);
var freqData2 = new Uint8Array(analyser2.frequencyBinCount);
setTimeout(function() {
analyser1.getByteFrequencyData(freqData1);
analyser2.getByteFrequencyData(freqData2);
console.log("OSC1 (200 Hz)");
printNonZeroFreqData(freqData1);
console.log("OSC2 (8000 Hz)");
printNonZeroFreqData(freqData2);
// change frequency of osc1 to 8000 Hz
osc1.frequency.value = 8000;
// wait a bit, then extract again frequency data from osc1
setTimeout(function() {
freqData1 = new Uint8Array(analyser1.frequencyBinCount);
analyser1.getByteFrequencyData(freqData1);
console.log("OSC1 (8000 Hz)");
printNonZeroFreqData(freqData1);
}, 500);
}, 500);
// print non zero frequency values
function printNonZeroFreqData(arr) {
for (var i = 0; i < arr.length; ++i) {
if (arr[i] != 0) {
console.log("Bin " + i, "\tvalue " + arr[i]);
}
}
console.log("");
}
This is expected. According to the spec, successive calls to extract the frequency data combines the data from the current call with a history of the data from previous calls. If we want to see the frequency data only from the current time, set smoothingTimeConstant to 0.
smoothingTimeConstant on Mozilla Developer Network
I am building out an Angular2 Slider component and the current setup is that the value of the slider is a percentage (based off where the slider handle is from 0% - 100%). I have an array of n items and want the slider to grab the appropriate index from the array based off the percentage (where the handle is at).
Here is my current drag event (fired when user is dragging slider handle):
handleDrag(evt, ui) {
let maxWidth = $('#slideBar').width() - 15;
let position = $('#slideHandle').css('left');
position = position.replace('px', '');
let percent = (+position / +maxWidth) * 100;
this.year = percent;
}
The percentage is working correctly but am wondering how I should structure the algorithem to fetch the array index by percentage. So, if i'm at 50%, I want to fetch the array index 73 if the array length is 146.
Is there an easier way of doing this with JavaScript? I have done a similar component where I did a table approach but would like to figure out a way to do this without adding 'helper html elements' to the page.
Your approach sounds fine, so getting the index would could be achieved by using the .length property of your array as follows:
var actualndex = Math.floor((array.length-1) * percentage);
// Where percentage is a value between 0 and 1
This should return an index between 0 and array.length-1 depending on the percentage value.
TL;DR
This answer didn't work for me. I did some experimenting and found that Math.round is better than Math.floor in this situation, but using Math.floor with some additional logic is even better still. For best results, try:
var index = Math.min(Math.floor(array.length * percentage), (array.length-1));
Intro
I know this question seems to be answered, but I'm posting another answer because using the selected answer gave me unexpected results, and I want to save others from running into the same problem.
First of all, I want to clarify that if you only need an approximation, not a precise result, the selected answer will mostly work, especially for a larger array.
However, for smaller arrays, it frequently does not return the expected result. For example, with an array size of 2, it always returns the first element (index 0) unless the percentage is exactly 100%. Even 99% will return the 1st element (Math.floor((2-1) * 0.99) == 0). In fact, 100% is the ONLY value that will yield the last index (no matter the size of the array).
Expected Results
By "expected" results, I mean, that if an array has N elements, each element is represented by 1/Nth of the values from 0.0 to 1.0. An array of 5 elements would be break down like this:
Given Percentage
Expected Index
[0.00%, 20.00%)
0
[20.00%, 40.00%)
1
[40.00%, 60.00%)
2
[60.00%, 80.00%)
3
[80.00%, 100.00%]
4
Algorithms
I looked at 3 different algorithms with various array sizes:
var indexFloor = Math.floor((array.length-1) * percentage);
var indexRound = Math.round((array.length-1) * percentage);
var indexCustom = Math.min(Math.floor(array.length * percentage), (array.length-1));
Results
All 3 algorithms tend to the same results as the array gets larger, but the 3rd one always gives the results that I would expect.
Take a look at the results from arrays of sizes 2 and 5 (incorrect results in BOLD) (Prettier version):
Percentage
Floor(Size 2)
Round(Size 2)
Custom(Size 2)
Floor(Size 5)
Round(Size 5)
Custom(Size 5)
0.00%
0
0
0
0
0
0
5.00%
0
0
0
0
0
0
10.00%
0
0
0
0
0
0
15.00%
0
0
0
0
1
0
20.00%
0
0
0
0
1
1
25.00%
0
0
0
1
1
1
30.00%
0
0
0
1
1
1
35.00%
0
0
0
1
1
1
40.00%
0
0
0
1
2
2
45.00%
0
0
0
1
2
2
50.00%
0
1
1
2
2
2
55.00%
0
1
1
2
2
2
60.00%
0
1
1
2
2
3
65.00%
0
1
1
2
3
3
70.00%
0
1
1
2
3
3
75.00%
0
1
1
3
3
3
80.00%
0
1
1
3
3
4
85.00%
0
1
1
3
3
4
90.00%
0
1
1
3
4
4
95.00%
0
1
1
3
4
4
100.00%
1
1
1
4
4
4
For size 2, both Round and the custom algorithm produce the expected results. However, for size 5, only the custom algorithm does. Specifically, the Round method has 85% selecting the 3rd element (I would expect 80% and over to be the last element) and 15% selecting 2nd element (I would expect that to require 20%).
We see similar results with slightly larger table sizes (Incorrect Results in bold) (Prettier Version):
Percentage
Floor(Size 8)
Round(Size 8)
Custom(Size 8)
Floor(Size 10)
Round(Size 10)
Custom(Size 10)
0.00%
0
0
0
0
0
0
5.00%
0
0
0
0
0
0
10.00%
0
1
0
0
1
1
15.00%
1
1
1
1
1
1
20.00%
1
1
1
1
2
2
25.00%
1
2
2
2
2
2
30.00%
2
2
2
2
3
3
35.00%
2
2
2
3
3
3
40.00%
2
3
3
3
4
4
45.00%
3
3
3
4
4
4
50.00%
3
4
4
4
5
5
55.00%
3
4
4
4
5
5
60.00%
4
4
4
5
5
6
65.00%
4
5
5
5
6
6
70.00%
4
5
5
6
6
7
75.00%
5
5
6
6
7
7
80.00%
5
6
6
7
7
8
85.00%
5
6
6
7
8
8
90.00%
6
6
7
8
8
9
95.00%
6
7
7
8
9
9
100.00%
7
7
7
9
9
9
The Round and Custom methods are very similar (especially for the size 10 array). They are, in fact, identical up until 60%, which the Round method yields index 5, but the custom method correctly classifies that as index 6. From then on, the Round algorithm is off by one.
Conclusion
The original algorithm's use of Math.floor was correct, but the method used to ensure the result was a valid array index skewed the results because there are there are N elements, not N-1 elements. But when we use N, a percentage value of 100% yields a value that is outside of the valid range of 0 to N-1.
However, by definition, 100% should return the last index of the array, and this can be enforced with a simple call to Math.min. The result is this algorithm for determining an array index using a percentage:
var index = Math.min(Math.floor(array.length * percentage), (array.length-1));
Backup
Complete table of all the tested array sizes and percentages
Excel document I made for testing
So Iam making a birt report on enclipse my goal is to output on each page amount of sales of a given person, not overall amount of sales of that person.
Consider this table(consider we only have 1 person):
personID sales date
------- ----- -----
111 10 2010-02-02
111 15 2010-02-03
111 5 2010-03-03
111 7 2010-04-03
111 8 2011-01-01
111 9 2013-01-01
111 20 2014-01-01
111 25 2014-03-02
The scenario on each page:
It shows 3 results(which I want)
but below each page its showing 99 sales (which I don't want)
What I want is that:
on 1st page it shows 30 sales (for 3 rows)
on 2nd page it shows 24 sales(for 3 rows)
and last page it shows 45(for 2 remaining rows)
what I did is that (i dont know if its right approach):
DENSE_RANK() OVER (ORDER BY sales) AS Row,
convert(integer,ROW_NUMBER() over(order by sales)/4) as page
which turned my table into
personID sales date Row page
------- ----- ----- ---- ----
111 10 2010-02-02 1 0
111 15 2010-02-03 2 0
111 5 2010-03-03 3 0
111 7 2010-04-03 4 1
111 8 2011-01-01 5 1
111 9 2013-01-01 6 1
111 20 2014-01-01 7 1
111 25 2014-03-02 8 2
As you can see there's another issue which is:
the 1st 3 rows got page(0)
but row 4-5-6-7 got page 1 which is wrong it should have been row 4-5-6 page 1
and row 7-8 page 2
I am also working on eclipse using JavaScript
<method name="onPageEnd"><![CDATA[var sales = this.getInstancesByElementName("sales");
var tmp=0;
if( sales != null )
{
for(var i=0; i< sales.length; i++ )
{
for(var j=0;j<3;j++)
{
//Instance of DataItemInstance
var sales = sales[i];
tmp+=parseInt(sales.getValue());
}
}
}
Once your query results are correct, you should compute SUM(sales) based on the page.
This can be done either with SQL (Oracle SQL syntax)
with x as
(
your_query_here
)
select x.*,
sum(sales) over (partition by page) -- IIRC
from x
or with BIRT, if you create a GROUP (called "page" for example) in your (layout) table, and an aggregate column binding based on this group.
Now to the query itself:
I don't think that the DB actually shows the results you state with your query.
Probably "Row" is defined in your query as
DENSE_RANK() OVER (ORDER BY "date" AS "Row"
I often use something like this for "matrix reports in SQL" and the like (asssuming you want 3 rows/page). This is a puire SQL solution:
with
y as (
select ...,
ROW_NUMBER() over(order by "sales") partition by ("personID") - 1 as "Row"
-- Note: Row is zero-based: 0,1,2,...
),
x as (
select y.*,
MOD(y."Row", 3) + 1 as "RowOnPage"
trunc(y."Row"/3) + 1 as "Page"
from y
)
select x.*,
sum("sales") over (partition by "personId", "Page") as SumSalesPerPersonAndPage
-- IIRC
from x
This is probably not quite correct (because I don't know how you intend to handle the different persons), but you get the idea...
For creating reports, it is a great advantage to know analytic functions.
I usually test my queries outside of BIRT (eg with SQL*Developer).