Algorithm to remove extreme outliers in array - javascript

I've got an array which I use for the x-axis in a D3 graph, and it blows up because the chart size is too small for the size of the array. I had a look at data and there are extreme outliers in the data. See chart below.
The data around 0 (its not totally zero, its 0.00972 etc).
The data starts getting interesting around 70, then massive spikes about 100. the data then continues and then the same sort of thing on the other side about 200.
Can anyone help me with some algo that removes the extreme outliers? e.g. give me 95% or 90% percentiles and remove the contiguous elements (e.g. not just one element from the middle but x number of elements from the start of the array and the end of the array, where x depends on working out where best to do it based on the data? In Javascript as well please!
thanks!
ps you'll need to save the image to view it properly

Assuming the data is like
var data[] = {0.00972, 70, 70, ...};
first sort
data.sort(function(a,b){return a-b});
then take off the bottom 2.5% and top 2.5%
var l = data.length;
var low = Math.round(l * 0.025);
var high = l - low;
var data2 = data.slice(low,high);
An alternative would be to only show data within 3 standard deviations of the mean. If you data is normally distributed 99.7% will fall in this range.
var sum=0; // stores sum of elements
var sumsq = 0; // stores sum of squares
for(var i=0;i<data.length;++i) {
sum+=data[i];
sumsq+=data[i]*data[i];
}
var mean = sum/l;
var varience = sumsq / l - mean*mean;
var sd = Math.sqrt(varience);
var data3 = new Array(); // uses for data which is 3 standard deviations from the mean
for(var i=0;i<data.length;++i) {
if(data[i]> mean - 3 *sd && data[i] < mean + 3 *sd)
data3.push(data[i]);
}
Or similar using some multiple of the Inter-quartile range
var median = data[Math.round(l/2)];
var LQ = data[Math.round(l/4)];
var UQ = data[Math.round(3*l/4)];
var IQR = UQ-LQ;
var data4 = new Array();
for(var i=0;i<data.length;++i) {
if(data[i]> median - 2 * IQR && data[i] < mean + 2 * IQR)
data4.push(data[i]);
}

Related

How to efficiently feed the output of a GPU JS function back into it?

I am attempting to use GPU JS to accelerate the performance of a dynamic programming algorithm.
Here is my current code:
let pixels = new Uint32Array(5 * 5);
for (let i = 0; i < pixels.length; i++) {
pixels[i] = i;
}
function kFunction() {
let width = this.output.x;
let row = this.constants.row;
let col = this.thread.x;
let prevRow = (row - 1) * width;
let base = (row * width) + col;
let prevBase = (prevRow * width) + col;
let nw = this.constants.pixels[prevBase - 1];
let n = this.constants.pixels[prevBase];
let ne = this.constants.pixels[prevBase + 1];
return this.constants.pixels[base] + Math.min(Math.min(nw, n), ne);
}
var gpuKernel = gpu.createKernel(kFunction)
.setConstants({ pixels: pixels, row: 1 })
.setOutput([5, 5]);
console.log(gpuKernel());
This works, except I would like to have it run on each row, instead of just row 1.
The issue is that in order to run on the next row, the previous row has to be computed first (for rows n > 1 the nw, n, and ne values should be computed based on the previous row's value instead of pixels)
I could easily fix this by putting createKernel in a loop and running it on every row, but I believe that constantly returning the value from the GPU and sending it back is slow. I heard that Textures might be able to solve this, to maintain some sort of state, but I cannot find any relevant information on them.
Is what I'm asking to do possible? To have a single GPU function call to compute the entire cumulative sum table without passing data back and forth for each row computed?

How can I produce a distance matrix for large datasets using Google Script?

I am currently producing a script that will compare a list of around 90 addresses to each other. The result of the script should be a list that contains the time taken to travel to each address from each other.
I've run into a series of issues whilst trying to resolve this. The main issue is that resulting distance matrix will have 8100 elements. Google script's max execution time is 30 minutes and thus the script keeps timing out.
Any ways that I can improve the script to make it run faster?
The aim of this script is to produce a list with StartID, EndID and Time. I would then be able to filter the list to find addresses within an hour of each other.
Thanks!
function maps(origin, destination) {
var driving = Maps.DirectionFinder.Mode.DRIVING
var transit = Maps.DirectionFinder.Mode.TRANSIT
var modeSet = driving
var directions = Maps.newDirectionFinder()
.setOrigin(origin)
.setDestination(destination)
.setMode(modeSet)
.setOptimizeWaypoints(true)
.getDirections()
var result = directions
return result;
}
function GoogleMaps() {
//get distance
var sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("ABC");
var outputSheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("EFG");
var lastrow = sheet.getLastRow();
var lastcolumn = sheet.getLastColumn();
var range = sheet.getRange(2, 3, lastrow-1, 3);
//var range = sheet.getRange(2, 3, 3, 3);
//Origin is in row 2, column 3
var values = range.getValues();
var output = []
for (var i = 0; i < values.length; ++i)
{
var loop1 = values[i]
var start = values[i][1]
var startId = values[i][0]
for (var j = 0; j < values.length; j++) {
var loop2 = values[j]
var end = values[j][1]
var endId = values[j][0]
var result = maps(start, end)
var status = result.status
try{
var time = result.routes[0].legs[0].duration.value / 60;
var row = [startId, endId, time]
output.push(row)
} catch(err){
Logger.log(err);
}
}
}
var outputLength = output.length
var outputRange = outputSheet.getRange(1,1,outputLength,3);
outputRange.setValues(output);
}
EDIT: updated number of elements in list
The first thing you want to do is reduce the number of operations you execute in your for loops. So let's start with analyzing that first, but from an algorithmic perspective.
In your current implementation, you're basically calculating the Cartesian Product on a set of 90 values to produce a new set consisting of 8100 values.
However, there are a number of redundant values in that result set, such that:
The result set includes calculations where the same address is used as both the starting and ending location.
The distance between 2 addresses is calculated twice; such that address A is the start address and address B is the end address and in another iteration address A is the end address and address B is the start address.
CAVEAT: I'm making the assumption that you cover the same distance during transit between two addresses regardless of one's transit
direction (ie. A-to-B or B-to-A). That may not be the case in your
scenario.
You can eliminate those redundancies by using an area of discrete mathematics called combinatorics; more specifically using this lovely formula:
If we let n = 90 and r = 2 we get the following:
That means, at our most optimal, we need an algorithm that produces no more than 4005 address pairs.
With that as our goal, [cracks fingers] its time to write a more optimal algorithm! But for illustrative purposes and in the interest of brevity lets use a smaller sample size of 4 addresses made up of one letter. The following array should suffice:
var addresses = ['a', 'b', 'c', 'd'];
Using the aforementioned formula we deduce there are 6 unique address pairs, which we can represent as follows:
ab bc cd
ac bd
ad
So how does one generate those pairs?
If you look at the representation above you'll notice a few things:
The number of columns is one less than the number of addresses in the array
With each successive column (from left to right) the number of address pairs per column is reduced by 1; ie. there are 3 pairs that start with 'a', 2 that start with 'b', 1 that starts with 'c'.
Also note, that as you progress from one column to the next, successive columns do not have any pairs with the starting character of the previous columns; ie. the 2nd column does not have any pairs starting with 'a' and the 3rd column does not have any pairs starting with 'a' or 'b'
Let's generalize these observations. Given an array of n addresses we can generate n - 1 columns. The length of each column shrinks by 1 such that the first column has n - 1 pairs, the 2nd column has n - 2 pairs, the 3rd column n - 3 pairs etc., where each column consist of pair combinations that omit addresses from previous columns.
Based on those rules we can set up a for loop as follows (run the script and it will generate a collection of objects whose 'start' and 'end' properties represent unique address pairs):
var addresses = ['a', 'b', 'c', 'd'];
var pairs = [];
var numColumns = addresses.length - 1;
var columnHeight;
var columnIndex;
var rowIndex;
for (columnIndex = 0; columnIndex < numColumns; columnIndex++) {
columnHeight = numColumns - columnIndex;
for (rowIndex = 0; rowIndex < columnHeight; rowIndex++) {
pairs.push({
"start":addresses[columnIndex],
"end":addresses[columnIndex + rowIndex + 1]
});
}
}
console.log(pairs);
So the above handles algorithmic optimizations, you'll need to tweak it for use with your implementation but it should serve as a good jumping-off point. However, while generating 4005 address pairs is relatively quick, processing those address pairs to find distance traveled via the Map API will likely be time intensive.
In the event that you still manage to exhaust the 30 minute script execution quota, you may want to consider using batch processing techniques, where you setup up your application to do calculations on smaller batches of address pairs, one batch at a time over a given period. You might even be able to process multiple batches concurrently if you setup your application correctly. But that's a post for another time.
This is perhaps not any better than what you have for performance but try to break it down into a more modular solution here, then you can decide which part to optimize perhaps by doing it in some subset at a time;
function getValuesArray(values) {
let valueArray = [];
for (let i = 0; i < values.length; ++i) {
valueArray.push({
id: values[i][0],
value: values[i][1]
});
}
return valueArray;
}
function GoogleMaps() {
//get distance
var sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("ABC");
var outputSheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("EFG");
var lastrow = sheet.getLastRow();
var lastcolumn = sheet.getLastColumn();
var range = sheet.getRange(2, 3, lastrow - 1, 3);
//var range = sheet.getRange(2, 3, 3, 3);
//Origin is in row 2, column 3
var values = range.getValues();
var output = [];
let list1 = getValuesArray(values);
// deep clone
const clone = (items) => items.map(item => Array.isArray(item) ? clone(item) : { ...item
});
// might only need list1 but usin two for clarity here
const list2 = clone(list1);
const listWork = [];
for (var a = 0; a < list1.length; a++) {
for (var j = 0; j < list2.length; j++) {
listWork.push({
dest: list2[j].value,
destId: list2[j].id,
origin: list1[a].value,
originId: list1[a].id
}
}
}
}
let results = [];
for (let w = 0; w < listWork.length; w++) {
results.push(startId: listWork.originId, endId: listWork.destId, map: maps(listWork.origin, listWork.dest));
}
for (let r = 0; r < results.length; r++) {
let result = results[r];
// seems to not be used
//var status = result.map.status;
let route = !!result.map.routes && result.map.routes[0] ? result.map.routes[0] : null;
if (route !== null &&
route.legs &&
route.legs[0] &&
route.legs[0].duration &&
route.legs[0].duration.value) {
let time = route.legs[0].duration.value / 60;
let row = [result.startId, result.endId, time];
output.push(row);
}
}
let outputLength = output.length;
let outputRange = outputSheet.getRange(1, 1, outputLength, 3);
outputRange.setValues(output);
}

unable to create an automatic filling array without writing it manually

I'd like to create an coordinates array so that I don't have to manually write 100 pairs of x and y values. I was thinking of using for loop inside another for loop, but I couldn't achieve it.
I'd like to make an empty array, named CoordsArray, and making use of the push method within the most inner for, create an mixed array as follows:
var coordsArray = [
{x:0,y:0},{x:0,y:20},{x:0,y:40},{x:20,y:0},{x:20,y:20},{x:20,y:40},{x:40,y:0},{x:40,y:20},{x:40,y:40}
]
The usage of the above code is for creating three rows of three circles (using D3js) without creating the array manually, because later I'll have to create a 10 rows of ten circles and creating the x and the y position of each one will be very cumbersome. What's the best approach to achieve it? Thanks.
The pattern I see is this:
the x-value is the (array index / # of cols) rounded down to the
nearest integer
the y-value is the modulo of the (array index / # of
cols)
So you can start with an empty array of length (rows * cols), fill each entry with the circle size, and then map each element to an object created by multiplying the array value by those x and y calculations...
let rows = 3,
cols = 5,
size = 20;
let data = new Array(rows*cols)
.fill(size)
.map((d, i) => ({
x: d * Math.floor(i/cols),
y: d * (i%cols)
}));
console.log(data);
Granted, it may not be as human-readable as nested loops, but I like it...
You should always post the code you've tried.
Here's one way:
var coordsArray = [];
const rows = 10,
cols = 10,
size = 20;
for (var row = 0; row < rows; row++) {
for (var col = 0; col < cols; col++) {
coordsArray[row * cols + col] = [row * size, col * size];
}
}

Matrix conversion in a particular cluster Structure

I am trying to convert the Distance Matrix I got after calculating Euclidean Distance and taking Matrix form of it, to a manual defined clustered pattern.
In my case, this is the matrix suppose which normal pattern
Suppose, this is the desired cluster order, I have to convert i.e. from 1 2 3 4 to 3 1 2 4, I want to have something like this. I do not want to do it manually since my matrix size is 40 X 40.
It is not clicking to my mind, I can code but algorithm is not coming to my mind. If you could help or someone has done something like this before. Please help me out with it.
I tried reading the data from csv with the help of d3.csv which gives the desired result of 1234 like this :
d3.csv("final.csv", function(loadeddata) {
mydata = loadeddata.map(function(d) {return [+d["1"], +d["2"] , +d["3"], +d["4"]] ;});
to
d3.csv("final.csv", function(loadeddata) {
mydata = loadeddata.map(function(d) {return [+d["3"], +d["1"] , +d["2"], +d["4"]] ;});
so that I can get the matrix in desired clustered format but it did not work coz it shifting whole column to the desired place but not shifting the element wise.
Here is another try I made
var iMax = 4;
var jMax = 4;
var newdata = new Array();
for (i=0;i<iMax;i++) {
newdata[i]=new Array();
for (j=0;j<jMax;j++) {
newdata[i][j]=0;
}
}
var arraycomb = [3,1,2,4];
for ( i = 0; i < 4; i++) {
for ( j = 0; j < 4; j++) {
newdata[arraycomb[i]][arraycomb[j]] = mydata[i][j];
}
}

Get the minimum value in the chart rendered for flot.js

Is it possible to get the lowest value in the chart itself assuming that the data is dynamic?Take a look at this example Fiddle.
$(function () {
var d1 = [];
for (var i = 0; i < 14; i += 0.5)
d1.push([i, Math.sin(i)]);
$.plot($("#placeholder"), [ d1]);
});
How can I get the lowest value in this line chart?
Update: It seems my earlier example didn't quite make sense please take a look at this link: https://abtw.alliancebernstein.com.tw/APAC/TW/Funds/American-Income.htm?ShareClassId=60006908 make sure to turn off Flash plugin so that Flotchart will render. Now looking at the area chart I want to get the lowest value base on the chart rendered. Is this possible?
If you save your plot object like so
var plot = $.plot($("#placeholder"), [ d1]);
you can get the minimum value from it with
var minimum = plot.getData()[0].yaxis.datamin;
The same is possible for maximum value (datamax), for the xaxis and for other data series (the index behind getData()).
http://jsfiddle.net/fenderistic/Sf5Yr/
Simply keep a lowest variable, and check throughout the for-loop to see if the value lower, if so, replace the current lowest value with it.
$(function () {
var d1 = [];
//Assuming you're always starting at zero
var lowest = Math.sin(0);
for (var i = 0; i < 14; i += 0.5) {
d1.push([i, Math.sin(i)]);
if (Math.sin(i) < lowest) {
lowest = Math.sin(i);
}
}
alert(lowest)
$.plot($("#placeholder"), [d1]);
});

Categories