tensorflow results are weird. How to solve it? - javascript

It has two inputs and one output.
Input: [Temperature, Humidity]
Output: [wattage]
I learned as follows
Even after 5 million rotations, it does not work properly.
Did I choose the wrong option?
var input_data = [
[-2.4,2.7,9,14.2,17.1,22.8,281,25.9,22.6,15.6,8.2,0.6],
[58,56,63,54,68,73,71,74,71,70,68,62]
];
var power_data = [239,224,189,189,179,192,243,317,224,190,189,202];
var reason_data = tf.tensor2d(input_data);
var result_data = tf.tensor(power_data);
var X = tf.input({ shape: [2] });
var Y = tf.layers.dense({ units: 1 }).apply(X);
var model = tf.model({ inputs: X, outputs: Y });
var compileParam = { optimizer: tf.train.adam(), loss: tf.losses.meanSquaredError }
model.compile(compileParam);
var fitParam = {
epochs: 500000,
callbacks: {
onEpochEnd: function (epoch, logs) {
console.log('epoch', epoch, logs, "RMSE --> ", Math.sqrt(logs.loss));
}
}
}
model.fit(reason_data, result_data, fitParam).then(function (result) {
var final_result = model.predict(reason_data);
final_result.print();
model.save('file:///path/');
});
The following is the result for 5 million times.
It should be the same as power_data , but it failed.
What should I fix?

While there is rarely one simple reason to point to when a model doesn't perform the way you would expect, here are some options to consider:
You don't have enough data points. Twelve is not nearly sufficient to get an accurate result.
You need to normalize the data of the input tensors. Given that your two features [temperature and humidity] have different ranges, they need to be normalized to give them equal opportunity to influence the output. The following is a normalization function you could start with:
function normalize(tensor, min, max) {
const result = tf.tidy(function() {
// Find the minimum value contained in the Tensor.
const MIN_VALUES = min || tf.min(tensor, 0);
// Find the maximum value contained in the Tensor.
const MAX_VALUES = max || tf.max(tensor, 0);
// Now calculate subtract the MIN_VALUE from every value in the Tensor
// And store the results in a new Tensor.
const TENSOR_SUBTRACT_MIN_VALUE = tf.sub(tensor, MIN_VALUES);
// Calculate the range size of possible values.
const RANGE_SIZE = tf.sub(MAX_VALUES, MIN_VALUES);
// Calculate the adjusted values divided by the range size as a new Tensor.
const NORMALIZED_VALUES = tf.div(TENSOR_SUBTRACT_MIN_VALUE, RANGE_SIZE);
// Return the important tensors.
return {NORMALIZED_VALUES, MIN_VALUES, MAX_VALUES};
});
return result;
}
You should try a different optimizer. Adam might be the best choice, but for a linear regression problem such as this, you should also consider Stochastic Gradient Descent (SGD).
Check out this sample code for an example that uses normalization and sgd. I ran your data points through the code (after making the changes to the tensors so they fit your data), and I was able to reduce the loss to less than 40. There is room for improvement, but that's where adding more data points comes in.

Related

Picking quartile value on each point

I'm plotting sentiment value of tweet over last 10 years.
The csv file has the three columns like below.
I plotted each value by date successfully.
However, when I tried to generate an area graph,
I encountered a problem which is,
each date has multiple values.
That's because each data point is from one single tweets that
one x point ended up with having multiple y values.
So I tried to pick quartile value of each date or pick largest or least y value.
For clarity, please see the example below.
January 8 has multiple y values (textblob)
I want to draw area graph by picking the largest value or 2nd quartile value of each point.
How do I only pick the point?
I would like to feed the point in the following code as a
x/y coordinate for line or area greaph.
function* vlinedrawing(data){
for(let i;i<data.length;i++){
if( i%500==0) yield svg.node();
let px = margin+xscale(data[i].date)
let py = height-margin-yscale(data[i].vader)
paths.append('path')
.attr('x',px)
.attr('y',py)
}
yield svg.node()
}
The entire code is in the following link.
https://jsfiddle.net/soonk/uh5djax4/2/
Thank you in advance.
( The reason why it is a generator is that I'm going to visualize the graph in animated way)
For getting the 2nd quartile you can use d3.quantile like this:
d3.quantile(dataArray, 0.5);
Of course, since the 2nd quartile is the median, you can also just use:
d3.median(dataArray);
But d3.quantile is a bit more versatile, you can just change the p value for any quartile you want.
Using your data, without parsing the dates (so we can use a Set for unique values`), here is a possible solution:
const aggregatedData = [...new Set(data.map(function(d) {
return d.date
}))].map(function(d) {
return {
date: parser(d),
textblob: d3.quantile(data.filter(function(e) {
return e.date === d
}).map(function(e) {
return e.textblob
}), 0.5)
}
});
This is just a quick answer for showing you the way: that's not a optimised code, because there are several loops within loops. You can try to optimise it.
Here is the demo:
var parser = d3.timeParse("%m/%d/%y");
d3.csv('https://raw.githubusercontent.com/jotnajoa/Javascript/master/tweetdata.csv', row).then(function(data) {
const aggregatedData = [...new Set(data.map(function(d) {
return d.date
}))].map(function(d) {
return {
date: parser(d),
textblob: d3.quantile(data.filter(function(e) {
return e.date === d
}).map(function(e) {
return e.textblob
}), 0.5)
}
});
console.log(aggregatedData)
});
function row(d) {
d.vader = +d.vader;
d.textblob = +d.textblob;
return d;
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/5.7.0/d3.min.js"></script>

How to train a model in nodejs (tensorflow.js)?

I want to make a image classifier, but I don't know python.
Tensorflow.js works with javascript, which I am familiar with. Can models be trained with it and what would be the steps to do so?
Frankly I have no clue where to start.
The only thing I figured out is how to load "mobilenet", which apparently is a set of pre-trained models, and classify images with it:
const tf = require('#tensorflow/tfjs'),
mobilenet = require('#tensorflow-models/mobilenet'),
tfnode = require('#tensorflow/tfjs-node'),
fs = require('fs-extra');
const imageBuffer = await fs.readFile(......),
tfimage = tfnode.node.decodeImage(imageBuffer),
mobilenetModel = await mobilenet.load();
const results = await mobilenetModel.classify(tfimage);
which works, but it's no use to me because I want to train my own model using my images with labels that I create.
=======================
Say I have a bunch of images and labels. How do I use them to train a model?
const myData = JSON.parse(await fs.readFile('files.json'));
for(const data of myData){
const image = await fs.readFile(data.imagePath),
labels = data.labels;
// how to train, where to pass image and labels ?
}
First of all, the images needs to be converted to tensors. The first approach would be to create a tensor containing all the features (respectively a tensor containing all the labels). This should the way to go only if the dataset contains few images.
const imageBuffer = await fs.readFile(feature_file);
tensorFeature = tfnode.node.decodeImage(imageBuffer) // create a tensor for the image
// create an array of all the features
// by iterating over all the images
tensorFeatures = tf.stack([tensorFeature, tensorFeature2, tensorFeature3])
The labels would be an array indicating the type of each image
labelArray = [0, 1, 2] // maybe 0 for dog, 1 for cat and 2 for birds
One needs now to create a hot encoding of the labels
tensorLabels = tf.oneHot(tf.tensor1d(labelArray, 'int32'), 3);
Once there is the tensors, one would need to create the model for training. Here is a simple model.
const model = tf.sequential();
model.add(tf.layers.conv2d({
inputShape: [height, width, numberOfChannels], // numberOfChannels = 3 for colorful images and one otherwise
filters: 32,
kernelSize: 3,
activation: 'relu',
}));
model.add(tf.layers.flatten());
model.add(tf.layers.dense({units: 3, activation: 'softmax'}));
Then the model can be trained
model.fit(tensorFeatures, tensorLabels)
If the dataset contains a lot of images, one would need to create a tfDataset instead. This answer discusses why.
const genFeatureTensor = image => {
const imageBuffer = await fs.readFile(feature_file);
return tfnode.node.decodeImage(imageBuffer)
}
const labelArray = indice => Array.from({length: numberOfClasses}, (_, k) => k === indice ? 1 : 0)
function* dataGenerator() {
const numElements = numberOfImages;
let index = 0;
while (index < numFeatures) {
const feature = genFeatureTensor(imagePath);
const label = tf.tensor1d(labelArray(classImageIndex))
index++;
yield {xs: feature, ys: label};
}
}
const ds = tf.data.generator(dataGenerator).batch(1) // specify an appropriate batchsize;
And use model.fitDataset(ds) to train the model
The above is for training in nodejs. To do such a processing in the browser, genFeatureTensor can be written as follow:
function loadImage(url){
return new Promise((resolve, reject) => {
const im = new Image()
im.crossOrigin = 'anonymous'
im.src = 'url'
im.onload = () => {
resolve(im)
}
})
}
genFeatureTensor = image => {
const img = await loadImage(image);
return tf.browser.fromPixels(image);
}
One word of caution is that doing heavy processing might block the main thread in the browser. This is where web workers come into play.
Consider the exemple https://codelabs.developers.google.com/codelabs/tfjs-training-classfication/#0
What they do is:
take a BIG png image (a vertical concatenation of images)
take some labels
build the dataset (data.js)
then train
The building of the dataset is as follows:
images
The big image is divided into n vertical chunks.
(n being chunkSize)
Consider a chunkSize of size 2.
Given the pixel matrix of image 1:
1 2 3
4 5 6
Given the pixel matrix of image 2 is
7 8 9
1 2 3
The resulting array would be
1 2 3 4 5 6 7 8 9 1 2 3 (the 1D concatenation somehow)
So basically at the end of the processing, you have a big buffer representing
[...Buffer(image1), ...Buffer(image2), ...Buffer(image3)]
labels
That kind of formatting is done a lot for classification problems. Instead of classifying with a number, they take a boolean array.
To predict 7 out of 10 classes we would consider
[0,0,0,0,0,0,0,1,0,0] // 1 in 7e position, array 0-indexed
What you can do to get started
Take your image (and its associated label)
Load your image to the canvas
Extract its associated buffer
Concatenate all your image's buffer as a big buffer. That's it for xs.
Take all your associated labels, map them as a boolean array, and concatenate them.
Below, I subclass MNistData::load (the rest can be let as is (except in script.js where you need to instantiate your own class instead)
I still generate 28x28 images, write a digit on it, and get a perfect accuracy since I don't include noise or voluntarily wrong labelings.
import {MnistData} from './data.js'
const IMAGE_SIZE = 784;// actually 28*28...
const NUM_CLASSES = 10;
const NUM_DATASET_ELEMENTS = 5000;
const NUM_TRAIN_ELEMENTS = 4000;
const NUM_TEST_ELEMENTS = NUM_DATASET_ELEMENTS - NUM_TRAIN_ELEMENTS;
function makeImage (label, ctx) {
ctx.fillStyle = 'black'
ctx.fillRect(0, 0, 28, 28) // hardcoded, brrr
ctx.fillStyle = 'white'
ctx.fillText(label, 10, 20) // print a digit on the canvas
}
export class MyMnistData extends MnistData{
async load() {
const canvas = document.createElement('canvas')
canvas.width = 28
canvas.height = 28
let ctx = canvas.getContext('2d')
ctx.font = ctx.font.replace(/\d+px/, '18px')
let labels = new Uint8Array(NUM_DATASET_ELEMENTS*NUM_CLASSES)
// in data.js, they use a batch of images (aka chunksize)
// let's even remove it for simplification purpose
const datasetBytesBuffer = new ArrayBuffer(NUM_DATASET_ELEMENTS * IMAGE_SIZE * 4);
for (let i = 0; i < NUM_DATASET_ELEMENTS; i++) {
const datasetBytesView = new Float32Array(
datasetBytesBuffer, i * IMAGE_SIZE * 4,
IMAGE_SIZE);
// BEGIN our handmade label + its associated image
// notice that you could loadImage( images[i], datasetBytesView )
// so you do them by bulk and synchronize after your promises after "forloop"
const label = Math.floor(Math.random()*10)
labels[i*NUM_CLASSES + label] = 1
makeImage(label, ctx)
const imageData = ctx.getImageData(0, 0, canvas.width, canvas.height);
// END you should be able to load an image to canvas :)
for (let j = 0; j < imageData.data.length / 4; j++) {
// NOTE: you are storing a FLOAT of 4 bytes, in [0;1] even though you don't need it
// We could make it with a uint8Array (assuming gray scale like we are) without scaling to 1/255
// they probably did it so you can copy paste like me for color image afterwards...
datasetBytesView[j] = imageData.data[j * 4] / 255;
}
}
this.datasetImages = new Float32Array(datasetBytesBuffer);
this.datasetLabels = labels
//below is copy pasted
this.trainIndices = tf.util.createShuffledIndices(NUM_TRAIN_ELEMENTS);
this.testIndices = tf.util.createShuffledIndices(NUM_TEST_ELEMENTS);
this.trainImages = this.datasetImages.slice(0, IMAGE_SIZE * NUM_TRAIN_ELEMENTS);
this.testImages = this.datasetImages.slice(IMAGE_SIZE * NUM_TRAIN_ELEMENTS);
this.trainLabels =
this.datasetLabels.slice(0, NUM_CLASSES * NUM_TRAIN_ELEMENTS);// notice, each element is an array of size NUM_CLASSES
this.testLabels =
this.datasetLabels.slice(NUM_CLASSES * NUM_TRAIN_ELEMENTS);
}
}
I found a tutorial [1] how to use existing model to train new classes. Main code parts here:
index.html head:
<script src="https://unpkg.com/#tensorflow-models/knn-classifier"></script>
index.html body:
<button id="class-a">Add A</button>
<button id="class-b">Add B</button>
<button id="class-c">Add C</button>
index.js:
const classifier = knnClassifier.create();
....
// Reads an image from the webcam and associates it with a specific class
// index.
const addExample = async classId => {
// Capture an image from the web camera.
const img = await webcam.capture();
// Get the intermediate activation of MobileNet 'conv_preds' and pass that
// to the KNN classifier.
const activation = net.infer(img, 'conv_preds');
// Pass the intermediate activation to the classifier.
classifier.addExample(activation, classId);
// Dispose the tensor to release the memory.
img.dispose();
};
// When clicking a button, add an example for that class.
document.getElementById('class-a').addEventListener('click', () => addExample(0));
document.getElementById('class-b').addEventListener('click', () => addExample(1));
document.getElementById('class-c').addEventListener('click', () => addExample(2));
....
Main idea is to use existing network to make its prediction and then substitute the found label with your own one.
Complete code is in the tutorial. Another promising, more advanced one in [2]. It needs strict pre processing, so I leave it only here, I mean it is so much more advanced one.
Sources:
[1] https://codelabs.developers.google.com/codelabs/tensorflowjs-teachablemachine-codelab/index.html#6
[2] https://towardsdatascience.com/training-custom-image-classification-model-on-the-browser-with-tensorflow-js-and-angular-f1796ed24934
TL;DR
MNIST is the image recognition Hello World. After learning it by heart, these questions in your mind are easy to solve.
Question setting:
Your main question written is
// how to train, where to pass image and labels ?
inside your code block. For those I found perfect answer from examples of Tensorflow.js examples section: MNIST example. My below links have pure javascript and node.js versions of it and Wikipedia explanation. I will go them through on the level necessary to answer the main question in your mind and I will add also perspectives how your own images and labels have anything to do with MNIST image set and the examples using it.
First things first:
Code snippets.
where to pass images (Node.js sample)
async function loadImages(filename) {
const buffer = await fetchOnceAndSaveToDiskWithBuffer(filename);
const headerBytes = IMAGE_HEADER_BYTES;
const recordBytes = IMAGE_HEIGHT * IMAGE_WIDTH;
const headerValues = loadHeaderValues(buffer, headerBytes);
assert.equal(headerValues[0], IMAGE_HEADER_MAGIC_NUM);
assert.equal(headerValues[2], IMAGE_HEIGHT);
assert.equal(headerValues[3], IMAGE_WIDTH);
const images = [];
let index = headerBytes;
while (index < buffer.byteLength) {
const array = new Float32Array(recordBytes);
for (let i = 0; i < recordBytes; i++) {
// Normalize the pixel values into the 0-1 interval, from
// the original 0-255 interval.
array[i] = buffer.readUInt8(index++) / 255;
}
images.push(array);
}
assert.equal(images.length, headerValues[1]);
return images;
}
Notes:
MNIST dataset is a huge image, where in one file there are several images like tiles in puzzle, each and every with same size, side by side, like boxes in x and y coordination table. Each box has one sample and corresponding x and y in the labels array has the label. From this example, it is not a big deal to turn it to several files format, so that actually only one pic at a time is given to the while loop to handle.
Labels:
async function loadLabels(filename) {
const buffer = await fetchOnceAndSaveToDiskWithBuffer(filename);
const headerBytes = LABEL_HEADER_BYTES;
const recordBytes = LABEL_RECORD_BYTE;
const headerValues = loadHeaderValues(buffer, headerBytes);
assert.equal(headerValues[0], LABEL_HEADER_MAGIC_NUM);
const labels = [];
let index = headerBytes;
while (index < buffer.byteLength) {
const array = new Int32Array(recordBytes);
for (let i = 0; i < recordBytes; i++) {
array[i] = buffer.readUInt8(index++);
}
labels.push(array);
}
assert.equal(labels.length, headerValues[1]);
return labels;
}
Notes:
Here, labels are also byte data in a file. In Javascript world, and with the approach you have in your starting point, labels could also be a json array.
train the model:
await data.loadData();
const {images: trainImages, labels: trainLabels} = data.getTrainData();
model.summary();
let epochBeginTime;
let millisPerStep;
const validationSplit = 0.15;
const numTrainExamplesPerEpoch =
trainImages.shape[0] * (1 - validationSplit);
const numTrainBatchesPerEpoch =
Math.ceil(numTrainExamplesPerEpoch / batchSize);
await model.fit(trainImages, trainLabels, {
epochs,
batchSize,
validationSplit
});
Notes:
Here model.fit is the actual line of code that does the thing: trains the model.
Results of the whole thing:
const {images: testImages, labels: testLabels} = data.getTestData();
const evalOutput = model.evaluate(testImages, testLabels);
console.log(
`\nEvaluation result:\n` +
` Loss = ${evalOutput[0].dataSync()[0].toFixed(3)}; `+
`Accuracy = ${evalOutput[1].dataSync()[0].toFixed(3)}`);
Note:
In Data Science, also this time here, the most faschinating part is to know how well the model survives the test of new data and no labels, can it label them or not? For that is the evaluation part that now prints us some numbers.
Loss and accuracy: [4]
The lower the loss, the better a model (unless the model has over-fitted to the training data). The loss is calculated on training and validation and its interperation is how well the model is doing for these two sets. Unlike accuracy, loss is not a percentage. It is a summation of the errors made for each example in training or validation sets.
..
The accuracy of a model is usually determined after the model parameters are learned and fixed and no learning is taking place. Then the test samples are fed to the model and the number of mistakes (zero-one loss) the model makes are recorded, after comparison to the true targets.
More information:
In the github pages, in README.md file, there is a link to tutorial, where all in the github example is explained in greater detail.
[1] https://github.com/tensorflow/tfjs-examples/tree/master/mnist
[2] https://github.com/tensorflow/tfjs-examples/tree/master/mnist-node
[3] https://en.wikipedia.org/wiki/MNIST_database
[4] How to interpret "loss" and "accuracy" for a machine learning model

Extracting raster values to point features with reduce regions. Error: "User memory limit exceeded"

I am very new to Earth Engine and Javascript so I wouldn't be surprised if the solution to my problem is extremely simple. Anyways, I've been trying to fix this in days and I'm not doing anyhow better.
I'm trying to get the cumulative cost distance between some areas of interest and some tide gauges in the coastal U.S. To do this, I first calculated the pixel cost based on a national elevation map. Then, I calculated the cumulative cost thanks to the built-in function cumulativeCost. That went all pretty well. Now I'm trying to extract the cumulative cost values at the position of the tide gauges. To do this, someone suggested me to use the reduceRegions method. I've tried the following code but unsuccessfully.
I'm going to post my whole code so that is replicable. Please note that the part with which I have a problem is the second one.
Thanks so much in advance.
// IMPORTS
var imageCollection =
ee.ImageCollection("users/brazzolanicoletta/sourceRasters"),
sourceVis = {"opacity":1,"bands":["b1"],"min":1,"max":1,"gamma":1},
DEMVis = {"opacity":1,"bands":
["elevation"],"min":-73.50744474674659,"max":374.555654458347,"gamma":1},
imageVisParam = {"opacity":1,"bands":
["elevation"],"min":-0.02534169006347656,"max":3.6601884765625,"palette":
["0345ff","000000"]},
imageVisParam2 = {"opacity":1,"bands":
["elevation"],"min":-0.02534169006347656,"max":3.6601884765625,"palette":
["0345ff","000000"]},
cosVisParam = {"opacity":1,"bands":
["cumulative_cost"],"max":4170.014060708561,"palette":
["ff0303","efff05","4eff05","002bff","ff01f7","000000"]},
imageVisParam3 = {"opacity":1,"bands":
["cumulative_cost"],"max":4028.1446098656247,"gamma":1};
//get IDs for images in image collection
var getID = function(image){ return image.set('ID', image.id());};
var okID = imageCollection.map(function(image) { return image.set('ID',
image.id());});
// Set general estethic parameters
var dem_vis = {bands:"elevation", min:0, max:0.05,
palette:"#0345ff,#000000"};
var cost_vis = {bands:"cumulative_cost", min:0, max:10000,
palette:"ff0303,efff05,4eff05,002bff,ff01f7,000000"}
// PART 1: Cumulative Cost based on source rasters
//import elevation map
var dem = ee.Image('USGS/NED');
// pixel cost calculation
var elThreshold = ee.Number(5); //set elevation threshold
var subDEM = dem.updateMask(dem.lt(elThreshold)); //mask pixel above
elevation threshold
var costDEM = (subDEM.add(30)).divide(1000); //calculate the cost of each
pixel (height + width pixel (30m)) in km
// Add DEM to the map
Map.addLayer(costDEM, dem_vis, "SRTM");
// Cumulative cost
var calcCumCost = function(img) {
return costDEM.cumulativeCost({
source:img,
maxDistance:1E5});
}; //write a function that perform the cumulative cost calculation for each
image given the cost of the pixel
var demCost = ee.ImageCollection(okID.map(calcCumCost)); // caulcuate
cumulative cost for each source raster in the image collection
// PART 2 - Reduce Region: extract cumulative cost for tide gauges
var tideGauges =
ee.FeatureCollection('ft:1e1ik7ZklKbRSRVS50Ml_prHBTZ0WbNgW73fw7Ald');
//import fusion table of tide gauges
// WORK IN PROGRESS
// Empty Collection to fill
var ft = ee.FeatureCollection(ee.List([]));
// function that extract values from cumulative cost rasters and reduce it
for points region
var fill = function(img) {
// gets the values for the points in the current img
var ft2 = img.reduceRegions(tideGauges, ee.Reducer.first(),30);
// set ID
var ID = ee.Feature(null, {'id':img.id()});
// writes the ID in each feature
var ft3 = ft2.map(function(f){return f.set("id", ID)});
// merges the FeatureCollections
return ft.merge(ft3);
};
// Apply the function to each image in the ImageCollection
var newft = ee.FeatureCollection(demCost.map(fill));
print(newft, 'Potentially: region-reduced cost');
The maxDistance on your cumulativeCost function translates into a neighborhood of 3000 pixels (at 30m), which means each tile needs to bring in a 44 million neighbor pixels, which is just too much memory. You're going to have to lower the maxDistance or increase the scale of the reduceRegion.

Highstock columnrange data grouping values not consistent

The highstock column range with dataGrouping enabled seems not to be computing dataAggregation correctly.
The aggregated values seem to change when changing the range.
March 2014 will show different values if scrolling more towards the right.
Code and jsfiddle:
dataGrouping: {
enabled: true,
approximation: function() {
const indices = _.range(this.dataGroupInfo.start, this.dataGroupInfo.start + this.dataGroupInfo.length);
const low = _.min(indices.map(i => this.options.data[i][1]));
const high = _.max(indices.map(i => this.options.data[i][2]));
return [low, high];
},
groupPixelWidth: 50
}
See jsfiddle
The columns are changed only when the navigator does not start from the beggining - and that happens because the way you defined approximation callback.
dataGroupInfo contains information according to the visible points (which fall into x axis range, cropped points) in the chart, not all the points - so to have proper indices for the initial data, you need to add this.cropStart - it is the index from which points are visible.
approximation: function() {
const start = this.cropStart + this.dataGroupInfo.start;
const stop = start + this.dataGroupInfo.length;
const indices = _.range(start, stop);
const low = _.min(indices.map(i => this.options.data[i][1]));
const high = _.max(indices.map(i => this.options.data[i][2]));
return [ low, high ];
},
example: https://jsfiddle.net/12o4e84v/7/
The same functionality can be implemented easier
approximation: function(low, high) {
return [ _.min(low), _.max(high) ];
}
example: https://jsfiddle.net/12o4e84v/8/
Or even simpler:
approximation: 'range',
However, by default approximation for columns is set to range, so you do not have to do it manually.
example: https://jsfiddle.net/12o4e84v/9/

Crossfilter - Double Dimensions (second value linked to daily max)

Quite an oddly specific question here but something I've been having a lot of trouble with over the past day or so. Broadly, I'm trying to calculate the maximum of an array using crossfilter and then use this value to find a maximum.
For example, I have a series of Timestamps with an associated X Value and a Y Value. I want to aggregate the Timestamps by day and find the maximum X Value and then report the Y Value associated with this Timestamp. In essence this is a double dimension as I understand it.
I'm able to do the first stage simply to find the maximum values. But am having a lot of difficulty getting through to the second value.
Working code for the first, (using Crossfilter and Reductio). Assuming that each row has the following four values.
[(Timestamp, Date, XValue, YValue),
(2015-05-15 16:00:00, 2015-05-15, 30, 15),
(2015-05-15 16:45:00, 2015-05-15, 25, 33)
... (many thousand of rows)]
First Dimension
ndx = crossfilter(data);
dailyDimension = ndx.dimension(function(d) { return d.date; });
Get the max of the X Value using reductio
maxXValue = reductio().max(function(d) { return d.XValue;});
XValues = maxXValue(dailyDimension.group())
XValues now contains all of the maximum X Values on a Daily Basis.
I would now like to use these X Values to identify the corresponding Y Values on a date basis.
Using the same data above the appropriate value returned would be:
[(date, YValue),
('2015-05-15', 15)]
// Note, that it is 15 as it is the max X Value we find, not the max Y Value.
In Python/Pandas I would set the index of a DataFrame to X and then do an index match to find the Y Values
(Note, it can safely be assumed that the X Values are unique in this case but in reality we should really identify the Timestamp linked to this period and then match on that as they are strictly guaranteed to be unique, not loosely).
I believe this can be accomplished by modifying the reductio maximum code which I don't fully understand properly Source Code is from here
var reductio_max = {
add: function (prior, path) {
return function (p, v) {
if(prior) prior(p, v);
path(p).max = path(p).valueList[path(p).valueList.length - 1];
return p;
};
},
remove: function (prior, path) {
return function (p, v) {
if(prior) prior(p, v);
// Check for undefined.
if(path(p).valueList.length === 0) {
path(p).max = undefined;
return p;
}
path(p).max = path(p).valueList[path(p).valueList.length - 1];
return p;
};
},
initial: function (prior, path) {
return function (p) {
p = prior(p);
path(p).max = undefined;
return p;
};
}
};
Perhaps this can be modified so that there is a second valueList of Y Values which maps 1:1 with the X Values associated in the max function. In that case it would be the same index look up of both in the functions and could be assigned simply.
My apologies that I don't have any more working code.
An alternative approach would be to use some form of Filtering Function to remove entries which don't satisfy the X Criteria and then group by day (there should only be one value in this setting so a simple reduceSum for example will still return the correct value).
// Pseudo non working code
dailyDimension.filter(function(p) {return p.XValue === XValues;})
dailyDimension.group().reduceSum(function(d) {return d.YValue;})
Eventual results will be plotted in dc.js
Not sure if this will work, but maybe give it a try:
maxXValue = reductio()
.valueList(function(d) {
return ("0000000000" + d.XValue).slice(-10) + ',' + d.YValue;
})
.aliasProp({
max: function(g) {
return +(g.valueList[g.valueList.length - 1].split(',')[0]);
},
yValue: function(g) {
return +(g.valueList[g.valueList.length - 1].split(',')[1]);
}
});
XValues = maxXValue(dailyDimension.group())
This is kind of a less efficient and less safe re-implementation of the maximum calculation using the aliasProp option, which let's you do pretty much whatever you want to to a group on every record addition and removal.
My untested assumption here is that the undocumented valueList function that is used internally in max/min/median will properly order. Might be easier/better to write a Crossfilter maximum aggregation and then modify it to also add the y-value to the group.
If you want to work through this with Reductio, I'm happy to do that with you here, but it will be easier if we have a working example on something like JSFiddle.

Categories