Why does d3.max() output second largest number? - javascript

I am creating a bar chart with d3.js. The datasets I used output wrong max value so I tested with the following sets again.
name,value
us,1000
china,800
uk,850
spain,700
italy,400
france,400
belgium,300
But when I run my script below, the output is 850, not 1000. What's happening?
csv(filepath).then(data => {
let top = max(data, d => d.value);
console.log(top)
render(data) // refer to formerly created function render()
});

you have to be sure that d.value is a number and not a string, if not it will give you the alphabetic max between string '850' and '1000'
one way can be to parseInt your data to compared it as number value and get the correct max
csv(filepath).then(data => {
let top = max(data, d => parseInt(d.value, 10));
console.log(top)
render(data) // refer to formerly created function render()
});

Related

tensorflow results are weird. How to solve it?

It has two inputs and one output.
Input: [Temperature, Humidity]
Output: [wattage]
I learned as follows
Even after 5 million rotations, it does not work properly.
Did I choose the wrong option?
var input_data = [
[-2.4,2.7,9,14.2,17.1,22.8,281,25.9,22.6,15.6,8.2,0.6],
[58,56,63,54,68,73,71,74,71,70,68,62]
];
var power_data = [239,224,189,189,179,192,243,317,224,190,189,202];
var reason_data = tf.tensor2d(input_data);
var result_data = tf.tensor(power_data);
var X = tf.input({ shape: [2] });
var Y = tf.layers.dense({ units: 1 }).apply(X);
var model = tf.model({ inputs: X, outputs: Y });
var compileParam = { optimizer: tf.train.adam(), loss: tf.losses.meanSquaredError }
model.compile(compileParam);
var fitParam = {
epochs: 500000,
callbacks: {
onEpochEnd: function (epoch, logs) {
console.log('epoch', epoch, logs, "RMSE --> ", Math.sqrt(logs.loss));
}
}
}
model.fit(reason_data, result_data, fitParam).then(function (result) {
var final_result = model.predict(reason_data);
final_result.print();
model.save('file:///path/');
});
The following is the result for 5 million times.
It should be the same as power_data , but it failed.
What should I fix?
While there is rarely one simple reason to point to when a model doesn't perform the way you would expect, here are some options to consider:
You don't have enough data points. Twelve is not nearly sufficient to get an accurate result.
You need to normalize the data of the input tensors. Given that your two features [temperature and humidity] have different ranges, they need to be normalized to give them equal opportunity to influence the output. The following is a normalization function you could start with:
function normalize(tensor, min, max) {
const result = tf.tidy(function() {
// Find the minimum value contained in the Tensor.
const MIN_VALUES = min || tf.min(tensor, 0);
// Find the maximum value contained in the Tensor.
const MAX_VALUES = max || tf.max(tensor, 0);
// Now calculate subtract the MIN_VALUE from every value in the Tensor
// And store the results in a new Tensor.
const TENSOR_SUBTRACT_MIN_VALUE = tf.sub(tensor, MIN_VALUES);
// Calculate the range size of possible values.
const RANGE_SIZE = tf.sub(MAX_VALUES, MIN_VALUES);
// Calculate the adjusted values divided by the range size as a new Tensor.
const NORMALIZED_VALUES = tf.div(TENSOR_SUBTRACT_MIN_VALUE, RANGE_SIZE);
// Return the important tensors.
return {NORMALIZED_VALUES, MIN_VALUES, MAX_VALUES};
});
return result;
}
You should try a different optimizer. Adam might be the best choice, but for a linear regression problem such as this, you should also consider Stochastic Gradient Descent (SGD).
Check out this sample code for an example that uses normalization and sgd. I ran your data points through the code (after making the changes to the tensors so they fit your data), and I was able to reduce the loss to less than 40. There is room for improvement, but that's where adding more data points comes in.

Picking quartile value on each point

I'm plotting sentiment value of tweet over last 10 years.
The csv file has the three columns like below.
I plotted each value by date successfully.
However, when I tried to generate an area graph,
I encountered a problem which is,
each date has multiple values.
That's because each data point is from one single tweets that
one x point ended up with having multiple y values.
So I tried to pick quartile value of each date or pick largest or least y value.
For clarity, please see the example below.
January 8 has multiple y values (textblob)
I want to draw area graph by picking the largest value or 2nd quartile value of each point.
How do I only pick the point?
I would like to feed the point in the following code as a
x/y coordinate for line or area greaph.
function* vlinedrawing(data){
for(let i;i<data.length;i++){
if( i%500==0) yield svg.node();
let px = margin+xscale(data[i].date)
let py = height-margin-yscale(data[i].vader)
paths.append('path')
.attr('x',px)
.attr('y',py)
}
yield svg.node()
}
The entire code is in the following link.
https://jsfiddle.net/soonk/uh5djax4/2/
Thank you in advance.
( The reason why it is a generator is that I'm going to visualize the graph in animated way)
For getting the 2nd quartile you can use d3.quantile like this:
d3.quantile(dataArray, 0.5);
Of course, since the 2nd quartile is the median, you can also just use:
d3.median(dataArray);
But d3.quantile is a bit more versatile, you can just change the p value for any quartile you want.
Using your data, without parsing the dates (so we can use a Set for unique values`), here is a possible solution:
const aggregatedData = [...new Set(data.map(function(d) {
return d.date
}))].map(function(d) {
return {
date: parser(d),
textblob: d3.quantile(data.filter(function(e) {
return e.date === d
}).map(function(e) {
return e.textblob
}), 0.5)
}
});
This is just a quick answer for showing you the way: that's not a optimised code, because there are several loops within loops. You can try to optimise it.
Here is the demo:
var parser = d3.timeParse("%m/%d/%y");
d3.csv('https://raw.githubusercontent.com/jotnajoa/Javascript/master/tweetdata.csv', row).then(function(data) {
const aggregatedData = [...new Set(data.map(function(d) {
return d.date
}))].map(function(d) {
return {
date: parser(d),
textblob: d3.quantile(data.filter(function(e) {
return e.date === d
}).map(function(e) {
return e.textblob
}), 0.5)
}
});
console.log(aggregatedData)
});
function row(d) {
d.vader = +d.vader;
d.textblob = +d.textblob;
return d;
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/5.7.0/d3.min.js"></script>

Crossfilter - Double Dimensions (second value linked to daily max)

Quite an oddly specific question here but something I've been having a lot of trouble with over the past day or so. Broadly, I'm trying to calculate the maximum of an array using crossfilter and then use this value to find a maximum.
For example, I have a series of Timestamps with an associated X Value and a Y Value. I want to aggregate the Timestamps by day and find the maximum X Value and then report the Y Value associated with this Timestamp. In essence this is a double dimension as I understand it.
I'm able to do the first stage simply to find the maximum values. But am having a lot of difficulty getting through to the second value.
Working code for the first, (using Crossfilter and Reductio). Assuming that each row has the following four values.
[(Timestamp, Date, XValue, YValue),
(2015-05-15 16:00:00, 2015-05-15, 30, 15),
(2015-05-15 16:45:00, 2015-05-15, 25, 33)
... (many thousand of rows)]
First Dimension
ndx = crossfilter(data);
dailyDimension = ndx.dimension(function(d) { return d.date; });
Get the max of the X Value using reductio
maxXValue = reductio().max(function(d) { return d.XValue;});
XValues = maxXValue(dailyDimension.group())
XValues now contains all of the maximum X Values on a Daily Basis.
I would now like to use these X Values to identify the corresponding Y Values on a date basis.
Using the same data above the appropriate value returned would be:
[(date, YValue),
('2015-05-15', 15)]
// Note, that it is 15 as it is the max X Value we find, not the max Y Value.
In Python/Pandas I would set the index of a DataFrame to X and then do an index match to find the Y Values
(Note, it can safely be assumed that the X Values are unique in this case but in reality we should really identify the Timestamp linked to this period and then match on that as they are strictly guaranteed to be unique, not loosely).
I believe this can be accomplished by modifying the reductio maximum code which I don't fully understand properly Source Code is from here
var reductio_max = {
add: function (prior, path) {
return function (p, v) {
if(prior) prior(p, v);
path(p).max = path(p).valueList[path(p).valueList.length - 1];
return p;
};
},
remove: function (prior, path) {
return function (p, v) {
if(prior) prior(p, v);
// Check for undefined.
if(path(p).valueList.length === 0) {
path(p).max = undefined;
return p;
}
path(p).max = path(p).valueList[path(p).valueList.length - 1];
return p;
};
},
initial: function (prior, path) {
return function (p) {
p = prior(p);
path(p).max = undefined;
return p;
};
}
};
Perhaps this can be modified so that there is a second valueList of Y Values which maps 1:1 with the X Values associated in the max function. In that case it would be the same index look up of both in the functions and could be assigned simply.
My apologies that I don't have any more working code.
An alternative approach would be to use some form of Filtering Function to remove entries which don't satisfy the X Criteria and then group by day (there should only be one value in this setting so a simple reduceSum for example will still return the correct value).
// Pseudo non working code
dailyDimension.filter(function(p) {return p.XValue === XValues;})
dailyDimension.group().reduceSum(function(d) {return d.YValue;})
Eventual results will be plotted in dc.js
Not sure if this will work, but maybe give it a try:
maxXValue = reductio()
.valueList(function(d) {
return ("0000000000" + d.XValue).slice(-10) + ',' + d.YValue;
})
.aliasProp({
max: function(g) {
return +(g.valueList[g.valueList.length - 1].split(',')[0]);
},
yValue: function(g) {
return +(g.valueList[g.valueList.length - 1].split(',')[1]);
}
});
XValues = maxXValue(dailyDimension.group())
This is kind of a less efficient and less safe re-implementation of the maximum calculation using the aliasProp option, which let's you do pretty much whatever you want to to a group on every record addition and removal.
My untested assumption here is that the undocumented valueList function that is used internally in max/min/median will properly order. Might be easier/better to write a Crossfilter maximum aggregation and then modify it to also add the y-value to the group.
If you want to work through this with Reductio, I'm happy to do that with you here, but it will be easier if we have a working example on something like JSFiddle.

Initial Range selection in DC.js chart

I would like to make an initial range selection in some dc.js charts (bar and line).
So I add this for example:
.filter([7,10])
And the range appears well on the chart, but apparently 0 observations are selected.
I expected a few thousands observations selected. Like it does when I select the range [7,10] manually with the brush.
Any hint on what I'm missing here?
Part of my code:
var chart_globalscore = dc.barChart('#chart_globalscore');
(...)
var ndx = crossfilter(data_movies)
,all = ndx.groupAll()
(...)
,GlobalScoreDimension = ndx.dimension(function(d) { if ( !isNaN(d.GlobalScore) ) {return Math.round(d.GlobalScore*10)/10 ;} else {return -1;} })
,GlobalScoreGroup = GlobalScoreDimension.group()
(...)
;
(...)
chart_globalscore
.width(width001)
.height(height001)
.margins(margins)
.dimension(GlobalScoreDimension)
.group(GlobalScoreGroup)
.round(function(val){return Math.round(val*10)/10;})
.x(d3.scale.linear().domain([0, 10.1]))
.filter([7,10])
.centerBar(false)
.transitionDuration(transitionDuration)
.elasticY(true)
.gap(1)
.xUnits(function(){return 100;})
.renderHorizontalGridLines(true)
.yAxis().ticks(2)
;
The filter code is a bit tricky in dc.js. If you specify an array of values, it will not interpret the array as a range. (It will either interpret the array as a single value, or if the array contains another array, it will filter on the values inside that array.)
Try specifying a ranged filter object instead:
.filter(dc.filters.RangedFilter(7, 10))

updating a line graph in d3 is not working

i am trying to update a line graph and it is not throwing any error but it is also not updating the graph.
i am deleting a point and adding a new one with an incremented rate and incremented created_at date by a second(trying to follow http://bl.ocks.org/benjchristensen/1148374)
function redrawWithoutAnimation() {
for (var i in chart_data) {
linedata = chart_data[i];
//delete first element of array
linedata.points.reverse().shift();
//create a new point
rate = linedata.points[0].rate + 1;
created_at = linedata.points[0].created_at + 6000;
new_point = {};
new_point.rate = rate;
new_point.created_at = created_at;
linedata.points.push(new_point);
console.log(linedata);
}
// static update without animation
svg.selectAll("path")
.data([linedata.points]); // set the new data
line(linedata.points); // apply the new data values
}
redrawWithoutAnimation();
setInterval(function () {
redrawWithoutAnimation();
}, 8000);
here is my code
http://jsfiddle.net/yr2Nw/8/
Working fiddle: http://jsfiddle.net/reblace/GsaGb/1
There's a few issues here...
First, you were updating all the chart_data in the for loop, but outside the loop, you were only trying to update the line still stored in the linedata variable after loop execution. You should try to avoid having variables with greater scope than they need. It can lead to bugs like this one:
svg.selectAll("path").data([linedata.points]);
line(linedata.points);
You should instead use D3's data joining to rejoin the new data to all the paths at once declaratively like so:
linesGroup.selectAll("path")
.data(chart_data)
.attr("d", function(d){ return line(d.points); });
What that code's doing is it's selecting the paths and then joining each of them to the chart_data elements and then binding the appropriate line generator to the "d" attribute for the appropriate path.
Then, you need to update your x axis and y axis otherwise the plot will just shoot off the drawn area. This code is updating the domains and then rebinding the axes to the dom elements so they redraw:
xAxis.scale().domain([
d3.min(chart_data, function (c) { return d3.min(c.points, function (v) { return v.created_at; }); }),
d3.max(chart_data, function (c) { return d3.max(c.points, function (v) { return v.created_at; }); })
]);
yAxis.scale().domain([
0,
d3.max(chart_data, function (c) { return d3.max(c.points, function (v) { return v.rate; }); })
]);
svg.select(".x.axis").call(xAxis);
svg.select(".y.axis").call(yAxis);
There were a few other bugs I fixed them in the Fiddle. For example, you need to calculate the time for the new point based on the last element in the array, not the first, otherwise the line can't interpolate properly since its no longer a continuous function... and this is a bit more concise way to do your line updates:
for (var i=0; i<chart_data.length; i++) {
linedata = chart_data[i];
//delete first element of array
var removedPoint = linedata.points.shift();
//create a new point
var lastpoint = linedata.points[linedata.points.length-1];
var new_point = {
rate: removedPoint.rate,
created_at: lastpoint.created_at + 6000
};
linedata.points.push(new_point);
}
Also note that you shouldn't use the for(var in) loop for Arrays, that's for iterating over the properties in an object.
There's still some issues, but I think this should help get you over the hurdle you were stuck on. Anyways, it looks cool in action!
Fine fenac.. You facing so many problems since your data is not in good format for your requirements..
as per http://bl.ocks.org/benjchristensen/1148374 The x-axis data must be (data[] (data array))
Your data is something like this
[objects,object,object] where each object holds one element of xaxis value.. so the pushing and shifting is not possible..
try to change the format of the data (linedata.points) to an array (data[]) and try it out sure it works..
You just need to put all the values in linedata.points into an array data[] and use this data[] to animate your line..
Since yours the multiline.. you need to create 2D array and must pass them accordingly...
Cheers..
I updated your jsfiddle
setInterval(function () {
console.log(linedata.points);
var v = linedata.points.shift(); // remove the first element of the array
linedata.points.push(v); // add a new element to the array (we're just taking the number we just shifted off the front and appending to the end)
redrawWithoutAnimation();
}, 3000);
http://jsfiddle.net/yr2Nw/9/
But still it wont works till you do that work...
Personal Suggestion: First Try with single line graph then go with looping for multiline...

Categories