d3js v5 + Topojson v3 Optimization about joining csv & json - javascript

In order to make maps, I need to import some values from csv to json directly in the code.
For loading json and csv files, I use an asynchronous operation with Promise object and I use two loops and a common key to add new properties on json file.
for (var i=0; i< fr[1].length;i++){
var csvId = fr[1][i].codgeo;
var csvValue1 = parseFloat(fr[1][i].value1);
var csvValue0 = parseFloat(fr[1][i].value0);
for (var j=0; j<topojson.feature(fr[0],fr[0].objects.dep_GEN_WGS84_UTF8).features.length;j++){
var jsonId = topojson.feature(fr[0],fr[0].objects.dep_GEN_WGS84_UTF8).features[j].properties.codgeo;
if (csvId === jsonId) {
topojson.feature(fr[0],fr[0].objects.dep_GEN_WGS84_UTF8).features[j].properties.value1 = csvValue1;
topojson.feature(fr[0],fr[0].objects.dep_GEN_WGS84_UTF8).features[j].properties.value0 = csvValue0;
break;
Everything is working but show up the map on the web takes time.
Is there a way to optimize the loading time of the map ?
Here is a sample of my code : https://plnkr.co/edit/ccwIQzlefAbd53qnjCX9?p=preview

I took your plunkr and added some timing points to it, ran it a bunch of times and got some data on where your script takes its time:
Here's a block with the logging.
I am pretty sure my bandwidth where I live is below average and has a ton of variability; the file load time showed a lot of variability for me, down to 500 milliseconds and up to 1800 milliseconds, everything else was consistent
Let's take a closer look a the data manipulation stage, which you include in your question:
for (var i=0; i< fr[1].length;i++){
var csvId = fr[1][i].codgeo;
var csvValue1 = parseFloat(fr[1][i].value1);
var csvValue0 = parseFloat(fr[1][i].value0);
for (var j=0; j<topojson.feature(fr[0],fr[0].objects.dep_GEN_WGS84_UTF8).features.length;j++){
var jsonId = topojson.feature(fr[0],fr[0].objects.dep_GEN_WGS84_UTF8).features[j].properties.codgeo;
if (csvId === jsonId) {
topojson.feature(fr[0],fr[0].objects.dep_GEN_WGS84_UTF8).features[j].properties.value1 = csvValue1;
topojson.feature(fr[0],fr[0].objects.dep_GEN_WGS84_UTF8).features[j].properties.value0 = csvValue0;
break;
The nested for statement runs approximately 5,151 times by my count. The parent for statement runs 101. These shouldn't change as your data is fixed. Why do these cycles take so long? Because you are calling topojson.feature() every for iteration:
If I isolate this line:
topojson.feature(fr[0],fr[0].objects.dep_GEN_WGS84_UTF8)
We can see that this actually takes a few milliseconds alone.
Topojson.feature
Returns the GeoJSON Feature or FeatureCollection for the specified
object in the given topology. If the specified object is a
GeometryCollection, a FeatureCollection is returned, and each geometry
in the collection is mapped to a Feature. Otherwise, a Feature is
returned. The returned feature is a shallow copy of the source object:
they may share identifiers, bounding boxes, properties and
coordinates. (from the docs).
So, everytime we use topojson.feature we are essentially converting the topojson to geojson. We don't need to do this in the for loop. Let's do that once:
var featureCollection = topojson.feature(fr[0],fr[0].objects.dep_GEN_WGS84_UTF8);
//Merge csv & json
//Add properties from csv to json)
for (var i=0; i< fr[1].length;i++){
var csvId = fr[1][i].codgeo;
var csvValue1 = parseFloat(fr[1][i].value1);
var csvValue0 = parseFloat(fr[1][i].value0);
for (var j=0; j<featureCollection.features.length;j++){
var jsonId = featureCollection.features[j].properties.codgeo;
if (csvId === jsonId) {
featureCollection.features[j].properties.value1 = csvValue1;
featureCollection.features[j].properties.value0 = csvValue0;
break;
}
}
}
Of course, we have to update the portion of code that renders to use the featureCollection variable too, rather than the topojson
Let's take a look at timing now:
Here's an updated bl.ock based on the one above, also with timing points.
No, I didn't forget to include a time for manipulation, it just averaged 1.5 milliseconds for me. Yes, the variability in my bandwidth shows - but the time spent on other manipulation should be clearly less regardless of external factors
Further Enhancements
Preprojection of geometry, see this question/answer.
Simplification of geometry, see mapshaper.org (though I believe you have already done this).
Removal of non-necessary attributes from csv or topojson - are you really using the population field in the topojson, do you need both libgeo and libgeo_m in the topojson (eg: "libgeo":"Puy-de-Dôme","libgeo_m":"PUY-DE-DÔME")?

Related

Get visible points for a series in LightningChartJs

Exists a function in LightningChartJs to get all visible points from a line or point series in a chart?
If I zoom the chart I want to show something if no visible points available. In some cases I have breaks in my data.
For now I have to check the range and filter all points within this range, but that seems not to be very performant. I guess LC is aware of all the visible points and can give me that.
I would very much welcome any thoughts on the subject or other solutions. Thanks.
LightningChart JS doesn't track the data points that are visible at any time. So the method that you have used to solve the issue is the best way currently.
Something like this seems to be reasonably performant.
function getDataInRange(data, rangeStart, rangeEnd){
const inRangeData = []
const dataLength = data.length
let curPoint
for(let i = 0; i < dataLength; i += 1){
curPoint = data[i]
if(curPoint.x >= rangeStart && curPoint.x <= rangeEnd){
inRangeData.push(curPoint)
}
}
return inRangeData
}
On my personal machine it can process 1 million points in ~10ms ± 2ms. If you only want to know that a point is visible in the range then you could just break the loop as soon as a single point is in the visible range.
Late to the game but for anybody googling:
If you already have a chart defined and it happens to be named 'chart' (otherwise change chart to your chart's object name), you can track the visible start and end data points like this:
axisX = chart.getDefaultAxisX()
window.axisXScaleChangeToken = axisX.onScaleChange((s, e) => {
window.axisXVisibleDataRangeStart = s
window.axisXVisibleDataRangeEnd = e
})
let visiblePoints = [];
for(let i of cur.data){
if(i[0] > window.axisXVisibleDataRangeStart && i[0] < window.axisXVisibleDataRangeEnd) visiblePoints.push(i)
}
Every time the X axis is scaled/zoomed/moved, axisXVisibleDataRangeStart and axisXVisibleDataRangeEnd will change. You're then iterating over where your data points are stored (cur.data in my case and the example) and comparing: If timestamp is within range, push to visiblePoints.
(I am using OHLC where data[0] is the timestamp. Your comparison might be to an object array where {x:} is the value youre looking to compare. You get the idea.)
To remove the listener and stop the logging:
axisX.offScaleChange(window.axisXScaleChangeToken)

Algorithm run from within Node HTTP request takes much longer to run

I have a node app which plots data on an x,y dot plot graph. Currently, I make a GET request from the front end and my back end node server accepts the requests, loops through an array of data points, draws a canvas using Node Canvas and streams it back to the front end where it's displayed as a PNG image.
Complicating things is that there are can be polygons so my algorithm calculates if a point is inside a polygon, using the point in polygon package, and colors that data point differently if it is.
This works fine when there are less than 50,000 data points. However, when there are 800,000 the request takes approximately 23 seconds. I have profiled the code and most of that time is spent looping through all the data points and figuring out where to plot it on the canvas and what color (depending on if it's in one or more polygons). Here's a plunker i made. Basically i do something like this:
for (var i = 0; i < data.length; i++) {
// get raw points
x = data[i][0];
y = data[i][1];
// convert to a point on canvas
pointX = getPointOnCanvas(x);
pointY = getPointOnCanvas(y, 'y');
color = getColorOfCell(pointX, pointY);
color = color;
plotColor.push({
color: color,
pointX: pointX,
pointY : pointY
});
}
// draw the dots down here
The algorithm itself is not the problem. The issue I have is that when the algorithm is run within a HTTP request, it takes a long time to calculate what color a point is - about 16 seconds. But if do it in chrome on the front end, it takes just over a second (see the plunker). When I run the algorithm on the command line with Node, it takes less than a second. So the fact that my app runs the algorithm within a HTTP request is slowing it down massively. So couple of questions:
Why would this be? Why does running an algorithm from within a HTTP request take so much longer?
What can I do to fix this, if anything? Would it somehow be possible to make a request to start the task, and then notify frontend when finished and retrieve the PNG?
EDIT
I fully tested running the algorithm and creating a PNG through the command line. It's much quicker, less than half a second to work out what color each of the 800k data points should be. Im thinking of using socket to make a request to the server and start the task, then have it return the image. I'm baffled though why the code should take so long when run within a HTTP request...
EDIT
The problem is Mongo and Mongoose. I store the coordinates of each polygon in Mongo. I fetch these coordinates once but when I compare them to each x, y point/. Somehow, this is what's massively delaying the algoritm. If I close the Mongo document, the algorithm goes from 16 seconds to 1.5 seconds......
Edit
#DevDig pointed out the main problem in the comments section - when using a Mongoose object there are lots of getters and setters slowing it down. Using lean() in the query reduces algorithm from 16 seconds to 1.5 seconds
Just finished running a version of your code as a nodeJS service. The code is taken from your plunker. Execution time was 171mSec for 100,000 rows in data (replicated first 10K rows 10 times. Here's what I did:
First, your data.json and gates.json files aren't really JSON files, they are javascript files. I removed the var data/gates = statements from the front and removed the ending semicolon. The issue you're encountering may have to do with how you're reading in your data sets in your app. Since you don't modify gates or data, I read them in as part of the set-up on the server, which is exactly how you are processing in the browser. If you need to read the files in each time you access the server, then that, of course, will change the timing. That change took the execution time from 171mSec to 515mSec - still nothing near what you're seeing. This is being executed on a macBook Pro. If needed, I can update timings from a network accessed cloud server.
getting the files:
var fs = require("fs");
var path = require("path");
var data = [];
var allGatesChain;
var events = [];
var x, y, pointX, pointY;
var filename = __dirname + "/data.txt";
data = JSON.parse(fs.readFileSync(filename, "utf-8"));
filename = __dirname + "/gates.json";
var gates = JSON.parse(fs.readFileSync(filename, "utf-8"));
I moved your routines to create allGatesChain and events into the exported function:
allGatesChain = getAllGatesChain();
generateData();
console.log("events is "+events.length+" elements long. events[0] is: "+events[0]);
console.log("data is "+data.length+" elements long. data[0] is "+data[0]);
and then ran your code:
var start, end;
var plotColor = [];
start = new Date().getTime();
for (var i = 0; i < data.length; i++) {
// get raw points
x = data[i][0];
y = data[i][1];
// convert to a point on canvas
pointX = getPointOnCanvas(x);
pointY = getPointOnCanvas(y, 'y');
color = getColorOfCell({
gateChain: allGatesChain,
events: events,
i: i
});
color = color;
plotColor.push({
color: color,
pointX: pointX,
pointY : pointY
});
}
end = new Date().getTime();
var _str = "loop execution took: "+(end-start)+" milliseconds.";
console.log(_str);
res.send(_str);
result was 171mSec.

Is it possible to initialize a bidimensional array in javascript like in java?

I want to represent a board game in javascript and for that i need a bidimensional array.
I have different board sizes, so i need to initialize the array with the board size in the beginning. I am a java programmer so i know that in java when you want a bidimensional array you just do:
int board_size = 6;
String board = new String[board_size][board_size];
How can i achieve this with javascript? I have searched and found some ways of doing this, but all much less intuitive than this one.
It is not required like in Java or C#. The Javascript arrays grow dynamically, and they are optimized to work that way, so you don't have to set the size of your matrix dimensions upfront.
However, if you insist, you could do something like the following to pre-set the dimensions:
var board = [];
var board_size = 6;
for (var i = 0; i < board_size; i++) {
board[i] = new Array(board_size);
}
So to summarize you just have three options:
Initialization with a literal (like in #Geohut answer)
Initialization with a loop (like in my example)
Do not initialize upfront, but on-demand, closer to the code that access the dimensions.
With JavaScript it is not a static language like Java. It is dynamic. That means you can create the array without a need to preset the size of the array, but if you want you can procreate an array of the size you want.
var items = [[1,2],[3,4],[5,6]];
alert(items[0][0]); // 1
If you need to add to it just do
items.push([7,8]);
That will add the next element.
Code taken from old stack overflow post: How can I create a two dimensional array in JavaScript?
Edited to properly make I in items same as variable declaration.

Presimplify topojson from command line

As far as I understand topojson.presimplify(JSON) in D3 adds Z coordinate to each point in the input topojson shape based on its significance, which then allows to use it for the dynamic simplification like in http://bl.ocks.org/mbostock/6245977
This method topojson.presimplify() takes quite a long time to execute on complicated maps, especially in Firefox which makes the browser unresponsive for few seconds.
Can it be baked directly into the topojson file via the command line as it is done with projections:
topojson --projection 'd3.geo.mercator().translate([0,0]).scale(1)' -o cartesian.topo.json spherical.topo.json
I found a workaround for this which is not completely as simple as I wanted but still achieves the same result.
After the topojson.presimplify(data) is called, data already holds the pre simplified geometry with added Z axis values.
Then I convert it to the JSON string and manually copy it to a new file with JSON.stringify(data)
Nevertheless these conversion to a JSON string has a problem with Infinity values which often occur for Z and with JSON.stringify method are converted to null. Also when there is a value for Z coordinate it is usually too precise and writing all decimal points takes too much space.
For that reason before converting data to a JSON string I trim the numbers:
// Simplifying the map
topojson.presimplify(data);
// Changing Infinity values to 0, limiting decimal points
var arcs = data.arcs;
for(var i1 = arcs.length; i1--;) {
var arc = arcs[i1];
for(var i2 = arc.length; i2--;) {
var v = arc[i2][2];
if(v === Infinity) arc[i2][2] = 0;
else {
arc[i2][2] = M.round(v * 1e9)/1e9;
}
}
}
This makes Infinity values to appear as exactly 0 and other values are trimmed to 9 decimal points which is enough for dynamic simplification to work properly.
Since such string is too long to easily print it for copying to the new json file it is much easier to store it in the localStorage of the browser:
localStorage.setItem(<object name>, JSON.stringify(data))
Then in Safari or Chrome open the developer console and in the tab Resources -> Local Storage -> <Website URL> the stored object can be found, copied and then pasted into a text editor.
Usually it is pasted as a <key> <value> pair, so one needs to remove from the beginning of the pasted string so that it starts from {.
Since Infinity values have been converted to 0, in the dynamic simplification function it should be taken into account so that points with Z = 0 are treated as Z = Infinity and are always plotted with any simplification area:
point: function(x, y, z) {
if (z===0 || z >= simplificationArea) {
this.stream.point(x, y);
}
}

understanding javascript variable handling and referening especially canvas ImageArray and array assigning

consider this code:
var deSaturated = deSaturate(greyscaleCtx.getImageData(0, 0, canvasWidth, canvasHeight));
imageData comes from getImageData canvas function.
function deSaturate (imageData) {
var theData = imageData.data;
var dataLength = theData.length;
var i = dataLength-1;
var lightLevel;
// Iterate through each pixel, desaturating it
while ( i >= 0) {
// To find the desaturated value, average the brightness of the red, green, and blue values
theData[i] = theData[i+1] = theData[i+2] = (theData[i] + theData[i + 1] + theData[i + 2]) / 3;
// Fully opaque
theData[i+3] = 255;
// returning an average intensity of all pixels. Used for calibrating sensitivity based on room light level.
lightLevel += theData[i]; //combining the light level in the samefunction
i -= 4;
}
imageData.data = theData; //bring back theData into imageData.data - do I really need this?
var r = [lightLevel/dataLength,imageData]
return r;
}
during the writing and optimizing of this code I found out I don't really understand how js is treating for example "theData" variable. is working with it just a short way to reference imageData.data in which case I don't need the following code in the end:
imageData.data = theData
but then do I pay in degraded performance ( a lot of DOM I/O)?
or is doing theData = imageData.data actually copying the original array (represented as Uint8ClampedArray) and then I have to reassign the modified data to imageData.data.
I guess this is basic javascript, but I found contradictory code examples in MDN and other developer resources and I would really like to understand this properly.
thanks for the help!
Just ran a quick test:
var idata = ctx.getImageData(0,0,300,300);
var data = idata.data;
for(var i=0;i<data.length;i++){
data[i]=0;
}
ctx.putImageData(idata,0,0);
And that properly blanks out part of the screen as expected. However without putImageData nothing will happen. So changing the data object, whether stored in a different variable or not, will be reflected in that imageData object. However this will not affect the canvas until putImageData has been called.
So, yes, you can remove that final assignment and it will work as desired.
However I will warn that it is not a valid assumption that it is a Uint8ClampedArray. Yes, that is how Chrome handles it (last I checked), and it is indeed what the official specification uses. However some browsers have no notion of Uint8ClampedArray, while still supporting canvas through the now deprecated CanvasPixelArray.
So all you are guaranteed to get is something with the some array-like interface. I had to learn this the hard way when I tried to cache interesting features of image data by creating a new Uint8ClampedArray, which failed in some browsers.
See: https://developer.mozilla.org/en-US/docs/DOM/CanvasPixelArray
In javascript, assigning either an array or an object just assigns a reference to that array or object - it does not make a copy of the data. A copy is only made if you physically create a new array and copy the data over or call some function that is designed to do that for you.
So, if imageData.data is an array, then assigning it to theData just makes a shortcut for referring to the same data. It does not make a new copy of the data. Thus, after modifying the data pointed to by theData, you don't have to assign it back to imageData.data because there is only one copy of the data and both theData and imageData.data point already point to that same copy of the data.
So, in direct answer to your question, this line is unnecessary:
imageData.data = theData;

Categories