Big data amounts with Highcharts / Highstock (async loading)

Big data amounts with Highcharts / Highstock (async loading) - javascript

Since my data amount becomes bigger everyday (right now > 200k MySQL rows in one week), the chart is very slow at loading. I guess the async loading method is the right way to go here (http://www.highcharts.com/stock/demo/lazy-loading).
I tried to implement it, but it doesn't work. So far I can provide my data with Python via URL parameters e.g. http://www.url.de/data?start=1482848100&end=1483107000, but there are several things I don't understand in the example code:
If a period of All Data is chosen in the Navigator, then all data is
provided by my server and loaded by the chart. So its the same as I
what I do right now without lazy loading. Whats the difference then?
Why there is a second getJSON() method without any URL parameter in
the above mentioned example code? Its the following URL, which is empty. What do I
need it for? I don't understand it:
https://www.highcharts.com/samples/data/from-sql.php?callback=?
And which method to load the data is better?:
This one: chart.series[0].setData(data);
or the code below which I use so far:
var ohlc = [],
volume = [],
dataLength = data.length,
i = 0;
for (i; i < dataLength; i += 1) {
ohlc.push([
data[i]['0'], // date
data[i]['1_x'], // open
data[i]['2_x'], // high
data[i]['3'], // low
data[i]['4'] // close ]);

The idea behind the lazy loading demo is that you fetch only the amount of points which is necessary, so if you have the data which includes 1.7 mln points, you never load so many points to the chart.
Based on Highcharts demo. Instead of loading too many points, you request for already grouped points, you have 1.7 milion daily points, you set the navigator to 'all' (time range 1998 - 2011), you don't need daily data, so the response will include monthly points. Gains are: fetching smaller amount of data (12 * 14 = 168 instead of 1.7 mln), avoiding heavy processing data on the client side (processing, grouping, etc.) -> lower memory and cpu usage for the client, faster chart loading.
The request for the data is in JSONP format. More information about its advantages here. So actually, url has 3 params - mandatory callback=? and optional start=?&stop=? - which indicates the points time range and its density. The first request does not have start/stop params because the server has some default values already set. After the navigator is moved, more detailed points are requested and loaded to the chart. This is the downside of the lazy loading - after the navigator is moved, your request a new set of data -> frequent data request and interruption due to the network failure.
The answer for your last question depends on if you have your data in a proper format or you don't. If you do, you can avoid looping the data on the client side and load it to the chart directly. If the format is not correct, then you have to preprocess the data, so the chart will be able to visualize them correctly. Ideally, you want the data to be in the right format after you request them - so if you can, you should do it on the server side.

Related

change the domain of my line chart at runtime

In my code I am loading a JSON of more than 900 data. these data represent the data emitted by some machines. I'm drawing a line chart, the keys in this JSON represent the name of the machines.
This is the structure of my JSON:
{"AF3":3605.1496928113393,"AF4":-6000.4375230516,"F3":1700.3827875419374,"F4":4822.544985821321,"F7":4903.330735023786,"F8":824.4048714773611,"FC5":3259.4071092472655,"FC6":4248.067359141752,"O1":3714.5106599153364,"O2":697.2904723891061,"P7":522.7300768483767,"P8":4050.79490288753,"T7":2939.896657485737,"T8":9.551935316881588}
each line represents each machine and I put a space to see each machine separately. I am currently reading the data with the help of an counter called cont. all the data in the JSON is between 0 to 5000. But I have modified some objects of the JSON to achieve change the domain and then the new domain in general for all the lines must be equal to the change.
for example on line 106 of the JSON to "AF3":7000. (in this case the domain should be [0-7000] for all the lines)
in the line 300, "AF4": - 1000.(in this case the domain should be [-1000,7000] for all the lines)
I have modified some data on purpose to achieve this change. I would like all lines to be updated to this new domain, if possible with an animation.
How can I do it?
this is my code:
http://plnkr.co/edit/KVVyOYZ4CVjxeei7pd9H?p=preview

To update domain across all line charts, we need to recalculate the domain before new data gets pushed in.
Plunker: http://plnkr.co/edit/AHWVM3HT7TDAiINFRlN9?p=preview
var newDomain = d3.extent(ids.map(function(d) {
return aData[cont][d]
}));
var oldDomain = y.domain()
newDomain[0] = newDomain[0] < oldDomain[0] ? newDomain[0] : oldDomain[0]
newDomain[1] = newDomain[1] > oldDomain[1] ? newDomain[1] : oldDomain[1]
y.domain(newDomain)
domain.text(y.domain())
With respect to graph getting trimmed, data needs to be manipulated ( In your case, 14 arrays, push and shift operation to the array and D3 transition) all within a 1ms, which may not be enough. Unfortunately I don't have any resource to back this up. In case anyone can edit this answer to provide proof, please feel free.

Cesium large number of entity updates

I am working on a project dealing with sensor data. In my backend everything is stored in a database which is getting polled by a controller and converted into kml to display on the cesium globe. This poll happens every 5-10 seconds and contains around 4000-8000 objects (we store up to 1 minute worth of data so we are looking at somewhere like 20k - 50k points). Following this I have an update function which slowly fades the markers out which updates every 5 seconds.
To load the kml on the map I use the following function:
var dataSource = new Cesium.KmlDataSource();
dataSource.load('link').then(function(value);
viewer.dataSources.add(dataSource);
});
On the update color function I am iterating over all of the objects within the datasources entity collection and updating them like so (this is very inefficient):
var colorUpdate = Cesium.Color.fromAlpha(newColor, .4);
dataSource.entities.values[i].billboard.color = colorUpdate;
When I do and add or color update I see a large amount of lag and was curious if there was anything you would suggest to fix this? Generally I get a freeze up for a few seconds. After 60 seconds of the data being on the map it gets removed like so (just a different if case within the color update loop)
dataSource.entities.remove(dataSource.entities.values[i]);
Is there potentially a way to set a propertiy for an entire entity collection so when this collection becomes 30 seconds old it updates the color to a new one? It seems that I just need to find a way to set a property for the entire collection vs individual entities. Does anyone know how to do that or have a suggestion for something better?

Maximum size of an Google Apps Script Array

One of my scripts is a leave approval system.
It reads a spreadsheet of all leave requests ever submitted, loading all data into an array.
This array is then processed and displayed in a dynamic grid.
The way this is designed, all leave requests need to be in a single sheet. Even once requests are approved, employees can view their current and past requests through this script.
Over time this will grow into thousands of lines. Each line is ~140 bytes.
I can't find any reference to a maximum array size in Apps Script.
I suppose I may hit execution time limits before I exceed the size of the structure anyway!
Does anyone know if there is a limit, and what it is?
Tony
DataSource = SpreadsheetApp.openById("0AgHhFhurd2nCdFV4dmdRS3....");
DataSheet = DataSource.setActiveSheet(DataSource.getSheets()[0]);
var numRows = DataSheet.getLastRow()-1; // -1 to omit header row
var LeaveData = DataSheet.getRange(2, 1, numRows, 16).getValues();

d3 — Progressively draw a large dataset

I'm using d3.js to plot the contents of an 80,000 row .tsv onto a chart.
The problem I'm having is that since there is so much data, the page becomes unresponsive for aprox 5 seconds while the entire dataset is churned through at once.
Is there an easy way to process the data progressively if it's spread over a longer period of time?
Ideally the page would remain responsive, and the data would be plotted as it became available, instead of in one big hit at the end

I think you'll have to chunk your data and display it in groups using setInterval or setTimeout. This will give the UI some breathing room to jump in the middle.
The basic approach is:
1) chunk the data set
2) render each chunk separately
3) keep track of each rendered group
Here's an example:
var dataPool = chunkArray(data,100);
function updateVisualization() {
group = canvas.append("g").selectAll("circle")
.data(dataPool[poolPosition])
.enter()
.append("circle")
/* ... presentation stuff .... */
}
iterator = setInterval(updateVisualization,100);
You can see a demo fiddle of this -- done before I had coffee -- here:
http://jsfiddle.net/thudfactor/R42uQ/
Note that I'm making a new group, with its own data join, for each array chunk. If you keep adding to the same data join over time ( data(oldData.concat(nextChunk) ), the entire data set still gets processed and compared even if you're only using the enter() selection, so it doesn't take long for things to start crawling.

Charting thousands of points with dojo

I need to plot thousands of points, perhaps close to 50,000 with the dojo charting library. It works, but it's definitely very slow and lags the browser. Is there any way I can get better performance?
EDIT:
I solved by applying a render filter to the data. Essentially, I have a new item parameter called "render" which is set to false by my json source if the point is expected to overlap others. My DataSeries then queries for all points where render:true. This way all of the data is there still for non-visual sources that want all of the points, while my charts now run smoothly.
Psuedocode:
def is_overlapped(x, y, x_round, y_round)
rounded_x = round(x, x_round)
rounded_y = round(y, y_round)
hash = hash_xy(rounded_x, rounded_y)
if(#overlap_filter[hash].nil?)
#overlap_filter[hash] = true
return false
end
return true
end
x_round and y_round can be determined by the x and y ranges, say for example range / 100

I know this isn't probably exactly the answer you're looking for, but have you considered simply reducing the number of points you are plotting? I don't know the specific function of the graph(s), but I'd imagine most graphs with that many points are unnecessary; and no observer is going to be able to take that level of detail in.
Your solution could lie with graphing techniques rather than JavaScript. E.g. you could most likely vastly reduce the number of points and use a line graph instead of a scatter plot while still communicating similar levels of information to your intended target.

We Keep Coding

JavaScript is the programming language of the Web.