How many objects is too many to process in javascript?
I've got a mapping application (google maps api) that has a 100,000 lat/long markers (with names). It's really simple data, but I it bogs down the page loading it.
Are there ways to deal with this? I'm not sure if the problem is that I'm loading too many objects, or if I just need to store/access the data in a less intensive way.
I verify every day that 100,000 polygons with a significant number of coordinates (30 to 500 lat / lng each) involve a general slowing down of the duration of 3 to 5 seconds with a machine of discrete performance. You can make it more reactive the application suddvidento the population with a series of calls ajax portions sorted data .. This is often not easy from the point of view of application, but if it is possible allows a net improvement of performance even thanks to the asynchronous management of population data rendering them on the map.
Best way to know what's going wrong is to profile the CPU and the memory. It could just be too much data adding up, given it's 100,000 objects. Even with only a few properties on each, it adds up.
It's also possible that it just has trouble rendering that many points on the map. Depending on the business logic of your application, you can add something like a search or default filters to reduce the number needing to show.
Related
I am visualizing some government data using Google Maps JS API. Currently every time the user changes a filter value it grabs the entirety of the JSON data again, filters it, and creates a marker for each row that passes the filter validation. This is slow because it's re-downloading the JSON every time you change the form the filters are located in.
There are two ways to approach caching and displaying the data dynamically: storing the received JSON once and destroying/recreating markers based on the filter, or by creating all the markers at once and only displaying those that match filters.
Is there a performance difference between these two options? Both make sense to me, I'm just not sure how to tell whether one is better than the other. How can I assess how 'heavy' google maps markers are for the user?
The suggested 2 approaches are definitely going to be faster than the original strategy where the JSON data is re-fetched on each filter change.
I guess there are advantages and disadvantages to each method.
If you are not going to retrieve the JSON data on each filter change then essentially the data could be out of data but if the use case is that the JSON data rarely gets updated then this consideration can be dropped.
Having the JSON data cached and creating all of the markers upfront would cause the map to take a bit longer than usual to load at start, as you will need to create all markers first whereas the other way round is that you only create a subset of the markers - hence quicker.
I guess it all comes down to how many markers are there? What is the typical using pattern of the map.
If there is a million markers and the typical filter would cause 100,000 markers to be regenerated on change, then you better off generating the markers upfront and just tweak their visibility accordingly.
Similarly if you have a million markers and the typical filter would only cause 1 or 2 markers to appear out of the million, then probably destroying and recreate would be faster.
Anyway, as a user I would rather have the map take a bit longer to load at the start, probably sacrificing 1-2 seconds. Then have the markers changes instantaneously when I'm playing with the filters. Hope this helps.
I'm building a visualization for some industrial devices that produce large amounts of time-based data like temperatures, currents or voltages. All data is constantly written to a SQL Server database (can't control that part).
The HTML5 frontend consists of an interactive zoomable chart I made with d3.js. Data series can be added(loaded) to the chart on demand, in which case the frontend sends an ajax request, ASP.NET MVC and EF6 fetches the values from the DB and returns them as Json.
Each data element simply consists of a DateTime and a value. Please note that the values do not get written in regular intervals (like every 2 seconds or so), but in irregular intervals. This is because the device doesn't get polled regularly but sends data on specific events, like a raise/drop of a temperature by a given change of 0.1 °C, for example.
So far everything works really well and smooth, but the large amount of data becomes a problem. For example, when I want to show a line chart for a selected period of lets say 3 month, each data series already consists of appr. 500.000 values, so the Json response from the server also gets bigger and bigger and the request takes longer with growing time periods.
So I am looking for a way to reduce the amount of data without losing relevant information, such as peaks in temperature curves etc., but at the same time I want to smoothen out the noise in the signal.
Here's an example, please keep in mind that this is just a chosen period of some hours or days, usually the user would like to see data for several months or even years as well:
The green lines are temperatures, the red bars are representations of digital states (in this case a heater that makes one of the temp curves go up).
You can clearly see the noise in the signals, this is what I want to get rid of. At the same time, I want to keep characteristic features like the ones after the heater turns on and the temperature strongly rises and falls.
I already tried chopping the raw data into blocks of a given length and then aggregating the data in them, so that I have a min, max and average for that interval. This works, but by doing so I the characteristic features of the curve get lost and everything gets kind of flattened or averaged. Here's a picture of the same period as above zoomed out a bit, so that the aggregating kicks in:
The average of the upper series is shown as the green line, the extent (min/max) of each chop is represented by the green area around the average line.
Is there some kind of fancy algorithm that I can use to filter/smoothen/reduce my data right when it comes out of the DB and before it gets send to the frontend? What are the buzzwords here that I need to dig after? Any specific libraries, frameworks or techniques are highly appreciated, as well as general comments on this topic. I'm interested primarily in server-side solutions, but please feel free to mention client-side Javascript solutions as well as they might surely be of interest for other people facing the same problem.
"Is there some kind of fancy algorithm that I can use to filter/smoothen/reduce my data right when it comes out of the DB and before it gets send to the frontend? What are the buzzwords here that I need to dig after?"
I've asked a friend at the University where I work and she says Fourier Transforms can probably be used... but that looks like Dutch to me :)
Edit: looking at it a bit more myself, and because your data is time sampled, I'm guessing you'll be interested in Discrete Time Fourier Transforms
Further searching around this topic led me here - and that, to my (admittedly unexpert) eyes, looks like something that's useful...
Further Edit:
So, that link makes me think that you should be able to remove (for example) every second sample on the server-side: then on the client-side, you can use the interpolation technique described in that link (using the inverse fourier transform) to effectively "restore" the missing points on the client-side: you've transferred half of the points and yet the resulting graph will be exactly the same because on the client you've interpolated the missing samples.... or is that way off base? :)
relatively new to databases here (and dba).
I've been recently looking into Riot Games' APIs, however now realising that you're limited to 10 calls per 10 seconds, I need to change my front-end code that was originally just loading all the information with lots of and lots of API calls into something that uses a MySQL database.
I would like to collect ranked data about each player and list them (30+ players) in an ordered list of ranking. I was thinking, as mentioned in their Rate Limiting Page, "caching" data when GET-ing it, and then when needing that information again, check if it is still relevant - if so use it, if not re-GET it.
Is the idea of adding a time of 30 minutes (the rough length of a game) in the future to a column in a table, and when calling check whether server time is ahead of the saved time. Is this the right approach/idea of caching - If not, what is the best practice of doing so?
Either way, this doesn't solve the problem of loading 30+ values for the first time, when no previous calls have been made to cache.
Any advice would be welcome, even advice telling me I'm doing completely the wrong thing!
If there is more information needed I can edit it in, let me know.
tl;dr What's best practice to get around Rate-Limiting?
Generally yes, most of the large applications simply put guesstimate rate limits, or manual cache (check DB for recent call, then go to API if its an old call).
When you use large sites like op.gg or lolKing for Summoner look ups, they all give you a "Must wait X minutes before doing another DB check/Call", I also do this. So yes, giving an estimated number (like a game length) to handle your rate limit is definitely a common practice that I have observed within the Riot Developer community. Some people do go all out and implement actual caching though with actual caching layers/frameworks, but you don't need to do that with smaller applications.
I recommend building up your app's main functionality first, submit it, and get it approved for a higher rate limit as well. :)
Also you mentioned adjusting your front-end code for calls, make sure your API calls are in server-side code for security concerns.
I was wondering if anyone can recommend app.config settings for map and reduce Javascript VM pools?
My current setup consists of two (2) Amazon EC2 m1.medium instanes in the cluster. Each server has a single CPU with ~4GB of RAM. My ring size is set to 64 partitions, with 8 JS VMs for map phases, 16 JS VMs for reduce, and 2 for hooks. I am planning on adding another instance on the cluster, to make it 3, but I'm trying to stretch as much as possible until then.
I recently encountered high wait times for queries on a set of a few thousand records (the query was to fetch the most recent 25 news feeds from a bucket of articles), resulting in timeouts. As a workaround, I passed "reduce_phase_only_1" as an argument. My query was structured as follows:
1) 2i index search
2) map phase to filter out deleted articles
3) reduce phase to sort on creation time (this is where i added reduce_phase_only_1 arg)
4) reduce phase to slice the top of results
Anyone know how to alleviate the bottleneck?
Cheers,
-Victor
Your Map phase functions are going to execute in parallel close to the data while the reduce phase generally runs iteratively on a single node using a single VM. You should therefore increase the number of VMs in the pool for map phases and reduce the pool size for Reduce phases. This has been described in greater detail here.
I would also recommend not using the reduce_phase_only_1 flag as it will allow you to pre-reduce if volumes grow, although this will result in a number of reduce phase functions running in parallel, which will require a larger pool size. You could also merge your two reduce phase functions into one and at each stage sort before cutting excessive results.
MapReduce is a flexible way to query your data, but also quite expensive, especially compared to direct key access. It is therefore best suited for batch type jobs where you can control the level of concurrency and the amount of load you put on the system through MapReduce. It is generally not recommended to use it to serve user driven queries as it can overload the cluster if there is a spike in traffic.
Instead of generating the appropriate data for every request, it is very common to de-normalise and pre-compute data when using Riak. In your case you might be able to keep lists of news in separate summary objects and update these as news are inserted, deleted or updated. This adds a bit more work when inserting, but will make reads much more efficient and scalable as it can be served through a single GET request rather than a MapReduce job. If you have a read heavy application this is often a very good design.
If inserts and updates are too frequent, thereby making it difficult to update these summary objects efficiently, it may be possible to have a batch job do this at specific time intervals instead if it is acceptable that the view may not be 100% up to date.
First of all, I am not sure this question should be in stackoverflow or other stackexchange network. So if wrong place just move for me.
It is better to calculate the distance between 2 points of lat/lng in DB or js?
Since Google maps provide computeDistanceBetween(). This is really easy to use, but i am thinking if i have 10000 rows and only 5 rows are within the distance and to display into the map.
Any idea?
If your question is only about where it should be calculated then I prefer client side. It is simple, fast, more flexible and dinamic, and you unload your server.
Without knowing much about your problem, I'd probably go with doing calculation-intensive tasks on the client side (or "in JS"). This will never put too much load on your server-side application, by distributing it among clients.
Of course there are many variables you have to take into account to choose the best approach.
Other things you may consider:
doing it server-side and caching the results,
using google-api and also caching it on your server,
many many more...
It depends to the frequency of calculations.
If your points are permanent then you can calculate all required distances only once before inserting the row into table and save the calculated result to provide it to clients later.
But if calculations are repeatable and coordinates of points will be changed eventually then it's better to use client-side calculations.
Really it's not that hard task for browser to execute some JavaScript which will calculate distances even for a lot of points.
Also if you have really tons of input data you can consider some background pre-calculations on client-side and caching of results in localStorage* so your users will never wait.
* store.js could help you to ensure cross-browser compatibility.
Where it is better to calculate distance is completely dependent on your application. If you the client has the information then it is better in JS.
In this scenario you appear to be trying to calculate the nearest points stored in your database to a given point. This is most certainly better handled by the database using geospatial indexes.
The typical algorithm narrows down the result set with a unsophisticated approach. Within x & y +/- 10KM and then with that result set do a full calculation.
I would say if your information is in the database then look into using built in geospatial tools for your DBMS.