Effecient method for structuring JSON data for querying - javascript

I'm trying to come up with an efficient method of storing static data in JSON so that it can be used in queries client-side.
Currently, this data consists of about 60 CSV files which each have approx. 2000-2200 entries each. I parse this data server-side and have a webservice that handles queries coming from the client. As mentioned, I'd like to be able to move this to the client side so that the web application could potentially work offline using the application cache.
A small sample of the data is below:
Battle Axe,20,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1
150,19EK,21EK,23EK,25EK,33ES,33ES,36ES,36ES,34ES,36ES,40ES,40ES,34ES,34ES,39ES,42ES,38ES,41ES,44ES,46ES
149,19ES,21ES,23ES,25ES,33ES,33ES,36ES,36ES,34ES,36ES,40ES,40ES,34ES,34ES,39ES,42ES,38ES,41ES,44ES,46ES
148,19EK,21EK,23EK,25EK,33ES,33ES,36ES,36ES,34ES,36ES,39ES,40ES,34ES,34ES,39ES,42ES,37ES,40ES,44ES,45ES
147,19ES,21ES,23ES,25ES,33ES,32ES,35ES,35ES,33ES,35ES,39ES,39ES,33ES,33ES,38ES,41ES,37ES,40ES,43ES,45ES
My original attempt at converting to JSON was based on the following:
Each file was one JSON Object (lets call this object 'weapon')
Each row in the file was another JSON object stored in an array under the corresponding weapon object
Each entry for a row was stored in a fixed length array under the row object
All of the 'weapon' objects were stored in an array.
This meant I had one array that consisted of approx. 60 objects, which in turn had on average 100 objects stored within them. Each of these 100 objects had an array of 20 objects for each entry which consisted of the actual value and some additional meta data. As you can see, I am already at 120K objects... the resulting minified json string was 3mb. Small sample below:
var weapon =
{
Name: 'Broadsword',
HitEntries: [
{
High: 150,
Low: 150,
Unmodified: false,
Hits: [ { Hits: '12', Critical: 'CK', Fail: false},...,{ Hits: '1', Critical: '', Fail: false}]
},
...
{
High: 50,
Low: 47,
Unmodified: false,
Hits: [ { Hits: '3', Critical: '', Fail: false}]
}
]
}
An example of a query that will be run is below. It will be based on the sample data csv shown above:
Battle Axe weapon is selected
A value of 147 is selected for the roll (row)
A value for 9 is selected for the armour type (column heading)
The result of the above should be 39ES (cross reference between row and heading)
At this point I decided it was probably a good idea to get some advice before heading down this path. Any input is appreciated =)

You can make a few optimizations here:
Use WebSockets to stream data if possible
Convert the data to TypedArrays (blobs) - you'll end up dealing with something like a 10K file.
Use the index DB to query if needed

Related

How to optimize performance of searching in two array of object

There are two array of objects one from database and one from csv. I required to compare both array object by their relative properties of Phones and emails and find duplicate array among them. Due to odd database object structure I required to compare both array with Javascript. I wanted to know what is the best algorithm and best way of compare and find duplicates?
I explain simple calculations.
There are 5000 contacts in my database and user may upload another 3000 contacts from csv. Everytime we requires to find duplicate contacts from database and if they find then it may overwrite and rest should be insert. If I compare contact row by row then it may loop 5000 database contacts x 3000 csv contacts = 15000000 time traverse.
This is my present scenario I face due to this system goes stuck. I require some efficient solution of this issue.
I develop the stuff in NodeJS, RethinkDB.
Database object structure exactly represent like that way and it may duplicate entry of emails and phones in other contacts also.
[{
id: 2349287349082734,
name: "ABC",
phones: [
{
id: 2234234,
flag: true,
value: 982389679823
},
{
id: 65234234,
flag: false,
value: 2979023423
}
],
emails: [
{
id: 22346234,
flag: true,
value: "test#domain.com"
},
{
id: 609834234,
flag: false,
value: "test2#domain.com"
}
]
}]
Please review fiddle code, if you want: https://jsfiddle.net/dipakchavda2912/eua1truj/
I have already did indexing. The problem is looking very easy and known in first sight but when we talk about concurrency it is really very critical and CPU intensive.
If understand the question you can use the lodash method differenceWith
let csvContacts = [] //fill it with your values;
let databaseContacts = .... //from your database
let diffArray = [] //the non duplicated object;
const l = require("lodash");
diffArray = l.differenceWith(csvContact,
databaseContacts,
(firstValue,secValue)=>firstValue.email == secValue.email

What are the security considerations for the size of an array that can be passed over HTTP to a JavaScript server?

I'm dealing with the library qs in Node.js, which lets you stringify and parse query strings.
For example, if I want to send a query with an array of items, I would do qs.stringify({ items: [1,2,3] }), which would send this as my query string:
http://example.com/route?items[0]=1&items[1]=2&items[2]=3
(Encoded URI would be items%5B0%5D%3D1%26items%5B1%5D%3D2%26items%5B2%5D%3D3)
When I do qs.parse(url) on the server, I'd get the original object back:
let query = qs.parse(url) // => { items: [1,2,3] }
However, the default size of the array for qs is limited to 20, according to the docs:
qs will also limit specifying indices in an array to a maximum index of 20. Any array members with an index of greater than 20 will instead be converted to an object with the index as the key
This means that if I have more than 20 items in the array, qs.parse will give me an object like this (instead of the array that I expected):
{ items: { '0': 1, '1': 2 ...plus 19 more items } }
I can override this behavior by setting a param, like this: qs.parse(url, { arrayLimit: 1000 }), and this would allow a max array size of 1,000 for example. This would, thus, turn an array of 1,001 items into a plain old JavaScript object.
According to this github issue, the limit might be for "security considerations" (same in this other github issue).
My questions:
If the default limit of 20 is meant to help mitigate a DoS attack, how does turning an array of over 20 items into a plain old JavaScript object supposed to help anything? (Does the object take less memory or something?)
If the above is true, even if there is an array limit of, say, 20, couldn't the attacker just send more requests and still get the same DoS effect? (The number of requests necessary to be sent would decrease linearly with the size limit of the array, I suppose... so I guess the "impact" or load of a single request would be lower)

Extracting data fields from a json object

I've been reading all the stuff I can find about this, but none of the solutions seems to provide the answer I need.
Specifically, I read this Access / process (nested) objects, arrays or JSON carefully, as well as dozens of other posts.
Here's what I'm trying to accomplish:
I have a large object, called allData -- that I got as a json object from MongoDB, and it contains an array of all the data from each reading.
{pitch: -7.97, roll: -4.3, temp: 98, yaw: -129.83, time: "01/22/2016 17:28:47", …}
{pitch: -8.04, roll: -4.41, temp: 97, yaw: -130.81, time: "01/22/2016 17:28:58", …}
...
What I'd LIKE to do is be able to extract all the pitch readings with something along the lines of allData.pitch but obviously that doesn't work, since each data reading is in an element of the allData array. So I could go through in a loop and do allData[x].pitch but I was hoping for cleaner, faster way to do do this -- since I'll probably want to extract each type of data.
Unfortunately at this point I don't have the ability to simply request the pitch data from the db, so I get this whole set back.
One last wrinkle is that ONE of the elements of the array above is already a data object.
You can utilise Array.prototype.map() for this
var pitches = allData.map(function(d) {
return {
"pitch": d.pitch,
"time": d.time
};
});
If you can't control the data returned from the server (i.e., to only retrieve the pitch values you want), you are going to have to loop through to get them all.

How to maintain the file reading pointer in code igniter php back and forth between the view?

I've got a Code Igniter controller which reads the uploaded record file (excel, csv, tsv, xml, etc) and writes each line as a record into the database. The record ID which is the first column in the file should be unique. If it is already present in the database then while reading the file I should throw an alert and depending on the user's response I will override or skip the record and continue with the other records.
But once I throw the alert and reach the view, I loose the control of the controller and I don't know to keep the file pointer ticking so that I can continue to read after receiving the response from the user.
Is there a work around for this or do we have an other alternative method?
Consider I have a file like this:
Roll, Name, Age, Subject
1, Praveen, 25, Cloud Computing
2, Sri Ram, 25, Cloud Computing
3, Dhiwa, 23, Computer Science
4, Vennila, 25, Cloud Computing
5, Arun, 22, Computer Science
Being a CSV file, I would read it using PHP's built-in function str_getcsv to read the file, this way:
$csv = array_map('str_getcsv', file('data.csv'));
I would take the collection of the records' Roll, which is a collection of unique entries, by parsing the array's first value, so that I would get something like:
Array (
1,
2,
3,
4,
5
)
And finally I would assume the database would already have a similar structure like:
+--------+------+-----+---------+
| UserId | Name | Age | Subject |
+--------+------+-----+---------+
So, now I would use a SQL's GROUP_CONCAT function to group all the UserIds together and get it as this way:
$UserIdsFromDB = get_first_row_value("SELECT GROUP_CONCAT(`UserId`) FROM `Users`");
I would get parse it as an array and have the result in $UserIdsFromDB this way:
Array (
5,
17,
22,
23,
55,
56
57
)
I could easily use a PHP's built-in function array_intersect to find those values that are repeated, which gives me:
Array (
5
)
And using a simple AJAX call, I would ask if I need to replace the 5 User or all the users, this way:
get_user_name(5) // Gets the name of the user with ID 5.
count($Array_Intersect_Result) // Gets the number of matching users.
I hope this would take you to a level of understanding. Let me know how it goes.

Extracting data from a JSON call to a Postgres table for use in Highcharts (without PHP)

I am using JSON to retrieve data from a Postgres table for use in a Highcharts column chart. I have five time-series in the database, each with five elements, and only want to use two key:value pairs from each time-series in a chart.
The output needs to be in this format:
An array of objects with named values. In this case the objects are point configuration objects as seen below.
Range series values are given by low and high.
Example:
data: [{
name: 'Point 1',
color: '#00FF00',
y: 0
}, {
name: 'Point 2',
color: '#FF00FF',
y: 5
}]
Note that line series and derived types like spline and area, require data to be sorted by X because it interpolates mouse coordinates for the tooltip. Column and scatter series, where each point has its own mouse event, does not require sorting.
That's taken from the Highcharts series / data api: http://api.highcharts.com/highcharts#series.data
Here's my db schema:
enable_extension "plpgsql"
create_table "series", force: true do |t|
t.string "acronym"
t.string "name"
t.string "current"
t.string "previous"
t.string "pc1"
t.datetime "created_at"
t.datetime "updated_at"
end
end
Here's the json call:
url = "http://local server 3000/series";
$.getJSON(url, function (series) {
console.log(series);
}
options.series[0].data = series;
var chart = new Highcharts.Chart(options);
});
console.log shows an array of objects being returned:
[Object, Object, Object, Object, Object]
0: Object
acronym: "1"
current: "3.4"
id: 1
name: "a"
pc1: "25"
previous: "2.4"
url: "http://localhost:3000/series/1.json"
__proto__: Object
1: Object
2: Object
3: Object
4: Object
length: 5
__proto__: Array[0]
I "opened" the first object in order to show how the data is being returned. Notice that the key:value pairs are not followed by a comma (which Highcharts asks for).
In the current chart, a column chart, I only want to display the series "name", e.g., "a", and the value of the corresponding key / value pair "pc1". (They're actually going to be economic time series, e.g., CPI, and its year-over-year percent change.) I plan on creating other charts with, for example, the current and previous values, for a particular time series.
I'm new to programming and do not know how to "extract", if you will, the key:value pairs into the Highcharts format.
I tried pushing the data into a new array with no luck:
var data = [];
data.push(series[0].name + "," series[0].pc1);
I tried a for loop with no luck:
for (var i = 0, l = series.length; i < l; i++) {
var key = series[i].name;
var value = series[i].pc1;
var data = [ [key + ":" + value], };
In most attempts, I could build an array, but not a nested one of objects and / or one with the comma between x and y values.
I also tried splice(); however, could only remove entire objects.
So, how do I get output that looks like:
data: [{
name: 'a',
pc1: '25',
}, {
name: 'b',
pc1: '15',
}]
I see some very complex code in the answers relating to PHP and MySQL; however, there's nothing with JSON alone (that I could find).
A related question: Am I approaching the problem correctly. Or should I, perhaps, create a new database table for each chart so that only the data I want to display is saved and retrieved.
Thank you in advance.
I would definitely not recommend you create a new database table for each chart. That is not necessary. You can tailor your SQL queries from that table to suit your chart data needs.
One issue is that your y-value is currently a string, which is not what you want in a graphing application. That should be cast to an integer in your SQL query.
The data form is an array of hashes. To initialize the array and add one such element, you could do something like this:
data = [];
data.push({'name': 'a', 'pc1': 25});
To see the data in pretty-printed form, you can verify it in the console with this code:
print(JSON.stringify(data));
So, basically you can build up the array of hashes in the for loop (or via a mechanism such as jQuery's $.each).
That method is essentially hand-rolling your own JSON, though, which is kind of tedious and can be error-prone in some cases. An easier method would probably be to have your backend convert the result of the Postgres query to JSON directly.
Ruby (which I assume is what you're using in the backend based on your schema definition code) has a JSON module (http://www.ruby-doc.org/stdlib-2.0.0/libdoc/json/rdoc/JSON.html) that can do this.
Note that for this to work, your Postgres library will need to return the data as an array of hashes, so it has the column names as well (just an array of arrays will not work). This is doable through libraries such as psycopg2 (Python) and DBI (Perl), and is likely similarly available in the Ruby library you're using also.

Categories