Lodash 2d array comparison for large data sets - javascript

I have a 2d array representing rows in a database. I'm using officeJS to load and manipulate the data in Excel. I update, insert, and delete rows. The challenge I'm facing is that I need to figure out the changed rows (inserted, deleted or updated) so that I can update only those rows in the database. I'm sending one query for the updated and inserted rows and one query for the deleted rows. I'm able to do this using lodash for data with 5000 rows and 10 columns. I'd like to scale this to a much larger data set and I'm wondering if there are any alternatives to what I'm currently doing. Below is the code I'm using to find the difference.
insertedOrUpdatedRows = _.differenceWith(modifiedData, originalData, _.isEqual);
deletedRows = _.differenceWith(originalData, modifiedData, compareFunction);
function compareFunction(a, b) {
if(a[0] == b[0]) {
return true;
}
else
return false;
}
Sample data array
[ [1,data,data,data],
[2,data,data,data] ]
The first element is the primary key.

Because you have mentioned that your Javascript engine is crashing (which it should not, at 50,000 rows - so I would revisit the logic), I would recommend chunking out the data using Lodash's _.chunk function:
_.chunk(modifiedData, modifiedData.length/500).map({
...
...
});

Ok im using the following logic. Not sure why its crashing at 50K rows. OriginalData and ModifiedData are in the format of the sample 2D array mentioned above.
var originalDataStrings = [];
var modifiedDataStrings = [];
var insertedOrUpdatedRows;
originalData.forEach(function(row){
originalDataStrings.push(JSON.stringify(row));
});
modifiedData.forEach(function(row){
modifiedDataStrings.push(JSON.stringify(row));
})
insertedOrUpdatedRows = _.differenceWith(modifiedDataStrings, originalDataStrings, _.isEqual);
console.log(insertedOrUpdatedRows);

Related

How to make a summary list of one column in a large CSV file?

I have 20,000 rows in a CSV which I have loaded using d3. Within this CSV there are roughly 4,000 unique category names (each being repeated across various numbers of rows).
I would like to make a list (an array or objects) of all the ~4,000 category names from my CSV, to be able to filter out categories that I do not want to work with.
See code and data sample below; the category column is called feature_id.
var rowConverter = function(d){
return{
event_date: parseTime(d.event_date),
claim_number: d.claim_number,
cause: d.cause,
detail_cause: d.detail_cause,
paid_total: parseFloat(d.paid_total),
feature_id: d.feature_id,
id: parseFloat(d.id)
};
}
d3.csv('claims_cwy.csv', rowConverter, function(dataset) {
console.log(dataset);
}
You can create an empty array, iterate over this dataset and for each iteration check this category if it exists. If not, add to the array. Something like:
const categories = []
dataset.forEach( item => {
if ( categories.indexOf(item.category) <= 0)
categories.push(item.category)
})
PS: I don't know which of this attributes in the row represents the category, it's not clear.
There are various ways to achieve what you want. If you want to keep it D3-ish you could make use of d3.set() which not only guarantees uniqueness of its values, but also allows you to provide an accessor to extract the categories' values, i.e. the field feature_id, from your data.
const categories = d3.set(dataset, d => d.feature_id);
Note, however, that this requires an additional loop through your data. As you claim to have a large set of data, you might want to do it step by step by adding to the set in the row converter function.
const categories = d3.set();
const rowConverter = function(d) {
categories.add(d.feature_id);
};
Whatever approach you prefer the unique category values are available by calling d3.values().

Trying to dynamically organize JSON object elements into different arrays based on values

This is the JSON I'm working with:
https://data.cityofnewyork.us/resource/xx67-kt59.json?$where=camis%20=%2230112340%22
I'd be dynamically making the queries using different data, so it'll possibly change.
What I'm essentially trying to do is to somehow organize the elements within this array into different arrays based on inspection_date.
So for each unique inspection_date value, those respective inspections would be put into its own collection.
If I knew the dates beforehand, I could easily iterate through each element and just push into an array.
Is there a way to dynamically create the arrays?
My end goal is to be able to display each group of inspections (based on inspection date) using Angular 5 on a webpage. I already have the site up and working and all of the requests being made.
So, I'm trying to eventually get to something like this. But of course, using whatever dates in the response from the request.
2016-10-03T00:00:00
List the inspections
2016-04-30T00:00:00
List the inspections
2016-04-12T00:00:00
List the inspections
Just for reference, here's the code I'm using:
ngOnInit() {
this.route.params.subscribe(params => {
this.title = +params['camis']; // (+) converts string 'id' to a number
this.q.getInpectionsPerCamis(this.title).subscribe((res) => {
this.inspectionList = res;
console.log(res);
});
// In a real app: dispatch action to load the details here.
});
}
I wish I could give you more info, but at this point, I'm just trying to get started.
I wrote this in jQuery just because it was faster for me, but it should translate fairly well to Angular (I just don't want to fiddle with an angular app right now)
Let me know if you have any questions.
$(function() {
let byDateObj = {};
$.ajax({
url: 'https://data.cityofnewyork.us/resource/xx67-kt59.json?$where=camis%20=%2230112340%22'
}).then(function(data) {
//probably do a check to make sure the data is an array, im gonna skip that
byDateObj = data.reduce(function(cum, cur) {
if (!cum.hasOwnProperty(cur.inspection_date)) cum[cur.inspection_date] = [];
//if the cumulative array doesn't have the inspection property already, add it as an empty array
cum[cur.inspection_date].push(cur);
//push to inspection_date array.
return cum;
//return cumulatie object
}, byDateObj);
//start with an empty object by default;
console.log(byDateObj);
}, console.error);
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

Display posts in descending posted order

I'm trying to test out Firebase to allow users to post comments using push. I want to display the data I retrieve with the following;
fbl.child('sell').limit(20).on("value", function(fbdata) {
// handle data display here
}
The problem is the data is returned in order of oldest to newest - I want it in reversed order. Can Firebase do this?
Since this answer was written, Firebase has added a feature that allows ordering by any child or by value. So there are now four ways to order data: by key, by value, by priority, or by the value of any named child. See this blog post that introduces the new ordering capabilities.
The basic approaches remain the same though:
1. Add a child property with the inverted timestamp and then order on that.
2. Read the children in ascending order and then invert them on the client.
Firebase supports retrieving child nodes of a collection in two ways:
by name
by priority
What you're getting now is by name, which happens to be chronological. That's no coincidence btw: when you push an item into a collection, the name is generated to ensure the children are ordered in this way. To quote the Firebase documentation for push:
The unique name generated by push() is prefixed with a client-generated timestamp so that the resulting list will be chronologically-sorted.
The Firebase guide on ordered data has this to say on the topic:
How Data is Ordered
By default, children at a Firebase node are sorted lexicographically by name. Using push() can generate child names that naturally sort chronologically, but many applications require their data to be sorted in other ways. Firebase lets developers specify the ordering of items in a list by specifying a custom priority for each item.
The simplest way to get the behavior you want is to also specify an always-decreasing priority when you add the item:
var ref = new Firebase('https://your.firebaseio.com/sell');
var item = ref.push();
item.setWithPriority(yourObject, 0 - Date.now());
Update
You'll also have to retrieve the children differently:
fbl.child('sell').startAt().limitToLast(20).on('child_added', function(fbdata) {
console.log(fbdata.exportVal());
})
In my test using on('child_added' ensures that the last few children added are returned in reverse chronological order. Using on('value' on the other hand, returns them in the order of their name.
Be sure to read the section "Reading ordered data", which explains the usage of the child_* events to retrieve (ordered) children.
A bin to demonstrate this: http://jsbin.com/nonawe/3/watch?js,console
Since firebase 2.0.x you can use limitLast() to achieve that:
fbl.child('sell').orderByValue().limitLast(20).on("value", function(fbdataSnapshot) {
// fbdataSnapshot is returned in the ascending order
// you will still need to order these 20 items in
// in a descending order
}
Here's a link to the announcement: More querying capabilities in Firebase
To augment Frank's answer, it's also possible to grab the most recent records--even if you haven't bothered to order them using priorities--by simply using endAt().limit(x) like this demo:
var fb = new Firebase(URL);
// listen for all changes and update
fb.endAt().limit(100).on('value', update);
// print the output of our array
function update(snap) {
var list = [];
snap.forEach(function(ss) {
var data = ss.val();
data['.priority'] = ss.getPriority();
data['.name'] = ss.name();
list.unshift(data);
});
// print/process the results...
}
Note that this is quite performant even up to perhaps a thousand records (assuming the payloads are small). For more robust usages, Frank's answer is authoritative and much more scalable.
This brute force can also be optimized to work with bigger data or more records by doing things like monitoring child_added/child_removed/child_moved events in lieu of value, and using a debounce to apply DOM updates in bulk instead of individually.
DOM updates, naturally, are a stinker regardless of the approach, once you get into the hundreds of elements, so the debounce approach (or a React.js solution, which is essentially an uber debounce) is a great tool to have.
There is really no way but seems we have the recyclerview we can have this
query=mCommentsReference.orderByChild("date_added");
query.keepSynced(true);
// Initialize Views
mRecyclerView = (RecyclerView) view.findViewById(R.id.recyclerView);
mManager = new LinearLayoutManager(getContext());
// mManager.setReverseLayout(false);
mManager.setReverseLayout(true);
mManager.setStackFromEnd(true);
mRecyclerView.setHasFixedSize(true);
mRecyclerView.setLayoutManager(mManager);
I have a date variable (long) and wanted to keep the newest items on top of the list. So what I did was:
Add a new long field 'dateInverse'
Add a new method called 'getDateInverse', which just returns: Long.MAX_VALUE - date;
Create my query with: .orderByChild("dateInverse")
Presto! :p
You are searching limitTolast(Int x) .This will give you the last "x" higher elements of your database (they are in ascending order) but they are the "x" higher elements
if you got in your database {10,300,150,240,2,24,220}
this method:
myFirebaseRef.orderByChild("highScore").limitToLast(4)
will retrive you : {150,220,240,300}
In Android there is a way to actually reverse the data in an Arraylist of objects through the Adapter. In my case I could not use the LayoutManager to reverse the results in descending order since I was using a horizontal Recyclerview to display the data. Setting the following parameters to the recyclerview messed up my UI experience:
llManager.setReverseLayout(true);
llManager.setStackFromEnd(true);
The only working way I found around this was through the BindViewHolder method of the RecyclerView adapter:
#Override
public void onBindViewHolder(final RecyclerView.ViewHolder holder, int position) {
final SuperPost superPost = superList.get(getItemCount() - position - 1);
}
Hope this answer will help all the devs out there who are struggling with this issue in Firebase.
Firebase: How to display a thread of items in reverse order with a limit for each request and an indicator for a "load more" button.
This will get the last 10 items of the list
FBRef.child("childName")
.limitToLast(loadMoreLimit) // loadMoreLimit = 10 for example
This will get the last 10 items. Grab the id of the last record in the list and save for the load more functionality. Next, convert the collection of objects into and an array and do a list.reverse().
LOAD MORE Functionality: The next call will do two things, it will get the next sequence of list items based on the reference id from the first request and give you an indicator if you need to display the "load more" button.
this.FBRef
.child("childName")
.endAt(null, lastThreadId) // Get this from the previous step
.limitToLast(loadMoreLimit+2)
You will need to strip the first and last item of this object collection. The first item is the reference to get this list. The last item is an indicator for the show more button.
I have a bunch of other logic that will keep everything clean. You will need to add this code only for the load more functionality.
list = snapObjectAsArray; // The list is an array from snapObject
lastItemId = key; // get the first key of the list
if (list.length < loadMoreLimit+1) {
lastItemId = false;
}
if (list.length > loadMoreLimit+1) {
list.pop();
}
if (list.length > loadMoreLimit) {
list.shift();
}
// Return the list.reverse() and lastItemId
// If lastItemId is an ID, it will be used for the next reference and a flag to show the "load more" button.
}
I'm using ReactFire for easy Firebase integration.
Basically, it helps me storing the datas into the component state, as an array. Then, all I have to use is the reverse() function (read more)
Here is how I achieve this :
import React, { Component, PropTypes } from 'react';
import ReactMixin from 'react-mixin';
import ReactFireMixin from 'reactfire';
import Firebase from '../../../utils/firebaseUtils'; // Firebase.initializeApp(config);
#ReactMixin.decorate(ReactFireMixin)
export default class Add extends Component {
constructor(args) {
super(args);
this.state = {
articles: []
};
}
componentWillMount() {
let ref = Firebase.database().ref('articles').orderByChild('insertDate').limitToLast(10);
this.bindAsArray(ref, 'articles'); // bind retrieved data to this.state.articles
}
render() {
return (
<div>
{
this.state.articles.reverse().map(function(article) {
return <div>{article.title}</div>
})
}
</div>
);
}
}
There is a better way. You should order by negative server timestamp. How to get negative server timestamp even offline? There is an hidden field which helps. Related snippet from documentation:
var offsetRef = new Firebase("https://<YOUR-FIREBASE-APP>.firebaseio.com/.info/serverTimeOffset");
offsetRef.on("value", function(snap) {
var offset = snap.val();
var estimatedServerTimeMs = new Date().getTime() + offset;
});
To add to Dave Vávra's answer, I use a negative timestamp as my sort_key like so
Setting
const timestamp = new Date().getTime();
const data = {
name: 'John Doe',
city: 'New York',
sort_key: timestamp * -1 // Gets the negative value of the timestamp
}
Getting
const ref = firebase.database().ref('business-images').child(id);
const query = ref.orderByChild('sort_key');
return $firebaseArray(query); // AngularFire function
This fetches all objects from newest to oldest. You can also $indexOn the sortKey to make it run even faster
I had this problem too, I found a very simple solution to this that doesn't involved manipulating the data in anyway. If you are rending the result to the DOM, in a list of some sort. You can use flexbox and setup a class to reverse the elements in their container.
.reverse {
display: flex;
flex-direction: column-reverse;
}
myarray.reverse(); or this.myitems = items.map(item => item).reverse();
I did this by prepend.
query.orderByChild('sell').limitToLast(4).on("value", function(snapshot){
snapshot.forEach(function (childSnapshot) {
// PREPEND
});
});
Someone has pointed out that there are 2 ways to do this:
Manipulate the data client-side
Make a query that will order the data
The easiest way that I have found to do this is to use option 1, but through a LinkedList. I just append each of the objects to the front of the stack. It is flexible enough to still allow the list to be used in a ListView or RecyclerView. This way even though they come in order oldest to newest, you can still view, or retrieve, newest to oldest.
You can add a column named orderColumn where you save time as
Long refrenceTime = "large future time";
Long currentTime = "currentTime";
Long order = refrenceTime - currentTime;
now save Long order in column named orderColumn and when you retrieve data
as orderBy(orderColumn) you will get what you need.
just use reverse() on the array , suppose if you are storing the values to an array items[] then do a this.items.reverse()
ref.subscribe(snapshots => {
this.loading.dismiss();
this.items = [];
snapshots.forEach(snapshot => {
this.items.push(snapshot);
});
**this.items.reverse();**
},
For me it was limitToLast that worked. I also found out that limitLast is NOT a function:)
const query = messagesRef.orderBy('createdAt', 'asc').limitToLast(25);
The above is what worked for me.
PRINT in reverse order
Let's think outside the box... If your information will be printed directly into user's screen (without any content that needs to be modified in a consecutive order, like a sum or something), simply print from bottom to top.
So, instead of inserting each new block of content to the end of the print space (A += B), add that block to the beginning (A = B+A).
If you'll include the elements as a consecutive ordered list, the DOM can put the numbers for you if you insert each element as a List Item (<li>) inside an Ordered Lists (<ol>).
This way you save space from your database, avoiding unnecesary reversed data.

How to get the row count from an azure database?

Am working on a windows store javascript application. The application uses data from azure mobile services.
Consider the below code:
var itemTable = mobileService.getTable('item');
//item is the table name stored in the azure database
The code fetches the entire table item and saves it to a variable itemTable.
What code will return the no of rows present in itemTable??
What you're looking for is the includeTotalCount method on the table/query object (unfortunately it's missing from the documentation, I'll file a bug to the product team to have it fixed).
When you call read on the query object, it will return by default 50 (IIRC, the number may be different) elements from it, to prevent a naïve call from returning all elements in a very large table (thus either incurring the outbound bandwidth cost for reserved services, or hitting the quota for free ones). So getting all the elements in the table, and getting the length of the results may not be accurate.
If all you want is the number of elements in the table, you can use the code below: returning zero elements, and the total count.
var table = client.getTable('tableName');
table.take(0).includeTotalCount().read().then(function (results) {
var count = results.totalCount;
new Windows.UI.Popups.MessageDialog('Total count: ' + count).showAsync();
});
If you want to query some elements, and also include the total count (i.e., for paging), just add the appropriate take() and skip() calls, and also the includeTotalCount as well.
If anybody comes here and interested in how to get the totalCount only on C# (like me), then this is how you do it:
var table = MobileService.GetTable<T> ();
var query = table.Take(0).IncludeTotalCount();
IList<T> results = await query.ToListAsync ();
long count = ((ITotalCountProvider)results).TotalCount;
Credit goes to this blog post here
You need to execute read() on the table query and then get the length of the results.
var items, numItems;
itemTable.read().then(function(results) { items = results; numItems = items.length; });
If you are only showing a record count and not the entire results - you should just select the ID column to reduce the amount of data transmitted. I don't see a count() method available yet in the JS Query API to fill this need.
var itemTable = mobileService.getTable('item').select('itemID');

d3 seems to assume I know the column names of a csv?

I have csv files that are generated, and I am trying to load them into d3 to graph them. The column names are based on the data, so I essentially can't know them in advance. With testing, I am able to load this data and graph it all well and nice if I know the names of the columns...but I don't in my use case.
How can I handle this in d3? I can't seem to find anything to help/reference this online or in the documentation. I can see when I log to the console data[0] from d3.csv that there are two columns and the values read for them, but I don't know how to refer arbitrarily to column 1 or 2 of the data without knowing the name of the column ahead of time. I'd like to avoid that in general, knowing my timestamps are in column 1 and my data is in column 2, if that makes sense.
Edit, my answer uses d3.entries to help learn the name of the unknown column, and then continues to access all objects with that index:
d3.csv("export.csv", function(error, data) {
var mappedArray = d3.entries(data[0]);
var valueKey = mappedArray[1].key;
data.forEach(function(d) {
...
d.value = d[valueKey];
}
}
You can use the d3.entries() function to transform an associative array into another array that contains an associative array with key and value keys for each key-value pair.
I'm glad you figured it out, #cdietschrun.
Version 4 of D3 allows you to do this a little more simply. It introduces a columns property, which creates an array of column headers (i.e. the dataset's 'keys').
So, instead of using your code:
var mappedArray = d3.entries(data[0]),
valueKey = mappedArray[1].key;
... you can use:
var valueKey = data.columns;
You can get keys (column names) using D3 v3 values() method like this:
const [dataValues] = d3.values(data)
const keys = Object.keys(dataValues)
console.log(keys);
I used d3.entries() as below:
for(i=0;i<data.length;i++)
{
var temp = d3.entries(data[i]);
for(j=0;j<temp.length;j++)
if(temp[j].key == selectedx)
myarray.push(temp[j].value);
}
Hope this helps :)

Categories