How to manage large java script data files? - javascript

I'm developing a Cordova project. to store my data I'm using java script files like this:
var groups = [
id: 1,
parent_id: 0,
name: "Group 1"
id: 2,
parent_id: 0,
name: "Group 2"
First problem is that I don't know if it is a good way or maybe there are better ways.
to use this data, simply I use a loop through variable, but the problem is when there are large data volumes, for example thousands of records. It's hard to handle this amount of data in a .js file. what should I do?

A possible solution is to use a database such as IndexedDB(if your app is completely offline) or FireBase (if your app uses internet), you can query and get just the data you require.
Even DOM Storage (Local-Storage) is an option but there is the problem of looping over an array and this cannot store more than 5MB of data.


how to navigate nested objects of unknown depth?

Im making a notetaking app and Ive decided to store all the notes and structure in JSON file. On javascript, I get the JSON with AJAX, parse it and output it on the website.
My note structure is array of objects that can be nested, like this (if it is a note, it has a "content" attribute, if it is a folder, it has an array of objects (can be empty array too if the folder should me empty):
data {
entries = [
name: "Some note",
content: "This is a test note"
name: "folder",
children: [
name: "Bread recpie",
content: "Mix flour with water..."
name: "Soups",
children: [
name: "Pork soup",
content: "Add meat, onion..."
name: "Chicken soup"
content: "....."
To list the root directory, its simple, i just loop through the array as it only outputs the top-level records:
for (entry of data.entries) {
const li = document.createElement("li");
li.textContent =;
if (entry.children) {
li.className = "folder";
} else {
li.className = "file";
But what about the folders? How should I proceed in listing the folders if the depth of nesting is unknown? And how do I target the specific folder? Should I add unique IDs to every object so i can filter the array with them? Or should I store some kind of depth information in a variable all the time?
You're making this more difficult for yourself by saving data to a JSON file. That is not a good approach. What you need to do is design a database schema appropriate for your data and create an API that outputs a predictable pattern of data that your client can work with.
I would suggest having a Folder resource and a Note resource linked through a one-to-many relationship. Each Folder resource can have many associated Note entries, but each Note has only one Folder that it is linked to. I suggest using an ORM, because most make it easy to eager load related data. For instance, if you choose Laravel you can use Eloquent, and then getting all notes for a folder is as easy as:
$folderWithNotes = Folder::with('notes')->where('name', 'school-notes')->get();
Knowing PHP is beside the point. You should still be able to see the logic of that.
If you create a database and build a server-side API to handle your data, you will end up with JSON on your client side that has a predictable format and is easy to work with.

Synchronize Data across multiple occasionally-connected-clients using EventSourcing (NodeJS, MongoDB, JSON)

I'm facing a problem implementing data-synchronization between a server and multiple clients.
I read about Event Sourcing and I would like to use it to accomplish the syncing-part.
I know that this is not a technical question, more of a conceptional one.
I would just send all events live to the server, but the clients are designed to be used offline from time to time.
This is the basic concept:
The Server stores all events that every client should know about, it does not replay those events to serve the data because the main purpose is to sync the events between the clients, enabling them to replay all events locally.
The Clients have its one JSON store, also keeping all events and rebuilding all the different collections from the stored/synced events.
As clients can modify data offline, it is not that important to have consistent syncing cycles. With this in mind, the server should handle conflicts when merging the different events and ask the specific user in the case of a conflict.
So, the main problem for me is to dertermine the diffs between the client and the server to avoid sending all events to the server. I'm also having trouble with the order of the synchronization process: push changes first, pull changes first?
What I've currently built is a default MongoDB implementation on the serverside, which is isolating all documents of a specific user group in all my queries (Currently only handling authentication and server-side database work).
On the client, I've built a wrapper around a NeDB store, enabling me to intercept all query operations to create and manage events per-query, while keeping the default query behaviour intact. I've also compensated for the different ID systems of neDB and MongoDB by implementing custom ids that are generated by the clients and are part of the document data, so that recreating a database won't mess up the IDs (When syncing, these IDs should be consistent across all clients).
The event format will look something like this:
type: 'create/update/remove',
collection: 'CollectionIdentifier',
target: ?ID, //The global custom ID of the document updated
data: {}, //The inserted/updated data
timestamp: '',
creator: //Some way to identify the author of the change
To save some memory on the clients, I will create snapshots at certain amounts of events, so that fully replaying all events will be more efficient.
So, to narrow down the problem: I'm able to replay events on the client side, I'm also able to create and maintain the events on the client and serverside, Merging the events on serverside should also not be a problem, Also replicating a whole database with existing tools is not an option as I'm only syncing certain parts of the database (Not even entire collections as the documents are assigned different groups in which they should sync).
But what I am having trouble with is:
The process of determining what events to send from the client when syncing (Avoid sending duplicate events, or even all events)
Determining what events to send back to the client (Avoid sending duplicate events, or even all events)
The right order of syncing the events (Push/Pull changes)
Another Question I would like to ask, is whether storing the updates directly on the documents in a revision-like style is more efficient?
If my question is unclear, duplicate (I found some questions, but they didnt help me in my scenario) or something is missing, please leave a comment, I will maintain it as best as I can to keep it simple, as I've just written everything down that could help you understand the concept.
Thanks in advance!
This is a very complex subject, but I'll attempt some form of answer.
My first reflex upon seeing your diagram is to think of how distributed databases replicate data between themselves and recover in the event that one node goes down. This is most often accomplished via gossiping.
Gossip rounds make sure that data stays in sync. Time-stamped revisions are kept on both ends merged on demand, say when a node reconnects, or simply at a given interval (publishing bulk updates via socket or the like).
Database engines like Cassandra or Scylla use 3 messages per merge round.
Data in Node A
{ id: 1, timestamp: 10, data: { foo: '84' } }
{ id: 2, timestamp: 12, data: { foo: '23' } }
{ id: 3, timestamp: 12, data: { foo: '22' } }
Data in Node B
{ id: 1, timestamp: 11, data: { foo: '50' } }
{ id: 2, timestamp: 11, data: { foo: '31' } }
{ id: 3, timestamp: 8, data: { foo: '32' } }
Step 1: SYN
It lists the ids and last upsert timestamps of all it's documents (feel free to change the structure of these data packets, here I'm using verbose JSON to better illustrate the process)
Node A -> Node B
[ { id: 1, timestamp: 10 }, { id: 2, timestamp: 12 }, { id: 3, timestamp: 12 } ]
Step 2: ACK
Upon receiving this packet, Node B compares the received timestamps with it's own. For each documents, if it's timestamp is older, just place it in the ACK payload, if it's newer place it along with it's data. And if timestamps are the same, do nothing- obviously.
Node B -> Node A
[ { id: 1, timestamp: 11, data: { foo: '50' } }, { id: 2, timestamp: 11 }, { id: 3, timestamp: 8 } ]
Step 3: ACK2
Node A updates it's document if ACK data is provided, then sends back the latest data to Node B for those where no ACK data was provided.
Node A -> Node B
[ { id: 2, timestamp: 12, data: { foo: '23' } }, { id: 3, timestamp: 12, data: { foo: '22' } } ]
That way, both node now have the latest data merged both ways (in case the client did offline work) - without having to send all your documents.
In your case, your source of truth is your server, but you could easily implement peer-to-peer gossiping between your clients with WebRTC, for example.
Hope this helps in some way.
Cassandra training video
Scylla explanation
I think that the best solution to avoid all the event order and duplication issues are to use the pull method. In this way every client maintains its last imported event state (with a tracker for example) and ask the server for the events generated after that last one.
An interesting problem will be to detect the breaking of business invariants. For that you could store on the client the log of applied commands also and in case of a conflict (events were generated by other clients) you could retry the execution of commands from the command log. You need to do that because some commands will not succeed after re-execution; for example, a client saves a document after other user deleted that document in the same time.

Where can I store a large amount of "business" data/object in my JS project?

I work with AngularJs. I have a factory providing services about buildings.
I have a lot of buildings (around 50-60) with a lot properties and sub-properties related to them (around 15-20, more or less complex).
Business requirements force me to store all the data about buildings in JS format.
So I created an object like that:
var Buildings = {
myFirstBuilding: {
id: 1,
name: 'Building 1',
quantity: 0,
costs: {
myFirstCost: {
name : 'Cost 1',
value: 10,
isAvailable: 0
mySecondCost: {
name : 'Cost 2',
value: 20,
isAvailable: 0
workersAvailable: 0,
type: 'Type 1',
other properties...
mySecondBuilding: {
id: 1,
name: 'Building 2',
quantity: 3,
and so on...
I can update the structure of this object if needed. The only requirements is to be able to store it in user's browser and to "JSON it" when it's needed.
Where should I store this object in my project? Can I reference it from an external file in my factory? Should I include it directly in my factory?
Do you see any inconsistencies in this object as it's actually made in my example above?
You've got 3 options as far as I see it:
Use an angular value or constant do store the data, if you just need
to render it, and don't require database search/filter capabilities,
and it's small enough.
If it's large, but not huge, and you don't need database
search/filter, store it in local storage and read it using an angular service.
If it's very large and/or you need db search/filter use
indexedDB on the browser. As some implementations of indexedDB
are pretty broken, you should use a wrapper library such as

Get count of unique values of properties from JSON API response

I have a JSON API served by a Ruby on Rails backend. One of the endpoints returns an array of objects structured like this
"title_slug": "16-gaijin-games-bittrip-beat-linux-tar-gz",
"platform": "Linux",
"format": ".tar.gz",
"title": "BIT.TRIP BEAT",
"bundle": "Humble Bundle for Android 3",
"unique_games": 9
"title_slug": "17-gaijin-games-bittrip-beat-linux-deb",
"platform": "Linux",
"format": ".deb",
"title": "BIT.TRIP BEAT",
"bundle": "Humble Bundle for Android 3",
"unique_games": 9
Because there are different types of downloads for a single title the "Title" is not unique across several objects. I would like a count of only unique titles.
I was thinking of doing it in Ruby on Rails in the model and just sending it in the JSON response but that does not work because it needs the whole array to count them, obviously. I am using angular on the front end so I am thinking it needs to be done in the controller. I also filter the response in a table and want updated numbers of the unique titles being displayed.
Here's a screenshot of the page this is going on to get better perspective.
Thank you very much,
Thomas Le
BTW, this is a site I am developing that is not going to be a public website. It is a database site that holds all the data on the bundles I have bought from IndieGala and HumbleBundle. I am not going to make these links available to the public. I am making it more functional than the bare minimum because it is an open source project that I have on GitHub that people can use themselves locally.
Just in case people were wondering why I have Humble Bundle stuff listed on the image.
Aggregate your data in an array indexed by the unique key, Then you get access to information on duplicates and count.
var i,
uniqueResults= {};
for (i in results) {
title= results[i].title;
if (!uniqueResults[title]) {
uniqueResults[title]= [];
Maybe it would be better to restructure your data at the same time, so you can also get those items easily later as well as a quick lookup for the number of titles, e.g. in JavaScript
// assuming arrayOfObjects
var objectOfTitles = {},
for (i = 0; i < arrayOfObjects.length; ++i) {
if (!objectOfTitles.hasOwnProperty(arrayOfObjects[i].title)) {
objectOfTitles[arrayOfObjects[i].title] = [];
var numberOfTitles = Object.keys(objectOfTitles).length;
// then say you choose a title you want, and you can do
// objectOfTitles[chosenTitle] to get entries with just that title

Best way to store json object from rest api in an offline cordova app

I'm building an AngularJS application wrapped with Cordova for Android and iOS.
All my datas come from my rest API.
I have urls like my.api/groups/1/items to get the list of items
id: 5,
type: "foo",
title: "Item 5",
group: 1,
body: "<p>Some content</p> ",
img: ""
id: 6,
and my.api/items/5 to get a specific item
id: 5,
type: "foo",
title: "Item 5",
group: 1,
body: "<p>Some content</p> ",
img: ""
I retrieve my datas with restangular and it works perfectly'group', id).getList('items').then(function(data){
$scope.items = data;
Now I want that datas be available offline and refreshed time to time.
Localstorage is not possible because I ll have +5MB of datas and images.
I see a lot of posts about file API but does it mean I need to have a file for each item?
eg: item1.json, item2.json and a file for the list items.json
I think there is a better solution than having 500+ different files...
How to handle images? Do I need to parse my api, download images and replace with local links?
Why not use WebSQL? As a web standard, it is dead, but it works well on mobile and works great in PhoneGap/Cordova. As for images, I'd probably store them as binary though on the file system.
