Dealing with a JSON object too big to fit into memory

Dealing with a JSON object too big to fit into memory - javascript

I have a dump of a Firebase database representing our Users table stored in JSON. I want to run some data analysis on it but the issue is that it's too big to load into memory completely and manipulate with pure JavaScript (or _ and similar libraries).
Up until now I've been using the JSONStream package to deal with my data in bite-sized chunks (it calls a callback once for each user in the JSON dump).
I've now hit a roadblock though because I want to filter my user ids based on their value. The "questions" I'm trying to answer are of the form "Which users x" whereas previously I was just asking "How many users x" and didn't need to know who they were.
The data format is like this:
{
users: {
123: {
foo: 4
},
567: {
foo: 8
}
}
}
What I want to do is essentially get the user ID (123 or 567 in the above) based on the value of foo. Now, if this were a small list it would be trivial to use something like _.each to iterate over the keys and values and extract the keys I want.
Unfortunately, since it doesn't fit into memory that doesn't work. With JSONStream I can iterate over it by using var parser = JSONStream.parse('users.*'); and piping it into a function that deals with it like this:
var stream = fs.createReadStream('my.json');
stream.pipe(parser);
parser.on('data', function(user) {
// user is equal to { foo: bar } here
// so it is trivial to do my filter
// but I don't know which user ID owns the data
});
But the problem is that I don't have access to the key representing the star wildcard that I passed into JSONStream.parse. In other words, I don't know if { foo: bar} represents user 123 or user 567.
The question is twofold:
How can I get the current path from within my callback?
Is there a better way to be dealing with this JSON data that is too big to fit into memory?

I went ahead and edited JSONStream to add this functionality.
If anyone runs across this and wants to patch it similarly, you can replace line 83 which was previously
stream.queue(this.value[this.key])
with this:
var ret = {};
ret[this.key] = this.value[this.key];
stream.queue(ret);
In the code sample from the original question, rather than user being equal to { foo: bar } in the callback it will now be { uid: { foo: bar } }
Since this is a breaking change I didn't submit a pull request back to the original project but I did leave it in the issues in case they want to add a flag or option for this in the future.

Related

What is the best way to transfer certain data to another json document?

Explanation:
I want to make an Electron app [Javascript not jQuary] (or am in the process of doing so) and would like to add a function that puts one config into the "format" of another.
The big file from which I want to take the information I currently read in via "dialog.showOpenDialog" and can also access the json object.
Now to the problem:
The file I get via the dialog is 8000 lines long and contains individual information that I want to pack into a smaller document with about 3000 lines.
Important: Individual information have a different name e.g. I want "ABCD: 23" from document 1 in the other as EFG: 23.
Now my two questions:
how can I best provide the smaller file for editing?
how can I convert the individual information without going through each line separately?
bigconfig.json:
{
"EXAMPLE_CATEGORY": {
"setting1": 0,
"setting2": 1,
"setting3": 115,
"setting4": 0,
},
Smallerconfig.json
{
"EXAMPLE_CATEGORY": {
"setting7": 115,
"setting8": 0,
},
Edit: What I want to achieve is that I can create (and save) a modified file with the information I packed from the big file into the small one.
In the smaller one should be all 3000 felt
Would really appreciate help... yesterday I did a lot of research and used the search engine for several hours.
Thanks in advance

The only way your smallerConfig object will know which new keys to use is if you define them beforehand. To do this, you must create an object that links the old key names to the new key names. These links would be best defined in one place. The code below holds these links in the conversionTable.
To build the smallerConfig object, you must loop (using for...in) through the bigConfig object one line at a time. Here you will check if the key in the bigConfig object matches a key in the conversionTable (using the in operator). If a matching key is found, then we will use the key’s value in the conversionTable as the new key in the smallerConfig object. Using the bigConfig value in the creation of the smallerConfig object is easy.
let bigConfig = {
'EXAMPLE_CATEGORY': {
'setting1': 0,
'setting2': 1,
'setting3': 115,
'setting4': 0
}
};
let smallerConfig = {
'EXAMPLE_CATEGORY': {}
};
let conversionTable = {
'setting3': 'setting7',
'setting4': 'setting8'
};
// Iterate through the bigConfig object
for (let bigKey in bigConfig.EXAMPLE_CATEGORY) {
// Check for a matching key in the conversionTable
if (bigKey in conversionTable) {
smallerConfig.EXAMPLE_CATEGORY[conversionTable[bigKey]] = bigConfig.EXAMPLE_CATEGORY[bigKey];
}
}
console.log(smallerConfig);
Output will be:
{
'EXAMPLE_CATEGORY': {
'setting7': 115,
'setting8': 0
}
}
Finally:
Use JSON.parse() to convert the file contents from a string to a Javascript object.
Use JSON.stringify() to convert the Javascript object back to a string for writing to the new file.

How to pass an array of objects through a query string according to REST?

What is the best practice to pass an array of objects throught query string in REST style?
For example, the array:
examples[] = [
{
name: "foo",
value: "1"
},
{
name: "bar",
value: "2"
}
]
I thought about it:
/items?examples[0][name]=foo&examples[0][value]=1&examples[1}[name]=bar&examples[1][value]=2
Are there other ways to do this?
Upd:
I need readable URL to show it to the user in the address field. It should display state of some filters in the table, I'm not sending it to the backend.

Since you're parsing this manually in JS, you could keep the structure you have and just write a parsing function
var items = {};
location.search.split("?")[1].split("&").map((q) => {
var [token, value] = q.split("="),
[idx, key] = /\[([0-9+])\]\[(\w+)\]/g.exec(token).slice(1, 3);
if (!items[idx]){
items[idx] = {};
}
items[idx][key] = value;
})
This will yield you something with a structure like
{
"0": {
"key1": "data"
"key2": "data:
},
"1": {
"key1": "data"
"key2": "data"
}
}
If you need it to end up an array, it would be pretty easy to convert, but keeping it as an object with numeric strings for keys will prevent an error if it's not sequential.
Also, note there's no error checking or anything here, so if you're going to have query string params that aren't in that format, you'll want to test for that and handle them differently.

You shouldn't take care about how pass data for a backend, Angular do it for you.
About your example, you probably want to update or save several item. So it's not into the url that you will pass your data but into the Request Body :
this.httpService.post(yourUrl, examples, yourHttpOptions).subscribe( (response) => {
// you manage your response data
});

REST does not care how you encode information into your identifiers. You can use any scheme you want, so long as it is consistent with the production rules defined by RFC 3986.
REST cares a little bit about how you share information about creating URI, in the sense that that information should be shared in some readily standardizable form, like an HTML form, or a URI Template.
We don't, to my knowledge, have a "readily standardizable form" that describes how to transform a json array to a query string.
But... REST does allow code on demand; embedding, for example, a bunch of java script into a resource where that javascript knows how to encode the json into the URI... that is in bounds, so long as you have the code on demand itself referenced in a readily standardizable way (like we have with HTML and script tags).
In practice? urlencode the json representation and put it onto the query string directly. That will get you through until you start to discover the real requirements that your URI design needs to support (requirements like: operators needing to be able to understand the access logs).

Implementing Keras Model into website with Keras.js

I have been trying to implement a basic Keras model generated in Python into a website using the Keras.js library. Now, I have the model trained and exported into the model.json, model_weights.buf, and model_metadata.json files. Now, I essentially copied and pasted test code from the github page to see if the model would load in browser, but unfortunately I am getting errors. Here is the test code. (EDIT: I fixed some errors, see below for remaining ones.)
var model = new KerasJS.Model({
filepaths: {
model: 'dist/model.json',
weights: 'dist/model_weights.buf',
metadata: 'dist/model_metadata.json'
},
gpu: true
});
model.ready()
.then(function() {
console.log("1");
// input data object keyed by names of the input layers
// or `input` for Sequential models
// values are the flattened Float32Array data
// (input tensor shapes are specified in the model config)
var inputData = {
'input_1': new Float32Array(data)
};
console.log("2 " + inputData);
// make predictions
return model.predict(inputData);
})
.then(function(outputData) {
// outputData is an object keyed by names of the output layers
// or `output` for Sequential models
// e.g.,
// outputData['fc1000']
console.log("3 " + outputData);
})
.catch(function(err) {
console.log(err);
// handle error
});
EDIT: So I changed my program around a little to be compatible with JS 5 (that was a stupid mistake on my part), and now I have encountered a different error. This error is caught and then is logged. The error I get is: Error: predict() must take an object where the keys are the named inputs of the model: input. I believe this problem arises because my data variable is not in the correct format. I thought that if my model took in a 28x28 array of numbers, then data should also be a 28x28 array so that it could correctly "predict" the right output. However, I believe I am missing something and that is why the error is being thrown. This question is very similar to mine, however it is in python and not JS. Again, any help would be appreciated.

Ok, so I figured out why this was happening. There were two problems. First, the data array needs to be flattened, so i wrote a quick function to take the 2D input and "flatten" it to be a 1D array of length 784. Then, because I used a Sequential model, the key name of the data should not have been 'input_1', but rather just 'input'. This got rid of all the errors.
Now, to get the output information, we simply can store it in an array like this: var out = outputData['output']. Because I used the MNIST data set, out was a 1D array of length 10 that contained probabilities of each digit being the user-written digit. From there, you can simply find the number with the highest probability and use that as a prediciton for the model.

JSON/jQuery - moving between child and parent objects

I'm currently building a tool for the card game Hearthstone. If you're familiar with the game it's basically a tool that allows you to add your in game deck to a list so that you can monitor what cards you have left at all times along with the chance of drawing X card etc. Nothing too fancy but since I am a huge novice to the world of web development I'm using it as an exercise to help me learn more.
Anyway on to my problem!
At the moment I have a JSON database that has every card in hearthstone along with all of the different parameters associated with each card such as name, cost, playerClass etc.
I have figured out how to retrieve objects from the database but only via the name, since that's what players will use to search for the card they want to add to their deck.The problem I have at the moment is that the name of the card is a child of the card object which is itself a child of the card set object (basic, classic, naxx, gvg etc)
I would like to get ALL of the card data back when I search for it by name but try as I might, I can't figure out how to talk to a parent object via it's child.
to start with here is the search function from the users input:
$.getJSON("json/AllSets.json",function(hearthStoneData){
$('.submit').click(function(){
var searchValue = $('#name').val();
var returnValue = getObjects(hearthStoneData, "name", searchValue);
console.log(hearthStoneData);
console.log(returnValue);
});
});
and here is the request from the database:
function getObjects(obj, key, val) {
var objects = [];
for (var i in obj) {
if (!obj.hasOwnProperty(i)) continue;
if (typeof obj[i] == 'object') {
objects = objects.concat(getObjects(obj[i], key, val));
} else if (i == key && obj[key].toLowerCase() == val.toLowerCase()) {
objects.push(obj[i]);
}
}
return objects;
}
And finally here is an example of one of the JSON cards I am trying to talk to.
{
"id":"EX1_066","name":"Acidic Swamp Ooze",
"type":"Minion",
"faction":"Alliance",
"rarity":"Common",
"cost":2,
"attack":3,
"health":2,
"text":"<b>Battlecry:</b> Destroy your opponent's weapon.",
"flavor":"Oozes love Flamenco. Don't ask.",
"artist":"Chris Rahn",
"collectible":true,
"howToGetGold":"Unlocked at Rogue Level 57.",
"mechanics":["Battlecry"]}
So the output im getting when I console log is something like this:
Object {Basic: Array[210], Classic: Array[387], Credits: Array[17], Curse of Naxxramas: Array[160], Debug: Array[58]…}
Basic: Array[210]
[0 … 99]
6: Object
artist: "Howard Lyon"
collectible: true
cost: 2
faction: "Neutral"
flavor: "This spell is much better than Arcane Implosion."
howToGet: "Unlocked at Level 1."
howToGetGold: "Unlocked at Level 28."
id: "CS2_025"
name: "Arcane Explosion"
playerClass: "Mage"
rarity: "Free"
text: "Deal $1 damage to all enemy minions."
type: "Spell"
As you can see, there are several nested arrays before you actually get to the card. I can sort of visualise in my head what I think needs to be done but I definitely dont feel certain. Also alot of the syntax has been copy/pasted and modified to suit my needs, I'm still a total beginner with this stuff and really have no idea how I would write a fix to this problem myself.
Any help is hugely appreciated.
Thanks

I think there's a problem with how the data is stored:
Every object needs to have a unique id
Every Child object needs to return a reference to the parentId. This needs to be stored on insert or creation of the child object.
There needs to be a way to look up any object by id

Does anyone know the original JSON.stringify() function in JavaScript?

I need it because I recently made an app that saves an object containing all the user-generated data to localStorage, and encodes/decodes it with JSON.
The bizarre thing is that for some reason, Internet Explorer has poor, if not zero, support for JSON ("JSON is not defined"), and I'm not up to trying to re-create the entire function.
stringify:function(x){
y='{'
for(i in x){
reg=RegExp('\'','g')
y+=',\''+i.replace(reg,'\\\'')+'\':\''+x[i].replace(reg,'\\\'')+'\''
}
y=y.replace(',','')
y+='}'
return y
}
This was my first attempt, but I had forgotten that the object has other objects inside it, which themselves contain objects, and kept getting an error which basically stemmed from trying to call the method String.prototype.replace() of an Object.
Since I was kinda OCD with my code at the time, I actually do have the structure of the object saved in the source code:
/*
Link Engine.data: Object: {
X: Object: { [Each is a Paradigm, contains links]
link.0:{
link:[link],
title:[title],
removed:[true/false],
starred:[true/false]
},
...
},
LSPAR: [Reserved] Object: { [Paradigm list and pointer contained here]
key:[key], (this controls X)
list:{
[listitem]:[listitem],
...
}
},
#CONFIG: [Reserved] Object: { [contains miscellaneous Config data]
property:boolean/number/string,
...
}
*/
That's the basic data structure, ... represents a repeating pattern.
Edit 2019
This whole question is an abomination, but I want to at least attempt to fix the bothersome documentation I wrote for my poorly-designed data structure so that it's more coherent:
Link {
string link
string title
boolean removed
boolean starred
}
Config {
...
/* Just has a bunch of arbitrary fields; not important */
}
WArray {
string... [paradigm-name]
/* Wasteful Array; an object of the form
* { "a":"a", "b":"b", ... }
*/
}
Paradigm { /* analogous to above "X: Object: {..." nonsense */
Link... [paradigm-name].[id]
/* each key is of the form [paradigm-name].[id] and stores a Link
* e.g. the first link in the "Example" paradigm would
* be identified by the key "Example.0"
*/
}
ParadigmList {
string key /* name of selected paradigm */
WArray list /* list of paradigm names */
}
LinkEngineData {
Paradigm... [paradigm-name]
ParadigmList LSPAR
Config #CONFIG /* actual field name */
}
Hopefully now you can sort of parse what's going on. This syntax:
type... format
is meant to convey that objects of type type appear many times, like an array, except it isn't an array. As such, the fields don't have a name that is set-in-stone, hence
format: [descriptor1]text[descriptor2]text...
a format is used in place of an actual field name. This is what happens when you try to create a data structure without knowing what a data structure is. I did use the words "data" and "structure" adjacently in the original question, but it was pure coincidence. I didn't mean it like "this is the data structure I used"; I meant it like "this is the structure of my data".
Anyways, here's how I would design it today:
Link {
string url
string title
boolean starred
}
LinkGroup {
string name
Link[] links
}
Config {
... /* has whatever it needs to have */
}
Data {
int selGroup
LinkGroup[] groups
Config config
}
That is all.
If someone has the sourcecode of the actual JSON.stringify function, or knows a way to replicate it, then please put your answer.
EDIT (2013, probably)
I ended up dropping IE support and completely redesigning the app from the ground up; the new version is hosted here. And it works with IE9 out of the box!

I think this is the best replacement: http://bestiejs.github.com/json3/
It claims to be better than Crockford's JSON 2 for the following reasons (from their site):
JSON 3...
Correctly serializes primitive wrapper objects (Issue #28).
Throws a TypeError when serializing cyclic structures (JSON 2 recurses until the call stack overflows).
Utilizes feature tests to detect broken or incomplete native JSON implementations (JSON 2 only checks for the presence of the native functions). The tests are only executed once at runtime, so there is no additional performance cost when parsing or serializing values.
In contrast to JSON 2, JSON 3 does not...
Add toJSON() methods to the Boolean, Number, and String prototypes. These are not part of any standard, and are made redundant by the design of the stringify() implementation.
Add toJSON() or toISOString() methods to Date.prototype. See the note about date serialization below.

Try https://github.com/douglascrockford/JSON-js

I think you should use the json2.js library:
https://github.com/douglascrockford/JSON-js

We Keep Coding

JavaScript is the programming language of the Web.

Dealing with a JSON object too big to fit into memory - javascript

Related

What is the best way to transfer certain data to another json document?

How to pass an array of objects through a query string according to REST?

Implementing Keras Model into website with Keras.js

JSON/jQuery - moving between child and parent objects

Does anyone know the original JSON.stringify() function in JavaScript?

Categories

Resources