Implementing Keras Model into website with Keras.js - javascript

I have been trying to implement a basic Keras model generated in Python into a website using the Keras.js library. Now, I have the model trained and exported into the model.json, model_weights.buf, and model_metadata.json files. Now, I essentially copied and pasted test code from the github page to see if the model would load in browser, but unfortunately I am getting errors. Here is the test code. (EDIT: I fixed some errors, see below for remaining ones.)
var model = new KerasJS.Model({
filepaths: {
model: 'dist/model.json',
weights: 'dist/model_weights.buf',
metadata: 'dist/model_metadata.json'
},
gpu: true
});
model.ready()
.then(function() {
console.log("1");
// input data object keyed by names of the input layers
// or `input` for Sequential models
// values are the flattened Float32Array data
// (input tensor shapes are specified in the model config)
var inputData = {
'input_1': new Float32Array(data)
};
console.log("2 " + inputData);
// make predictions
return model.predict(inputData);
})
.then(function(outputData) {
// outputData is an object keyed by names of the output layers
// or `output` for Sequential models
// e.g.,
// outputData['fc1000']
console.log("3 " + outputData);
})
.catch(function(err) {
console.log(err);
// handle error
});
EDIT: So I changed my program around a little to be compatible with JS 5 (that was a stupid mistake on my part), and now I have encountered a different error. This error is caught and then is logged. The error I get is: Error: predict() must take an object where the keys are the named inputs of the model: input. I believe this problem arises because my data variable is not in the correct format. I thought that if my model took in a 28x28 array of numbers, then data should also be a 28x28 array so that it could correctly "predict" the right output. However, I believe I am missing something and that is why the error is being thrown. This question is very similar to mine, however it is in python and not JS. Again, any help would be appreciated.

Ok, so I figured out why this was happening. There were two problems. First, the data array needs to be flattened, so i wrote a quick function to take the 2D input and "flatten" it to be a 1D array of length 784. Then, because I used a Sequential model, the key name of the data should not have been 'input_1', but rather just 'input'. This got rid of all the errors.
Now, to get the output information, we simply can store it in an array like this: var out = outputData['output']. Because I used the MNIST data set, out was a 1D array of length 10 that contained probabilities of each digit being the user-written digit. From there, you can simply find the number with the highest probability and use that as a prediciton for the model.

Related

How can you store and modify large datasets in node.js?

Basics
So basically I have written a program which generates test data for MongoDB in Node.
The problem
For that, the program reads a schema file and generates a specified amount of test data out of it. The problem is that this data can eventually become quite big (Think about creating 1M Users (with all properties it needs) and 20M chat messages (with userFrom and userTo) and it has to keep all of that in the RAM to modify/transform/map it and after that save it to a file.
How it works
The program works like that:
Read schema file
Create test data from the schema and store it in a structure (look down below for the structure)
Run through this structure and link all objects referenceTo to a random object with matching referenceKey.
Transform the object structure in a string[] of MongoDB insert statements
Store that string[] in a file.
This is the structure of the generated test data:
export interface IGeneratedCollection {
dbName: string, // Name of the database
collectionName: string, // Name of the collection
documents: IGeneratedDocument[] // One collection has many documents
}
export interface IGeneratedDocument {
documentFields: IGeneratedField [] // One document has many fields (which are recursive, because of nested documents)
}
export interface IGeneratedField {
fieldName: string, // Name of the property
fieldValue: any, // Value of the property (Can also be IGeneratedField, IGeneratedField[], ...)
fieldNeedsQuotations?: boolean, // If the Value needs to be saved with " ... "
fieldIsObject?: boolean, // If the Value is a object (stored as IGeneratedField[]) (To handle it different when transforming to MongoDB inserts)
fieldIsJsonObject?: boolean, // If the Value is a plain JSON object
fieldIsArray?: boolean, // If the Value is array of objects (stored as array of IGeneratedField[])
referenceKey?: number, // Field flagged to be a key
referenceTo?: number // Value gets set to a random object with matching referenceKey
}
Actual data
So in the example with 1M Users and 20M messages it would look like this:
1x IGeneratedCollection (collectionName = "users")
1Mx IGeneratedDocument
10x IGeneratedField (For example each user has 10 fields)
1x IGeneratedCollection (collectionName = "messages")
20Mx IGeneratedDocument
3x IGeneratedField (message, userFrom, userTo)
hich would result in 190M instances of IGeneratedField (1x1Mx10 + 1x20Mx3x = 190M).
Conclusion
This is obviously a lot to handle for the RAM as it needs to store all of that at the same time.
Temporary Solution
It now works like that:
Generate 500 documents(rows in sql) at a time
JSON.stringify those 500 documents and put them in a SQLite table with the schema (dbName STRING, collectionName STRING, value
JSON)
Remove those 500 documents from JS and let the Garbage Collector do its thing
Repeat until all data is generated and in the SQLite table
Take one of the rows (each containing 500 documents) at a time, apply JSON.parse and search for keys in them
Repeat until all data is queried and all keys retrieved
Take one of the rows at a time, apply JSON.parse and search for key references in them
Apply JSON.stringify and update the row if necessary (if key references found and resolved)
Repeat until all data is queried and all keys are resolved
Take one of the rows at a time, apply JSON.parse and transform the documents to valid sql/mongodb inserts
Add the insert (string) in a SQLite table with the schema (singleInsert STRING)
Remove the old and now unused row from the SQLite table
Write all inserts to file (if run from the command line) or return a dataHandle to query the data in the SQLite table (if run from other
node app)
This solution does handle the problem with RAM, because SQLite automatically swaps to the Harddrive when the RAM is full
BUT
As you can see there are a lot of JSON.parse and JSON.stringify involved, which slows down the whole process drastically
What I have thought:
Maybe I should modify the IGeneratedField to only use shortend names as variables (fieldName -> fn, fieldValue -> fv, fieldIsObject -> fio, fieldIsArray -> fia, ....)
This would make the needed storage in the SQLite table smaller, BUT it would also make the code harder to read
Use a document oriented database (But I have not really found one), to handle JSON data better
The Question
Is there any better solution to handle big objects like this in node?
Is my temporary solution OK? What is bad about it? Can it be changed to perform better?
Conceptually, generate items in a stream.
You don't need all 1M users in db. You could add 10k at a time.
For the messages, random sample 2n users from db, those send messages to each other. Repeat till satisfied.
Example:
// Assume Users and Messages are both db.collections
// Assume functions generateUser() and generateMessage(u1, u2) exist.
const desiredUsers = 10000;
const desiredMessages = 5000000;
const blockSize = 1000;
(async () => {
for (const i of _.range(desiredUsers / blockSize) ) {
const users = _.range(blockSize).map(generateUser);
await Users.insertMany(users);
}
for (const i of _.range(desiredMessages / blockSize) ) {
const users = await Users.aggregate([ { $sample: { size: 2 * blockSize } } ]).toArray();
const messages = _.chunk(users, 2).map( (usr) => generateMessage(usr[0], usr[1]));
await Messages.insertMany(messages);
}
})();
Depending on how you tweak the stream, you get a different distribution. This is uniform distribution. You can get more long tailed distribution by interleaving the users and messages. For example, you might want to do this for message boards.
Went to 200MB after i switched the blockSize to 1000.

Returning a single child's value on Firebase query using orderByChild and equalTo

I am trying to pull a URL for an image in storage that is currently logged in the firebase real time database.
This is for a game of snap - there will be two cards on the screen (left image and right image) and when the two matches the user will click snap.
All of my image urls are stored in the following way:
Each one has a unique child called "index" - I also have another tree that is just a running count of each image record. So currently I am running a function that checks the total of the current count, then performs a random function to generate a random number, then performs a database query on the images tree using orderByChild and an equalTo that contains the random index number.
If I log the datasnap of this I can see a full node for one record (So index, score, url, user and their values) however if I try to just pull the URL I get returned a value of Null. I can, rather annoyingly, return the term "URL" seemingly at my leisure but I can't get the underlying value. I've wondered if this is due to it being a string and not a numeric but I can't find anything to suggest that is a problem.
Please bare in mind I've only been learning Javascript for about a week at max, so if I'm making obvious rookie errors that's probably why!
Below is a code snippet to show you what I mean:
var indRef = firebase.database().ref('index')
var imgRef = firebase.database().ref('images')
var leftImg = document.getElementById('leftImg')
var rightImg = document.getElementById('rightImg')
document.addEventListener('DOMContentLoaded', function(){
indRef.once('value')
.then(function(snapShot){
var indMax = snapShot.val()
return indMax;
})
.then(function(indMax){
var leftInd = Math.floor(Math.random()* indMax + 1)
imgRef.orderByChild('index').equalTo(leftInd).once('value', function(imageSnap){
var image = imageSnap.child('url').val();
leftImg.src=image;
})
})
})
When you execute a query against the Firebase Database, there will potentially be multiple results. So the snapshot contains a list of those results. Even if there is only a single result, the snapshot will contain a list of one result.
Your code needs to cater for that list, by looping over Snapshot.forEach():
imgRef.orderByChild('index').equalTo(leftInd).once('value', function(imageSnap){
imageSnap.forEach(function(child) {
var image = child.child('url').val();
leftImg.src=image;
})
})

Trying to dynamically organize JSON object elements into different arrays based on values

This is the JSON I'm working with:
https://data.cityofnewyork.us/resource/xx67-kt59.json?$where=camis%20=%2230112340%22
I'd be dynamically making the queries using different data, so it'll possibly change.
What I'm essentially trying to do is to somehow organize the elements within this array into different arrays based on inspection_date.
So for each unique inspection_date value, those respective inspections would be put into its own collection.
If I knew the dates beforehand, I could easily iterate through each element and just push into an array.
Is there a way to dynamically create the arrays?
My end goal is to be able to display each group of inspections (based on inspection date) using Angular 5 on a webpage. I already have the site up and working and all of the requests being made.
So, I'm trying to eventually get to something like this. But of course, using whatever dates in the response from the request.
2016-10-03T00:00:00
List the inspections
2016-04-30T00:00:00
List the inspections
2016-04-12T00:00:00
List the inspections
Just for reference, here's the code I'm using:
ngOnInit() {
this.route.params.subscribe(params => {
this.title = +params['camis']; // (+) converts string 'id' to a number
this.q.getInpectionsPerCamis(this.title).subscribe((res) => {
this.inspectionList = res;
console.log(res);
});
// In a real app: dispatch action to load the details here.
});
}
I wish I could give you more info, but at this point, I'm just trying to get started.
I wrote this in jQuery just because it was faster for me, but it should translate fairly well to Angular (I just don't want to fiddle with an angular app right now)
Let me know if you have any questions.
$(function() {
let byDateObj = {};
$.ajax({
url: 'https://data.cityofnewyork.us/resource/xx67-kt59.json?$where=camis%20=%2230112340%22'
}).then(function(data) {
//probably do a check to make sure the data is an array, im gonna skip that
byDateObj = data.reduce(function(cum, cur) {
if (!cum.hasOwnProperty(cur.inspection_date)) cum[cur.inspection_date] = [];
//if the cumulative array doesn't have the inspection property already, add it as an empty array
cum[cur.inspection_date].push(cur);
//push to inspection_date array.
return cum;
//return cumulatie object
}, byDateObj);
//start with an empty object by default;
console.log(byDateObj);
}, console.error);
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

Removing attributes from nested array in D3

I am trying to filter out some attributes from an array in D3. The array contains the values of a csv file.
This all worked well for a small csv file doing it like this:
d3.csv("foods.csv", function(data) {
data.forEach(function(v){ delete v.name });
data.forEach(function(v){ delete v.created_at });
});
This is what the first array looks like:
But when I try to do it for a bigger csv file I get an error saying : "devtools was disconnected from the page. once page is reloaded devtools will automatically reconnect".
This is what the 2nd array looks like.
Why is this not working for the 2nd array? Is the array too big or should I try to address the values recursively because I already tried doing it like this:
function deleteCitation(v) {
if(Object.prototype.toString.call(v) === '[object Array]' ) {
v.forEach(deleteCitation);
}
else {
delete v.citation;
}
}
d3.csv("compounds_foods.csv", function(data) {
data.forEach(deleteCitation);
print(data);
});
I never loaded an CSV with 740 thousand rows. However, I believe you have some alternatives:
Use a row conversion function, or an accessor function:
d3.csv("foods.csv", deleteCitation, function(data) {
//the rest of the code
And then declare the conversion function:
function deleteCitation(d){
delete d.name;
delete d.created_at;
return d;
}
I didn't benchmarked it, maybe the conversion function takes the same time that your forEach (they do pretty much the same thing), but I believe that it's worth to check if this is quicker than calling your deleteCitation function for each object inside the data array.
The second alternative is simpler: don't remove those two properties, just leave them there and don't use them!
When you load an CSV to your data array you don't have to use all the properties in each object for your visualisation. You can simply ignore them. It's possible that you waste more processing time manipulating that huge array than simply leaving those extra two objects there.
The third alternative is the logical one: as there is absolutely no way you're gonna use 740k objects in a dataviz, consider filtering/reducing/cropping this CSV before sending it to the client side.

Dealing with a JSON object too big to fit into memory

I have a dump of a Firebase database representing our Users table stored in JSON. I want to run some data analysis on it but the issue is that it's too big to load into memory completely and manipulate with pure JavaScript (or _ and similar libraries).
Up until now I've been using the JSONStream package to deal with my data in bite-sized chunks (it calls a callback once for each user in the JSON dump).
I've now hit a roadblock though because I want to filter my user ids based on their value. The "questions" I'm trying to answer are of the form "Which users x" whereas previously I was just asking "How many users x" and didn't need to know who they were.
The data format is like this:
{
users: {
123: {
foo: 4
},
567: {
foo: 8
}
}
}
What I want to do is essentially get the user ID (123 or 567 in the above) based on the value of foo. Now, if this were a small list it would be trivial to use something like _.each to iterate over the keys and values and extract the keys I want.
Unfortunately, since it doesn't fit into memory that doesn't work. With JSONStream I can iterate over it by using var parser = JSONStream.parse('users.*'); and piping it into a function that deals with it like this:
var stream = fs.createReadStream('my.json');
stream.pipe(parser);
parser.on('data', function(user) {
// user is equal to { foo: bar } here
// so it is trivial to do my filter
// but I don't know which user ID owns the data
});
But the problem is that I don't have access to the key representing the star wildcard that I passed into JSONStream.parse. In other words, I don't know if { foo: bar} represents user 123 or user 567.
The question is twofold:
How can I get the current path from within my callback?
Is there a better way to be dealing with this JSON data that is too big to fit into memory?
I went ahead and edited JSONStream to add this functionality.
If anyone runs across this and wants to patch it similarly, you can replace line 83 which was previously
stream.queue(this.value[this.key])
with this:
var ret = {};
ret[this.key] = this.value[this.key];
stream.queue(ret);
In the code sample from the original question, rather than user being equal to { foo: bar } in the callback it will now be { uid: { foo: bar } }
Since this is a breaking change I didn't submit a pull request back to the original project but I did leave it in the issues in case they want to add a flag or option for this in the future.

Categories