How to download models and weights from tensorflow.js - javascript

I'm trying to download a pretrained tensorflow.js model including weights, to be used offline in python in the standard version of tensorflow as part of a project that is not on an early stage by any means, so switching to tensorflow.js is not a possibility.
But I cant just figure out how to download those models and if its necessary to to do some conversion to the model.
I'm aware that in javascript I can access the models and use them by calling them like this
but how do I actually get the .ckpt files or the model frozen if thats the case?
<script src="https://cdn.jsdelivr.net/npm/#tensorflow/tfjs#0.13.3"></script>
<script src="https://cdn.jsdelivr.net/npm/#tensorflow-models/posenet#0.2.3"></script>
My final objective is to get the frozen model files, and get the outputs like is done in the normal version of tensorflow.
Also this will be used in an offline environment, so any online reference would not be useful.
Thanks for your replies

It is possible to save the model topology and its weights by calling the method save of the model.
const model = tf.sequential();
model.add(tf.layers.dense(
{units: 1, inputShape: [10], activation: 'sigmoid'}));
const saveResult = await model.save('downloads://mymodel');
// This will trigger downloading of two files:
// 'mymodel.json' and 'mymodel.weights.bin'.
console.log(saveResult);
There are different scheme strings depending on where to save the model and its weights (localStorage, IndexDB, ...). doc

I went to https://storage.googleapis.com/tfjs-models/ and found the directory listing all the files. I found the relevant files (I wanted all the mobilenet float, as opposed to quantized mobileNet), and populated this file_uris list.
base_uri = "https://storage.googleapis.com/tfjs-models/"
file_uris = [
"savedmodel/posenet/mobilenet/float/050/group1-shard1of1.bin",
"savedmodel/posenet/mobilenet/float/050/model-stride16.json",
"savedmodel/posenet/mobilenet/float/050/model-stride8.json",
"savedmodel/posenet/mobilenet/float/075/group1-shard1of2.bin",
"savedmodel/posenet/mobilenet/float/075/group1-shard2of2.bin",
"savedmodel/posenet/mobilenet/float/075/model-stride16.json",
"savedmodel/posenet/mobilenet/float/075/model-stride8.json",
"savedmodel/posenet/mobilenet/float/100/group1-shard1of4.bin",
"savedmodel/posenet/mobilenet/float/100/group1-shard2of4.bin",
"savedmodel/posenet/mobilenet/float/100/group1-shard3of4.bin",
"savedmodel/posenet/mobilenet/float/100/model-stride16.json",
"savedmodel/posenet/mobilenet/float/100/model-stride8.json"
]
Then I used python to iteratively download the files into their same folders.
from urllib.request import urlretrieve
import requests
from pathlib import Path
for file_uri in file_uris:
uri = base_uri + file_uri
save_path = "/".join(file_uri.split("/")[:-1])
Path(save_path).mkdir(parents=True, exist_ok=True)
urlretrieve(uri, file_uri)
print(path, file_uri)
I enjoyed Jupyter Lab (Jupyter Notebook is also good) when experimenting with this code.
With this, you'll get a folder with bin files (the weight) and the json files (the graph model). Unfortunately, these are graph models, so they cannot be converted into SavedModels, and so they are absolutely useless to you. Let me know if someone find a way of running these tfjs graph model files in regular TensorFlow (preferably 2.0+).
You can also download zip files with the 'entire' model from TFHub, for example, a 2 byte quantized ResNet PoseNet is available here.

Related

Why loadGraphModel function from tensorflow.js not working?

I am working on deploy an ML that I trained using tensorflow (in Python). The model is saved as an .h5 file. After converting the model using the tensorflowjs_converter --input_format=keras ./model/myFile.h5 /JS_model/ command.
I imported the tensorflow library using the following:
<script src="https://cdn.jsdelivr.net/npm/#tensorflow/tfjs/dist/tf.min.js"> </script>
After this, I ws able to load the model using the loadLayersModel() function. However, when using the loadGraphModel, it does not work. It outputs this error on the browser:
''
I also tried using the tf.models.save_model.save() function in python which it outputs the variables and assets folders, as well as the .pb file. However, an error still occurs. Using the code above, changing only the path to 'THE_classifier' (which is the name of the folder where asset, variables and the .pb is located), the output is:
I want to work with the loadGraphModel() function because according to various sources, it provides a faster inference time.
layers models and graph models have different internal layout, they are not compatible and interchangable. if its a layers model, it must be loaded with tf.loadLayersModel and if its a graph model, it must be loaded with tf.loadGraphModel
graph models are frozen models - so if you want to convert keras model to graph, you need to freeze it first, else it can only be converted to layers model
(and thats where difference in inference time comes from - its faster to evaluate a frozen model than one that is still using variables)

What is the most efficient way to display images from a ZIP archive on an offline desktop app?

I have an offline Electron + React desktop app which uses no server to store data. Here is the situation:
An user can create some text data and add any number of images to them. When they save the data, a ZIP with the following structure is created:
myFile.zip
├ main.json // Text data
└ images
├ 00.jpg
├ 01.jpg
├ 02.jpg
└ ...
If they want to edit the data, they can open the saved ZIP with the app.
I use ADM-ZIP to read the ZIP content. Once open, I send the JSON data to Redux since I need them for the interface display. The images, however, are not displayed on the component that read the archive and can be on multiple "pages" of the app (so different components). I display them with an Image component that takes a name props and return an img tag. My problem here is how to get their src from the archive.
So here is my question: what could be the most efficient way to get the data of the images contained in the ZIP given the above situation?
What I am currently doing:
When the user open their file, I extract the ZIP in a temp folder and get the images paths from there. The temp folder is only deleted when the user close the file and not when they close the app (I don't want to change this behaviour).
// When opening the archive
const archive = new AdmZip('myFiles.zip');
archive.extractAllTo(tempFolderPath, true);
// Image component
const imageSrc = `${tempFolderPath}/images/${this.props.name}`;
Problem: if the ZIP contains a lot of images, the user must have enough space on their disk to get them extracted properly. Since the temp folder is not necessarily deleted when the app is closed, it means that the disk space will be taken until they close the file.
What I tried:
Storing the ZIP data in the memory
I tried to save the result of the opened ZIP in a props and then to get my images data from it when I need them:
// When opening the archive
const archive = new AdmZip('myFile.zip');
this.props.saveArchive(archive); // Save in redux
// Image component
const imageData = this.props.archive.readFile(this.props.name);
const imageSrc = URL.createObjectURL(new Blob([imageData], {type: "image/jpg"}));
With this process, the images loading time is acceptable.
Problem: with a large archive, storing the archive data in the memory might be bad for the performances (I think?) and I guess there is a limit on how much I can store there.
Opening the ZIP again with ADM-ZIP every time I have to display an image
// Image component
const archive = new AdmZip('myFile.zip');
const imageData = archive.readFile(this.props.name);
const imageSrc = URL.createObjectURL(new Blob([imageData], {type: "image/jpg"}));
Problem: bad performances, extremely slow when I have multiple images on the same page.
Storing the images Buffer/Blob in IndexedDB
Problem: it's still stored on the disk and the size is much bigger than the extracted files.
Using a ZIP that doesn't compress the data
Problem: I didn't see any difference compared to a compressed ZIP.
What I considered:
Instead of a ZIP, I tried to find a file type that could act as a non-compressed archive and be read as a directory but I couldn't find anything like that. I also tried to create a custom file with vanilla Node.js but I'm afraid that I don't know the File System API enough to do what I want.
I'm out of ideas so any suggestions are welcome. Maybe I didn't see the most obvious way to do... Or maybe that what I'm currently doing is not that bad and I am overthinking the problem.
What do you mean by "most efficient" is not exactly very clear. I'll make some assumptions and respond ahead according them.
Cache Solution
Basically what you have already done. It's pretty efficient loading it all at once (extracting to a temporary folder) because whenever you need to reuse something, the most spendable task doesn't have to be done again. It's a common practice of some heavy applications to load assets/modules/etc. at their startup.
Obs¹: Since you are considering the lack of disk space as a problem, if you want to stick with this is desirable that you handle this case programmatically and give the user some alert, since having little free space on storage is critical.
TL;DR - Spendable at first, but faster later for asset reutilization.
Lazy Loading
Another very common concept which consists in basically "load as you need, only what you need". This is efficient because this approach assures your application only loads the minimum needed to run and then load things as demanded by the user.
Obs¹: It looks pretty much on what you have done at your try number 2.
TL;DR - Faster at startup, slower during runtime.
"Smart" Loading
This is not a real name, but it describes satisfactorily what i mean. Basicaly, focus on understanding the goal of your project and mix both of previous solutions according to your context, so you can achieve the best overall performance in your application, reducing trade offs of each approach.
Ex:
Lazy loading images per view/page and keep a in-memory cache with
limited size
Loading images in background while the user navigates
Now, regardless of your final decision, the follow considerations should not be ignored:
Memory will always be faster to write/read than disk
Partially unzipping (setting specific files) is possible in many packages (including ADM-ZIP) and is always faster than unzipping everything, specially if the ZIP file is huge.
Using IndexedDB or a custom file-based database like SQLite offers overall good result for a huge number of files and a "smart" approach, since querying and organizing data is easier through those.
Always keep in mind the reason of every application design choice, the best you understand why you are doing something, the better your final result will be.
Topic 4 being said, in my opinion, in this case you really overthought a little, but that's not a bad thing and it's really admirable to have this kind of concern about how can you do things on the best possible way, I do this often even if it's not necessary and it's good for self improvement.
Well, I wrote a lot :P
I would be very pleased if all of this help you somehow.
Good luck!
TL;DR - There is not a closed answer based on your question, it depends a lot on the context of your application; some of your tries are already pretty good, but putting some effort into understanding the usability context to handle image loading "smartly" would surely award you.
The issue of using ZIP
Technically streaming the content would be the most efficient way but streaming zipped content from inside a ZIP is hard and inefficient by design since the data is fragmented and needs to parse the whole file to get the central directory
A much better format for your specific use case is 7zip it allows you to stream out single file and does not require you to read out the whole file to get the information about this single files.
The cost of using 7zip
You will have to add a 7zip binary to your program, luckily it's very small ~2.5MB.
How to use the 7zip
const { spawn } = require('child_process')
// Get all images from the zip or adjust to glob for the needed ones
// x, means extract while keeping the full path
// (optional instead of x) e, means extract flat
// Any additional parameter after the archive is a filter
let images = spawn('path/to/7zipExecutable', ['x', 'path/to/7zArchive', '*.png'])
Additional consideration
Either you use the Buffer data and display them inside the HTML via Base64 or you locate them into a cache folder that gets cleared as part of the extraction process. Depending on granular you want to do this.
Keep in mind that you should use appropriate folder for that under windows this would be %localappdata%/yourappname/cache under Linux and MacOS I would place this in a ~/.yourappname/cache
Conclusion
Streaming data is the most efficient (lowest memory/data footprint)
By using a native executable you can compress/decompress much faster then via pure js
the compression/decompression happens in another process avoiding your application from beeing blocked in the execution and rendering
ASAR as Alternative
Since you mentioned you considered to use another format that is not compressed you should give the asar file format a try it is a build in format in electron that can be accessed by regular filepaths via FS no extraction etc.
For example you have a ~/.myapp/save0001.asar which contains a img folder with multiple images in it, you would access that simply by ~/.myapp/save0001/img/x.png

Convert tfjs model for using in Tensorflow in c++

I'm new to tensorflow and I have one question, My project has two majors part, first written in NodeJs that train my model from dataset and save model to local storage, so I have two files:
model.json
wights.bin
The second part is written in c++, After couple of days I could build tensorflow with bazel and add it to my OpenCv project, so here is my question :
I want to train my model in NodeJs part and use these models in my C++ part. Is this possible ?
also I saw tjs converter but it converts models to use in NodeJs not vice versa.
Update :
After searching a lot I figured out that I should convert my models to protobuff file, but tfjs-Converter does not support this type of conversion and another point is that I want to use my model with opencv library.
Update 2
Finally I could change my model to .pb file, first I use tfjs_converter to convert to keras model(.h5 file) and after that use this python script to convert to .pb file and opencv can successfully load model. But I got this error in using model :
libc++abi.dylib: terminating with uncaught exception of type
cv::Exception: OpenCV(4.1.0)
/tmp/opencv-20190505-12101-14vk1fh/opencv-4.1.0/modules/dnn/src/dnn.cpp:524:
error: (-2:Unspecified error) Can't create layer
"flatten_Flatten1/Shape" of type "Shape" in function
'getLayerInstance'
Any help ?
thanks
Finally i solved my own problem.
Here is the steps that I've done :
Convert my tfjs model to keras model using tfjs-converter
Using this python scripts to change keras(.h5) model to frozen model(.pb)
Use this tutorial to optimize my .pb model
Finally everything works great!

Three.js loading large models

When I try to load a very large file using the appropriate loaders provided with the library, the tab my website runs in crashes. I have tried implementing the Worker class, but it doesnt seem to work. Heres what happens:
In the main javascript file I have:
var worker = new Worker('loader.js');
When user selects one of available models I check for the extension and pass the file URL/path to the worker: (in this instance a pcd file)
worker.postMessage({fileType: "pcd", file: file});
Now the loader.js has the appropriate includes that are necessary to make it work:
importScripts('js/libs/three.js/three.js');
importScripts('js/libs/three.js/PCDLoader.js');
and in its onmessage method, it uses the apropriate loader depending on file extension.
var loader = new THREE.PCDLoader();
loader.load(file, function (mesh) {
postMessage({points: mesh.geometry.attributes.position.array, colors: mesh.geometry.attributes.color.array});
});
The data is passed back successfully to the main javascript which adds it to the scene. At least for small files - large ones, like I said, take too long and the browser decides there was an error. Now I thought the worker class was supposed to work asynchronously, so whats the deal here?
Currently Three.js's loaders rely on strings and arrays of strings to parse data from a file. They dont split files into pieces, which leads to excessive memory usage which browsers immediately interrupt. Loading a 64 MB file spikes to over 1 GB memory used during load (which then results in an error).

How to read and write files with Appengine in Python?

I'm new at appengine and i need your help.
The javascript i want to display generates a graph.
import webapp2
MAIN_PAGE_HTML1 = """\
<html>
<body>
<script>
#My script comes here
var graph = new Graph();
graph.addNodes('a', 'b');
graph.addEdges(['a', 'b']);
#...
</script>
</body>
</html>
"""
class MainPage(webapp2.RequestHandler):
def get(self):
self.response.write(MAIN_PAGE_HTML1)
app = webapp2.WSGIApplication([
('/', MainPage),
], debug=True)
My idea was to store the main html in a file and read it if the RequestHandler is called and modify it when i post new graph elements from a client. I can't do this, because appengine doesn't allow standard file operations.
What is the easiest way to make it work?
App Engine lets you read files just fine, just not write to them. Among the many alternatives to plain files that you can use if you need read/write functionality, the best two are usually: (A) the App Engine data store, for "files" of modest to moderate size; (B) Google Cloud Storage, for "files" that can potentially be quite large.
Your use case seems to call for the former -- i.e, the datastore -- so I'll focus on that possibility.
Define a model class for entities representing HTML you want to send in your response -- usually such "models" are best kept in a separate model.py file, to be imported from others of your Python files, but that's an issue of proper code organization, not of functionality. For the latter, whatever file you place it in, your code will be somewhat like:
from google.appengine.ext import ndb
class Page(ndb.Model):
name = ndb.StringProperty()
html = ndb.TextProperty()
When you need to get a page by a certain name, your code will be something like:
page = Page.query(Page.name == the_name).get()
if page is None:
page = Page(name=the_name, html=MAIN_PAGE_HTML1)
page.put()
and to set new, modified html content on an existing page page previously fetched, just
page.html = new_html_content
page.put()
The put calls return a key which you may want to save (for example in memcache) if you want "strong consistency" (since key.get() is guaranteed to fetch the latest updated content, while getting from a query, without other precautions, might get a previously saved version of the data -- it only exhibits eventual consistency, not "immediate" updates).
But it's hard to be more specific in offering advice about how you should best use the datastore without knowing much, much more about your exact requirements -- how do you determine exactly what page to display and/or update (that would be given by the name property in my example code, while it would have been given by the filename if you could have ordinary read/write files as you wished), what are your consistency (immediacy of updates) requirements, and so forth.
(For most use cases one could infer from your incomplete specs, I'd probably use the name, that here I have modeled as a property, instead as an id, part of a key -- but, I'm trying to keep things simple to match what little you have expressed about your specs).
Note that in this approach the whole html content is re-written each time you want to change it -- the same goes for the main alternative (suggested for potentially much larger files), Google Cloud Storage: no actual "incremental updates", just complete re-writes to affect any change to a "file"'s contents.
That's the main difference between GCS and a common filesystem (while the datastore also offers much more functionality on top, such as queries and ordering of entities -- we're just not using any of that extra functionality here because you're asking just for filesystem-like behavior).

Categories