Find JSON object size without parsing it to string

Find JSON object size without parsing it to string - javascript

I know I can get the size of a JSON object in bytes by using JSON.parse(data).length.
//UTF-8 etc can be ignored for now but parsing takes time for huge data which I don't want.
Is there any way to get its size in MB without transforming it to a string?

We have next options:
Recurring calculation like object-sizeof library
object to string/buffer:
JSON.stringify({a:1}).length
v8.serialize({a:1}).length

For security reasons, Javascript is not allowed to access or mutate information about the device, therefore, determining exactly how many bytes an object occupies should be impossible.
With that being said, the following Javascript command DOES exist (within chrome only):
window.performance.memory
This returns an object with the amount of bytes the window can use at maximum, the amount of bytes used including free space, and the amount of bytes actually used. You could, theoretically, use that to determine the amount of bytes used before an object was created and after, and calculate the difference. The memory-stats project for instance utilizes that command.
However, the values in this object never change except if chrome was launched with the "--enable-precise-memory-info" flag. You therefore cannot use this command in a (production) application (the MDN docs indicate the same). You can only approach the amount of memory an object occupies by counting all the strings and numbers and multiplying that by how much bytes a string usually occupies (which is what the object-sizeof library does).
If you are just interested in the size of the object and do not wish to use that information in a production app, you can simply do so by making a timeline recording in the Chrome Devtools.

There is no native way to calculate the size of an object in Javascript but there is a node module that gives you the size of an object in bytes.
object-sizeof
This would be an example of what you need:  
var sizeof = require('object-sizeof');
// 2B per character, 6 chars total => 12B
console.log(`${sizeof({abc: 'def'})/1024} MB`);

Related

Verify size and type of data stored in IndexedDB (Int8Array)

I'm setting up a virtual file system based on IndexedDB in the browser client. My assumption is that it will be efficient storing typed arrays as values; in particular, I am saving 8 kB chunks of data via Int8Array. What means do I have to verify the effectiveness of this - actual space used, actual data representation?
For example, in Chromium I can see that the Int8Array seem to be properly preserved. But the 'Clear Storage' page shows a suspiciously high storage size - 91 kB - although the first test file I was writing is only 40 kB in size (40096 bytes spread across 5 keys, plus 24 bytes for a meta-data key). The keys are quite small arrays containing a short string path and a number. So it looks like the storage takes around twice as much as predicted:
In constrast, I cannot find any information in Firefox about the usage amount, but the storage browser shows only type 'Array' for the values, and they are represented very inefficiently as JSON objects, although that may be a display issue only:
A related question is, is there a difference between storing the ArrayBuffer objects as opposed to an Int8Array view on it? I tried both, and there is a mimimal difference in size (Chromium uses 90.8 kB instead of 90.9 kB if I use ArrayBuffer instead of Int8Array as value passed to IDB).

First: Of course, writing only a small file can distort the picture, as overhead for the database itself is discounted. So I wrote instead a 40 MB file (now using ArrayBuffer directly), and now Chromium reports just under 41 MB of storage usage, confirming that the data is stored relatively compact. Indead I could see that the live updated storage usage was temporarily higher before going back to 41 MB, indicating that there is a compaction/cleaner algorithm running as well.
Since Firefox cannot show data usage for file://, I ran it through a web server for testing, and here too 41 MB of space is used.
Another surprising result was that cleverly storing an Int8Array which is a subarray actually seems to store the entire backing ArrayBuffer content as well. So, while the values are reported Int8Array(8192) in Chromium, the storage was much larger due to the underlying ArrayBuffer size (64K in my case). In this sense, better store the ArrayBuffer instance directly to avoid surprises.
BTW, Firefox was around 3 times faster than Chromium in this task. Still both perform abysmally slow (3 and 10 seconds respectively to perform the asynchronous I/O of storing 41 MB in chunks of 8 KB).

is their is any size limit of the protocol buffer?

I am passing the data from my client to server and vice versa . I want to know is their is any size limit of the protocol buffer .

Citing the official source:
Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.
That said, Protocol Buffers are great for handling individual messages within a large data set. Usually, large data sets are really just a collection of small pieces, where each small piece may be a structured piece of data. Even though Protocol Buffers cannot handle the entire set at once, using Protocol Buffers to encode each piece greatly simplifies your problem: now all you need is to handle a set of byte strings rather than a set of structures.
Protocol Buffers do not include any built-in support for large data sets because different situations call for different solutions. Sometimes a simple list of records will do while other times you may want something more like a database. Each solution should be developed as a separate library, so that only those who need it need to pay the costs.
As far as I understand the protobuf encoding the following applies:
varints above 64-bit are not specified, but given how their encoding works varint bit-length is not limited by the wire-format (varint consisting of several 1xxxxxxx groups and terminated by a single 0xxxxxxx is perfectly valid -- I suppose there is no actual implementation supporting varints larger than 64-bit thought)
given the above varint encoding property, it should be possible to encode any message length (as varints are used internally to encode length of length-delimited fields and other field types are varints or have a fixed length)
you can construct arbitrarily long valid protobuf messages just by repeating a single repeated field ad-absurdum -- parser should be perfectly happy as long as it has enough memory to store the values (there are even parsers which provide callbacks for field values thus relaxing memory consumption, e.g. nanopb)
(Please do validate my thoughts)

Why is a javascript object literal in memory so much bigger than its text file size in JSON form?

I have a huge JSON (let's call it {...}) which is about 80 MB. All keys/values are strings or objects. I expect that if I load it into javascript by saying var myJson = {...}, the amount of memory it takes up should be not too much bigger than 80 MB. However, using node.js's process.memoryUsage(), I see that as soon as I load this object my memory used increases by about 600 MB. Why does this happen, and what are some workarounds?
Edit: I have changed it from var myJSon = {...} to const myJson = require('./database.json'), with the contents of the json in a .json file rather than a .js file. Strangely, this seemed to reduce the amount of memory used by 50%, so still about 4x as large as the .json file.

There is a gap between JSON text file size and JSON object memory size.
"the amount of memory it takes up should be not too much bigger than 80 MB" -- That's not true, especially when the object is very big.
According to ECMAScript Language Specification, each string character will occupy 16 bits, and each number value will occupy 64 bits. This means there is a huge gap between JSON text file size and JSON object memory size.
Take the following simple object for example: {name:'John',age:16}. Saving this object as a text file only need 20 ascii characters, which means 20 Bytes. However, store this JSON object in memory needs at least 30 bytes ("name".length + "John".length + "age".length = 11, with each character occupy 2 Bytes, it is 22 Bytes; plus the 8 Bytes used for 16, in all, it needs 30 Bytes.).
The internal object structure ("John" is mapping to "name", and 16 is mapping to "age") also occupy memory. Even without this consideration, the gap between 20 Bytes and 30 Bytes is already big for such a simple object.
The gap would be very big if the JSON object is huge (like the one in question), or if the JSON object structure is very complex.
Of course, JavaScript engine will do some optimization work and reduce the usage of memory. But as ECMAScript specification already stated the bits usage of string and number etc., the gap would alway exist, and such gap is not trivial.
p.s. There is a npm module called object-sizeof, which can be used to estimate object memory allocation.

Shared memory | Node.JS + PHP | blocksize | remove trailing "00"

I am using shared memory to make data available among a PHP and a Node.JS process (on Debian9). I Open the memory block on php with shmop_open(). That function requires the size in bytes of the memory block. Since the block is created once and then repeating filled with differnt sized data, i select block size with a little buffer space. That means the block size can´t be as big as the data size, since it changes often. Data written with shmop_write() is of type string btw.
In Node.JS i use the module shm-typed-array to access the shm. I use shm.get (key, 'Buffer'). After that i convert it into a string using toString('utf8').
The Problem: shm.get() reads the entire shared memory, no matter how many bytes are actually used. So i receive a hexadecimal which is followed by a lot of 00 pairs. If i convert that hexa into a string, i receive my previosuly saved data with a lot of (spaces?) behind it. How am i supposed to fix this? I cannot "trim()" the resulting string, which makes me guess those spaces behind my data are no "real" spaces.
Thanks in advance

Greasemonkey Storage

Is there any limit on how much data can be stored using GM_setValue?

GM stores it in properties. Open about:config and look for them.
According to http://diveintogreasemonkey.org/api/gm_getvalue.html, you can find them in the greasemonkey.scriptvals branch.
This sqlite info on its limits shows some default limits for strings and blobs, but they may be changed by Firefox.

More information is in the Greasespot Wiki:
The Firefox preference store is not designed for storing large amounts of data. There are no hard limits, but very large amounts of data may cause Firefox to consume more memory and/or run more slowly.2
The link refers to a discussion in the Greasemonkey Mailinglist. Anthony Lieuallen answers the same question as you posted:
I've just tested this. Running up to a 32 meg string seems to work
without major issues, but 64 or 128 starts to thrash the disk for
virtual memory a fair deal.

According to the site you provided, "The value argument can be a string, boolean, or integer."
Obviously, a string can hold far more information than an integer or boolean.
Since GreaseMonkey scripts are JavaScript, the max length for a GM_setValue is the max length of a JavaScript string. Actually, the JavaScript engine (browser specific) determines the max length of a string.
I do not know any specifics, but you could write a script to determine max length.
Keep doubling length until you get an error. Then, try a value halfway between maxGoodLen and minBadLen until maxGoodLen = maxBadLen - 1.

We Keep Coding

JavaScript is the programming language of the Web.