Shared memory | Node.JS + PHP | blocksize | remove trailing "00" - javascript

I am using shared memory to make data available among a PHP and a Node.JS process (on Debian9). I Open the memory block on php with shmop_open(). That function requires the size in bytes of the memory block. Since the block is created once and then repeating filled with differnt sized data, i select block size with a little buffer space. That means the block size can´t be as big as the data size, since it changes often. Data written with shmop_write() is of type string btw.
In Node.JS i use the module shm-typed-array to access the shm. I use shm.get (key, 'Buffer'). After that i convert it into a string using toString('utf8').
The Problem: shm.get() reads the entire shared memory, no matter how many bytes are actually used. So i receive a hexadecimal which is followed by a lot of 00 pairs. If i convert that hexa into a string, i receive my previosuly saved data with a lot of (spaces?) behind it. How am i supposed to fix this? I cannot "trim()" the resulting string, which makes me guess those spaces behind my data are no "real" spaces.
Thanks in advance

Related

Node read only first N bytes/chars from file

I'm coding an app which has to determine a limited set of file types. These are all text based, and the marker I'm looking for should be at the beginning N bytes, let's say 512 bytes. The way how I determine the type afterwards is not subject of the current topic.
I've seen fs.readFile/Sync(), but it reads the whole file, which is unnecessary in my case. Is it possible to read only a few bytes from the beginning to improve performance?
I'm using typescript, but I don't think it matters in the current case.
Also, I'm looking for something that doesn't involve executing random stuffs from the main OS (eg cat or echo with some special parameters)
Just read documentation about fs.open/fs.read/fs.close

Find JSON object size without parsing it to string

I know I can get the size of a JSON object in bytes by using JSON.parse(data).length.
//UTF-8 etc can be ignored for now but parsing takes time for huge data which I don't want.
Is there any way to get its size in MB without transforming it to a string?
We have next options:
Recurring calculation like object-sizeof library
object to string/buffer:
JSON.stringify({a:1}).length
v8.serialize({a:1}).length
For security reasons, Javascript is not allowed to access or mutate information about the device, therefore, determining exactly how many bytes an object occupies should be impossible.
With that being said, the following Javascript command DOES exist (within chrome only):
window.performance.memory
This returns an object with the amount of bytes the window can use at maximum, the amount of bytes used including free space, and the amount of bytes actually used. You could, theoretically, use that to determine the amount of bytes used before an object was created and after, and calculate the difference. The memory-stats project for instance utilizes that command.
However, the values in this object never change except if chrome was launched with the "--enable-precise-memory-info" flag. You therefore cannot use this command in a (production) application (the MDN docs indicate the same). You can only approach the amount of memory an object occupies by counting all the strings and numbers and multiplying that by how much bytes a string usually occupies (which is what the object-sizeof library does).
If you are just interested in the size of the object and do not wish to use that information in a production app, you can simply do so by making a timeline recording in the Chrome Devtools.
There is no native way to calculate the size of an object in Javascript but there is a node module that gives you the size of an object in bytes.
object-sizeof
This would be an example of what you need:  
var sizeof = require('object-sizeof');
// 2B per character, 6 chars total => 12B
console.log(`${sizeof({abc: 'def'})/1024} MB`);

is their is any size limit of the protocol buffer?

I am passing the data from my client to server and vice versa . I want to know is their is any size limit of the protocol buffer .
Citing the official source:
Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.
That said, Protocol Buffers are great for handling individual messages within a large data set. Usually, large data sets are really just a collection of small pieces, where each small piece may be a structured piece of data. Even though Protocol Buffers cannot handle the entire set at once, using Protocol Buffers to encode each piece greatly simplifies your problem: now all you need is to handle a set of byte strings rather than a set of structures.
Protocol Buffers do not include any built-in support for large data sets because different situations call for different solutions. Sometimes a simple list of records will do while other times you may want something more like a database. Each solution should be developed as a separate library, so that only those who need it need to pay the costs.
As far as I understand the protobuf encoding the following applies:
varints above 64-bit are not specified, but given how their encoding works varint bit-length is not limited by the wire-format (varint consisting of several 1xxxxxxx groups and terminated by a single 0xxxxxxx is perfectly valid -- I suppose there is no actual implementation supporting varints larger than 64-bit thought)
given the above varint encoding property, it should be possible to encode any message length (as varints are used internally to encode length of length-delimited fields and other field types are varints or have a fixed length)
you can construct arbitrarily long valid protobuf messages just by repeating a single repeated field ad-absurdum -- parser should be perfectly happy as long as it has enough memory to store the values (there are even parsers which provide callbacks for field values thus relaxing memory consumption, e.g. nanopb)
(Please do validate my thoughts)

Building large strings in JavaScript ; Is join method most efficient?

In writing a database to disk as a text file of JSON strings, I've been experimenting with how to most efficiently build the string of text that is ultimately converted to a blob for download to disk.
There a number of questions that state to not concatenate a string with the + operator in a loop, but instead write the component strings to an array and then use the join method to build one large string.
The best explanation I came across explaining why can be found here, by Jeol Mueller:
In JavaScript (and C# for that matter) strings are immutable. They can never be changed, only replaced with other strings. You're probably aware that combined + "hello " doesn't directly modify the combined variable - the operation creates a new string that is the result of concatenating the two strings together, but you must then assign that new string to the combined variable if you want it to be changed.
So what this loop is doing is creating a million different string objects, and throwing away 999,999 of them. Creating that many strings that are continually growing in size is not fast, and now the garbage collector has a lot of work to do to clean up after this."
The thread here, was also helpful.
However, using the join method didn't allow me to build the string I was aiming for without getting the error:
allocation size overflow
I was trying to write 50,000 JSON strings from a database into one text file, which simply may have been too large no matter what. I think it was reaching over 350MB. I was just testing the limit of my application and picked something far larger than a user of the application will likely ever create. So, this test case was likely unreasonable.
Nonetheless, this leaves me with three questions about working with large strings.
For the same amount of data overall, does altering the number of array elements joined in a single join operation affect the efficiency in terms of not hitting an allocation size overflow?
For example, I tried writing the JSON strings to a pseudo 3-D array of 100 (and then 50) elements per dimension; and then looped through the outer two dimensions joining them together. 100^3 = 1,000,000 or 50^3 = 125,000 both provide more than enough entries to hold the 50,000 JSON strings. I know I'm not including the 0 index, here.
So, the 50,000 strings were held in an array from a[1][1][1] to a5[100][100] in the first attempt and of a[1][1][1] to a[20][50][50] in the second attempt. If the dimensions are i, j, k from outer to inner, I joined all the k elements in each a[i][j]; and then joined all of those i x j joins, and lastly all of these i joins into the final text string.
All attemtps still hit the allocation size overflow before completing.
So, is there any difference between joining 50,000 smaller strings in one join versus 50 larger strings, if the total data is the same?
Is there a better, more efficient way to build large strings than the join method?
Does the same principle described by Joel Mueller regarding string concatenation apply to reducing a string through substring, such as string = string.substring(position)?
The context of this third question is that when I read a text file in as a string and break it down into its component JSON strings before writing to the database, I use an array that is map of the file layout; so, I know the length of each JSON string in advance and repeat three statements inside a loop:
l = map[i].l;
str = text.substring(0,l);
text = text.substring(l).
It would appear that since strings are immutable, this sort of reverse of concatenation step is as inefficient as using the + operator to concatenate.
Would it be more efficient to not delete the str from text each iteration, and just keep track of the increasing start and end positions for the substrings as step through the loop reading the entire text string?
Response to message about duplicate question
I got a message, I guess from the stackoverflow system itself, asking me to edit my question explaining why it is different from the proposed duplicate.
Reasons are:
The proposed duplicate asks specifically and exclusively about the maximum size of a single string. None of the three bolded questions, here, asks about the maximum size of a single string, although that is useful to know.
This question asks about the most efficient way of building large strings and that isn't addressed in the answers found in the proposed duplicate, apart from an efficent way of building a large test string. They don't address how to build a realistic string, comprised of actual application data.
This question provides a couple links to some information concerning the efficiency of building large strings that may be helpful to those interested in more than the maximum size alone.
This question also has a specific context of why the large string was being built, which led to some suggestions about how to handle that situation in a more efficient manner. Although, in the strictest sense, they don't specifically address the question by title, they do address the broader context of the question as presented, which is how to deal with the large strings, even if that means ways to work around them. Someone searching on this same topic might find real help in these suggestions that is not provided in the proposed duplicate.
So, although the proposed duplicate is somewhat helpful, it doesn't appear to be anywhere near a genuine duplicate of this question in its full context.
Additional Information
This doesn't answer the question concerning the most efficient way to build a large string, but it refers to the comments about how to get around the string size limit.
Converting each component string to a blob and holding them in an array, and then converting the array of blobs into a single blob, accomplished this. I don't know what the size limit of a single blob is, but did see 800MB in another question.
A process (or starting point) for creating the blob to write the database to disk and then to read it back in again can be found here.
Regarding the idea of writing the blobs or strings to disk as they are generated on the client as opposed to generating one giant string or blob for download, although the most logical and efficient method, may not be possible in the scenario presented here of an offline application.
According to this question, web extensions no longer have access to the privileged javascript code necessary to accomplish this through the File API.
I asked this question related to the Streams API write stream method and something called StreamSaver.
In writing a database to disk as a text file of JSON strings.
I see no reason to store the data in a string or array of strings in this case. Instead you can write the data directly to the file.
In the simplest case you can write each string to the file separately.
To get better performance, you could first write some data to a smaller buffer, and then write that buffer to disk when it's full.
For best performance you could create a file of a certain size and create a memory mapping over that file. Then write/copy the data directly to the mapped memory (which is your file). The trick would be to know or guess the size up front, or you could resize the file when needed and then remap the file.
Joining or growing strings will trigger a lot of memory (re)allocations, which is unnecessary overhead in this case.
I don't want the user to have to download more than one file
If the goal is to let a user download that generated file, you could even do better by streaming those strings directly to the user without even creating a file. This also has the advantage that the user starts receiving data immediately instead of first having to wait till the whole file is generated.
Because the file size is not known up front, you could use chunked transfer encoding.

Why is a javascript object literal in memory so much bigger than its text file size in JSON form?

I have a huge JSON (let's call it {...}) which is about 80 MB. All keys/values are strings or objects. I expect that if I load it into javascript by saying var myJson = {...}, the amount of memory it takes up should be not too much bigger than 80 MB. However, using node.js's process.memoryUsage(), I see that as soon as I load this object my memory used increases by about 600 MB. Why does this happen, and what are some workarounds?
Edit: I have changed it from var myJSon = {...} to const myJson = require('./database.json'), with the contents of the json in a .json file rather than a .js file. Strangely, this seemed to reduce the amount of memory used by 50%, so still about 4x as large as the .json file.
There is a gap between JSON text file size and JSON object memory size.
"the amount of memory it takes up should be not too much bigger than 80 MB" -- That's not true, especially when the object is very big.
According to ECMAScript Language Specification, each string character will occupy 16 bits, and each number value will occupy 64 bits. This means there is a huge gap between JSON text file size and JSON object memory size.
Take the following simple object for example: {name:'John',age:16}. Saving this object as a text file only need 20 ascii characters, which means 20 Bytes. However, store this JSON object in memory needs at least 30 bytes ("name".length + "John".length + "age".length = 11, with each character occupy 2 Bytes, it is 22 Bytes; plus the 8 Bytes used for 16, in all, it needs 30 Bytes.).
The internal object structure ("John" is mapping to "name", and 16 is mapping to "age") also occupy memory. Even without this consideration, the gap between 20 Bytes and 30 Bytes is already big for such a simple object.
The gap would be very big if the JSON object is huge (like the one in question), or if the JSON object structure is very complex.
Of course, JavaScript engine will do some optimization work and reduce the usage of memory. But as ECMAScript specification already stated the bits usage of string and number etc., the gap would alway exist, and such gap is not trivial.
p.s. There is a npm module called object-sizeof, which can be used to estimate object memory allocation.

Categories