How does JavaScript represent strings and Date objects in memory?
Context and Intent
I'm mostly just curious, but here's what led to the question: We have code in our React front end that accepts JSON payloads from our API, and those payloads include dates -- in ISO-8601 UTC string form (e.g. 2023-02-01T17:01:08Z), of course.
As we pass the resulting model object around a variety of hooks, components, functions, etc., we keep it as a string, and we only parse it into a Date object if we are going to display it or use it to make decisions. In some cases, this means that we'll re-parse the same string into a Date multiple times in the course of rendering the UI. In other cases, we ignore the Date completely as it's not relevant for the page.
I'm trying to reason about whether it would be worthwhile to modify our system such that we always parse Date strings into actual Date objects upon receipt from our API. Our UI is written in TypeScript -- AFAIK this makes no difference with regard to my actual question -- and my biggest motivator in wanting to make this change is the improved type safety and clarity.
To be clear, time performance is the bigger concern, but I can do my own benchmarking. For the purposes of this question, I'm asking about memory performance, as much for my own understanding and education as for any technical decisions that might result, but I always try to understand the full scope of any tradeoff.
I imagine that this could be implementation dependent; if so, I'm most interested in the facts as they apply to modern versions of Google Chrome (with default configuration, if that matters), but happy to learn about other implementations as well.
Questions
If I take a 20-character ISO-8601 UTC string and parse it into a Date, how much memory would the resulting Date use? Does JavaScript work like Java, using an 8-byte "long" integer to store dates as milliseconds since the epoch? I found disappointingly little information about this in my searching; most of the results I found were actually about Java.
Also, how much memory does the string use? What's Javascript's internal string encoding? A quick Google indicates that it uses UTF-16 (and therefore 40 bytes for the 20-character date string)?
For both Date and String, are there any differences in applicable overhead? Are there optimizations that might apply to either Strings or Dates and affect the result (e.g. string interning -- which if I am understanding this correctly, JS does, but it would not apply to my use case since the value came from an API response)?
Related
I am trying to understand the data fields in the payload of my remote procedure calls. And Date together with Timestamp type objects confuse me the most.
The full request payload looks like:
7|0|8|https://myapp.com/myapp/client/|72119BCB4CE5FB8D147EA76E8006F76E|com.myapp.service.MyService|updateTimepoint|java.lang.String/2004016611|java.util.Date/3385151746|554455|java.sql.Timestamp/3040052672|1|2|3|4|2|5|6|7|8|VhGcuow|0|
The interface of this service as it is defined in the code is:
public void updateTimepoint(String myId, Date timepoint,
AsyncCallback<Void> async);
From that array of values above, I would tell that bold parts (see below) refer to sent java.util.Date object and "554455" in the middle - is the myId (I know that from the use case). I have no explanation why myId variable was put in the middle:
java.util.Date/3385151746|554455|java.sql.Timestamp/3040052672
Now I'm debugging the obfuscated code, so looking at the Source tab in browser seems like not an option. That won't help a lot though, as you would see weird JS date references. Which I'm not sure how to read either.
So, how would I compile Date+Timestamp from payload vars back into something readable?
Thank you!
P.S. Or - is VhGcuow a Date? As per GWT java.util.Date serialization
As #RobG said, those numbers are not the values, but details about the Date, Timestamp types. The payload is | delimited, those /s are part of the classname string. See Serializing RPC-GWT (an answer by me earlier this year) for more details on the order of the strings and other content in the payload.
VhGcuow is likely a base64-encoded long. Dates (and possibly Timestamps, though I havent checked) are serialized as a long field, so that value as a long would then represent the number of milliseconds since Jan 1, 1970. See RPC-GWT Serialization/java.util.Date Encoding for more discussion on how that can be understood and decoded, without simply trusting that RPC works.
Do note though that RPC hasn't changed in years, and is in use by 10s or 100s of thousands of GWT developers, who havent not had issues with Date serializing correctly. More likely something else is afoot (like timezone issues) - asking another question with all of the details of your problem and a "working" test case might get you to your answer more quickly.
TLDR below
I was reading through the Standard Built-In Objects portion of JavaScript on MDN and noticed that there are these methods that utilize 'Locale' and are used specifically, from what I can gather, to format return text from the method in a locally defined format if it exists. Apparently it causes an issue with Turkey(I don't know if there are others)
As far as I could tell, from what I've looked into, these were all implemented in ES 5.1 circa 2011. In fact, in one of the SO links in the references below it's actively pointed out that a reason why Angular 1.x might be using toString instead of toLocaleString is because of backwards compatibility with browsers that didn't yet completely adopt ES5.1 - simple aside: I don't know if that's exactly the case but it seems reasonable.
so I looked up the ES6 spec to check out the method:
On Object:
15.2.4.3 Object.prototype.toLocaleString ( ) This function returns the result of calling toString(). NOTE This function is provided to
give all Objects a generic toLocaleString interface, even though not
all may use it. Currently, Array, Number, and Date provide their own
locale-sensitive toLocaleString methods. NOTE The first parameter to
this function is likely to be used in a future version of this
standard; it is recommended that implementations do not use this
parameter position for anything else.
On Array:
15.4.4.3 Array.prototype.toLocaleString ( )
The elements of the array are converted to strings using their toLocaleString methods, and these strings are
then concatenated, separated by occurrences of a separator string that has been derived in an implementationdefined
locale-specific way. The result of calling this function is intended to be analogous to the result of
toString, except that the result of this function is intended to be locale-specific.
On String:
15.5.4.17 String.prototype.toLocaleLowerCase ( )
This function works exactly the same as toLowerCase except that its result is intended to yield the correct result
for the host environment’s current locale, rather than a locale-independent result. There will only be a difference in
the few cases (such as Turkish) where the rules for that language conflict with the regular Unicode case mappings.
NOTE The toLocaleLowerCase function is intentionally generic; it does not require that its this value be a String object.
Therefore, it can be transferred to other kinds of objects for use as a method.
NOTE The first parameter to this function is likely to be used in a future version of this standard; it is recommended that
implementations do not use this parameter position for anything else.
The original ToLowerCase for Clarity:
15.5.4.16 String.prototype.toLowerCase ( )
If this object is not already a string, it is converted to a string. The characters in that string are converted one by one
to lower case. The result is a string value, not a String object.
The characters are converted one by one. The result of each conversion is the original character, unless that
character has a Unicode lowercase equivalent, in which case the lowercase equivalent is used instead.
NOTE The result should be derived according to the case mappings in the Unicode character database (this explicitly includes
not only the UnicodeData.txt file, but also the SpecialCasings.txt file that accompanies it in Unicode 2.1.8 and later).
NOTE The toLowerCase function is intentionally generic; it does not require that its this value be a String object. Therefore, it
can be transferred to other kinds of objects for use as a method
(toLocaleUpperCase/toUpperCase reads exactly the same)
Given all that, and with the release of ES6 and it being largely adopted I am confused. I feel that toLowerCase and toUpperCase are used pretty commonly for validation purposes(though less so with ES6) and changing them to utilize Locale seems wrong because you would be checking against unknown formatting. So Okay, not really useful for validation. So what about outputting to the DOM with toLocaleString? It seems plausible, but again, you're dealing with unknowns. Say your locale isn't formatted and you wanted the integer 1000 to be displayed as '1,000'. (I've read that this happens with en-GB) It will leave it out of your hands and you may never even know that it's not displaying as you wanted it to.
TLDR:
Is there a practical use case for methods like toLocaleString toLocaleLowerCase toLocaleUpperCase, etc.? Should they be largely ignored?
Note: I realize this is on the line of opinionated, but I don't think it is. I'm looking for rational cases in which these may be applicable if they exist. e.g. like asking 'what you would use .call for' as opposed to 'do you think .call is better than .apply'
References
MDN String Prototype: toLocaleLowerCase
SO: Difference Between toLocaleLowerCase and toLowerCase?
SO: In which exactly js engines are toLowerCase toUpperCase locale sensitive?
SO: JavaScript difference between toString and toLocaleString methods of date?
It seems plausible, but again, you're dealing with unknowns.
Yes. You need to get to know them.
It will leave it out of your hands and you may never even know that it's not displaying as you wanted it to.
Indeed. If you want/need to have full control over your output, you need to implement the formatting yourself. If you only say, hey, it's a number, please format it to whatever you think is best for a British locale, then you can use it.
Is there a practical use case for methods like toLocaleString etc.?
Yes! You will want to use them in an environment that supports the ECMA-402 Standard.
From the API Overview:
"The ECMAScript 2016 Internationalization API Specification provides several key pieces of language-sensitive functionality that are required in most applications: String comparison (collation), number formatting, date and time formatting, and case conversion. While the ECMAScript 2016 Language Specification provides functions for this basic functionality (on Array.prototype: toLocaleString; on String.prototype: localeCompare, toLocaleLowerCase, toLocaleUpperCase; on Number.prototype: toLocaleString; on Date.prototype: toLocaleString, toLocaleDateString, and toLocaleTimeString), it leaves the actual behaviour of these functions largely up to implementations to define. The ECMAScript 2016 Internationalization API Specification provides additional functionality, control over the language and over details of the behaviour to be used, and a more complete specification of required functionality."
Should they be largely ignored?
In an unknown environment, probably. But not when you know what they do (because you control the environment, or you expect it to conform to ECMA-402), because in those cases they can be very useful and take a great deal of work off you.
I haven't done a lot of coding in dynamically typed languages such as JavaScript until recently, and now that I'm beginning to understand what's possible, I'm starting to wonder what's a good idea and what's not. Specifically, I am unsure as to whether changing the type of a variable as a function progresses through a sequence of operations is considered good practice.
For example, I have a bunch of files containing dates as strings. I'm using front-matter to extract date attributes and storing them within an object representing the original file. The strings themselves aren't very consistent, so I'm then using Moment.js to parse them, and storing the result back in the same attribute in the same object article.date. This feels more or less right to me, as article.date is only a String for one operation before being parsed and stored as a Date/'Moment' type.
The part I'm a little unsure of comes next. This is part of an ExpressJS app, and so an array of these objects is getting passed in as the data in a render() call, where it goes to a Jade template for rendering. But what if I want to use a display method in Moment.js to control the way the date looks as a String? Would it be reasonable to change the type of the date attribute back to a String again before passing it in?
Example:
articles[i] = processArticle(content);
// creates an article object from YAML, object has a property article.attributes.date
articles[i].attributes.date = moment(articles[i].attributes.date);
// attribute is now a Date/Moment
articles[i].attributes.date = articles[i].attributes.date.format("dddd, MMMM Do YYYY, h:mm:ss a");
// attribute is now "Sunday, February 14th 2010, 3:25:50 pm"
The reason why we even have type safety is to find errors in code early on: preventing invalid memory access/illegal operations/etc. In object orientated languages to allow decoupling and code re-use (polymorphism). Not just to make your life as a programmer more difficult, but to make sure your program will run.
Because JavaScript does not provide type safety, it's up to the programmer to do this. You will have to make sure that operations on a variable are valid and do not lead to an exception which stops your program from running. You will have to make sure that a method call can be invoked on any object.
So to answer your question: no, it's not a good practice to change the type of a variable as a function progresses.
To use Moment.js in your Jade template: How do I display todays date in Node.js Jade?
I need a way to serialize and unserialize dates that are potentially far away in the past, for instance -10000
I first look at ISO8601, but it does not seem to support years with more than four digits. (Or at least, python libraries I tried don't.)
The different solutions I can think of:
change the year before serializing/deserializing, give it to the parsing/formatting library, and fix it back (sounds hacky)
define my own format, like year:month:day:hour:minute:second (that is reinventing the wheel, since I have to handle timezones, etc.)
Use a UNIX timestamp without bounds or something equivalent (may overflow in some programming languages, and still the timezone stuff)
Store dates before -9999 (or 0) differently than those after, since there was no timezone issue/leap years/… issue at that time. (two different formats at the same place)
Do you see any other way that would be better than these ones? Or recommand one of those?
You could take a page from the astronomy people. Sky maps they account for long period precession of Earth's spin by establishing epochs. (The sky is different if you're looking now vs 10,000 BC.)
Create a new class that has an "epoch" number and a facade pattern of your current date class. The new class contains two private fields for epoch and internal-date. Your constructor sets epoch to (year div 10000), and instantiates the internal-date with (year modulo 10000). I hope rest of the facade pattern is as obvious as I think.
ISO 8601 does support dates with more than 4 digits if, and only if, they are signed. The only PHP function I know of that supports this functionality is
DateTime::setISODate($Year, $WeekOffset, $DayofWeekOffset)
Obviously it's a pain to use because it requires calculating the offsets from perfectly good day/month pairs. That said, you should be able to create BC dates by signing the year with a '-'(minus sign).
Then you'd output the date with
DateTime::format("c")
In production this would look something like:
$date= new DateTime();
$date->setISODate(-100000,$WeekOffset, $DoWOs);
echo $date->format("c");
Take a look at FlexiDate class — it might be useful for you purposes.
It is not a standards-compliance way in any way, but it might do the trick for you
I was reading this book - Professional Javascript for Web Developers where the author mentions string concatenation is an expensive operation compared to using an array to store strings and then using the join method to create the final string. Curious, I did a couple test here to see how much time it would save and this is what I got -
http://jsbin.com/ivako
Somehow, the Firefox usually produces somewhat similar times to both ways, but in IE, string concatenation is much much faster. So, can this idea now be considered outdated (browsers probably have improved since?
Even if it were true and the join() was faster than concatenation it wouldn't matter. We are talking about tiny amounts of miliseconds here which are completely negligible.
I would always prefer well structured and easy to read code over microscopic performance boost and I think that using concatenation looks better and is easier to read.
Just my two cents.
On my system (IE 8 in Windows 7) the times of StringBuilder in that test very from about 70-100% in range -- that is, it is not stable -- although the mean is about 95% of that of the normal appending.
While it's easy now just to say "premature optimization" (and I suspect that in almost every case it is) there are things worth considering:
The problem with repeated string concatenation comes repeated memory allocations and repeated data copies (advanced string data-types can reduce/eliminate much of this, but let's keep assuming a simplistic model for now). From this lets raise some questions:
What memory allocation is used? In the naive case each str+=x requires str.length+x.length new memory to be allocated. The standard C malloc, for instance, is a rather poor memory allocator. JS implementations have undergone changes over the years including, among other things, better memory subsystems. Of course these changes don't stop there and touch really all aspects of modern JS code. Because now ancient implementations may have been incredibly slow in certain tasks does not necessarily imply that the same issues still exist, or to the same extents.
As with above the implementation of Array.join is very important. If it does NOT pre-allocate memory for the final string before building it then it only saves on data-copy costs -- how many GB/s is main memory these days? 10,000 x 50 is hardly pushing a limit. A smart Array.join operation with a POOR MEMORY ALLOCATOR would be expected to perform a good bit better simple because the amount of re-allocations is reduced. This difference would be expected to be minimized as allocation cost decreases.
The micro-benchmark code may be flawed depending on if the JS engine creates a new object per each UNIQUE string literal or not. (This would bias it towards the Array.join method but needs to be considered in general).
The benchmark is indeed a micro benchmark :)
Increase the growing size should have an impact of performance based on any or all (and then some) above conditions. It is generally easy to show extreme cases favoring some method or another -- the expected use case is generally of more importance.
Although, quite honestly, for any form of sane string building, I would just use normal string concatenation until such a time it was determined to be a bottleneck, if ever.
I would re-read the above statement from the book and see if there perhaps other implicit considerations the author was indeed meaning to invoke such as "for very large strings" or "insane amounts of string operations" or "in JScript/IE6", etc... If not, then such a statement is about as useful as "Insert sort is O(n*n)" [the realized costs depend upon the state of the data and the size of n of course].
And the disclaimer: the speed of the code depends upon the browser, operating system, the underlying hardware, moon gravitational forces and, of course, how your computer feels about you.
In principle the book is right. Joining an array should be much faster than repeatedly concatenating to the same string. As a simple algorithm on immutable strings it is demonstrably faster.
The trick is: JavaScript authors, being largely non-expert dabblers, have written a load of code out there in the wild that uses concatenating, and relatively little ‘good’ code that using methods like array-join. The upshot is that browser authors can get a better improvement in speed on the average web page by catering for and optimising the ‘bad’, more common option of concatenation.
So that's what happened. The newer browser versions have some fairly hairy optimisation stuff that detects when you're doing a load of concatenations, and hacks it about so that internally it is working more like an array-join, at more or less the same speed.
I actually have some experience in this area, since my primary product is a big, IE-only webapp that does a LOT of string concatenation in order to build up XML docs to send to the server. For example, in the worst case a page might have 5-10 iframes, each with a few hundred text boxes that each have 5-10 expando properties.
For something like our save function, we iterate through every tab (iframe) and every entity on that tab, pull out all the expando properties on each entity and stuff them all into a giant XML document.
When profiling and improving our save method, we found that using string concatention in IE7 was a lot slower than using the array of strings method. Some other points of interest were that accessing DOM object expando properties is really slow, so we put them all into javascript arrays instead. Finally, generating the javascript arrays themselves is actually best done on the server, then you write then onto the page as a literal control to be exectued when the page loads.
As we know, not all browsers are created equal. Because of this, performance in different areas is guaranteed to differ from browser to browser.
That aside, I noticed the same results as you did; however, after removing the unnecessary buffer class, and just using an array directly and a 10000 character string, the results were even tighter/consistent (in FF 3.0.12): http://jsbin.com/ehalu/
Unless you're doing a great deal of string concatenation, I would say that this type of optimization is a micro-optimization. Your time might be better spent limiting DOM reflows and queries (generally the use of document.getElementbyById/getElementByTagName), implementing caching of AJAX results (where applicable), and exploiting event bubbling (there's a link somewhere, I just can't find it now).
Okay, regarding this here is a related module:
http://www.openjsan.org/doc/s/sh/shogo4405/String/Buffer/0.0.1/lib/String/Buffer.html
This is an effective means of creating String buffers, by using
var buffer = new String.Buffer();
buffer.append("foo", "bar");
This is the fastest sort of implementation of String buffers I know of. First of all if you are implementing String Buffers, don't use push because that is a built-in method and it is slow, for one push iterates over the entire arguments array, rather then just adding one element.
It all really depends upon the implementation of the join method, some implementations of the join method are really slow and some are relatively large.