In my Node.js app I have a database layer that calls toString on input parameters that need to be passed to the database as a string (for example, numbers). The places where I pass parameters to the library, should I call toString there too? On the one hand, I like being explicit. On the other, I'm calling toString on something that is already a string. If that is unnecessary, I'd rather save the CPU cycles.
How costly is it to call toString on a string?
How costly is it to call toString on a string?
In any decent engine (and V8 is a decent engine), it should be nearly free. But it still has to do the property lookup through the prototype chain, to make sure someone hasn't overridden it. So it won't be free, just cheap.
...should I call toString there too?
This one's subjective. I'm with #dystroy: There's no need. Further, it's probably best to give the API as much information to work with as possible, in case they enhance it in future releases. Unless the API requires that you give it strings, I'd preserve information by not converting to string before handing params over.
Related
I was wondering how accessing to strings is done. The String object exposes methods such as char(Code)At for public access, but doesn't use them itself as part of its other methods, for example indexOf, but instead accesses the string through an internal structure. However, when using a RegExp, I assume it can't use String's private properties, so it has to call the public API as any other class. I wrote a short test to see that, but it doesn't seem that's what happens: https://jsfiddle.net/n5pxe94L/
String.prototype[Symbol.iterator]=function*()
{
console.log("My custom iterator!");
yield "t";
};
const originalFunc=String.prototype.charCodeAt;
String.prototype.charCodeAt=function(index)
{
console.log("Used!");
return originalFunc.call(this,index);
}
new RegExp("a").test("Lalalala");
When I run this test, it doesn't print anything, which indicates the functions I overrode aren't used! So how does RegExp access the string it analyzes?
It's all done in the JavaScript engine however that engine sees fit to do it. Each JavaScript implementation may take a different approach, but they all have direct, very low-level access to the string's content. For performance reasons I'd expect these access methods to be highly optimized and direct, not allowing for overrides like you've attempted here as that would slow things down in the 99.9999% of cases where no such override exists.
JavaScript is not stuck using functions exposed in the API, it may have many others that are hidden from developers and for internal use only.
If you're writing JavaScript code you're stuck using the features available to you. If you're writing a JavaScript runtime you can do whatever you want.
Originally I wanted to simply answer along the lines of RegExp accesses the underlying string structure in C but I got curious so started digging into the source code of various javascript engines.
All the javascript engines commonly in use (which are V8 for Chrome and node.js, Chakra for Edge and JavascriptCore for Safari) implement regexp processing in C++. I couldn't find the exact regexp processing functions in Chakra and JavascriptCore but I'm sure if I spend a few days reading the code I'll find it.
For Chakra you will see that most of the Regexp methods in C++ accept an argument called input which is defined as Char*. I'm not sure if input is the string or the regexp pattern but I'm guessing it's the string. But note that it is Char not char so it is probably a custom class in the project. You can start by looking at the Chakra RegexpRuntime implementation here: https://github.com/microsoft/ChakraCore/blob/master/lib/Parser/RegexRuntime.cpp
For JavascriptCore best I can tell is that they pass the C++ JSString object to the regexp function. This is as close as to actually using the actual String prototype in js but it probably has methods not available to js: https://github.com/WebKit/webkit/blob/master/Source/JavaScriptCore/runtime/RegExpObject.cpp
I got the furthest with V8 in the last 15 minutes. V8 passes a parameter called subject as the string to regexp functions. This is a String object which is then converted to String::FlatContent. Now what is FlatContent? It is an array-like object (see the Vector class) of either 8 bit or 16 bit values that you can access using the [] operator in C++. This allows them to loop through the string using regular array indexing. You can start by checking out regexp.cc: https://github.com/v8/v8/blob/master/src/regexp/regexp.cc
As you can see, none of them access strings via Javascript's String object as is. The closest that works kind of how you expected them to behave is Apple's JavascriptCore in Safari but even then it's not using merely APIs exposed via javascript.
Background
I'm the maintainer of a low level library for fast object traversal in Node.js. The focus of the library is speed and it is heavily optimised. However there is one big slowdown: Callback Parameters
The Problem
Callbacks are provided by the library consumer and can be invoked many, many times per scan. For every invocation all parameters are computed and passed to the callback. In most cases only a fraction of the parameters are actually used by the callback.
The Goal
The goal is to eliminate the unnecessary computation of these parameters.
Solutions Ideas
Ideally NodeJs would expose the callback parameters as defined by the callback. However obtaining them doesn't seem to be possible without a lot of black magic (string parsing). It would also not solve the situation where parameters are only required conditionally.
Instead of trying to obtain the parameters from the callback, we could require the callback to expose the required parameters. It sounds very inconvenient and error prone and would also not solve conditionally requires.
We could introduce a different callback for every parameter combination. This sounds like a bad idea.
Instead of passing in the parameters directly, we could pass in a function for each parameter that computes and returns the parameter value. Inside the callback the parameter would then be invoked as required. It's ugly but might be the best approach?
Questions
How do other libraries solve this?
What are other ways this can be solved?
This is a very fundamental design decision and I'm trying to get this right.
Thank you very much for your time! As always appreciated!
You could pass to the callback an object that has various methods on it that the client using the callback could call to fetch whatever parameters they actually need. That way, you'd have a clean object interface and you'd only compute the necessary information that was actually requested.
This general design pattern is sometimes called "lazy computation" where you only do the computation as required. You can use either accessor functions or getters, depending upon the type of interface you want to expose.
For performance reasons, you can perhaps reuse the same object for each time you call the callback rather than building a new one (depends upon details of your implementation).
Note that you don't even have to put all the information needed for the computation into the object itself as the methods on the object can, in some cases, refer to your own local context and locally scoped variables when doing their computation.
However there is one big slowdown: Callback Parameters
Did you actually benchmark this? I doubt constructing the argument values is that costly. Notice that if this is a really heavily used call, V8 might be able to inline it and then optimise away unused argument values.
Ideally NodeJs would expose the callback parameters as defined by the callback.
Actually, it does. If you do want to rely on this property though, you should properly document that you do, otherwise this magic could lead to obscure bugs.
We could introduce a different callback for every parameter combination. This sounds like a bad idea.
It doesn't seem to be that much of a problem to provide two options, filter(key, value) and filterDetailed(key, value, context). If the optimisation is really worth it, and as you say this is a low-level library, just go for it.
Instead of passing in the parameters directly, we could pass in a function for each parameter that computes and returns the parameter value. Inside the callback the parameter would then be invoked as required. It's ugly but might be the best approach?
Constructing a closure object to pass instead of a parameter does have some overhead as well, so you will need to benchmark this properly. It might not be worth it.
However, I see that you are actually passing a single context object as the argument on which the computed values are accessed as properties. In that case, you can simply make these properties getters that will compute the value when they are accessed, not when the object is constructed.
Objects in JavaScript can be used as Hashtable
(the key must be String)
Is it perform well as Hashtable the data structure?
I mean , does it implemented as Hashtable behind the scene?
Update: (1) I changed HashMap to hashtable (2) I guess most of the browser implement it the same, if not why not? is there any requirement how to implement it in the ECMAScript specs?
Update 2 : I understand, I just wonder how V8 and the Firefox JS VM implements the Object.properties getters/setters?
V8 doesn't implement Object properties access as hashtable, it actually implement it in a better way (performance wise)
So how does it work? "V8 does not use dynamic lookup to access properties. Instead, V8 dynamically creates hidden classes behind the scenes" - that make the access to properties almost as fast as accessing properties of C++ objects.
Why? because in fixed class each property can be found on a specific fixed offset location..
So in general accessing property of an object in V8 is faster than Hashtable..
I'm not sure how it works on other VMs
More info can be found here: https://v8.dev/blog/fast-properties
You can also read more regarding Hashtable in JS here:(my blog) http://simplenotions.wordpress.com/2011/07/05/javascript-hashtable/
"I guess most of the browser implement it the same, if not why not? is there any requirement how to implement it in the ECMAScript specs?"
I am no expert, but I can't think of any reason why a language spec would detail exactly how its features must be implemented internally. Such a constraint would have absolutely no purpose, since it does not impact the functioning of the language in any way other than performance.
In fact, this is absolutely correct, and is in fact the implementation-independence of the ECMA-262 spec is specifically described in section 8.6.2 of the spec:
"The descriptions in these tables indicate their behaviour for native
ECMAScript objects, unless stated otherwise in this document for particular kinds of native ECMAScript objects. Host objects may support these internal properties with any implementation-dependent behaviour as long as it is consistent with the specific host object restrictions stated in this document"
"Host objects may implement these internal methods in any manner unless specified otherwise;"
The word "hash" appears nowhere in the entire ECMA-262 specification.
(original, continued)
The implementations of JavaScript in, say, Internet Explorer 6.0 and Google Chrome's V8 have almost nothing in common, but (more or less) both conform to the same spec.
If you want to know how a specific JavaScript interpreter does something, you should research that engine specifically.
Hashtables are an efficient way to create cross references. They are not the only way. Some engines may optimize the storage for small sets (for which the overhead of a hashtable may be less efficient) for example.
At the end of the day, all you need to know is, they work. There may be faster ways to create lookup tables of large sets, using ajax, or even in memory. For example see the interesting discussion on this post from John Reseig's blog about using a trie data structure.
But that's neither here nor there. Your choice of whether to use this, or native JS objects, should not be driven by information about how JS implements objects. It should be driven only by performance comparison: how does each method scale. This is information you will get by doing performance tests, not by just knowing something about the JS engine implementation.
Most modern JS engines use pretty similar technique to speed up the object property access. The technique is based on so called hidden classes, or shapes. It's important to understand how this optimization works to write efficient JS code.
JS object looks like a dictionary, so why not use one to store the properties? Hash table has O(1) access complexity, it looks like a good solution. Actually, first JS engines have implemented objects this way. But in static typed languages, like C++ or Java a class instance property access is lightning fast. In such languages a class instance is just a segment of memory, end every property has its own constant offset, so to get the property value we just need to take the instance pointer and add the offset to it. In other words, in compile time an expression like this point.x is just replaced by its address in memory.
May be we can implement some similar technique in JS? But how? Let's look at a simple JS function:
function getX(point) {
return point.x;
}
How to get the point.x value? The first problem here is that we don't have a class (or shape) which describes the point. But we can calculate one, that is what modern JS engines do. Most of JS objects at runtime have a shape which is bound to the object. The shape describes properties of the object and where these properties values are stored. It's very similar to how a class definition describes the class in C++ or Java. It's a pretty big question, how the Shape of an object is calculated, I won't describe it here. I recommend this article which contains a great explanation of the shapes in general, and this post which explains how the things are implemented in V8. The most important thing you should know about the shapes is that all objects with the same properties which are added in the same order will have the same shape. There are few exceptions, for example if an object has a lot of properties which are frequently changed, or if you delete some of the object properties using delete operator, the object will be switched into dictionary mode and won't have a shape.
Now, let's imagine that the point object has an array of property values, and we have a shape attached to it, which describes where the x value in this property array is stored. But there is another problem - we can pass any object to the function, it's not even necessary that the object has the x property. This problem is solved by the technique called Inline caching. It's pretty simple, when getX() is executed the first time, it remembers the shape of the point and the result of the x lookup. When the function is called second time, it compares the shape of the point with the previous one. If the shape matches no lookup is required, we can take the previous lookup result.
The primary takeaway is that all objects which describe the same thing should have the same shape, i.e. they should have the same set of properties which are added in the same order. It also explains why it's better to always initialize object properties, even if they are undefined by default, here is a great explanation of the problem.
Relative resources:
JavaScript engine fundamentals: Shapes and Inline Caches and a YouTube video
A tour of V8: object representation
Fast properties in V8
JavaScript Engines Hidden Classes (and Why You Should Keep Them in Mind)
Should I put default values of attributes on the prototype to save space?
this article explains how they are implemented in V8, the engine used by Node.js and most versions of Google Chrome
https://v8.dev/blog/fast-properties
apparently the "tactic" can change over time, depending on the number of properties, going from an array of named values to a dictionary.
v8 also takes the type into account, a number or string will not be treated in the same way as an object (or function, a type of object)
if i understand this correctly a property access frequently, for example in a loop, will be cached.
v8 optimises code on the fly by observing what its actually doing, and how often
v8 will identify the objects with the same set of named properties, added in the same order (like a class constructor would do, or a repetitive bit of JSON, and handle them in the same way.
see the article for more details, then apply at google for a job :)
I am aware of being able to use typeof, however, i would like to know if using
String(anyVariable) === anyVariable
in order to figure out if anyVariable is a string:
Is a generally valid approach?
Works consistently among browsers?
Has any pitfalls?
I would say do not do that, and use typeof because "String" is used to manipulate a stored piece of text, not compare types. It is best to use the features in their intended use, to assure the most stability, and best practice out of it. Also, the purpose is to extend the type with methods. So you are basically causing more work and processing, instead of just a type comparison. Hopefully that answers it, though this is a question that merely has an "opinion" as an answer. You wouldn't create a new object, assign it to your current object, to check if it is a type of object would you? No, you would just use "typeof".
I can think of no reason to use your method vs. the much simpler typeof. Yours is likely to perform worse (15x slower by Matti's jsperf) and be more complex.
Your method is going to require multiple memory manipulations (creating string object, then assign string value to it) and then need to run the garbage collector afterwards whereas typeof just looks at a property of the internal javascript object.
When in doubt, choose the simplest method that solves your problem.
When in doubt, choose the method that is specified in the language definition for solving your problem.
When in doubt, choose the method that requires less memory manipulation.
I've heard alot of people saying that accessing the arguments object is expensive. (example: Why was the arguments.callee.caller property deprecated in JavaScript?)
Btw what exactly does that statement mean at all? isn't accessing the arguments object simply a simple property lookup? what exactly is the big deal?
The big deal is at least twofold:
1) Accessing the arguments object has to create an arguments object. In particular, modern JS engines don't actually create a new object for the arguments every time you call a function. They pass the arguments on the stack, or even in machine registers. As soon as you touch arguments, though, they have to create an actual object. This is not necessarily cheap.
2) Once you touch the arguments object, various optimizations that JS engines can otherwise perform (e.g. detecting cases in which you never assign to an argument and optimizing that common case) go out the window. Every access to the function arguments, not just ones through arguments becomes much slower because the engine has to deal with the fact that you might have messed with the arguments via arguments.
I have also never heard a serious explanation for why accessing the arguments object is expensive. However, this site: http://www.playmycode.com/blog/2011/03/simple-yet-effective-javascript-optimisations/ notes that arguments is not really an array and is less efficient than accessing an array. The above linked site even suggests converting arguments to an array as an optimization.
Going to check with those who know JS interpreters more intimately...