Background
I'm the maintainer of a low level library for fast object traversal in Node.js. The focus of the library is speed and it is heavily optimised. However there is one big slowdown: Callback Parameters
The Problem
Callbacks are provided by the library consumer and can be invoked many, many times per scan. For every invocation all parameters are computed and passed to the callback. In most cases only a fraction of the parameters are actually used by the callback.
The Goal
The goal is to eliminate the unnecessary computation of these parameters.
Solutions Ideas
Ideally NodeJs would expose the callback parameters as defined by the callback. However obtaining them doesn't seem to be possible without a lot of black magic (string parsing). It would also not solve the situation where parameters are only required conditionally.
Instead of trying to obtain the parameters from the callback, we could require the callback to expose the required parameters. It sounds very inconvenient and error prone and would also not solve conditionally requires.
We could introduce a different callback for every parameter combination. This sounds like a bad idea.
Instead of passing in the parameters directly, we could pass in a function for each parameter that computes and returns the parameter value. Inside the callback the parameter would then be invoked as required. It's ugly but might be the best approach?
Questions
How do other libraries solve this?
What are other ways this can be solved?
This is a very fundamental design decision and I'm trying to get this right.
Thank you very much for your time! As always appreciated!
You could pass to the callback an object that has various methods on it that the client using the callback could call to fetch whatever parameters they actually need. That way, you'd have a clean object interface and you'd only compute the necessary information that was actually requested.
This general design pattern is sometimes called "lazy computation" where you only do the computation as required. You can use either accessor functions or getters, depending upon the type of interface you want to expose.
For performance reasons, you can perhaps reuse the same object for each time you call the callback rather than building a new one (depends upon details of your implementation).
Note that you don't even have to put all the information needed for the computation into the object itself as the methods on the object can, in some cases, refer to your own local context and locally scoped variables when doing their computation.
However there is one big slowdown: Callback Parameters
Did you actually benchmark this? I doubt constructing the argument values is that costly. Notice that if this is a really heavily used call, V8 might be able to inline it and then optimise away unused argument values.
Ideally NodeJs would expose the callback parameters as defined by the callback.
Actually, it does. If you do want to rely on this property though, you should properly document that you do, otherwise this magic could lead to obscure bugs.
We could introduce a different callback for every parameter combination. This sounds like a bad idea.
It doesn't seem to be that much of a problem to provide two options, filter(key, value) and filterDetailed(key, value, context). If the optimisation is really worth it, and as you say this is a low-level library, just go for it.
Instead of passing in the parameters directly, we could pass in a function for each parameter that computes and returns the parameter value. Inside the callback the parameter would then be invoked as required. It's ugly but might be the best approach?
Constructing a closure object to pass instead of a parameter does have some overhead as well, so you will need to benchmark this properly. It might not be worth it.
However, I see that you are actually passing a single context object as the argument on which the computed values are accessed as properties. In that case, you can simply make these properties getters that will compute the value when they are accessed, not when the object is constructed.
Related
Note: I now believe this question was based on assumptions about the javascript specification which are actually implementation specific.
I am attempting to build a runtime debugging hook system for a complex dynamic javascript application. A series of choices have let me to choose to use javascript Proxy and Reflect metaprogramming constructs to interpose function calls in the application I am debugging, wrapping all incoming functions arguments in Proxy/Reflect constructs.
The approach involves replacing high level application functions with Proxies and using traps and handlers to provide debugging functionality, ultimately passing arguments through to the application in a transparent way. All property get/set and function executions act as normal. However, by wrapping all objects and functions in Proxies allows tracing of the runtime.
I am installing this hook system into Chrome.
(Note: Please do NOT provide an answer suggesting a different methodology for debugging hooks - options have been evaluated extensively.)
The issue is that some javascript methods in the application invoke closures and pass "this" parameters. When "this" parameters are wrapped in a Proxy, the runtime fails to execute a closure, instead throwing an "Illegal Invocation" Exception.
I have tried reengineering the debugging hook system to not wrap arguments for some methods, or selectively wrap arguments. I have not been able to find a way to tell if an argument is intended to be used as a context, making code that tries this approach hardcoded to many possible methods and calling conventions. Ultimately this is too fragile to calling convention edge cases and requires too many case statements.
I have also removed the logic for wrapping arguments before passing them through. This removes the benefit from the debug hooking system, and so I have always reverted the logic to wrap all incoming arguments.
alert.apply(this, [1]);
p = new Proxy(this, {});
try {
alert.apply(p, [1]);
} catch (e) {
console.log(e);
}
This throws an "Illegal Invocation" Exception.
typeof this === 'object'
true
But it seems that contexts are objects just like everything else.
I expect that passing a Proxy() through to context should succeed in an invocation. Barring this, I would expect the type of a context to be specific enough to determine whether it should be wrapped in a Proxy() or not.
I have two questions.
(1) What are the semantics of context binding closures in javascript that would cause binding to a Proxy(context) to fail with an illegal invocation?
(2) What type of object are contexts, and how can a javascript method tell one apart from other javascript objects by inspecting its properties at runtime?
What type of object are contexts, and how can a javascript method tell one apart from other javascript objects by inspecting its properties at runtime?
There is no special type. Every object can become a context by calling a method upon it. Most objects that will become a context of a method call do have that very method as an (inherited) property, but there's no guarantee.
You cannot tell them apart.
What are the semantics of context binding in javascript that would cause binding to a Proxy(context) to fail with an illegal invocation?
When the method is a native one. In user code functions, the this context being a proxy doesn't make a difference, when you access it then it will just behave as a proxy.
The problem is native methods that expect their this argument to be a native object of the respective type. Sure, those objects are still javascript objects, but they may contain private data on internal properties as well. A proxy's target and handler references are implemented through such internal properties as well, for example - you can sometimes inspect them in the debugger. The native methods don't know to unwrap a proxy and use its target instead, they just look at the object and notice that it doesn't have the required internal properties for the method to do its job. You could've passed a plain {} as well.
Examples for such methods can be found as builtins of the ECMAScript runtime:
Map.prototype.has/get/set/…
Set.prototype.has/get/set/…
TypeArrayPrototype.slice/copyWithin/map/forEach/…
Number/String/Boolean prototype methods
But also (and even more of them) as host objects supplied by the environment:
window.alert/prompt
EventTarget.prototype.addEventListener/removeEventListener
document.createElement
Element.prototype.appendChild/remove/…
really just anything that's browser-specific
but also in other environments, like the nodejs os module
I have tried unwrapping Proxies in the right places by coding in edge cases and by blanket/heuristic policies.
I think the only reasonable approach would be to check whether the called function is a native one, and unwrap all arguments (including the this argument) for them.
Only a few native functions could be whitelisted, such as most of those on the Array.prototype which are explicitly specified in the language standard to work on arbitrary objects.
Is there a general good practice convention regarding sending parameters as individual variables, or to send an array of parameters to a function / method?
Eg.
param1, param2, param3 vs array data
How do you determine which of the two to use, or a combination of both?
My rule of thumb is that as soon as you have more than two parameters, you should switch to passing an aggregate of some sort (Array, Hash, Object, Record, whatever) instead. If it's a case of one or two primary parameters and several options, then put just the options into an aggregate and keep the primaries in their own parameters.
When you ask about an array of arguments, I'm assuming you're talking about arguments that are all of the same type (or similar type). It really depends upon the situation and it's a bit of a compromise between convenience for the caller and convenience for the function implementation. That means a lot of it depends upon what you most want to optimize for. There are no hard and fast rules, but you can use this sort of thinking as guidance:
Use separate arguments if:
The number of arguments is relatively small and usually fixed
The code in the receiving function is much cleaner by having named arguments
The typical way the function call is made is not by building argument lists programatically
Use an array if:
The number of arguments is relatively large or usually variable (caller probably wants to build an array and pass that)
The receiving function is cleaner by processing a variable list of arguments in a loop (this can be done with the arguments object too, but is sometimes simpler with an actual array)
A common way that the function call is made is from a list of arguments that is built programmatically (more convenient for the caller to just be able to pass the array).
The called function wants to be able to easily pass the list of arguments to some other function call. While this can be done without the array by processing the arguments object, it takes more code to do if the args aren't passed in an array to start with.
The caller can generally work-around either issue by using .apply() if the function isn't built to take an array, but the caller has arguments in an array.
FYI (though I don't think this was the main subject of your question), another option is to pass an object with a variable number of properties. The options object is particularly useful when there are a number of different arguments and most or all are optional. The called function can contain default values for all options and the caller can just pass the arguments they want to override the default for. The options object generally isn't the best solution for a variable number of the same type of argument that is better represented in an array or as a list of arguments.
In my Node.js app I have a database layer that calls toString on input parameters that need to be passed to the database as a string (for example, numbers). The places where I pass parameters to the library, should I call toString there too? On the one hand, I like being explicit. On the other, I'm calling toString on something that is already a string. If that is unnecessary, I'd rather save the CPU cycles.
How costly is it to call toString on a string?
How costly is it to call toString on a string?
In any decent engine (and V8 is a decent engine), it should be nearly free. But it still has to do the property lookup through the prototype chain, to make sure someone hasn't overridden it. So it won't be free, just cheap.
...should I call toString there too?
This one's subjective. I'm with #dystroy: There's no need. Further, it's probably best to give the API as much information to work with as possible, in case they enhance it in future releases. Unless the API requires that you give it strings, I'd preserve information by not converting to string before handing params over.
I've started to wrap my functions inside of Objects, e.g.:
var Search = {
carSearch: function(color) {
},
peopleSearch: function(name) {
},
...
}
This helps a lot with readability, but I continue to have issues with reusabilty. To be more specific, the difficulty is in two areas:
Receiving parameters. A lot of times I will have a search screen with multiple input fields and a button that calls the javascript search function. I have to either put a bunch of code in the onclick of the button to retrieve and then martial the values from the input fields into the function call, or I have to hardcode the HTML input field names/IDs so that I can subsequently retrieve them with Javascript. The solution I've settled on for this is to pass the field names/IDs into the function, which it then uses to retrieve the values from the input fields. This is simple but really seems improper.
Returning values. The effect of most Javascript calls tends to be one in which some visual on the screen changes directly, or as a result of another action performed in the call. Reusability is toast when I put these screen-altering effects at the end of a function. For example, after a search is completed I need to display the results on the screen.
How do others handle these issues? Putting my thinking cap on leads me to believe that I need to have an page-specific layer of Javascript between each use in my application and the generic methods I create which are to be used application-wide. Using the previous example, I would have a search button whose onclick calls a myPageSpecificSearchFunction, in which the search field IDs/names are hardcoded, which marshals the parameters and calls the generic search function. The generic function would return data/objects/variables only, and would not directly read from or make any changes to the DOM. The page-specific search function would then receive this data back and alter the DOM appropriately.
Am I on the right path or is there a better pattern to handle the reuse of Javascript objects/methods?
Basic Pattern
In terms of your basic pattern, can I suggest modifying your structure to use the module pattern and named functions:
var Search = (function(){
var pubs = {};
pubs.carSearch = carSearch;
function carSearch(color) {
}
pubs.peopleSearch = peopleSearch;
function peopleSearch(name) {
}
return pubs;
})();
Yes, that looks more complicated, but that's partially because there's no helper function involved. Note that now, every function has a name (your previous functions were anonymous; the properties they were bound to had names, but the functions didn't, which has implications in terms of the display of the call stack in debuggers and such). Using the module pattern also gives you the ability to have completely private functions that only the functions within your Search object can access. (Just declare the functions within the big anonymous function and don't add them to pubs.) More on my rationale for that (with advantages and disadvantages, and why you can't combine the function declaration and property assignment) here.
Retrieving Parameters
One of the functions I really, really like from Prototype is the Form#serialize function, which walks through the form elements and builds a plain object with a property for each field based on the field's name. (Prototype's current – 1.6.1 – implementation has an issue where it doesn't preserve the order of the fields, but it's surprising how rarely that's a problem.) It sounds like you would be well-served by such a thing and they're not hard to build; then your business logic is dealing with objects with properties named according to what they're related to, and has no knowledge of the actual form itself.
Returning Values / Mixing UI and Logic
I tend to think of applications as objects and the connections and interactions between them. So I tend to create:
Objects representing the business model and such, irrespective of interface (although, of course, the business model is almost certainly partially driven by the interface). Those objects are defined in one place, but used both client- and server-side (yes, I use JavaScript server-side), and designed with serialization (via JSON, in my case) in mind so I can send them back and forth easily.
Objects server-side that know how to use those to update the underlying store (since I tend to work on projects with an underlying store), and
Objects client-side that know how to use that information to render to the UI.
(I know, hardly original!) I try to keep the store and rendering objects generic so they mostly work by looking at the public properties of the business objects (which is pretty much all of the properties; I don't use the patterns like Crockford's that let you really hide data, I find them too expensive). Pragmatism means sometimes the store or rendering objects just have to know what they're dealing with, specifically, but I do try to keep things generic where I can.
I started out using the module pattern, but then started doing everything in jQuery plugins. The plugins allow to pass page specific options.
Using jQuery would also let you rethink the way you identify your search terms and find their values. You might consider adding a class to every input, and use that class to avoid specifically naming each input.
Javascript is ridiculously flexible which means that your design is especially important as you can do things in many different ways. This is what probably makes Javascript feel less like lending itself to re-usability.
There are a few different notations for declaring your objects (functions/classes) and then namespacing them. It's important to understand these differences. As mentioned in a comment on here 'namespacing is a breeze' - and is a good place to start.
I wouldn't be able to go far enough in this reply and would only be paraphrasing, so I recommend buying these books:
Pro JavaScript Design Patterns
Pro Javascript techniques
This is perhaps a dumb question, but I am new to Javascript and desperate for help.
If the Javascript engine will look for global variables outside of a function, then what is the point of passing parameters to it? What do you gain?
I understand that global variables are generally frowned upon, but I still don't understand the purpose of passing variables. Does it have something to do with data encapsulation?
There's a few magic words that are used by programmers to describe different kinds of functions. Here's a few:
Re-entrant
ThreadSafe
Referentially Transparent
Idempotent
Pure
Side-Effects
You can look some of them up if you want a headache. The point is that Computer science and engineering progress has always been about reducing complexity. We have spent quite a lot of time thinking about the best way to write a function to achieve that goal. Hopefully, you can stuff tiny bits of your program into your head at a time, and understand those bits, without having to also understand the overall functioning of the entire program simultaneously, or the detailed implementation of the insides of all the other functions. A function that uses global variables can't do that very well because:
You can't guarantee that the global variables exist
You can't guarantee that the global variables are what you think they are
You can't guarantee that other parts of the program haven't modified those variables in a way you didn't expect.
You can't easily generalise to use the function multiple times on multiple sets of variables.
You can't easily verify that the function works as advertised without first setting up the function's external environment and its dependencies.
If the global variables have changed in a way you didn't expect, it's really hard to track down which part of the program is the culprit. It could be any of 500 different functions that write to that variable!
On the other hand, if you explicitly pass in all the data a function needs to operate, and explicitly return all the results:
If something goes wrong with any of those variables, it's easy to find the source of the problem
It's easier to add code to verify the "domain" of your inputs. Is it really a string? Is it over a certain length, is it under a certain length? Is it a positive number? is it whole, or fractional? All these assumptions that your code needs to operate correctly can be explicit at the start of the function, instead of just crossing your fingers and hoping nothing goes wrong.
It's easier to guess what a particular function will actually do, if its output depends only on its input.
a function's parameters are not dependant on the naming of any external variables.
And other advantages.
if you were only going to use global variables that the functions worked on then you'd always have to know the inner workings of the functions and what your global variable names had to be for them to work.
also, something like Math.abs(n) would be hard to call twice in one line if using global variables.
Functions are reusable components of your code, that executes a particular snippet on the provided variable exhibiting varying behavior.
Encapsulation comes from being Object Oriented. Functions are more for giving structure to your program.
Also, you shouldn't undermine the execution time of a method, if the variable it access exists in the context rather than being global.
If you don't need parameters to be passed to functions, then, you really don't need functions.
Functions are usually (and should be) used to provide code re-use -- use the same function on different variables. If a function accesses global variables, then every time I use it it will perform the same action. If I pass parameters, I can make it perform a different action (based on those different parameters) every time I use it.
One of the main benefits is that it keeps all the information the function needs, nearby. It becomes possible to look at just the function itself and understand what its input is, what it does, and what its output will be. If you are using global variables instead of passing arguments to the function, you’ll have to look all over your code to locate the data on which the function operates.
That’s just one benefit, of many, but an easy one to understand.
Global variables (in any language) can sometimes become stale. If there is a chance of this it is good to declare, initialise and use them locally. You have to be able to trust what you are using.
Similarly, if something/someone can update your global variables then you have to be able to trust the outcome of what will happen whenyou use them.
Global variables are not always needed by everything so why keep them hanging around?
That said, variables global to a namesapce can be useful especially if you are using something like jquery selectors and you want to cache for performance sake.
Is this really a javascript question? I haven't encountered a language that doesn't have some form of global variables yet.