I'm building a interpreter and i'm now at the point where I need to implement it to handle closures. I understand the concept pretty well but I have a question on why they're designed the way they are.
In terms of how a closure is designed/interpreted there needs to be 3 things:
variable
body of logic that variable is bound to
environment that is saved during the closure's instantiation, this is for free variables that exist within the body to be bound when the closure variable is evaluated.
I understand why all of these things are needed, i'm just wondering about why the 3rd item is needed at all when substitution at the moment of the closure's creation is doing the same thing? Is there anything i'm not accounting for?
Essentially what i'm asking is why not just substitute the free variables with the respective environment values at closure creation instead of passing the environment entirely?
I guess it's a little late, but oh well...
That depends on your computation model (evaluation strategy for example).
If all the data structures [which can get bound by a variable, and in effect enclosed] are immutable in your language your method should work.
It works with pure lexical lisp dialects (eg functional subset of scheme), nice and smooth.
It might not work if:
You pass arguments by reference, as was already mentioned in the comments. Call by value is fine. Also references to immutable object are fine.
Your environment binding and/or lookup causes some side effect. That would be rather exotic, but who knows?
Also, mind you don't have to enclose entire environment, just the free variables of your function's body (easy!).
The only reasons [I am aware of] for implementing closures as body+environment are:
you might want to pass references to mutable objects. This happens i.a. with dictionaries in js and python; it is a bit scary to have closure changing over time, but oh well.
you don't need to write substitution function. Mind it has to keep the scoping correct, so would have to resemble your evaluation function [if your computational model is substitutional] -- so why repeat yourself? Also there is this delicate nature of values: in case of applicative order ("eager evaluation") when you substitute the value in the body, you need it to be lifted to expression who's value is the thing (if by any chance you are implementing LISP variant, think about symbols -- you don't substitute the value HI!, but the expression (quote HI!). This does not apply to cases when all your datastructures evaluate to themselves, like numbers or truth values in most LISPs). These are not problems in general, but introduce some complexity to your interpreter, and simple is good.
the bound value might be something memory-consuming, and the variable you enclose occurs [as a free variable] more than once -- you body will be significantly larger (e.g. your value is a bitmap or sound sample or some enormous matrix or... you get the picture). It is similar problem to computation duplication with lazy evaluation, but wrt memory, not time. This is also usualy not a problem as computer memories are big.
I've ran out of ideas what else might break, but if you're not into checking your computation model "on paper" (by equational reasoning) you should implement it and try the trickiest cases [if any of these applies]: side effects, lazy evaluation, references, mutable objects, combinations of the above. They are definitely not obstacles, just places worth checking.
Hope that helps, good luck with your interpreter!
PS If you like to think about this kind of stuff, check out defunctionalization ("Defunctionalization at Work" by Danvy and Nielsen is a pretty accessible read, and you should be fine with first part to get some inspiration)
Related
Do we need to unset variables in JavaScript, and why?
For example in PHP it's recommended to unset all temporary variables in all kind of loops to keep memory free and avoid variables duplicates.
But anyways according to php - what's the benefit of unsetting variables? question:
Is it true that unsetting variables doesn't actually decrease the memory consumption during runtime?
-Yep.
So should we delete window.some_var;?
Or we should use some_var = null?
Or we should not unset variables to avoid additional CPU cycles?
UPD:
Related questions:
How to unset a Javascript variable?
How to remove a property from a javascript object
No, plus in some cases you can't delete variables in the global scope. Variables that are properly declared using the var keyword cannot be deleted (implied global variables can be deleted, on the other hand. Always use the var keyword to declare variables).
Also, javascript engines have this thing called garbage collector that automatically looks for variables that are no longer used or referenced somewhere when your code is 'running'. Once it finds one, it'll shove it off immediately into the Abyss (deletes the variable in the memory)
No, it is not necessary to delete variables when you’re done with them. If you have so many enormous, global variables that this is actually a problem, you need to rethink your design.
And yes, it’s also silly in PHP, and no, it has nothing to do with avoiding “additional CPU cycles”.
Specifically:
“Unsetting temporary variables in loops”
They’re temporary, and probably either a) integers or b) references that take up about as much space as an integer. Don’t bother.
“Avoiding duplicate variables”
Again – references. Setting things that are still referenced elsewhere to null is pointless.
If you feel like it’s clutter, that is a good sign that more things should be going out of scope (i.e. use more functions).
In most other cases that you haven’t mentioned, the engine is also smart enough to manage your stuff properly. Do not optimize prematurely.
David Flanagan answers this quite nicely in his book, JavaScript: The Definitive Guide:
The JavaScript interpreter performs automatic garbage collection for memory management. This means that a program can create objects as needed, and the programmer never needs to worry about destruction or deallocation of those objects. When an object is no longer reachable – when a program no longer has any way to refer to it – the interpreter knows it can never be used again and automatically reclaims the memory it was occupying.
This doesn't mean that memory leaks can't happen in JavaScript. Far from it. Circular (or cyclic) references are a frequent problem in browsers which implement reference counting, as documented here by IBM:
A circular reference is formed when two objects reference each other, giving each object a reference count of 1. In a purely garbage collected system, a circular reference is not a problem: If neither of the objects involved is referenced by any other object, then both are garbage collected. In a reference counting system, however, neither of the objects can be destroyed, because the reference count never reaches zero. In a hybrid system, where both garbage collection and reference counting are being used, leaks occur because the system fails to identify a circular reference. In this case, neither the DOM object nor the JavaScript object is destroyed. Listing 1 shows a circular reference between a JavaScript object and a DOM object.
If you're worried that your website contains a JavaScript memory leak, Google has a tool, aptly named "Leak Finder for JavaScript", which can help you find the cause.
Further reading: What is JavaScript garbage collection?
I'm creating a compiler from a language to JavaScript. That language has referentially transparent functions by definition. For JavaScript, this means a lot of overhead for numerical operations such as matrix/vector sum, because you have to create new arrays every operation. I'm trying to figure out a workaround, but it is proving trickier than I thought. Is there any way to solve this problem?
You might be able to get some clues in either https://github.com/kripken/emscripten/wiki or https://code.google.com/p/v8/
I know they've both done a lot of work in that area.
What is a common way of dealing with referentially transparent functions?
My take on it is that you'll want to cache the output of functions, but how do you deal with scoping and garbage collection?
What if, instead of a actually implementing referentially transparent functions, you do a best-effort. Expressions within the same scope will benefit from the cache but otherwise it will have to recalculate. This resolves issues with garbage collection.
Alternatively you could set up some global caching scheme that doesn't even try to do garbage collection but just uses some generic limited-space caching scheme with pruning such as: last-recently-used, most-frequently-used or random-replacement. This relaxes the garbage collection constraint.
The way it is:
I have recently joined a webapp project which maintains, as a matter of standard, one single globally-available (ie, itself a property of window) object which contains, as properties or recursive sub-properties, all the functions and variables necessary to run the application — including stateful indicators for all the widgets, initialization aliases, generic DOM manipulation methods — everything. I would try to illustrate with simplified pseudo-code, but that defeats the point of my concern, which is that this thing is cyclopean. Basically nothing is encapsulated: everything single granular component can be read or modified from anywhere.
The way I'd like it:
In my recent work I've been the senior or only Javascript developer so I've been able to write in a style that uses functions, often immediately invoked / self-executing, for scoping discreet code blocks and keeping granular primitives as variables within those scopes. Using this pattern, everything is locked to its execution scope by default, and occasionally a few judiciously chosen getter/setter functions will be returned in cases where an API needs to be exposed.
…Is B more performant than A on a generic level?
Refactoring the code to functional parity from style A to style B is a gargantuan task, so I can't make any meaningful practical test for my assertion, but is the Monolithic God object anti-pattern a known performance monster compared to the scoped functional style? I would argue for B for the sake of legibility, code safety, and separation of concerns... But I imagine keeping everything in memory all the time, crawling though lookup chains etc would make it either an inherently performance-intensive exercise to access anything, or at least make garbage collection a very difficult task.
There are few things to consider, a more memory intensive program isn't necessarily slower.
Also, even if you use a self executing function, and only expose a few functions, the rest of the function is still kept in memory because the exposed functions may need it. Memory leaks because of closures, big topic on the web right now.
Now, also assuming the way v8 works, javascript code is compiled to C++ and then assembly. Now functions which are used a lot become hot code and the same cached version of the function is used over and over again. Something similar is true for objects.
But if you ever edit an object later, the object is recompiled and there is a performance hit.
If you use Try.. Catch blocks, they can't be compiled efficiently. However if you wrap the code inside the Try...Catch blocks into functions, that does help.
So really more than anything the most important tasks to speed of performance is to write anything mostly static as a function. And don't change defined objects multiple times.
Wrapping your code in a self executing function probably won't help much as it's still kept in memory. But the additional function definition might be compiled differently. Still, because it's just a wrapped function, there should be almost no difference at all.
Is it OK to reuse variables for different data types in terms of performance and memory usage ?
What happens to old data, is it garbage collected immediately after type casting ?
It's OK to reuse variables, although unless you're doing some crazy things (say so in the question) with the amount of variables you're using, you probably should not reuse them too liberally in this way. It's considered good coding practice in general to have a variable declared to point to a specific thing, and use a different variable when you want to refer to something else.
"Variables" in Javascript are just references. They're not inherently expensive-- they don't take up more space than their text in the code and a few bytes in memory pointing to somewhere else. If you reuse a variable name by setting the reference to something else (or null/undefined), then the GC will know that that original reference is detached and know that it can be collected.
The GC in whatever browser or environment you're using will choose when to actually run the collector based on lots of factors.
Full disclosure: I have no knowledge of the internals of any particular JavaScript engines. I'm going from general principles of VMs and interpreters.
Usually, variable names just refer to other memory locations. So, whether you remove an old variable (which happens when it goes out of scope) and introduce a new one, or replace the current contents with a new object, doesn't matter much in terms of memory allocation.
Garbage collection might be different in each implementation. Immediate garbage collection is difficult; the only way I can think of doing it involves reference counters, and it's tough to make that work even for cyclic data structures. So, most garbage collectors in the wild do non-immediate collection cycles where, each time, a whole bunch of data gets removed. The cycles might, for example, be run automatically when memory use goes above a certain threshold within the engine (but it'll usually be more refined than that).
JavaScript is a loosely-typed language, and can store any datatype in any variable (even reused ones).
If you are combining types, though, you should check them periodically using the typeof keyword to ensure they are the type you think they are (for instance, trying to perform a mathematical operation on a string will concatenate or break, depending on the situation).
Furthermore, JavaScript variables stick around as long as they are within scope. Once a scope is left, the variables within it are destroyed (eventually - it's automatic and transparent). As far as garbage collection on reassigned variables, the old value is destroyed as soon as the new value is assigned.
This is perhaps a dumb question, but I am new to Javascript and desperate for help.
If the Javascript engine will look for global variables outside of a function, then what is the point of passing parameters to it? What do you gain?
I understand that global variables are generally frowned upon, but I still don't understand the purpose of passing variables. Does it have something to do with data encapsulation?
There's a few magic words that are used by programmers to describe different kinds of functions. Here's a few:
Re-entrant
ThreadSafe
Referentially Transparent
Idempotent
Pure
Side-Effects
You can look some of them up if you want a headache. The point is that Computer science and engineering progress has always been about reducing complexity. We have spent quite a lot of time thinking about the best way to write a function to achieve that goal. Hopefully, you can stuff tiny bits of your program into your head at a time, and understand those bits, without having to also understand the overall functioning of the entire program simultaneously, or the detailed implementation of the insides of all the other functions. A function that uses global variables can't do that very well because:
You can't guarantee that the global variables exist
You can't guarantee that the global variables are what you think they are
You can't guarantee that other parts of the program haven't modified those variables in a way you didn't expect.
You can't easily generalise to use the function multiple times on multiple sets of variables.
You can't easily verify that the function works as advertised without first setting up the function's external environment and its dependencies.
If the global variables have changed in a way you didn't expect, it's really hard to track down which part of the program is the culprit. It could be any of 500 different functions that write to that variable!
On the other hand, if you explicitly pass in all the data a function needs to operate, and explicitly return all the results:
If something goes wrong with any of those variables, it's easy to find the source of the problem
It's easier to add code to verify the "domain" of your inputs. Is it really a string? Is it over a certain length, is it under a certain length? Is it a positive number? is it whole, or fractional? All these assumptions that your code needs to operate correctly can be explicit at the start of the function, instead of just crossing your fingers and hoping nothing goes wrong.
It's easier to guess what a particular function will actually do, if its output depends only on its input.
a function's parameters are not dependant on the naming of any external variables.
And other advantages.
if you were only going to use global variables that the functions worked on then you'd always have to know the inner workings of the functions and what your global variable names had to be for them to work.
also, something like Math.abs(n) would be hard to call twice in one line if using global variables.
Functions are reusable components of your code, that executes a particular snippet on the provided variable exhibiting varying behavior.
Encapsulation comes from being Object Oriented. Functions are more for giving structure to your program.
Also, you shouldn't undermine the execution time of a method, if the variable it access exists in the context rather than being global.
If you don't need parameters to be passed to functions, then, you really don't need functions.
Functions are usually (and should be) used to provide code re-use -- use the same function on different variables. If a function accesses global variables, then every time I use it it will perform the same action. If I pass parameters, I can make it perform a different action (based on those different parameters) every time I use it.
One of the main benefits is that it keeps all the information the function needs, nearby. It becomes possible to look at just the function itself and understand what its input is, what it does, and what its output will be. If you are using global variables instead of passing arguments to the function, you’ll have to look all over your code to locate the data on which the function operates.
That’s just one benefit, of many, but an easy one to understand.
Global variables (in any language) can sometimes become stale. If there is a chance of this it is good to declare, initialise and use them locally. You have to be able to trust what you are using.
Similarly, if something/someone can update your global variables then you have to be able to trust the outcome of what will happen whenyou use them.
Global variables are not always needed by everything so why keep them hanging around?
That said, variables global to a namesapce can be useful especially if you are using something like jquery selectors and you want to cache for performance sake.
Is this really a javascript question? I haven't encountered a language that doesn't have some form of global variables yet.