Possible to enumerate or access module-level function declarations in NodeJs? - javascript

In Node.js, if I load a module which contains code in module-scope like:
this["foo"] = function() { console.log("foo"); }
...then I appear to get a globally available function that I can call just by saying foo() from any code using the module. It can be seen as one of the printed items with Object.getOwnPropertyNames(this).
However, if I put the following in module scope instead:
function foo() { console.log("foo"); }
...then it produces a function which can similarly be called within that module as foo(), but is invisible outside of it (e.g. does not show up as one of the items with Object.getOwnPropertyNames(this)).
I gather this is a change in runtime behavior from what's done in browsers. A browser seems to poke everything into global scope by default (and for years people have had to consciously avoid this by wrapping things up in anonymous functions/etc.)
My question is whether NodeJs has some secret way of interacting with these declarations outside of the module in which they are declared BESIDES using exports.(...) = (...). Can they be enumerated somehow, or are they garbage collected as soon as they are declared if they're not called by a module export? If I knew what the name of such a function was going to be in advance of loading a module...could I tell Node.js to "capture it" when it was defined?
I'm not expecting any such capabilities to be well-documented...but perhaps there's a debugger feature or other system call. One of the best pointers would be to the specific code in the Node.js project where this kind of declaration is handled, to see if there are any loopholes.
Note: In researching a little into V8 I saw that a "function definition" doesn't get added to the context. It's put into an "activation object" of the "execution context", and cannot be programmatically accessed. If you want some "light reading" I found:
http://coachwei.sys-con.com/node/676031/mobile
http://perfectionkills.com/understanding-delete/

if you fill in exports.foo = foo; at the end of your file it will be available in other files in node, assuming that you do var myFile = require('myFile.js') with the file name and you call the function via myFile.foo(); You can even rename the function for outside use in exports and set whatever you want to call the package when you use require.
BTW you can enumerate these functions just like you do on any JSON object (ie for k in ...)

This is impossible in node without more advanced reflection tools like the debugger.
The only way to do this would be to use __parent__ which was removed due to security issues and other things (hard to optimize, not standard to begin with) . When you run the script those variables become closed under the module. You can not access them elsewhere without explicitly exporting them.
This is not a bug, it's by design. It's just how node works. see this closely related question.
If this sort of reflection was available without tools like the debugger it would have been extremely hard to optimize the source code (at least the way v8 works) , meaning the answer to your question is no. Sorry.

Related

Is there a way to access/modify the "Execution Context" of a Node module/function

I understand the scoping rules of Node/JavaScript, and from, for example, Understanding Execution Context and Execution Stack in Javascript, I think I understand the principle of how Execution Contexts work: the question is can you actually get access to them?
From an answer to the 2015 question How can I get the execution context of a javascript function inside V8 engine (involving (ab)using (new Error()).stack), and the answer to the 2018 question How can we get the execution context of a function?, I suspect the answer is "no" but just in case things have changed: is it possible to access/modify the Execution Context of a Node module/function?
(I'm very aware this screams either XY Problem or a desire to abuse Node/JavaScript: hopefully the Background section below will provide more context, and – if there's a different way of achieving what I want – that will work as well.)
In a nutshell, I want to achieve something like:
sharedVar = 1 ;
anotherModule.someFunction() ;
console.log( sharedVar ) ; // Where 'sharedVar' has changed value
Having a function in a different module being able to change variables in its caller's scope at will seems the definition of "A Dangerous Thing™", so – if it's possible at all – I expect it would need to be more like:
sharedVar = 1 ;
anotherModule.hereIsMyExecutionContext( SOMETHING ) ;
anotherModule.someFunction() ;
console.log( sharedVar ) ; // Where 'sharedVar' has changed value
and anotherModule would be something like:
let otherExecutionContext ;
function hereIsMyExecutionContext( anExecutionContext ) {
otherExecutionContext = anExecutionContext ;
}
function someFunction() {
//do something else
otherExecutionContext.sharedVar = 42 ;
}
and the question becomes what (if anything) can I replace SOMETHING with?
Notes / Things That Don't Work (for me)
You shouldn't be trying to do this! I realize what I'm trying to achieve isn't "clean" code. The use-case is very specific, where brevity (particularly in the code whose value I want changing) is paramount. It is not "production" code where the danger of forgotten, unexpected side-effects matters.
Returning a new value from the function. In my real use-case, there would be several variables that I would like someFunction() to be able to alter. Having to do { var1, var2, ... varN } = anotherModule.someFunction() would be both inconvenient and distracting (while the function might change some of these variables' values, it is not the main purpose of the function).
Have the variables members of anotherModule. While using anotherModule.sharedVar would be the clean, encapsulated way of doing things, I'd really prefer not to have to use the module name every time: not only is it more typing, but it distracts from what the code that would be using these variables is really doing.
Use the global scope. If I wasn't using "use strict";, it would be possible to have sharedVar on the global object and freely accessible by both bits of code. As I am using strict-mode (and don't want to change that), I'd need to use global.sharedVar which has the same "cumbersomeness" as attaching it to anotherModule.
Use import. It looks like using import { sharedVar } from anotherModule allows "live" sharing of the variables I want between the two modules. However, by using import, the module using the shared variable has to be an ES Module (not a CommonJS Module), and cannot be loaded dynamically using require() (as I also need it to be). Loading it dynamically the ESM way (using import() as a function returning a promise) appears to work, but repeated loadings come from a cache. According to this answer to How to reimport module with ES6 import, there isn't a "clean" way of clearing that cache (cf. delete require.cache[] that works with CommonJS modules). If there is a clean way of invalidating a cached ESM loaded through import(), please feel free to add an answer to that question (and a comment here, please), although the current Node JS docs on ESMs don't mention one, so I'm not hopeful :-(
Background
In early December, a random question on SO alerted me to the Advent of Code website, where a different programming problem is revealed everyday, and I decided to have a go using Node. The first couple of problems I tackled using standalone JS files, at which point I realized that there was a lot of common code being copy-pasted between each file. In parallel with solving the puzzles, I decided to create a "framework" program to coordinate them, and to provide as much of the common code as possible. One goal in creating the framework was that the individual "solution" files should be as "lean" as possible: they should contain the absolute minimum code over that needed to solve the problem.
One of the features of the framework relevant to this question is that it reloads (currently using require()) the selected solution file each time, so that I can work on the solution without re-running the framework... this is why switching to import and ES Modules has drawbacks, as I cannot (cleanly) invalidate the cached solution module.
The other relevant feature is that the framework provides aoc.print(...) and aoc.trace(...) functions. These format and print their arguments: the first all the time; the second conditionally, depending on whether the solution is being run normally or in "trace" mode. The latter is little more than:
function trace( ... ) {
if( traceMode ) {
print( ... )
}
}
Each problem has two sets of inputs: an "example" input, for which expected answers are provided, and the "real" input (which tends to be larger and more involved). The framework would typically run the example input with tracing enabled (so I could check the "inner workings") and run the "real" input with tracing disabled (because it would produce too much output). For most problems, this was fine: the time "wasted" preparing the parameters for the call, and then making the call to aoc.trace() only to find there was nothing to do, was negligible. However, for one solution in particular (involving 10 million iterations), the difference was significant: nearly 30s when making the ignored trace; under a second if they calls were commented-out, or I "short-circuited" the trace-mode decision by using the following construct:
TRACE && aoc.print( ... )
where TRACE is set to true/false as appropriate. My "problem" is that TRACE doesn't track the trace mode in the framework: I have to set it manually. I could, of course, use aoc.traceMode && aoc.print( ... ), but as discussed above, this is more to type than I'd like and makes the solution's code more "cluttered" than I'd ideally want (I freely admit these are somewhat trivial reasons...)
One of the features of the framework relevant to this question is that it reloads the selected solution file each time
This is key. You should load not leave the loading, parsing and execution to require, where you cannot control it. Instead, load the file as text, do nefarious things to the code, then evaluate it.
The evaluation can be done with eval, new Function, the vm module, or by messing with the module system.
The nefarious things I was referring to would most easily be prefixing the code by some "implicit imports", whether you do that by require, import or just const TRACE = true;. But you can also do any other kind of preprocessing, such as macro replacements, where you might simply remove lines that contain trace(…);.
Looks like you're heading the wrong way. The Execution Context isn't meant to be accessible by the user code.
An option is to make a class and pass its instances around different modules for your purpose. JS objects exist in the heap and can be accessed anywhere as long as you have its reference, so you can control it at will.

Altering Global Scope for dynamically loaded module in Node.js

Loading a module from a source dynamically:
var src="HERE GOES MY SOURCE"
var Module = module.constructor;
var m = new Module();
m._compile(src, 'a-path-that-does-not-exist');
Need to achieve following:
Pass some variables/functions so that they can be used inside the src script globally. Can set them in "m.foo", but want the script to use "foo" without using "module.foo". "global.foo" works, but see the point 2.
How to restrict the src script from accessing global scope?
How to restrict the src from loading other modules using require or other means.
How to restrict the src from running async operations?
All, I can think of is to wrap the script in its own function, kind of like nodejs already does for commonJS modules. This is the regular wrapper.
(function(exports, require, module, __filename, __dirname) {
// Module code actually lives in here
});
If you wrap that user code with your own wrapper and then when you call it to execute it, you can define your own values for require, module and any other semi-global symbols.
If you also put 'use strict'; as the very first line of your wrapper function (before any of the user code), then that will eliminate default assignment to the global object with just something like x = 4 because that will be an error without explicitly defining x first. If you then also make your own global object and pass it as an argument, that can keep anyone from assigning to the real global object. I don't think you can prevent implicit read access to pre-existing globals.
So, your wrapper could look like this:
(function(exports, require, module, __filename, __dirname, global) {
'use strict';
// insert user code here before evaluating it with eval()
// and getting the function which you can then call and pass the desired arguments
});
Then, when you call this function, you pass it the values you want to for all the arguments (something other than the real ones).
Note, it's hard to tell how leak-proof this type of scheme really is. Any real security should likely be run in a resource restricted VM.
Another idea, you could run in a Worker Thread which has it's own virgin set of globals. So, you do all of the above and run it in a Worker Thread.
Addressing your questions in comments:
Does the 'use strict'; need to go inside the wrapper function or outside?
It needs to be the first line of code inside the wrapper function, right before where you insert the user code. The idea is to force that function scope (where the user code lives) inside that wrapper to be in strict mode to limit some of the things it can do.
Could you explain the "I don't think you can prevent implicit read access to pre-existing globals."? If i provide my own object as global, how can the inner script access preexisting globals?
Any code, even strict mode code can access pre-existing globals without the global prefix. While you can prevent the code from creating new globals by shadowing it with your own global in the wrapper function arguments and by forcing it into strict mode, you can't prevent strict mode code from reading existing globals because they can do so without the global prefix. So, if there's a pre-existing global called "foo", then existing code can reference that like:
console.log(foo);
or
foo = 12;
If there is no foo in a closer scope, the interpreter will find the foo on the global object and use that.
Note that strict mode prevents the automatic creation of a new global with something like:
greeting = "happy birthday"
Could you elaborate more no "resource restricted VM"?
I was talking about real hardware/OS level VMs that allow you to fully control what resources a process may use (disk access, sockets, memory, hardware, etc...). It's essentially a virtual computer environment separate from any other VMs on the same system. This is a more rigorous level of control.
WorkerThread is a very interesting concept. will take a look! My understanding was that WorkerThreads provide memory isolation and the only way to share data is by sending messages (effectively creating copies)?
Yes, Worker Threads provide pretty good isolation as they start up a whole new JS engine and have their own globals. They can shared ArrayBuffers in certain ways (if you choose to do that), but normal JS variables are not accessible across thread boundaries. They would normally communicate via messaging (which is automatically synchronized through the event queue), but you could also communicate via sockets if you wanted.

Nodejs uses variable assignment to load modules

Most languages use 'import' directives to load other module code, like
java -
import a.b.c
elisp -
(load a)
python -
from a import b
But, why does nodejs use a variable expression to load other module functions like
var a = require('a')
i see, most IDEs for javascript like tern.js-emacs, nodeclipse are not able to do source code lookup (for loaded modules) properly because the IDE has to run the code (or) do eval to find out, what properties a loaded module object contains.
You could say JS belongs to a category of languages where the idea that everything is an object on equal footing is part of the "philosophy" that has guided its development. Node's require is a function (an object) supplied by the environment, as is the module object. This pattern is called the Common JS format.
You actually don't have to assign the result of the require function to a variable. It's rare in practice, but the node module you're calling on could just be invoked to cause an action to take place, for example one might require sugar.js which alters some of the native objects but has no methods of its own to offer, so there would be no point in assigning the return value (which is the module.exports object that was supplied during that module's execution).
A more common example of not assigning a module to a variable is when one uses require just to grab some property off the module -- e.g. var x = require('module').methodOfInterest. Similarly, some modules return a constructor, so you may sometimes see var instance = new (require('ConstructorModule'))(options) (which is ugly in my opinion; requires should generally be grouped at the top of a file and acted on only afterwards).
Note: There's really no concrete answer to your question so odds are high that it will get closed as SO-inappropriate.

Javascript Eval and Function constructor usage

I'm currently experimenting on self replicating code. Out of love for the language I'd like to write it in javascript.
I'm working on a program that writes a function's code which in turn writes its function own code and so on. Basically, the desired process is this:
I manually create A function which returns code (which includes some randomness) and a numeric value (proposed solution to a problem).
I call this function a number of times, evaluate the results of each of those returned functions, and continue the process until I have code that is sufficiently good for what I'm trying to do.
Now, I have always been told how eval is evil, how never to use it and so on. However for my specific use case it seems like the Function constructor or eval are exactly what I'm looking for.
So, in short the question is:
Are eval/Function constructor indeed the best tools to use in my case? If so, I figured I'd use the Function constructor to scope the code executed, but is there a way to truly limit it from accessing the global scope? Also, what are some good practices for eval usage in my case?
I think I just figured out something I could use:
If I run my javascript code using node.js I can use the vm module which allows me to execute javascript code safely in a new context, and without letting the executed code have any access to the local or global scopes.
vm.runInNewContext compiles code, then runs it in sandbox and returns the result.
Running code does not have access to local scope. The object sandbox will be used as
the global object for code. sandbox and filename are optional, filename is only used in
stack traces.
You can see a full example here: vm.runInNewContext
This will allow me to eval code safely, and seems to be the safest way (I found) currently available. I think this is a much better solution than eval or calling the Function constructor.
Thank you everyone who helped.
Unfortunately I believe there is no way to prevent it from accessing the global scope. Suppose for example that in a web browser i evaled some code like this :
(function(window) {
eval(script);
)(null));
Any time the script tries to access window - it will get an error, since window is null. However someone who knew what they were doing could always do this :
var global = (function() {
return this;
}());
Since when you invoke a function in what Crockford calls the "function invocation style" then the this is always bound to the global variable.

JSLint fails file due to undefined variable

I have just started putting JSLint into my build pipeline and it has been great. Although it has pointed out something in most of my files that is not an error as such, although it will see it as one. I have changed my constructor to now take an instance of this object so the tests pass, however I am not sure if I really should, as in all other major languages I would not need to do this.
I will have to add some more context to this, for it to make any sense so here goes.
I have the closest thing I can get to an Enum in javascript which is basically a globally scoped JSON style variable with a load of constants, it is used to describe event types so rather than every class that wants to raise/listen to events having to put hard coded strings, it can just use a constant from this enum variable. As I just mentioned I have these classes that make use of this static enum, but they just make use of the global version of this variable, not a local instance passed through the constructor, and this is where my problems begin, as in the actual app I know for a fact that the enum file will included which will make it globally accessible. However JSLint has no context of this, so it only sees an individual file without worrying about external dependencies, as it deems these to be bad, which in any other language would be true, but in JS you cannot achieve the same thing without having global variables (to my knowledge).
As I said originally, I have now added this enum to the constructor to let JSLint pass the files, however it just feels a bit wrong passing it in, but maybe this is because I am thinking about it as a regular developer and not a javascript developer...
Now should I stick to this, and pass it through the constructor, and just mock it in my tests, or should I take the approach that it should always be there?
I am sure this will be down to peoples personal opinions, but it would be nice to know if I am being an idiot and should just keep each file as its own silo, or if there is a way for me to have my cake and eat it.
I found that JSLint supports a comment to tell it of global variables:
/*globals myGlobal*/
I have decided to just use my enum as a global and get on with the more important things.
JavascriptLint greatly improves on JSLint in this respect, as it allows you to define inter-file dependencies:
If a script references a variable, function, or object from another script, you will need to add a /*jsl:import PathToOtherScript*/ comment in your script. This tells JavaScript Lint to check for items declared in the other script. Relative paths are resolved based on the path of the current script.
See: http://javascriptlint.com/docs/index.htm

Categories