JavaScript object code caching: which of these assertions are wrong? - javascript

Because I have been around engineers for so many years, I know that if I don't provide context, I'm just going to get a hundred answers of the form "What are you trying to accomplish?" I am going to give the background which motivates my question. But don't confuse the background context for the question I am asking, which is specifically related to the JavaScript semantics that made object code uncacheable between padge requests. I am not going to give marks for advice on how to make my webapp faster. It's completely tangential to my question, which will probably only be answerable by someone who has worked on a JavaScript compiler or at least the compiler for a dynamic language.
Background:
I am trying to improve the performance of a web application. Among its many resources, it contains one enormous JavaScript file with 40k lines and 1.3million characters pre-minification. Post-minification it's still large, and it still adds about 100ms to the window.onload event when synchronously loaded, even when the source is cached client-side. (I have conclusively ruled out the possibility that the resource is not being cached by watching the request logs and observing that it is not being requested.)
After confirming that it's still slow after being cached, I started doing some research on JavaScript caching in the major browsers, and have learned that none of them cache object code.
My question is in the form of some hypothetical assertions based on this research. Please object to these assertions if they are wrong.
JavaScript object code is not cached in any modern browser.
"Object code" can mean anything from a byte code representing a simple linearized parse tree all the way to native machine code.
JavaScript object code in a web browser is difficult to cache.
In other words, even if you're including a cached JS source file in an external tag, there is a linear cost to the inclusion of that script on a page, even if the script contains only function definitions, because all of that source needs to be compiled into an object code.
JavaScript object code is difficult to cache because JS source must evaluated in order to be compiled.
Statements have the ability to affect the compilation of downstream statements in a dynamic way that is difficult to statically analyze.
3a. (3) is true mostly because of eval().
Evaluation can have side effects on the DOM.
Therefore, JavaScript source needs to be compiled on every page request.
Bonus question: do any modern browsers cache a parse tree for cached JS source files? If not, why not?
Edit: If all of these assertions are correct, then I will give the answer to anyone who can expound on why they are correct, for example, by providing a sample of JS code that couldn't be cached as object code and then explaining why not.
I appreciate the suggestions on how to proceed from here to make my app faster, and I mostly agree with them. But the knowledge gap that I'm trying to fill is related to JS object code caching.

You're right in that it's dynamically compiled and evaluated.
You're right that it must be.
Your recourse isn't in trying to make that compile-time smaller.
It needs to be about loading less to begin with, doing the bare-minimum to get the user-experience visible, then doing the bare minimum to add core functionality in a modular fashion, then lazily (either on a timer, or as-requested by the end-user) loading in additional features, functionality and flourishes.
If your program is 10,000 lines of procedural code, then you've got a problem.
I'm hoping it's not all procedural.
So break it up. It means a slower 1st-page load. But on subsequent requests, it might mean much faster response-times as far as what the user perceives as "running", even though it will take longer to get to 100% functional.
It's all about the user's perception of "speed" and "responsiveness", and not about the shortest line to 100% functional.
JavaScript, in a single-threaded format, can't both do that and be responsive.
So be responsive first.
PS: Add a bootstrap. An intelligent bootstrap.
It should be able to discern which features are needed.
RequireJS is for loading dependencies.
Not for figuring out what your dependencies are.
An added benefit -- you can set a short-term cache on the bootstrap, which will point to versioned modules.
How is this a benefit? Well, if you need to update a module, it's a simple process to update the version in the bootstrap. When the bootstrap's cache expires, it points at the new module, which can have an infinite lifetime (because it's got a different name -- versioned or timestamped);

Related

Can I effectively use transpilation, code tokenzation/regeneration, a VM or a similar approach to guaranteeably sandbox/control code within a browser?

It looks like I'm asking about a tricky problem that's been explored a lot over the past decades without a clear solution. I've seen Is It Possible to Sandbox JavaScript Running In the Browser? along with a few smaller questions, but all of them seem to be mislabeled - they all focus on sandboxing cookies and DOM access, and not JavaScript itself, which is what I'm trying to do; iframes or web workers don't sound exactly like what I'm looking for.
Architecturally, I'm exploring the pathological extreme: not only do I want full control of what functions get executed, so I can disallow access to arbitrary functions, DOM elements, the network, and so forth, I also really want to have control over execution scheduling so I can prevent evil or poorly-written scripts from consuming 100% CPU.
Here are two approaches I've come up with as I've thought about this. I realize I'm only going to perfectly nail two out of fast, introspected and safe, but I want to get as close to all three as I can.
Idea 1: Put everything inside a VM
While it wouldn't present a JS "front", perhaps the simplest and most architecturally elegant solution to my problem could be a tiny, lightweight virtual machine. Actual performance wouldn't be great, but I'd have full introspection into what's being executed, and I'd be able to run eval inside the VM and not at the JS level, preventing potentially malicious code from ever encountering the browser.
Idea 2: Transpilation
First of all, I've had a look at Google Caja, but I'm looking for a solution itself written in JS so that the compilation/processing stage can happen in the browser without me needing to download/run anything else.
I'm very curious about the various transpilers (TypeScript, CoffeeScript, this gigantic list, etc) - if these languages perform full tokenization->AST->code generation that would make them excellent "code firewalls" that could be used to filter function/DOM/etc accesses at compile time, meaning I get my performance back!
My main concern with transpilation is whether there are any attacks that could be used to generate the kind code I'm trying to block. These languages' parsers weren't written with security in mind, after all. This was my motivation behind using a VM.
This technique would also mean I lose execution introspection. Perhaps I could run the code inside one or more web workers and "ping" the workers every second or so, killing off workers that [have presumably gotten stuck in an infinite loop and] don't respond. That could work.

Do comments in NodeJs affect performance?

I am in the process of having to refactor an entire NodeJs project which is quite large. One of the biggest problems I'm facing is that my predecessor included literally no documentation.
I'm used to clientside js, where comments can be stripped via uglify (or similar) before deploying to a production environment.
Is there something similar for node, or how do people handle this? Is the performance impact of comments negligible?
Comments do not affect code performance in a significant way. Neither in the in client nor in the server.
What happens in the client is that if you're including JavaScript with comments those lines are still being downloaded by the browser, without no extra benefit for the user.
In client side code, comments add to the file size that needs to be sent to the browser, so that is why tools at used to remove comments. On the other hand, the comments in server side code doesn't make much difference.
Comments do not affect performance in a significant matter. How I understand it, the javascript program is being loaded into the memory. In this process the comments are being ignored and are not loaded into the memory. Meaning that only during the loading of your application you could experience extremely minor increase of loading time while having a lot of comments. But this is negligible.
Using uglify would not be necessary since users cannot read your NodeJS code. And it would make the newly refactored code less readable for you (which would be counterproductive).
Like Alberto and Konst are pointing out is that uglify can be used to reduce file size for downloading by the client.
Note: I do not know if i am exactly correct, please do correct me if i am wrong.

Trace the execution of ALL Javascript in a web app

Here is the situation: A complex web app is not working, and it is possible to produce undesired behavior consistently. The cause of the problem is not known.
Proposal: Trace the execution paths of all javascript code. Essentially, produce two monstrous logs which can then be fed into a diff algorithm to determine where the behavior related to the bug begins to diverge (as the cause is not apparent from application behavior, and both comprehending and obtaining a copy of the actual JS code being run is difficult, due to the many pages that must be switched to and copied out from the web inspector. Making it difficult is the fact that all pages are dynamically spliced together with Perl code, where significant portions of JS code exist only as (dynamic...) Perl strings).
The Web Inspector in Chrome does not have an option that I know about for logging an execution trace. Basically what I would like is a log of every line of JS that is executed, in the order that they are executed. I don't see this as being a difficult thing to obtain given that the JS VM is single-threaded. The problem is simply that the existing user-facing tools are not designed for quite this much hardcore debugging. If we look at the Profiler in the Dev Tools, it's clearly capable of the kind of instrumentation that I need, but it is fundamentally designed to do profiling instead of tracing.
How can I get started with this? Is there some way I can build Chrome from source, where I can
switch off JIT in V8?
log every single javascript expression evaluated by V8 to a file
I have zero experience with the development side of Chrome. So e.g. links to dev-builds/branches/versions/distros of Chrome/Chromium/Canary (what's the difference?) are welcome.
At this point it appears that instrumenting the browser with powerful js tracing is still likely to be easier than redesigning the buggy app. The architecture of the page is a disaster, but the functionality is complex, and it almost fully works. I just have to find the one missing piece.
Alternatively, if tools of this sort already exist, what are some other keywords I can search for them with? "Code Tracing" is pretty much the only thing I can come up with.
I tested dynaTrace, which was a happy coincidence as our app supports IE (indeed Chrome support just came out of beta), but this does not produce a text dump, it basically produces a massive Win32 UI expando-tree, which is impossible to diff. This makes me really sad because I know how much more difficult it was to make the representation of the trace show up that way, and yet it turns out being almost utterly useless. Who's going to scroll up and down that tree view and see anything really useful in it, in anything other than a toy example of a web app?
If you are developing a big web app, it is always good to follow a test driven strategy for the coding part of it. Using just a few tips allows you to make a simple unit testing script (using QUnit) to test pretty much all aspects of your app. Here are some potential errors and some ways of solving them.
Make yourself handlers to register long living Objects and a handler to close them the safe way. If the safe way does not succeed then it is the management of the Object itself failing. One example would be Backbone zombie views. Either the view has bad code in the close section, the parent close is not hooked or an infinite loop happened. Testing all view events is also good, although tedious.
By putting all the code for data fetching inside a certain module (I often use a bunch of Backbone.Model objects for each table/document in my DB) and handlers for each using a reqres pattern, you can test them 1 by 1 to see if they all fetch and save correctly.
If complex computation is needed, abstract it in a function or module so that it can be easily tested with known data.
If your app uses data binding, a good policy is to have a JSON schema for all data to be tested against your views containing your bindings. Check against the schema all the data required. This is applied to your Backbone.Model also.
Using a good IDE also helps. PyCharm (if you use Python for backend) or WebStorm are extremely good for testing and developing JavaScript/CoffeeScript. You can breakpoint and study your code at specific locations, inside your browser! Also it runs your code for auto-completion and you can see some of the errors that way.
I just cannot encourage enough the use of modules in your code. Although there is no JavaScript official way of doing it (next ECMAScript draft has it), you can still use good libraries for it. Good ones are: RequireJS, CommonJS or Marionette.Module (if you use Marionette as your framework). I think Ember/AngularJS also offers this kind of functionality but I did not work with them personally so I am not sure.
This might not give you an immediate solution to your problem and I don't think (IMO) there is an easy one either. My focus was to show you ways to develop so that errors can be easily spotted and countered, and all of it (depending on your Unit Testing) during development phase. Errors will always happen, as much as our programmer ego wants us to believe the contrary. Hope I helped :)
I would suggest a divide and conquer strategy, first via logging, and second via code. Wrap suspect methods of the code with console logging in and out events and when the bug occurs hopefully it is occurring between or after some event. If event logging will not shed light, bring parts of the code into a new page/app. Divide and conquer will find when the error starts occurring.
So it seems you're in the realm of weird already, so I'm thinking of a weird solution. I know nothing about the backend of chrome myself so we're in the same boat, but if you're feeling bold here's an idea. Maybe you could find/replace every newline in your javascript files with a piece of code that logs either to a global string or to the console a) what file you're in, b) the contents of "this" or something useful to you, and maybe even c) the time. This would at least get you started. Make sure it's wrapped in something distinct so you can just as easily remove it.

view all javascript variables of a GWT app

This question is mainly for security purposes. I need to know if it is possible to view by any means (plugins, programmatically or whatever) a list of all variables and their values in a gwt application compiled to javascript.
Let's say I have a variable x created by gwt in its normal deployment mode.... let's just ignore how did the value get there... Can the user somehow get to know that there is a var called x and its value...
Please note that I am not looking for software engineering best practices, the question is over simplified so that we get to the point. I know that I should not have anything sensitive on the client on the first place... but please let's just skip that since the case is a much bigger story...
Thanks a lot..
Short awnser... yes..
GWT compiles to javascript and obfuscates everything, that said, all information is available from the compiled source if one knows what to look for. If someone succeeds in injecting a simple script tag into your application, they can simple retreive all scripts through XMLHttpRequest and parse them as text. No matter how obfuscated, it's theoretically possible to get what you want from any javascript source. If you can see it in the raw script file, it's attainable, doesn't really matter if it's locked away in anonymous closures or whatnot, any JS security mechanism can be circumvented.
Main condition is to get control of the page (script injection).
To quote yourself: " I know that I should not have anything sensitive on the client on the first place..."
If it's worth hacking, people will try it.
GWT code is compiled to javascript. So ultimately user can use javascript introspection to discover all objects and their properties.
Short answer - No, not unless you know what you are looking for.
GWT compiler does something called as cross-compiling, it transforms java code into java script/ECMA script. The mapping between a variable in java to that in generated script is not straight forward. The language semantics are not the same; the compiler tries to optimize and generates obfuscated JS (to reduce the size). You can tweak this to certain extent by passing arguments at compile time (by setting PRETTY). This still does not guarantee a one on one mapping.
On different quote, even decompiled java code does not look like the original source. ( thats' the complexity of the problem)

Why doesn't javascript have a better way to include files?

I've seen a few ways that you can make a javascript file include other javascript files, but they all seem pretty hacky - mostly they involve tacking the javascript file onto the end of the current document and then loading it in some way.
Why doesn't javascript just include a simple "load this file and execute the script in it" include directive? It's not like this is a new concept. I know that everyone is excited about doing everything in HTML5 with javascript etc, but isn't it going to be hard if you have to hack around omission of basic functionality like this?
I can't see how it would be a security concern, since a web page can include as many javascript files as it likes, and they all get executed anyway.
The main problems with the current inclusion system (ie, add additional script tags) involve latency. Since a script tag can insert code at the point of inclusion, as soon as a script tag is encountered, further parsing has to more-or-less stop until the JS downloads and is executed (although the browser can continue to fetch resources in parallel). If the JS decides to run an inclusion, you've just added more latency on top of this - now you can't even fetch your scripts in parallel.
Basically, it's trying to solve a problem that doesn't exist (since JS can already tack on additional script tags to do an inclusion), while making the latency problem worse. There are javascript minifiers out there that can merge JS files; you should look into using those instead, as they will help improve latency issues as well.
Actually, YUI 3 solves this problem beautifully. Feel free to check out the documentation: http://developer.yahoo.com/yui/3/yui/#use (that's the specific Use function which does this magic). Basically it works like this:
You define modules
When you create the core YUI object with YUI(), you specify which modules your code needs
Behind the scenes, YUI checks if those modules are loaded. If not, it asynchronously loads them on the page.
I've also read that the jQuery team's working on something similar (someone back me up here).
As to the philosophical argument that it'd be nice if this was built in, I think that may be a good feature. On the other hand, the simplicity of javascript is nice too. It allows a much lower point of entry for beginning programmers to do their thing. And for those of us that need it, great libraries like YUI are getting better every day.
the requirejs project attempts to solve this problem, please see for example
http://requirejs.org/docs/why.html
(I don't use it yet, though)

Categories