I was wondering if it would be possible to create a JavaScript function to check a condition, and if that is True then I deny access to the code? Right now I am checking the user agent, and if it doesn't meet given criteria then I delete the HTML tag. However, if they go to the network tab then they can still see the GET requests and responses for my code.
This is a website running on localhost because it's an Electron app by the way.
I thought maybe I could issue a 403 error but I'm not sure if that's possible via JS.
Thanks
In an Electron app, your code is still JavaScript source code. You can obfuscate it and/or put it in a ASAR archive to make it harder to read and fine, but the code is still there and accessible to anyone who wants to go to the effort. (For instance, if you use VS Code, you can see the source in resources/app/out directory and its subdirectories.)
You can make it even harder to find and understand the source if you're willing to put in more work. V8, the JavaScript engine Node.js uses, has a feature called startup snapshots: You run V8 and have it load your script (your obfuscated code), take a heap snapshot (after GC), and write it to a file. Then you specify the heap snapshot when loading V8, and it just hauls in the snapshot instead of reading and parsing code. The Atom team have done this on top of Electron. In their case the motivation was startup performance, not hiding the source code, but it has the effect of making your code even harder to find.
Note I said harder, not impossible. At the end of the day, if you want the code to execute on the end-user's computer, that code is going to be accessible to the end user. This is true even of compiled applications, although of course in that case a lot of organizational information that helps a human understand the code is lost in compilation, which happens less with obfuscated code. (But a good obfuscator makes code extremely opaque to human beings.)
Related
Good afternoon!
We're looking to get a javascript variable from a webpage, that we are usually able to retrieve typing app in the Chrome DevTools.
However, we're looking to realize this headlessly as it has to be performed on numerous apps.
Our ideas :
Using a Puppeteer instance to go on the page, type the command and return the variable, which works, but it's very ressource consuming.
Using a GET/POST request to the page trying to inject the JS command, but we didn't succeed.
We're then wondering if there will be an easier solution, as a special API that could extract the variable?
The goal would be to automate this process with no human interaction.
Thanks for your help!
Your question is not so much about a JS API (since the webpage is not yours to edit, you can only request it) as it is about webcrawling / browser automation.
You have to add details to get a definitive answer, but I see two scenarios:
the website actively checks for evidence of human browsing (for example, it sits behind CloudFlare and has requested this option); or the scripts depend heavily on there being a browser execution environment available. In this case, the simplest option is to automate a browser, because a headless option has to get many things right to fool the server or the scripts. I would use karate, which is easier than, say, selenium and can execute in-browser scripts. It is written in Java, but you can execute it externally and just read its reports.
the website does not check for such evidence and the scripts do not really require a browser execution environment. Then you can simply download everything requires locally and attempt to jury-rig the JS into executing in any JS environment. According to your post, this fails; but it is impossible to help unless you can describe how it fails. This option can be headless.
You can embed Chrome into your application and instrument it. It will be headless.
We've used this approach in the past to copy content from PowerPoint Online.
We were using .NET to do this and therefore used CEFSharp.
Here is the situation: A complex web app is not working, and it is possible to produce undesired behavior consistently. The cause of the problem is not known.
Proposal: Trace the execution paths of all javascript code. Essentially, produce two monstrous logs which can then be fed into a diff algorithm to determine where the behavior related to the bug begins to diverge (as the cause is not apparent from application behavior, and both comprehending and obtaining a copy of the actual JS code being run is difficult, due to the many pages that must be switched to and copied out from the web inspector. Making it difficult is the fact that all pages are dynamically spliced together with Perl code, where significant portions of JS code exist only as (dynamic...) Perl strings).
The Web Inspector in Chrome does not have an option that I know about for logging an execution trace. Basically what I would like is a log of every line of JS that is executed, in the order that they are executed. I don't see this as being a difficult thing to obtain given that the JS VM is single-threaded. The problem is simply that the existing user-facing tools are not designed for quite this much hardcore debugging. If we look at the Profiler in the Dev Tools, it's clearly capable of the kind of instrumentation that I need, but it is fundamentally designed to do profiling instead of tracing.
How can I get started with this? Is there some way I can build Chrome from source, where I can
switch off JIT in V8?
log every single javascript expression evaluated by V8 to a file
I have zero experience with the development side of Chrome. So e.g. links to dev-builds/branches/versions/distros of Chrome/Chromium/Canary (what's the difference?) are welcome.
At this point it appears that instrumenting the browser with powerful js tracing is still likely to be easier than redesigning the buggy app. The architecture of the page is a disaster, but the functionality is complex, and it almost fully works. I just have to find the one missing piece.
Alternatively, if tools of this sort already exist, what are some other keywords I can search for them with? "Code Tracing" is pretty much the only thing I can come up with.
I tested dynaTrace, which was a happy coincidence as our app supports IE (indeed Chrome support just came out of beta), but this does not produce a text dump, it basically produces a massive Win32 UI expando-tree, which is impossible to diff. This makes me really sad because I know how much more difficult it was to make the representation of the trace show up that way, and yet it turns out being almost utterly useless. Who's going to scroll up and down that tree view and see anything really useful in it, in anything other than a toy example of a web app?
If you are developing a big web app, it is always good to follow a test driven strategy for the coding part of it. Using just a few tips allows you to make a simple unit testing script (using QUnit) to test pretty much all aspects of your app. Here are some potential errors and some ways of solving them.
Make yourself handlers to register long living Objects and a handler to close them the safe way. If the safe way does not succeed then it is the management of the Object itself failing. One example would be Backbone zombie views. Either the view has bad code in the close section, the parent close is not hooked or an infinite loop happened. Testing all view events is also good, although tedious.
By putting all the code for data fetching inside a certain module (I often use a bunch of Backbone.Model objects for each table/document in my DB) and handlers for each using a reqres pattern, you can test them 1 by 1 to see if they all fetch and save correctly.
If complex computation is needed, abstract it in a function or module so that it can be easily tested with known data.
If your app uses data binding, a good policy is to have a JSON schema for all data to be tested against your views containing your bindings. Check against the schema all the data required. This is applied to your Backbone.Model also.
Using a good IDE also helps. PyCharm (if you use Python for backend) or WebStorm are extremely good for testing and developing JavaScript/CoffeeScript. You can breakpoint and study your code at specific locations, inside your browser! Also it runs your code for auto-completion and you can see some of the errors that way.
I just cannot encourage enough the use of modules in your code. Although there is no JavaScript official way of doing it (next ECMAScript draft has it), you can still use good libraries for it. Good ones are: RequireJS, CommonJS or Marionette.Module (if you use Marionette as your framework). I think Ember/AngularJS also offers this kind of functionality but I did not work with them personally so I am not sure.
This might not give you an immediate solution to your problem and I don't think (IMO) there is an easy one either. My focus was to show you ways to develop so that errors can be easily spotted and countered, and all of it (depending on your Unit Testing) during development phase. Errors will always happen, as much as our programmer ego wants us to believe the contrary. Hope I helped :)
I would suggest a divide and conquer strategy, first via logging, and second via code. Wrap suspect methods of the code with console logging in and out events and when the bug occurs hopefully it is occurring between or after some event. If event logging will not shed light, bring parts of the code into a new page/app. Divide and conquer will find when the error starts occurring.
So it seems you're in the realm of weird already, so I'm thinking of a weird solution. I know nothing about the backend of chrome myself so we're in the same boat, but if you're feeling bold here's an idea. Maybe you could find/replace every newline in your javascript files with a piece of code that logs either to a global string or to the console a) what file you're in, b) the contents of "this" or something useful to you, and maybe even c) the time. This would at least get you started. Make sure it's wrapped in something distinct so you can just as easily remove it.
Because I have been around engineers for so many years, I know that if I don't provide context, I'm just going to get a hundred answers of the form "What are you trying to accomplish?" I am going to give the background which motivates my question. But don't confuse the background context for the question I am asking, which is specifically related to the JavaScript semantics that made object code uncacheable between padge requests. I am not going to give marks for advice on how to make my webapp faster. It's completely tangential to my question, which will probably only be answerable by someone who has worked on a JavaScript compiler or at least the compiler for a dynamic language.
Background:
I am trying to improve the performance of a web application. Among its many resources, it contains one enormous JavaScript file with 40k lines and 1.3million characters pre-minification. Post-minification it's still large, and it still adds about 100ms to the window.onload event when synchronously loaded, even when the source is cached client-side. (I have conclusively ruled out the possibility that the resource is not being cached by watching the request logs and observing that it is not being requested.)
After confirming that it's still slow after being cached, I started doing some research on JavaScript caching in the major browsers, and have learned that none of them cache object code.
My question is in the form of some hypothetical assertions based on this research. Please object to these assertions if they are wrong.
JavaScript object code is not cached in any modern browser.
"Object code" can mean anything from a byte code representing a simple linearized parse tree all the way to native machine code.
JavaScript object code in a web browser is difficult to cache.
In other words, even if you're including a cached JS source file in an external tag, there is a linear cost to the inclusion of that script on a page, even if the script contains only function definitions, because all of that source needs to be compiled into an object code.
JavaScript object code is difficult to cache because JS source must evaluated in order to be compiled.
Statements have the ability to affect the compilation of downstream statements in a dynamic way that is difficult to statically analyze.
3a. (3) is true mostly because of eval().
Evaluation can have side effects on the DOM.
Therefore, JavaScript source needs to be compiled on every page request.
Bonus question: do any modern browsers cache a parse tree for cached JS source files? If not, why not?
Edit: If all of these assertions are correct, then I will give the answer to anyone who can expound on why they are correct, for example, by providing a sample of JS code that couldn't be cached as object code and then explaining why not.
I appreciate the suggestions on how to proceed from here to make my app faster, and I mostly agree with them. But the knowledge gap that I'm trying to fill is related to JS object code caching.
You're right in that it's dynamically compiled and evaluated.
You're right that it must be.
Your recourse isn't in trying to make that compile-time smaller.
It needs to be about loading less to begin with, doing the bare-minimum to get the user-experience visible, then doing the bare minimum to add core functionality in a modular fashion, then lazily (either on a timer, or as-requested by the end-user) loading in additional features, functionality and flourishes.
If your program is 10,000 lines of procedural code, then you've got a problem.
I'm hoping it's not all procedural.
So break it up. It means a slower 1st-page load. But on subsequent requests, it might mean much faster response-times as far as what the user perceives as "running", even though it will take longer to get to 100% functional.
It's all about the user's perception of "speed" and "responsiveness", and not about the shortest line to 100% functional.
JavaScript, in a single-threaded format, can't both do that and be responsive.
So be responsive first.
PS: Add a bootstrap. An intelligent bootstrap.
It should be able to discern which features are needed.
RequireJS is for loading dependencies.
Not for figuring out what your dependencies are.
An added benefit -- you can set a short-term cache on the bootstrap, which will point to versioned modules.
How is this a benefit? Well, if you need to update a module, it's a simple process to update the version in the bootstrap. When the bootstrap's cache expires, it points at the new module, which can have an infinite lifetime (because it's got a different name -- versioned or timestamped);
Currently, our developers unobfuscate the javascript code so they can better QA our code in staging before we release to production.
However, sometimes, they forget to obfuscate the code before releasing to production.
I was wondering if there is a way to expire unobfuscated javascript so that, even if the QA developers forget to obfuscate the js it will auto obfuscate the js after a certain time frame (say 12 hours)?
No, what you are asking for does not make sense.
The problem is with your process: why is the obfuscation happening before your QA developers get the product? Why is it not between them and the final customer release (even if it is necessary)?
Try redesigning the path the code takes between your core developers and the customers' hands instead of looking for a technical solution for a business problem.
Javascript does not obfuscate itself so this question doesn't make a whole lot of sense.
Obfuscation is done by a separate tool in the build/release process. What you need is to improve/automate part of the release process to remove the chances of human error in that process. This can either be done with a more rigorous manual process or with more automatation.
In general, the QA team should be testing the EXACT same code that gets deployed on the final site so if that's obfuscated, that's what QA should test. So, first of all, I'd review what QA is doing and why. They should be testing the obfuscated code.
If QA needs to review the unobfuscated code for any reasons (I can't think of any likely reasons myself), then they should make their own copy of the code on their own systems that is unobfuscated and should not be putting the unobfuscated code anywhere in the release process.
Lastly, it sounds like you would benefit from building an automated release process that does the obfuscation and deploys to both the QA test environment and the same process deploys to your production servers. That guarantees that obfuscation is in place and that QA is testing the exact same bits that would go to production when it's released.
Your process is wrong. If the devs need unobfuscated javascript for debugging, make it such that the server itself has a debug flag to return unobfuscated code. Default to obfuscated code. Require both be present. Work the obfuscation into a release script in the release process.
You don't want QA testing something other than the build you're deploying, so obfuscating after the fact is unideal (imagine you hit a bug in the obfuscator... as likely as it seems, would you rather catch it in QA, or not until it hits prod?
By defaulting to obfuscated code, but allowing plain code via say session state or cookie, you get both correctly tested releases, and easily debuggable ones.
I wanted to hide some business logic and make the variables inaccessible. Maybe I am missing something but if somebody can read the javascript they can also add their own and read my variables. Is there a way to hide this stuff?
Any code which executes on a client machine is available to the client. Some forms of code are harder to access, but if someone really wants to know what's going on, there's no way you have to stop them.
If you don't want someone to find out what code is being run, do it on a server. Period.
That's one of the downsides of using a scripting language - if you don't distribute the source, nobody can run your scripts!
You can run your JS through an obfuscator first, but if anyone really wants to figure out exactly what your code is doing, it won't be that much work to reverse-engineer, especially since the effects of the code are directly observable in the first place.
Javascript cannot be compiled, that is, it is still Javascript.
But, there's this: http://dean.edwards.name/packer/
Generally, this is used to reduce the code footprint of the Javascript, if say your script is being downloaded thousands of times per minute. There are other methods to accomplish this, but as for hiding the code this sort of works.
Granted, the code can be unpacked. This will keep out a novice but anyone who is determined to read your source code will find a way.
It is even this way with compiled languages, even when they have been obfuscated. It's impossible to hide your code 100% of the time -- if it executes on your machine, it can be read by a determined hacker.
You could encrypt it so no one can read it.
For example
http://daven.se/usefulstuff/javascript-obfuscator.html
You must always validate the data you send back. I've had a rather entertaining time playing pranks on a forum I'm a mod of by manipulating the pages with the Web Developer Toolbar. Whether or not you obfuscate it, always assume that data coming to the server has been intentionally manipulated. Only after you prove it hasn't (or verify the user has permission to act) do you handle the request.