Fetching tokens present in javascript file - javascript

I'm writing some client side javascript code that we expect to be reverse engineered at some point. I want a way that we can say with a reasonable degree of certainty that we are not exposing any debug information. We are running the code through uglify, so it obfuscates the variable names.
My thought was to fetch all the string literals (and tokens?) from the file and match them using a jest snapshot, but I can't find a tool that will pull the information from the JS file. Does anyone have any experience doing this?
Update:
I think it would help if I gave an example
Suppose someone were to write this code:
function processSensitiveData () {
console.log('processing sensitive data')
doMoreThings();
}
processSensitiveData();
There is nothing wrong with that because we strip console.log statements in production, and uglify transforms the function name, so no sensitive information is present. Suppose someone were to modify it to this code:
function processData (dataType) {
console.log('processing ' + dataType + ' data');
doMoreThings();
}
processData('sensitive');
Now, while the console.log statement gets stripped out, 'sensitive' will still be in the final output. It's this kind of thing that we want to avoid. While code reviews are the first line of defense, this is very likely to be missed in a code review, especially if combined with other changes. I'd like to have the computer do it, as that is something that a computer would do much better than a human. Ideally, it would be a linting rule, but that would be a very complicated rule to write.

Code reviews. This way if you review any and all code checked into source you will be aware if there is debugging information that is trying to sneak its way into your repo. Besides that you can keyword search before deployment as well for 'alert', 'debug' and other debugging JavaScript keywords.

Related

How to convert wasm back to C++ if I also have the original code

I'm not the best at explaining this stuff but here I go.
I have a program that uses "tesseract.js" to read an image every second or so.
10% of images have an "Empty page!!" error message, but I don't need or want this error message flooding my otherwise useful error log. I want to remove it from the source code, however, it isn't fired from the easily readable js code...
I assume it is fired from the wasmBinaryFile section, which (if I understand correctly) is a wasm binary compiled version of the original C++ (Tesseract 4.1.1)
In C++ Tesseract, the error message is fired from \src\textord\colfind.cpp line 366. If I knew where the equivalent section of the binary code was, I assume that I could remove it.
I know that decompiling wasm to C++ won't necessarily be understandable, but if I did it, would I be able to compare it to the source code for Tesseract and either find the section I need to remove or be able to recompile it for use again?
If so, would someone be able to point me towards a good software to do this?
You don't need to revers engineer that code, tesseract is open source, has a github page and you can look at the source code here :
https://github.com/tesseract-ocr/tesseract/blob/main/src/textord/colfind.cpp.
It also means you can use git to get a local copy, modify and compile it. Probably you can even find a way to change tprintf

How to display all JavaScript global variables with static code analysis?

I know that I can type into Chrome or FF the following command:
Object.keys(window);
but this displays DHTMLX stuff and also function names in which I'm not interested in. And it doesn't display variables in functions that have not been executed yet. We have more than 20,000 lines of JavaScript codebase, so I would prefer static code analyis. I found JavsScript Lint. It is a great tool but I don't know how to use it for displaying global vars.
So I'm searching for memory leaks, forgotten var keywords, and so on...
To do [only] what you're asking, I think you're looking for this:
for each (obj in window) {
if (window.hasOwnProperty(obj)) {
console.log(obj);
}
}
I haven't linted that code, which is unlike me, but you get the idea. Try setting something first (var spam = "spam";) and you'll see it reported on your console, and not the cruft you asked about avoiding.
That said, JLRishe is right; JSLint executes JavaScript in your browser without "phoning home", so feel free to run it. There are also many offline tools for JSLinting your code. I use a plugin for Sublime Text, for instance.
If you'd like some simplest-case html/JavaScript code to "wrap" JSLint, I've got an example here. Just download the latest jslint.js file from Crockford's repository into the same directory, and poof, you're linting with a local copy of JSLint.js. Edit: Added code in a new answer here.
Though understand that you're linting locally with that wrapper or when you visit JSLint.com. Honestly, I can say with some confidence, Crockford would rather not see our code. ;^) You download JSLint.js (actually webjslint, a minified compilation of a few files) from JSLint.com, but you execute in the browser.
(Admittedly, you're technically right -- you never know when that site could be compromised, and to be completely on the up and up, you sh/c/ould vet jslint.js each time you grab a fresh copy. It is safer to run locally, but as of this writing, you appear safe to use JSLint.com. Just eyeball your favorite browser's Net tab while running some test, non-proprietary code, and see if any phoning happens. Or unplug your box's network cable!)
Rick's answer to use "use strict"; is another great suggestion.
A great way to catch undeclared variables is to add 'use strict' to your code.
The errors will appear in the console, or you could display them in a try ... catch block:
'use strict';
try {
var i= 15;
u= 25;
} catch(ee) {
alert(ee.message);
}
I found a very good solution to list all the global variables with the jsl command line tool:
Here is the documentation
I just have to put /*jsl:option explicit*/ into each file that I want to check. Then it is enough to run ./jsl -process <someFile> | grep 'undeclared identifier'
It is also possible to use referenceFile that contains some intentional global variables /*jsl:import <referenceFile>*/ so these variables will not be listed.

is it possible to obfuscate while using soma.js dependency injection?

While looking at how to make JavaScript source code more secure I came upon a lot of 'solutions'. but most people said the same thing; "It's not possible to make your source code 100% secure", "try obfuscation", "run your code server side", etc, etc. After reading a lot posts here on stackoverflow, and other sites I came to the conclusion that a combination of minifying and obfuscating would do the job (for me).
But here is the problem: we are currently using soma.js with dependency injection, and the way we set it up it does not work well with obfuscation. It's basically this:
var session = function(id, sessionModel){
this._sessionmodel = sessionModel;
}
mapping:
injector.mapClass("sessionModel", project.SessionModel, true);
Obfuscation will then rename the sessionModel in the function to for example 'A', but the mapping that was done on SessionModel by the injector still remains 'sessionModel' and not 'A', this then basically breaks the code.
I've read this post which is about the same subject Dependency Injection and Code Obfuscation, but it does not provide a real answer to my problem so I decided to write my own question.
Any tips/hint/suggestions are welcome.
Thanks in advance.
EDIT
It seems you can tell Yuicompressor to exclude certain identifiers by putting in 'hints' into the files like this: "identifier:nomunge, identifier2:nomunge".
var session = function(id, sessionModel){
"sessionModel:nomunge";
this._sessionmodel = sessionModel;
}
I tested this and it works but that means you'll have to put it in yourself which is a lot of work if you have to do that for every script, especially if you have a very big project..
Gonna look into it further, and update this post if anything new pops up
EDIT 2
It's been a while, I only work 1 day a week on this =S.
As said before you can get it working by telling it which identifiers to exclude.
For that I looked into regular expression to get the "mapped classes" programmatically, since doing it by hand is just insane.
What I basically did was instead of putting every hint in by hand, I made a identifier, for example "#nomunge"; and used a simple replaceregexp task to find it and replace it with a string containing all the identifiers. This string is build by loading the script and going through it with a tokenfilter.
<target name="build hints">
<loadfile property="hints" srcFile="${temp.loc}/all.js">
<filterchain>
<tokenfilter delimoutput=":nomunge,">
<ignoreblank/>
<containsregex pattern="${regexp}"/>
</tokenfilter>
</filterchain>
</loadfile>
<echo message="${hints}"/>
</target>
<replaceregexp file="${temp.loc}/all.js"
match="#nomunge"
flags = "g"
replace = "target:nomunge, dispatcher:nomunge, injector:nomunge,${hints}"
/>
This seems to do the job, for now...
I'm behind the soma.js framework, feel free to ask me questions on the google group, happy to help.
This might help a bit more:
https://groups.google.com/forum/#!topic/somajs/noOX2R4K58g
Romu

How would I solve a coding puzzle with Javascript?

There is a website called Gild.com that has different coding puzzles/challenges for users to do. They can be completed in wide array of languages including Javascript. I am interested in solving these puzzles in Javascript, but I am unsure of the following:
How am I supposed to access the input file which is supposed to be passed as an argument?
How am I supposed to output the result?
My understanding of Javascript is that it is run from within an HTML page and that output really is only in the form of placing values in the HTML, modifying the DOM, etc. For that reason it is not clear to me how Javascript can be used for solving these types of problems. Can someone who has used Gild before or has some insights into my question suggest how to proceed?
An example of a problem would be: the given input file contains a positive integer, find the sum of all prime numbers smaller than that integer and output it.
EDIT: Some of the solutions below involve using external resources, but on Gild, I am supposed to put my solution in their editor and then submit it that way, like the following picture shows:
In other words, I don't think my solution can have access to Node.js or other external resources.
Edit: Here are some interesting articles that I have found that I think are the answer to my question:
http://www.phpied.com/installing-rhino-on-mac/
http://www.phpied.com/javascript-shell-scripting/
I haven't spent much time on Gild, but I do a lot of similar types of problems on Project Euler. I think the best way to go is probably Node.js.
If you're not familiar, Node is basically server-side JavaScript that runs in Google's V8 engine. Installing it on your own Mac/Windows machine takes about 2 minutes. It's also really fast (considering it's JavaScript).
And you'd use it like this:
var fs = require('fs'); // the filesystem module
var contents = fs.readFileSync('theFile.txt', 'utf-8');
// Do stuff with the file contents...
Everything after those first two lines can be done with the same JS you'd write in the browser, right down to calling console.log() to spit out the answer.
So, if you wrote your script in a file on your desktop called getprimes.js, you'd open up your terminal and enter node ~/Desktop/getprimes.js (assuming you're on a Mac)
If you're:
on a Mac,
planning to do a lot of these puzzles, and
willing to pay $10, then
I highly recommend CodeRunner. It encapsulates runtimes for a variety of languages — from C to JavaScript — and lets you quickly build and run any sort of one-off code. Just hack together your code, ⌘R, and the results are printed right there in the same window.
I haven't used any file-based JavaScript in CodeRunner, but I imagine kennis's suggestions would apply. To output your results:
console.log(...)
Easy as pie!

Is Annotation in Javascript? If not, how to switch between debug/productive modes in declarative way?

This is but a curious question. I cannot find any useful links from Google so it might be better to ask the gurus here.
The point is: is there a way to make "annotation" in javascript source code so that all code snippets for testing purpose can be 'filtered out' when project is deployed from test field into the real environment?
I know in Java, C# or some other languages, you can assign an annotation just above the function name, such as :
// it is good to remove the annoying warning messages
#SuppressWarnings("unchecked")
public class Tester extends TestingPackage
{
...
}
Basically I've got a lot of testing code that prints out something into FireBug console.
I don't wanna manually "comment out" them because the guy that is going to maintain the code might not be aware of all the testing functions, so he/she might just miss one function and the whole thing can be brought down to its knees.
One other thing, we might use a minimizer to "shrink" the source code into "human unreadable" code and boost up performance (just like jQuery.min), so trying to match testing section out of the mess is not possible for plain human eyes in the future.
Any suggestion is much appreciated.
You can overwrite the Firebug console functions so they do nothing:
console.log = function() { };
You could perhaps include this into your code in your build process.

Categories