Why are javascript programs delivered in plain text? - javascript

Why was it decided to ship javascript programs in plain text? Was it to achieve performance enhancement or the authors never imagined that javascript will be used in much more complex applications and developers may want to protect the source code?

I think part of the reason was that since HTML itself was delivered in plain text to the browser then so should JavaScript. It was the way of the Web.

That's because JavaScript was never intended for large web applications, rather it was a way to do something "cool" on the browser. JavaScript wasn't a very popular language and was despised until the advent of AJAX, that is why there has never been much insight how JavaScript should be distributed. After all, simplest way to send JavaScript to a browser was through regular text files, why would they have bothered with anything else back in 1995?

Take a look at YUI; it will compress javascript files by replacing all the names of variables, functions, etc to stuff like a, b, c, ...
It will also remove all unnecessary whitespaces, newlines, etc. So it will both obfuscate the javascript code and reduce its size.

Text is the one data form which transfers between any pair of computers without concern about computer architecture: endianness, word size, floating point binary representation, negative encoding, etc. Even EBCDIC computers readily handle ASCII.
Though any binary representation scheme can overcome these stumbling blocks—as TCP/IP internals do—code which does this is not completely pleasant to work with—or even to read. Experience is that even the most seasoned engineers occasionally forget to use a needed host-to-network or network-to-host conversion macro.
Indeed, many protocols which primarily transmit numeric information convert values to ASCII notation for transmission largely because of the generality. (PCL and ANSI 3.64 come to mind.) Text based transmission is handily supported by a wide universe of native numeric formatters and parsers, all of which tend to be well optimized. Also, virtually every programming language handily supports text encoded numeric data. Support for binary formatted data varies from adequate to painful.

It's easier to keep plain text secure than it is binary (from buffer overflows etc). It has a lower cost of entry. Minification and gzipping make it efficient. Web development is easier. Need I go on?

Related

How to defeat deobfuscation of obfuscated javascript code?

This is a generic question
I've seen javascript on some websites which is obfuscated
When you try to deobfuscate the code using standard deobfuscators (deobfuscatejavascript.com, jsnice.org and jsbeautifier.org)
, the code is not easily deobfuscated
I know it's practically impossible to avoid deobfuscation. I want to make it really tough for an attacker to deobfuscate it
Please suggest some ways I can acheive this
Should I write my own obfuscator, then obfuscate the output with another online obfuscator. Will this beat it?
Thanks in advance
P.S: I tried google closure compiler, uglifyjs, js-obfuscator and a bunch of other tools. None of them (used individually or in combination) are able to beat the deobfuscators
Obfuscation can be accomplished at several levels of sophistication.
Most available obfuscators scramble (shrink?) identifiers and remove whitespace. Prettyprinting the code can restore nice indentation; sweat and lots of guesses can restore sensible identifier names with enough effort. So people say this is weak obfuscation. They're right; sometimes it is enough.
[Encryption is not obfuscation; it is trivially reversed].
But one can obfuscate code in more complex ways. In particular, one can take advantage of the Turing Tarpit and the fact that reasoning about the obfuscated program can be hard/impossible in practice. One can do this by scrambling the control flow and injecting opaque control-flow control predicates that are Turing-hard to reason about; you can construct such predicates in a variety of ways. For example, including tests based on constructing artificial pointer-aliasing (or array subscripting, which is equivalent) problems of the form of "*p==*q" for p and q being pointers computed from messy complicated graph data structures.
Such obfuscated programs are much harder to reverse engineer because they build on problems that are Turing hard to solve.
Here's an example paper that talks about scrambling control flow. Here's a survey on control flow scrambling, including opaque predicates.
What OP wants is an obfuscator that operates at this more complex level. These are available for Java and C#, I believe, because building program analyzers to determine (and harness) control flow is relatively easy once you have a byte code representation of the program rather than just its text. They are not so available for other languages. Probably just a matter of time.
(Full disclosure: my company builds the simpler kind of obfuscators. We think about the fancier ones occasionally but get distracted by shiny objects a lot).
The public de-obfuscators listed by you use not much more than a simple eval() followed by a beautifier to de-obfuscate the code. This might need several runs. It works because the majority of obfuscators do their thing and add a function at the end to de-obfuscate it enough to allow the engine to run it. It is a simple character replacement (a kind of a Cesar cipher) in most cases and an eval() is enough to get some code, made more or less readable by a beautifier after that.
To answer your question: you can make it tougher ("tougher" in the sense that just c&p'ing it into a de-obfuscator doesn't work anymore) by using some kind of "encryption" that uses a password the the code gets from the server after the first round of de-obfuscation and uses a relative path that the browser completes instead of a full path. That would need manual intervention. Build that path in a complicated and non-obvious way and you have a deterrent for the average script-kiddie.
In general: you need something to de-obfuscate the script that is not in the script itself.
But beware: it does only answer your question, that is, it makes it impossible to de-obfuscate by simple c&p into one of those public de-obfuscators and not more. See Ira's answer for the more complex stuff.
Please be aware of the reasons to obfuscate code:
hide malicious intent/content
hide stolen code
hide bad code
a pointy haired boss/investor
other (I know what that is, but I am too polite to say)
Now, what do the people think, if they see your obfuscated code? That your investor insisted on it to give you money to write that little browser game everyone loves so much?
JavaScript is interpreted from clear text by your browser. If a browser can do it, so can you. It's the nature of the beast. There are plenty of other programming languages out there that allow you to compile/black box before distribution. If you are hell-bent on protecting your intellectual property, compile the server side data providers that your JavaScript uses.
No JavaScript obfuscation or protection can say it makes it impossible to reverse a piece of code. That being said there are tools that offer a very simple obfuscation that is easy to reverse and others that actually turn your JavaScript into something that is extremely hard and unfeasible to reverse. The most advanced product I know that actually protects your code is Jscrambler. They have the strongest obfuscation techniques and they add code locks and anti-debugging features that turn the process of retrieving your code into complete hell. I've used it to protect my apps and it works, it's worth checking out IMO

Feasibility of transporting JavaScript and CSS as binary from server to client?

I am a newbie in the web development field.
I see that minification of JavaScript and CSS is widely used to reduce web-page load times. But, undoubtedly, text format data will be longer than binary format, so why do we still use textual JavaScript and CSS?
Is it possible in the future to use binary format for servers to deliver presentational and behavioral definitions?
I think if there is a common standard to deliver these as binary data, then server-side programs will be created to convert text format JS/CSS produced by web designers to binary format, and network traffic will be greatly reduced.
Can anybody give me some ideas about this?
Gzip is pretty widely deployed http://betterexplained.com/articles/how-to-optimize-your-site-with-gzip-compression/
The feasibility is nil. It would require the existence of a universal standard for binary JavaScript and CSS, understood by all browsers, and by a lot of technology that is peripherally concerned with both.
There isn't one.
It's interesting that you didn't mention a binary version of HTML in your question.
A year ago, W3C published EXI, a specification for binary XML. You can use XML to represent HTML documets, so it is already possible to represent HTML in binary in a standards-compliant way (however, browsers have yet to support this).
CSS is a very regular format, so creating a binary format for it wouldn't be hard. (You might be interested in this.) Standardizing that format, on the other hand, would be.
Maybe in the future, people will write all their code in abstraction languages like SLIM and SASS, which will then be compiled to binary XML, allowing browsers to use one very fast and efficient interface to parse both markup and style.
As others have pointed out, little effort is being spent on developing web standards for more efficient data transfer. The consensus at the moment is that binary formats will complicate things (it will no longer be possible to edit the data directly), won't reduce the size much more than gzip*, and that further reduction in size is not necessary, especially since the introduction of fibre-optic.
* gzip is a general-purpose compression program much more widely used than any domain-specific binary format, and so is much more thouroughly tested and supported.

What is the best way of hiding or encrypting information in comments in Javascript, CSS or HTML code?

While digging through facebook's css and html code I found some comments which seems to be encrypted in order to hide information. This could be some kind of debugging information which might be useful to keep for later use. The comments are looking like this example:
/*[XnbHYrH~LGxMu]p`KYO^fXoOK]wFpBtjKdzjYssGm~[xISvmX0J]xhEMxwV_NjvnWm]jAyo`#}VtxqZ{QC`M|yxHMBLE[ZsaeCgU[aG}|K|`Icu`hxiAzM|j~RRkiO|AF`_KuuEnfd_I[P}BDo`ykXBjUjt_nza#^hh?CEQp~KHR|z`llKuTxM_lJp*/
A quick analysis of the encrypted text with this python snippet ''.join(sorted(set(comment))) shows that only 64 different characters are used.
'0?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~'
In terms of performance, size and browser-compatibility one cheap approach would be a base64 encoding of the raw text with a custom char mapping.
Update: Some of the constraints I would define for a best solution is a fast encoding with low computation time and a small output size for reduced bandwidth. On the other side it should be easy to retrieve the original information with a script and some kind of secret if needed. The usage is more for hiding non-sensitive data, so there is no need for strong encryption. It just should be economical unattractive for someone spending time on it.
I use a huffman code and base64 to encode some data on my website. I think it's very hard to bypass and I get some compression too. That was more an accident I did. But it would be nice if you can explain how you define best in this context? Do you have constraints?
I don't know what they're doing here, but I'd say you should never intentionally send sensitive data or anything you want to hide to a client, regardless of whether it's encrypted or not. Not only is it dangerous (if by some chance your encryption is broken) but it is wasting bandwidth.
If you absolutely need to keep stuff in your sourcecode for some reason, then you should have a pre-release job to strip it out so it never gets published.

Compressing plaintext in JavaScript?

I have a simple Notepad-like web application I'm making for fun. When you save a document, the contents of a <textarea> are sent to the server via Ajax and persisted in a database.
Let's just say for shits and giggles that we need to compress the contents of the <textarea> before sending it because we're on a 2800 baud modem.
Are there JavaScript libraries to do this? How well does plain text compress in the first place?
Simple 7 bit compression might work if you're only using the 7 bit ascii character set. A google search yielded this: http://www.iamcal.com/png-store/
Or you could use LZW
http://rosettacode.org/wiki/LZW_compression#JavaScript
As far as compression ratio; according to Dr. Dobbs:
It is somewhat difficult to characterize the results of any data compression technique. The level of compression achieved varies quite a bit, depending on several factors. LZW compression excels when confronted with data streams that have any type of repeated strings. Because of this, it does extremely well when compressing English text. Compression levels of 50 percent or better can be expected.
Well, you couldn't use gzip comppression. See here: Why can't browser send gzip request?
I suppose you could strip whitespace, but that would prove unsustainable. I'm not sure if this is an itch that needs scratching.
I did find this with a google search: http://rumkin.com/tools/compression/compress_huff.php That will eventually yield a smaller set of text, if the text is large enough. It actually inflates the text if the text is short.
I also found this: http://www.sean.co.uk/a/webdesign/javascript_string_compression.shtm
First, run the LZW compression, this yields compressed data in binary format.
Next then do base-64 encoding on the the compressed binary data. This will yield a text version of the compressed data that you can store in your database.
To restore the contents, do the base-64 decode. Then the LZW decompression.
There are Java libraries to do both. Just search on "LZW compression Java" and on "base-64 encode Java".
It varies heavily on the algorithm and the text.
I'm making my own compression algorithm here, as of writing its not done but it already works extremely well for English plaintext compression. ~50% compression for both small and large messages. It wouldn't be useful to share a code snippet because I'm using experimental dictionary compression, but heres my project: https://github.com/j-stodd/SMOL
I also tried the LZW compression shared by Suirtimed but it doesn't seem to perform that well, it will decrease length but bytes stay mostly the same. Compressing "aaaaaaaa" with LZW will save you only one byte. My algorithm would save you 5 bytes.

Should We Use Long-Name Or Short-Name in JavaScript Coding?

There is a discussion about JavaScript coding in my work group. Some people argues that we should use long-name for better readability; the others believes that short-name should be favored to same bits-on-wire.
Generally, it is about coding convention. One side believes that identifier such as "fAutoAdjustWidth" is OK, while others prefer to "fAtAjtW".
So, what's the better way? Should we sacrifice readability for performance or not?
Make it readable, and if you feel that the resulting JS file is to big, use one of many JS compactors before deploying the production version, while maintaining development version with long names.
BTW. if your really worried about bandwidth, use mod_deflate.
If you're worried about bits on the wire, you could always run a minifier on your code. Then you could develop with long names, and you could release with a much smaller file that
has equivalent functionality. The Yahoo YUI Compressor looks like it does whitespace compression and token compression.
Do these same people advocate not writing comments in their code? Be entirely clear and descriptive with your variable names.
while others prefer to "fAtAjtW"
Even if "bits-on-wire" was an issue (which it's not), a naming convention like this will make the code completely unmaintainable after the first week of working on the project.
Reading the code will get to be near impossible, and when writing the code people will constantly have to think about things like "was 'fAutoAdjustWidth' abbreviated 'fAtAjtW' or was it 'fAutAtW'?". It's a huge mental tax to pay while writing code that will result in far lower productivity.
On top of that, the problem is exacerbated by the fact that in Javascript you will get a new variable for every mistyped name!
Maybe the worry isn't about bits on the wire but the overhead of reading and re/over-viewing the code.
I tend to favor short names inside function and make function names as long as necessary, but as short as possible without loosing useful meaning.
No doubt it is a trade-off. It depends on whether you want your code to resemble natural language or be more implicit and compact.
Some prefix variable names to inject context information into them. I say, if that is necessary, the IDE should provide such injection capabilities as a visual overlay on the code via context symbolics.
The next version of Visual Studio will make such annotation gymnastics much easier via a fine-grained extensibility mechanism extended deep into the editor itself. I have not used Visual Studio for editing Javascript though.
I see now that your concern is indeed the space trade-off. This should never, ever, ever be an issue. Always always always favor readability over bits on the wire, esp. since compression exists, as noted by the other commentators.
The only thing I would add is the above, which is that sometimes comprehension is made easier with compact names over excessively long names. But it is harder to get short names right. Long names are much easier and faster to make right in my experience.
The reason for short names should never be data compression only cognitive efficiency. What works is individual.
Use big variable names because they help the programmer.
To save bits over the wire, minify your Javascript before deploying it to the production server. Dean Edwards' packer has an option to compress variable names, which looks like the best of both worlds for you.
Use the long names which are just enough to well describe variables and functions.
One of the reasons you many need short names is to make the file size smaller, but you can do that by tools while uploading online.
I would strongly advise against using short identifiers. Just reading your example shows how much more documentation is suddenly needed, when names like fAtAjtW are used. At some point it will get pretty much unmaintainable and this just for saving some bytes to transfer.
If the only reason for considering "short" names is to make the resulting script smaller and thus save some bandwidth, I would instead recommend using gzip compression, which will save you way more than a few bytes for an identifier.
One side believes that identifier such as "fAutoAdjustWidth" is OK, while others prefer to "fAtAjtW".
‘fAtAjtW’ is an unreadable, untypeable horror. Seriously, anyone prefers that? Hilarious and impossible to remember — is it ‘AtAjt’, or ‘AutAdj’...?
‘autoAdjustWidth’ would be a suitable full attribute name. (I'm not convinced about the ‘f’ prefix notation at all, but that's another issue.) Sometimes you want a very short name for a short-lived variable (eg. a temporary in a small loop), in which case I'd personally go straight for ‘var aaw’ rather than the above nightmare.
As for performance, there will be no difference. JavaScript doesn't care how long you make variable names, and assuming you're deflating your scripts on the way to the browser the compression will remove any transfer advantage of the shorter names.
Whoever thinks "fAtAjtW" is preferable is using some sort of pharmacalogical method in their programnming. fAutoAdjustWidth is very fine and very prudent. Javascript libraries do not use names like fAtAjtW for a reason. If you are worried about the size then probably your worries are misplaced. However, I recommend using some sort of minifier. However, that said, don't use ridiculously long; probably anything over 25-30 chars is going a bit far.
Use smaller names where it doesn't affect readability of your code. Bigger names are fine, but try only to use them where it really does make something easier for yourself and others to follow. Lastly (and as stated in other answers) minify your code, and / or turn on some kind of server compression mechanism such as apache's mod_gzip or mod_deflate to reduce the number of bits flowing through the wires.
With that said, I would prioritise readability over compactness of variable names.
Long and descriptive names.
And try and make methods as unique as possible. This helps navigability. If you want to find all usages of a particular method, there is less likelihood that you will conflict with another method that has the same name.
Modern Javascript IDEs can also do method refactoring (see : http://blue-walrus.com/2013/08/review-javascript-ides/ ). This is very hard if methods are called the same.

Categories