I'm working on building a development tool that is written in JavaScript.
This will not be an open source project and will be sold (hopefully) as a commercial product.
I'm looking for the best way to protect my investment. Is using an obfuscator (code mangler) enough to reasonably secure the code?
Are there other alternatives that I am not aware of?
(I'm not sure if obfuscator is the right word, it's one of the apps that takes your code and makes it very unreadable.)
I'm going to tell you a secret. Once you understand it, you'll feel a lot better about the fact that Javascript obfuscation is only really useful for saving bandwidth when sending scripts over the wire.
Your source-code is not worth stealing.
I know this comes as a shock to the ego, but I can say this confidently without ever having seen a line of code you've written because outside the very few realms of development where serious magic happens, it's true of all source-code.
Say, tomorrow, someone dumped a pile of DVDs on your doorstep containing the source code for Windows Vista. What would you be able to do with it? Sure, you could compile it and give away copies, but that's just one step more effort than copying the retail version. You could painstakingly find and remove the license-checking code, but that's something some bright kid has already done to the binaries. Replace the logo and graphics, pretend you wrote it yourself and market it as "Vicrosoft Mista"? You'll get caught.
You could spend an enormous amount of time reading the code, trying to understand it and truly "stealing the intellectual property" that Microsoft invested in developing the product. But you'd be disappointed. You'd find the code was a long series of mundane decisions, made one after the other. Some would be smarter than you could think of. Some would leave you shaking your head wondering what kind of monkeys they're hiring over there. Most would just make you shrug and say "yeah, that's how you do that."
In the process you'll learn a lot about writing operating systems, but that's not going to hurt Microsoft.
Replace "Vista" with "Leopard" and the above paragraphs don't change one bit. It's not Microsoft, it's software. Half the people on this site could probably develop a Stack Overflow clone, with or without looking at the source of this site. They just haven't. The source-code of Firefox and WebKit are out there for anyone to read. Now go write your own browser from scratch. See you in a few years.
Software development is an investment of time. It's utter hubris to imagine that what you're doing is so special that nobody could clone it without looking at your source, or even that it would make their job that much easier without an actionable (and easily detectable) amount of cut and paste.
I deeply disagree with most answers above.
It's true that every software can be stolen despite of obfuscation but, at least, it makes harder to extract and reuse individual parts of the software and that is the point.
Maybe it's cheaper and less risky to use an obfuscation than leaving the code open and fighting at court after somebody stole the best parts of our software and made dangerous concurrency.
Unobfuscated code whispers:
Come on, analyze me, reuse me. Maybe you could make a better software using me.
Obfuscated code says:
Go away dude. It's cheaper to use your own ideas than trying to crack me.
You are going to be fighting a losing battle if you try to obfuscate your code in the hopes of someone not stealing it. You may stop the casual browser from getting at it, but someone dedicated would almost certainly be able to overcome any measure you use.
In the past I have seen people do several things:
Paste a lot of whitespace at the top of the page with a message telling people that the code is unavailable, when in actuality you just need to scroll down a few pages to get at it.
Running it through an encoder of some kind, this is so so useful as it can just be run through the decoder.
Another method is to reduce variable names to one character and remove whitespace (this is also an efficiency thing).
There are many other methods.
In the end, your efforts are only likely to stop the casual browser from seeing your stuff. If someone dedicated comes along then there is not much you will be able to do. You will have to live with this.
My advice would be to make a really awesome product that attracts the most people and beat off any competition by having the best product/service/community and not the most obfuscated code.
You're always faced with the fact that any user that comes to your webpage will download some working version of your Javascript source. They will have the source code. Obfuscating it may make it very difficult to be reused by someone with the intent to steal your hard work. However, in many cases someone can even reuse the obfuscated source! Or in the worst case they can unravel it by hand and eventually comprehend it.
An example of a situation like yours might be Google Maps. The Javascript source is clearly obfuscated. However, for really private/sensitive logic they push the data to the server and have the server process that information using XMLHttpRequests (AJAX). With this design you have the important parts on the server side, much more tightly controlled.
That's probably about the best you can do. Just be aware that anybody with enough dedication, can probably de-obfuscate your program. Just make sure you're comfortable with that before embarking on your project. I think the biggest problem with this would be to control who's using it on their site. If somebody goes to a site with your code on it, and likes what it does, it doesn't matter that they don't understand what the code does, or can't read it, when they can just copy the code, and use it on their own site.
A obfuscator won't help you at all if someone wants to figure out the code. The code still exists on the client machine and they can grab a copy of it and study it at their leisure.
There is simply no way to hide code written in Javascript since the source code has to be handed to the browser for execution.
If you want to hide your code, you have the following options:
1) Use an environment where compiled code (not source) is downloaded to the client, e.g. Flash or Silverlight. I'm not even sure that's foolproof, but it's certainly much better than Javascript.
2) Have a back end on the server side that does the work and a thin client that just makes requests to the server.
I'd say yes, it's enough if you also make sure than you compress the code as well using a tool like Dean Edward's Packer or similar. If you think about what is possible with tools like .NET Reflector in terms of reverse engineering compiled code / IL in .NET, you realize that there's nothing you can do to completely protect your investment.
On the other hand, remember that folks who release their source code also seem to make do quite nicely anyway - it's their experience that people want more than their intellectual property.
code obfuscator is enough for something that needs minimal protection, but I think it will definitely not enough to really protect you. if you are patient you can realy de-mangle the whole thing.. and i'm sure there are programs to do it for you.
That being said, you can't stop anyone from pirating your stuff because they'll eventually will break any kind of protection you create anyway. and it is espcially easy in scripted language where the code is not compiled.
If you are using some other language, maybe java or .NET, You can try doing things like "calling home" to verify that a license number matches a given url. Which works if you your app is some sort of online app that is going to be connected online all the time. But having access to the source, people can easily bypass that part.
In short, javascript is a poor choice for what you are doing.
A step up from what you are doing is maybe using a webservice backend to get your data. Let the webservice handle the authentication/verification process. Requires a bit of work to make sure it is bulletproof, but it might work
If this is for a website, which by its very nature puts viewing of its code one menu click away, is there really any reason to hide anything? If someone wants to steal your code they will most likely go through the effort of making even the most mangled code human readable. Look at commercial websites, they don't obfuscate their code, and no one goes out and steals code from the google apps. If you are really worried about code theft, I would argue for writing it in some other compiled language. (which does of course destroy the whole webapp thing...) Even then, you aren't totally safe, there are many de-compilers out there.
So really, there is no way to do what you want in the face of anyone with sufficient motivation.
Related
This is a generic question
I've seen javascript on some websites which is obfuscated
When you try to deobfuscate the code using standard deobfuscators (deobfuscatejavascript.com, jsnice.org and jsbeautifier.org)
, the code is not easily deobfuscated
I know it's practically impossible to avoid deobfuscation. I want to make it really tough for an attacker to deobfuscate it
Please suggest some ways I can acheive this
Should I write my own obfuscator, then obfuscate the output with another online obfuscator. Will this beat it?
Thanks in advance
P.S: I tried google closure compiler, uglifyjs, js-obfuscator and a bunch of other tools. None of them (used individually or in combination) are able to beat the deobfuscators
Obfuscation can be accomplished at several levels of sophistication.
Most available obfuscators scramble (shrink?) identifiers and remove whitespace. Prettyprinting the code can restore nice indentation; sweat and lots of guesses can restore sensible identifier names with enough effort. So people say this is weak obfuscation. They're right; sometimes it is enough.
[Encryption is not obfuscation; it is trivially reversed].
But one can obfuscate code in more complex ways. In particular, one can take advantage of the Turing Tarpit and the fact that reasoning about the obfuscated program can be hard/impossible in practice. One can do this by scrambling the control flow and injecting opaque control-flow control predicates that are Turing-hard to reason about; you can construct such predicates in a variety of ways. For example, including tests based on constructing artificial pointer-aliasing (or array subscripting, which is equivalent) problems of the form of "*p==*q" for p and q being pointers computed from messy complicated graph data structures.
Such obfuscated programs are much harder to reverse engineer because they build on problems that are Turing hard to solve.
Here's an example paper that talks about scrambling control flow. Here's a survey on control flow scrambling, including opaque predicates.
What OP wants is an obfuscator that operates at this more complex level. These are available for Java and C#, I believe, because building program analyzers to determine (and harness) control flow is relatively easy once you have a byte code representation of the program rather than just its text. They are not so available for other languages. Probably just a matter of time.
(Full disclosure: my company builds the simpler kind of obfuscators. We think about the fancier ones occasionally but get distracted by shiny objects a lot).
The public de-obfuscators listed by you use not much more than a simple eval() followed by a beautifier to de-obfuscate the code. This might need several runs. It works because the majority of obfuscators do their thing and add a function at the end to de-obfuscate it enough to allow the engine to run it. It is a simple character replacement (a kind of a Cesar cipher) in most cases and an eval() is enough to get some code, made more or less readable by a beautifier after that.
To answer your question: you can make it tougher ("tougher" in the sense that just c&p'ing it into a de-obfuscator doesn't work anymore) by using some kind of "encryption" that uses a password the the code gets from the server after the first round of de-obfuscation and uses a relative path that the browser completes instead of a full path. That would need manual intervention. Build that path in a complicated and non-obvious way and you have a deterrent for the average script-kiddie.
In general: you need something to de-obfuscate the script that is not in the script itself.
But beware: it does only answer your question, that is, it makes it impossible to de-obfuscate by simple c&p into one of those public de-obfuscators and not more. See Ira's answer for the more complex stuff.
Please be aware of the reasons to obfuscate code:
hide malicious intent/content
hide stolen code
hide bad code
a pointy haired boss/investor
other (I know what that is, but I am too polite to say)
Now, what do the people think, if they see your obfuscated code? That your investor insisted on it to give you money to write that little browser game everyone loves so much?
JavaScript is interpreted from clear text by your browser. If a browser can do it, so can you. It's the nature of the beast. There are plenty of other programming languages out there that allow you to compile/black box before distribution. If you are hell-bent on protecting your intellectual property, compile the server side data providers that your JavaScript uses.
No JavaScript obfuscation or protection can say it makes it impossible to reverse a piece of code. That being said there are tools that offer a very simple obfuscation that is easy to reverse and others that actually turn your JavaScript into something that is extremely hard and unfeasible to reverse. The most advanced product I know that actually protects your code is Jscrambler. They have the strongest obfuscation techniques and they add code locks and anti-debugging features that turn the process of retrieving your code into complete hell. I've used it to protect my apps and it works, it's worth checking out IMO
I'm working on a Software as a Service site which we will use backbone primarily, but what I'm noticing is most of the logic for the application is lying on backbone. While we use ruby mostly as just a session controller and a bridge to the database it seems. So our site is very susceptible to being copied. (just a matter of copying the js files...)
I know this may be a dumb question but, is it anyway I can avoid this or would have a client side heavy application like this be bad for this type of application?
I'm not sure on how I can secure this site structure at this point.
Sure it can be copied, that is a risk you take with JavaScript. You have the same problem with your markup and your CSS as well, but I'd say you rarely see someone stealing it anyway. There is probably more to your service than just your code (your design, your copy, your business model, your customer support). Even if they did copy your code, you will probably be able to deliver a better service than them anyway, since your are devoted to your product, which they clearly are not.
Another way of looking at the whole thing is to see it as the beauty of web development. You are free to open up the code of any web page and learn from it.
If you still want to "protect" your code, your best shot is probably to use something like UglifyJS or similar, to minimize and obfuscate your code. Sure the "thief" could then use a prettyfier to get indentation etc. back, but the code will still be obscure and practically impossible to maintain. So it would probably not be worth the job of stealing it in the long run.
Protecting your javascript libraries is hard because you let your clients download them. The best thing you can do to protect them is to run a obfuscation and minification tool on them before you deploy them into production.
I'm considering to try out an idea, mostly for fun, and my question is if this is reasonable and if there are any libraries or frameworks that could make this experiment a little easier.
So, the idea: Basically it is to write a new UI for a website I've developed, but doing it with client-side code only. I can read/write data using ajax, since my existing website has an API that allows me to perform all kinds of queries. This allows me to use JavaScript for the whole thing and theoretically put all of the code in a single file.
Obviously there are limitations to circumvent; bookmarking, page refreshing, the back-button etc. But these limitations are what makes it interesting, right? :) I'm not so worried about search engine indexing, since one has to be logged in to use the site anyway.
The site itself is not overly complex, but it is not simple either. There are four different levels of users, multiple languages and quite a lot of data to be presented.
Is this a bad idea? If so, why would you advice against it? And do you know of any JavaScript frameworks or libraries that could make this easier? (And no, I'm not looking for an abstraction like Google Web Toolkit; I would like something purely JavaScript)
One of my coworkers did this. A nice feature of this concept was that you don't have a ton of POSTS whenever the user 'changes pages', since they are actually not ever changing a page until they submit their data for the final time. He did this for product registration software, which was nice. Our servers were only taking a hit when the user initially requested the page, and then when they submitted it.
The major, MAJOR downside to this concept is that most web developers are not expecting this. My coworker (and you) have a cool idea - but unless it is well implemented, with comments, 100% valid HTML, and a host of other good design principles in place, it can be confusing since most web developers have basically never seen this done before. His site was a nightmare to work with, as he did not actually know what engineering web software meant, and it was all slapped together. My organization never pursued this (potentially useful) idea because his implementation was so poor.
So, when I looked at this idea here were the trade-off I came up with:
1.) You cannot require any server-side interaction during intermediate pages.
2.) The initial page loading is longer, but there are no intermediate page requests (better optimization).
3.) This is vastly different than anything anyone usually does, which means you need to be especially careful with documentation.
4.) This design concept facilitates totally stand-alone web software to be easily deployed without the web.
5.) You might be increasing complexity for avoiding page loads, but maybe not. I'm not sure.
All together, I think it just depends on what you want to accomplish. My coworker really just wanted to see if he could do it, which he could. However, how he did it was really pretty bad, to the point where everyone else connected his poor implementation with a poor idea it was fairly sad.
Mostly, I think if you follow good web design practices this wouldn't be too bad of a thing to pursue. What are your goals though?
I'm sorry I couldn't directly answer all of your questions. I hope my experiences are still helpful in answering if I think it is a bad idea or not.
-Brian J. Stinar-
SproutCore is one of the best out there when it comes to really being built from the ground up for single-page, potentially complex applications. Unlike some others like GWT or Cappucino, SproutCore is really about using JavaScript directly. They aren't the only one though. You might also want to look at JavaScriptMVC and qooxdoo.
Personally, I've built an extremely large and complex single-page application using JavaScript. It is currently around 100,000 lines (including comments/whitespace). To give a sense of scale, jQuery is around 6,000. To reach that size, I built my own framework, build tools etc. and it is extremely maintainable. You're asking if it can work, and I can tell you that it does, but you do need a little infrastructure if you're looking at doing something big. (BTW, there's a lot of lazy loading - its not 100,000 lines all at once!)
I also wouldn't really recommend this method for anything other than a web application. As pointed out, its still difficult for SEO, and may be strange for some users.
You're looking for ExtJS
I did my site like that in jquery, backed by spring mvc 3. For me it works fine and you have a lightweight, flashy feling while on it.
If you have time to dig in javascript seriously... it is really good exercise.
The only thing I really need to add is SEO, since spiders see only intial html. If it is important for you, i think backend static files and sitemap would be good enough solution.
I found something very light-weight that provides the basic needed functionally without forcing you into a whole framework. It's called Sammy and is inspired by Ruby's Sinatra: http://code.quirkey.com/sammy/
Very recommended!
Alternative: code the same in Java. Take a look to ItsNat
Think in JavaScript but code the same in Java in server with the same DOM APIs, in server is way easier to manage your application without custom client/bridges because UI and data are together.
Regarding SEO, bookmarking etc in single page there are solutions, take a look to the Single Page Interface Manifesto
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I need to write a GUI related javascript library. It will give my website a bit of an edge (in terms of functionality I can offer) - up until my competitors play with it long enough to figure out how to write it by themselves (or finally hack the downloaded script). I can accept the fact that it will be emulated over time - thats par for the course (its part of business). I just want to have a few months breathing space where people go "Wow - how the f*** did they do that?" - which gives me a few months of free publicity and some momentum to move onto other things.
To be clear, I am not even concerned about hard core hackers who will still hack the source - thats a losing battle not worth fighting (and in any case I accept that my code is not "so precious"). However, what I cannot bear, is the idea of effectively, simply handing over all the hard work that would have gone into the library to my competitors, by using plain javascript that anyone can download and use. If someone is going to use what I have worked on, then I sure as hell don't want to simply hand it over to them - I want them to work hard at decoding it. If they can decode it, they deserve to have the code (they'll most likely find out they could have written better code themselves - they just didn't have the business sense to put all the [plain vanilla] components in that particular order) - So, I'm not claiming that no one could have written this (which would be a preposterous claim in any case) - but rather, what I am saying is that no one (up to now) has made the functionality I am talking about, available to this particular industry - and I (thinking as an entrepreneur rather than a geek/coder), want to milk it for all its worth, while it lasts i.e until it (inevitably) gets hacked.
It is an established fact that not one website in the industry I am "attacking" has this functionality, so the value of such a library is undeniable and is not up for discussion (i.e. thats not what I'm asking here).
What I am seeking to find out are the pros and cons of obfuscating a javascript library, so that I can come to a final decision.
Two of my biggest concerns are debugging, and subtle errors that may be introduced by the obfuscator.
I would like to know:
How can I manage those risks (being able to debug faulty code, ensuring/minimizing against obfuscation errors)
Are there any good quality industry standard obfuscators you can recommend (preferably something you use yourself).
What are your experiences of using obfuscated code in a production environment?
If they can decode it, they deserve to have the code (they'll most likely find out they could have written better code themselves - they just didn't have the business sense to put all the [plain vanilla] components in that particular order).
So really, you're trying to solve a business issue with technical measures.
Anybody worth his salt as a Javascript programmer should be able to recreate whatever you do pretty easily by just looking at the product itself, no code needed. It's not like you're inventing some new magical thing never seen before, you're just putting pieces together in a new way, as you admit yourself. It's just Javascript.
Even if you obfuscate the script, it'll still run as-is, competitors could just take it and run with it. A few customizations shouldn't be too hard even with obfuscated code.
In your niche business, you'll probably notice pretty quickly if somebody "stole" your script. If that happens, it's a legal issue. If your competitors want to be in the clear legally, they'll have to rewrite the script from scratch anyway, which will automatically buy you some time.
If your competitors are not technically able to copy your product without outright stealing the code, it won't make a difference whether the code is in the clear or obfuscated.
While you can go down the long, perilous road of obfuscators, you generally don't see them used on real, production applications for the simple reason that they don't really do much. You'll notice that Google apps, which is really a whole heap of proprietary and very valuable JavaScript when you get down to it, is only really minimized and not obfuscated, though the way minimizers work now, they are as good as obfuscated. You really need to know what you're doing to extract the meaning from them, but the determined ones will succeed.
The other problem is that obfuscated code must work, and if it works, people can just rip it wholesale, not understanding much of it, and use it as they see fit in that form. Sure, they can't modify it directly, but it isn't hard to layer on some patches that re-implement parts they don't like without having to get in too deep. That is simply the nature of JavaScript.
The reason Google and the like aren't suffering from a rash of cut-and-paste competitors is because the JavaScript is only part of the package. In order to have any degree of control over how and where these things are used, a large component needs to be server-based. The good news is you can leverage things like Node.js to make it fairly easy to split client and server code without having to re-implement parts in a completely different language.
What you might want to investigate is not so much obfuscating, but splitting up your application into parts that can be loaded on-demand from some kind of service, and as these parts can be highly inter-dependent and mostly non-functional without this core server, you can have a larger degree of control over when and where this library is used.
You can see elements of this in how Google is moving to a meta-library which simply serves as a loader for their other libraries. This is a step towards unifying the load calls for Google Apps, Google AdSense, Google Maps, Google Adwords and so forth.
If you wanted to be a little clever, you can be like Google Maps and add a poison pill your JavaScript libraries as they are served dynamically so that they only operate in a particular subdomain. This requires generating them on an as-needed basis, and while it can always be removed with sufficient expertise, it prevents wholesale copy-paste usage of your JavaScript files. To insert a clever call that validates document.href is not hard, and to find all these instances in an aggressively minimized file would be especially infuriating and probably not worth the effort.
Javascript obfuscation facts:
No one can offer a 100% crack free javascript obfuscation. This means that with time and knowledge every obfuscation can be "undone".
Minify != obfuscation: When you minify your objective is: reduce code size. Minified code looks completly different and its much more complex to read (hint:jsbeautifier.com). Obfucation has a completly different objective: to protect the code. The transformations used try to protect Obfuscated code from debugging and eavesdropping. Obfuscation can even produce a even bigger version of the original code which is completely contrary to the objectives of minification.
Obfuscation != encryption - This one is obvious but its common mistake people make.
Obfuscation should make debugging much much harder, its one of it objectives. So if it is done correctly you can expect to loose a lot of time trying to debug obfuscated code.That said, if it is done correctly the introduction of new errors is a rare issue and you can easily find if it is an obfuscation error by temporarily replacing the code with non obfuscated code.
Obfuscation is NOT a waste of time - Its a tool. If used correctly you can make others waste lots of time ;)
Javascript obfuscation fiction: ( I will skip this section ;) )
Answer to Q2 - Sugested obfuscation tools:
For an extensive list of javascript obfuscator: malwareguru.org. My personal choice is jscrambler.com.
Answer to Q3 - experiences of using obfuscated code
To date no new bugs introduced by obfuscation
Much better client retention. They must come to the source to get the source;)
Occasional false positives reported by some anti-virus tools. Can be tested before deploying any new code using a tool like Virustotal.com
Standard answer to obfuscation questions: Is using an obfuscator enough to secure my JavaScript code?
IMO, it's a waste of time. If the competitors can understand your code in the clear (assuming it's anything over a few thousand lines...), they should have no trouble deobfuscating it.
How can I manage those risks (being
able to debug faulty code,
ensuring/minimizing against
obfuscation errors)
Obfuscation will cause more bugs, you can manage them by spending the time to debug them. It's up to the person who wrote the obfuscation (be it you or someone else), ultimately it will just waste lots of time.
What are your experiences of using
obfuscated code in a production
environment?
Being completely bypassed by side channel attacks, replay attacks, etc.
Bugs.
Google's Closure Complier obfuscates your code after you finish writing it. That is, write your code, run it through the compiler, and publish the optimized (and obfuscated) js.
You do need to be careful if your using external js that interfaces with the lib though because it changes the names of your objects so you can't tell what is what.
Automatic full-code obfuscation is so far only available in the Closure Compiler's Advanced mode.
Code compiled with Closure Advanced mode is almost impossible to reverse-engineer, even passing through a beautifier, as the entire code base (includinhg the library) is obfuscated. It is also 25% small on average.
JavaScript code that is merely minified (YUI Compressor, Uglify etc.) is easy to reverse-engineer after passing through a beautifier.
If you use a JavaScript library, consider Dojo Toolkit which is compatible (after minor modifications) with the Closure Compiler's Advanced mode compilation.
http://dojo-toolkit.33424.n3.nabble.com/file/n2636749/Using_the_Dojo_Toolkit_with_the_Closure_Compiler.pdf?by-user=t
You could adopt an open-source business model and license your scripts with the GPL or Creative Commons BY-NC-ND or similar
While obfuscation in general is a bad thing, IMHO, with Javascript, the story is a little different. The idea is not to obfuscate the Javascript itself but to produce shorter code length (bandwidth is expensive, and that first-time users may just be pissed off waiting for your Javascript to load the first time). Initially called minification (with programs such as minify), it has evolved quite a bit and now a full compiler is available, such as YUI compiler and Google Closure Compiler. Such compiler performs static checking (which is a good thing, but only if you follow the rules of the compiler), minification (replace that long variable name with 'ab' for example), and many other optimization techniques. At the end, what you got is the best of both worlds, coding in non-compiled code, and deploying compiled (, minified, and obfuscated) code. Unfortunately, you would of course need to test it more extensively as well.
The truth is obfuscator or not, any programmer worth his salt could reproduce whatever it is you did in about as much time as it took you. If they stole what you did you could sue them. So bottom line from the business point of view is that you have, from the moment you publish, roughly the same amount of time it took you to implement your design until a competitor catches up. Period. That's all the head start you get. The rest is you innovating faster than your competitors and marketing it at least as well as they do.
Write your web site in flash, or better yet in Silverlight. This will give your company unmatched GUI, that your competitors will be salivating about. But compiled nature of flash/dotnet will not allow them easily pick into your code. It's a win/win situation for you ;)
I was wondering how I can detect code plagiarism with Javascript. I want to test assignment submissions for homework I'm going to hand out.
I've looked at using MOSS, but—from what I've heard—it's pretty poor for anything other than C. Unfortunately, I can't test it yet because I don't have submissions.
How can I go about detecting code plagiarism with JavaScript?
They claim that MOSS works on Javascript. Why don't you just try it. Write a Javascript file, then modify it, like a cheater would modify somebody elses code and feed it to MOSS to see what it says?
I build Clone detection tools, that find similar blocks of code across files.
See CloneDR overview
and example reports. CloneDR works for a wide variety of languages, and uses
the langauge structure to makethe clone detection efficient and effective.
As per yar's comment pasting chunks of javascript into Google will work pretty well - but is stopping them cheating realistic?
Could you split the task into two parts, the first part allowing them to 'cheat' if they want to but tell them that there will be a second part of the task in class. Then have the class do exactly the same task in supervised class time.
If everyone has 'cheated' first time that's one thing. But if anyone is unable to redo their homework in class then they a) cheated (which is bad enough) and b) learnt nothing (which is worse).
Using the internet to 'research' is always going to happen - but its the ones who forget their 'research' that are cheating both you and themselves.
I wouldn't go out of my way to try and run through a plagiarism checker.
Code is code and bad code is bad code. People who can't code (those who are more likely to copy/paste code**) generally don't have good code. Difficulties (and questionable approaches around them) will be easily detectable if you even take a few seconds to check the source. Something just won't match up and it should smack you in the face.
**I would argue that adapted code isn't plagiarized unless it violates the authors distribution intent (e.g. violates copyright or license) and would encourage the students to simply document which existing resources, if any, they used as a base and/or incorporated as well as to encourage them to understand and adapt the code to fit their needs (and to make it better, so much code out there is soup). I do this all the time for "real programming work". Of course, it's not my curriculum :-)