Why does gmail use eval?

Why does gmail use eval? - javascript

This question suggests that using eval is a bad practice and many other questions suggest that it is 'evil'.
An answer to the question suggests that using eval() could be helpful in one of these cases:
Evaluate code received from a remote server. (Say you want to make a site that can be remotely controlled by sending JavaScript code to it?)
Evaluate user-written code. Without eval, you can't program, for
example, an online editor/REPL.
Creating functions of arbitrary length dynamically (function.length
is readonly, so the only way is using eval).
Loading a script and returning it's value. If your script is, for
example, a self-calling function, and you want to evaluate it and get
it's result (eg: my_result = get_script_result("foo.js")), the only
way of programming the function get_script_result is by using eval
inside it.
Re-creating a function in a different closure.
While looking at the Google Accounts page Source code I've found this:
(function(){eval('var f,g=this,k=void 0,p=Date.now||function(){return+new Date},q=function(a,b,c,d,e){c=a.split("."),d=g,c[0]in d||!d.execScript||d.execScript("var "+c[0]);for(;c.length&&(e=c.shift());) [a lot of code...] q("botguard.bg.prototype.invoke",K.prototype.ha);')})()</script>
I just can't get how is this helpful as it does not match any of the above cases. A comment there says:
/* Anti-spam. Want to say hello? Contact (base64)Ym90Z3VhcmQtY29udGFjdEBnb29nbGUuY29tCg== */
I can't see how eval would be used as anti-spam . Can somebody tell me why is it used in this specific case?

Mike Hearn from plan99.net created anti-bot JS system, and you see parts of its anti-reverse engineering methods (random encryption). There is his letter with mention about it: https://moderncrypto.org/mail-archive/messaging/2014/000780.html
[messaging] Modern anti-spam and E2E crypto
Mike Hearn
Fri Sep 5 08:07:30 PDT 2014
There's a significant amount of magic involved in preventing bulk signups.
As an example, I created a system that randomly generates encrypted
JavaScripts that are designed to resist reverse engineering attempts. These
programs know how to detect automated signup scripts and entirely wiped
them out
http://webcache.googleusercontent.com/search?q=cache:v6Iza2JzJCwJ:www.hackforums.net/archive/index.php/thread-2198360.html+&cd=8&hl=en&ct=clnk&gl=ch
You can google the info about system by its "Ym90Z3VhcmQtY29udGFjdEBnb29nbGUuY29tCg" base64 contact code or by "botguard-contact".
The post http://webcache.googleusercontent.com/search?q=cache:v6Iza2JzJCwJ:www.hackforums.net/archive/index.php/thread-2198360.html+&cd=8&hl=en&ct=clnk&gl=ch says:
The reason for this is being the new protection google introduced a couple of weeks/months ago.
Let me show you a part of the new Botguard ( as google calls it )
Code:
/* Anti-spam. Want to say hello? Contact (base64) Ym90Z3VhcmQtY29udGFjdEBnb29nbGUuY29tCg== */
You will have to crack the algorithm of this javascript, to be able to create VALID tokens that allow you to register a new account.
Google still allows you to create accounts without these tokens, and you wanna know why?
Its because they wait a couple of weeks, follow up the trace you and your stupid bot leave behind and than make a banwave.
ALL accounts you've sold, all accounts your customers created will be banned.
Your software might be able to be able to still create accounts after the banwave, but whats the use?
So, botguard is the optional security measure. It can be correctly computed in browser, but not in some/most javascript engines, used by bots. You can bypass it by not entering correct code, but the created account will be marked as bot-account and it will be disabled soon (and linked accounts will be terminated too).
There are also several epic threads on the GitHub:
https://github.com/assaf/zombie/issues/336
Why does Zombie produce an improper output compared to the more basic contextify version in the following example?
Output varies depending on when document.bg is initialized to new botguard.bg(), because the botguard script mixes in a timestamp salt when encoding.
mikehearn commented on May 21, 2012
Hi there,
I work for Google on signup and login security.
Please do not attempt to automate the Google signup form. This is not a good idea and you are analyzing a system that is specifically designed to stop you.
There are no legitimate use cases for automating this form. If you do so and we detect you, the accounts you create with it will be immediately terminated. Accounts associated with the IPs you use (ie, your personal accounts) may also be terminated.
If you believe you have a legitimate use case, you may be best off exploring other alternatives.
In the https://github.com/jonatkins/ingress-intel-total-conversion/issues/864 thread there are some details:
a contains heavily obfuscated code that starts with this comment:
The code contains a lot of generic stuff: useragent sniffing (yay, Internet Explorer), object type detection, code for listening to mouse/kb events... So it's looks like some generic library. After that there's a lot of cryptic stuff that makes absolutely no sense. The interesting bit is that it calls something labeled as "botguard.bg.prototype.invoke".
Evidently this must be google's botguard. From what I know, It collects data about user behavior on the page and its browser and avaluates it against other know data, this way it can detect anomaly usage and detect bots (kinda like clienBlob in ingress client). My guess would be it's detecting what kind of actions it takes the user to send requests (clicks, map events would be the most sensible)
So, google uses evil eval to fight evil users, which are unable to emulate the evaluated code fast/correctly enough.

eval() is dangerous when used on untrusted input. When used on a hardcoded string, that's not generally the case.

Related

PHP - Filtering user query to prevent all attacks [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
A user submits a search query to my site.
I then take this query and use it in other places, as well as echo'ing it back out to the page.
Right now I'm using htmlspecialchars(); to filter it.
What other steps should I take to prevent XSS, SQL Injection, etc, and things I can't even think of. I want to have all my bases covered.
<?php
$query = $_GET["query"];
$query = htmlspecialchars($query);
?>

Right now I'm using htmlspecialchars(); to filter it.
What other steps should I take to prevent XSS, SQL Injection, etc, and things I can't even think of. I want to have all my bases covered.
To cover all your bases, this depends a lot. The most straight forward (but unsatisfying) answer then probably is: do not accept user input.
And even this may sound easy, it is often not and then forgotten that any input from a different context has to be considered user input. For example when you open a file from the file-system, e.g. reading records from a database or some other data from some other system or service - not only some parameter from the HTTP request or a file upload.
Thinking this through, in context of PHP, this normally also includes the PHP code itself which is often read from disk. Not SQL, just PHP code injection.
So if you really think about the question in such a generally broad way ("etc"), the first thing you need to ensure is you've got a defined process to deploy the application and have checks and marks in place that the files of the deployment can't be tempered with (e.g. a read-only file-system). And from the operational side: You can create and restore the known state of the program within seconds with little or no side effects.
Only after that you should start to worry about other kind of user-input. For which - to complete the answer - you should only accept what is acceptable.
A user submits a search query to my site.
Accepting a search query is the higher art of user input. It involves (free form) text which tends to become more and more complex after every other day and may also include logical operators and instructions which may require parsing which involves even more components that can break and can be exploited by various kind of attacks (and SQL Injection is only one of those, albeit still pretty popular). So plan ahead for it.
As a first level mitigation, you can question if the search is really a feature that is needed. Then if you have decided on that, you should outline which problems it generally creates and you should take a look if the problems are common. That question is important because common questions may already have answers, even common answers. So if a problem is common, it is likely that the problem is already solved. Leaning towards an existing solution then only bears the problem to integrate that solution (that is understanding the problem - you always need to do it and you learn soon enough, one or two decades is normally fine - and then understanding the specific solution as you need to integrate it).
For example:
$query = $_GET["query"];
$query = htmlspecialchars($query);
is making use of variable re-use. This is commonly known to be error prone. Using different variable names that mark the context of its value(s) can help:
$getQuery = $_GET["query"];
$htmlQuery = htmlspecialchars($getQuery);
It is then more visible that $htmlQuery can be used in HTML output to show the search query (or at least was intended for it). Similar to $_GET["query"], it would make totally visible that $getQuery would not be appropriate for HTML output and its string concatenation operations.
In the original example, this would not be equally visible for $query.
It would then perhaps also made visible that in other than HTML output contexts, it ($htmlQuery) is not appropriate either. As your question suggests you already imagine that $getQuery or $htmlQuery is not appropriate to deal with the risks of an SQL Injection for example.
The example is intentionally verbose on the naming, real-life naming schemes are normally different and wouldn't emphasize the type on the variable name that much but would have a concrete type:
try {
...
$query = new Query($_GET["query"]);
...
<?= htmlspecialchars($query) ?>
If you already read up to this point, it may become more clear that there hardly can not be any one-size-fits-it-all function that magically prevents all attacks (apart from muting any kind of user-input which sometimes is equal to deleting the overall software in the first place - which is known to be safe, perhaps most of all for your software users). If you allow me the joke, maybe this is it:
$safeQuery = unset($_GET["query"]); // null
which technically works in PHP, but I hope you get the idea, it's not really meant as an answer to your question.
So now as it is hopefully clear that each input needs to be treated in context of input and output to work, it should give some pointers how and where to look for the data-handling that is of need.
Context is a big word here. One guidance is to take a look if you're dealing with user data (user input) in the input phase of a system or in the output phase.
In the input phase what you normally want to do is to sanitize, to verify the data. E.g. is it correctly encoded? Can the actual value or values the data represents (or is intended to represent) be safely decoded? Can any actual value be obtained from that data? If the encoding is already broken, ensure no further processing of that data is done. This is basically error handling and commonly means to refuse input. In context of a web-application this can mean to close the connection on the TCP transport layer (or not send anything (back) on UDP), to respond with a HTTP Status Code that denotes an error (with or without further, spare details in the response body), with a more user-friendly hypertext message in the response body, or, for a HTML-Form dedicated error messages for the part of the input that was not accepted and for some API in the format that the client can consume for the API protocol to channel out errors with the request input data (the deeper you go, the more complicated).
In the output phase it is a bit different. If you for example identified the user-input being a search query and passed the query (as value) to a search service or system and then get back the results (the reflected user input which still is user input), all this data needs to be correctly encoded to transport all result value(s) back to the user. So for example if you output the search query along with the search results, all this data needs to be passed in the expected format. In context of a web application, the user normally tells with each request what the preferred encoding of the response should be. Lets say this is normally hypertext encoded as HTML. Then all values need to be output in a way/form so that these are properly represented in HTML (and not for some error as HTML, e.g. a search for <marquee> would not cause the whole output to move all over the page - you get the idea).
htmlspecialchars() may do the job here, so might by chance htmlentities(), but which function to use with which parameters highly depends on underlying encoding like HTTP, HTML or character encoding and to which part something belongs in the response (e.g. using htmlspecialchars() on a value that is communicated back with a cookie response header would certainly not lead to intended results).
In the input phase you assert the input is matching your expectations so that you can safely let pass it along into the application or refuse further processing. Only you can know in detail what these requirements are.
In the output phase your job is to ensure that all data is properly encoded and formatted for the overall output to work and the user can safely consume it.
In the input phase you should not try to "fix" issues with the incoming data yourself, instead assume the best and communicate back that there will be no communication - or - what the problem was (note: do not let fool yourself: if this involves output of user input, mind what is important for the output phase of it, there is less risk in just dropping user input and not further process it, e.g. do not reflect it by communicating it back).
This is a bit different for the non-error handling output phase (given the input was acceptable), you err here on the safe side and encode it properly, you may even be fine with filtering the user-data so that it is safe in the output (not as the output which belongs to your overall process, and mind filtering is harder than it looks on first sight).
In short, don't filter input, only let it pass along if it is acceptable (sanitize). Filter input only in/for output if you do not have any other option (it is a fall-back, often gone wrong). Mind that filtering is often much harder and much more error prone incl. opening up to attacks than just refusing the data overall (so there is some truth in the initial joke).
Next to input or output context for the data, there is also the context in use of the values. In your example the search query. How could anyone here on Stackoverflow or any other internet site answer that as it remains completely undefined in your question: A search query. A search query for what? Isn't your question itself in a search for an answer? Taking it as an example, Stackoverflow can take it:
Verify the input is in the form of a question title and its text message that can safely enter their database - it passed that check, which can be verified as your question was published.
With your attempt to enter that query on Stackoverflow, some input validation steps were done prior sending it to the database - while already querying it: Similar questions, is your user valid etc.
As this short example shows, many of the questions for a concrete application (your application, your code) needs not only the basic foundation to work (and therefore do error handling on the protocol level, standard input and output so to say), but also to build on top of it to work technically correct (a database search for existing questions must not be prone to SQL injection, neither on the title, not on the question text, nor must the display of error messages or hints introduce other form of injections).
To come back to your own example, $htmlQuery is not appropriate if you need to encode it as a Javascript string in a response. To encode a value within Javascript as a string you would certainly use a different function, maybe json_encode($string) instead of htmlspecialchars($string).
And for passing the search query to a search service, it may be as well encoded differently, e.g. as XML, JSON or SQL (for which most database drivers offers a nice feature called parameterized queries or more formalized prepared statements which are of great help to handle input and output context more easily - common problems, common solutions).
prevent XSS, SQL Injection, etc, and things I can't even think of. I want to have all my bases covered.
You may already now spot the "error" with this "search query". It's not about the part that there aren't things you or anyone else can even think of. Regardless of how much knowledge you have, there always will be known and unknown unknowns. Next to the just sheer number of mistakes we do encode into software each other day. The one "wrong" perhaps is in thinking that there would be a one-size-fits-it-all solution (even in good intend as things must have been solved already - and truly most have been, but still one needs to learn about them first, so good you ask) and perhaps more important the other one to assume that others are solving your problems: your technical problems perhaps, but your problems you can only solve yourself. And if that sentence may sound hard, take the good side of it: You can solve them. And I write this even I can only give a lengthy answer to your question.
So take any security advice - including the text-wall I just placed here - on Stackoverflow or elsewhere with a grain of salt. Only your own sharp eyes can decide if they are appropriate to cover your bases.
Older PHP Security Poster (via my blog)

Is it safe to save user created javascript in database?

I'm working on a code playground type of application where a user(web developer/designer) can input HTML, CSS and Javascript and view the result on an iframe. The inputted code will be saved in the database (MySQL) and rendered back again in an iframe on a show_results view/action.
Now the question: Is it safe to save javascripts directly in the database? If not, then where/how should I save it?

The database is not going to be your problem here. It's fairly trivial to use prepared statements to allow all kinds of characters to be stored safely in the database. Using anything other than prepared statements to store user input is insufficient, and essentially never recommended.
But you're talking about allowing arbitrary javascript to be executed, which is always going to be a security problem. As a commenter above implies, you're going to be replicating the complexities of jsfiddle.net without the security experience, the development know-how, or the express wish to keep on patching the vulnerabilities that will keep on cropping up.
Certainly you should be aware that what you're doing will completely compromise any domain that you set it up on, so that essentially that javascript should be only written on a throw-away domain or subdomain that you don't use for any other purpose. Of course, it's going to be trivial in such an environment to simply framebreak and pull a viewer off of the site that hosts the frame as well.
I'm sure this just scratches the surface of the potential abuses that arbitrary javascript execution (aka intentional self cross-site-scripting) will bring with it.
Since you're essentially re-inventing a very dangerous wheel with this concept, why not simply use some of the embedding services that already exist out there? codepen.io for example, allows you to embed it's snippets.

Yes it is safe as an architectural decision iff you are executing the javascript on the client side.
On any website you can use tools such as chrome's "inspect element" to manipulate the html, javascript etc on the client. Your system cannot assume that items on the client are not manipulated. This is why server side validation is still so important.
I completely disagree with kzqai.
If this was the case then fiddler would be in serious trouble.
There are potential problems that can be exposed more easily with what you are doing, but those problems already exist and are just obscure.
IFF you are executing javascript on the server side, this is a very complex decision. I would personally avoid it if possible because the game you are playing is that you are able to catch every possible scenario for trouble vs a bad guy being able to catch the 1 you did not.

It is safe as long as you correctly escape certain characters when inserting the value in the SQL statement. For example, if your Javascript code is:
var foo = 'hello world';
Then you will have to escape the single quotes when building the SQL statement:
INSERT INTO snippets (code) VALUES ('var foo = ''hello world'';')
In the statement above, two single quotes ('') are the way to represent just a single quote in a string enclosed by single quotes.
See the link below for further information on escaping characters:
http://dev.mysql.com/doc/refman/5.0/en/string-literals.html
EDIT
As Stephen P correctly points out, if you use prepared statements on the server side code then the framework under the hood will replace those characters for you.

Equivalent of SPContet.Current.ListItem in Client Object Model (ECMAScript)

I'm integrating an external application to SharePoint 2010 by developing custom ribbon tabs, groups, controls and commands that are made available to editors of a SharePoint 2010 site. The ribbon commands use the dialog framework to open dialogs with custom application pages.
In order to pass a number of query string parameters to the custom applications pages, I'm therefore looking for the equivalent of SPContext.Current.ListItem in the Client Object Model (ECMAScript).
Regarding available tokens (i.e. {ListItemId} or {SelectedItemId}) that can be used in the declarative XML, I already emitting all tokens, but unfortunately the desired tokens are not either not parsed or simply null, while in the context of a Publishing Page (i.e. http://domain/pages/page.aspx). Thus, none of the tokes that do render, are of use to establishing the context of the calling SPListItem in the application page.
Looking at the SP.ClientContext.get_current() provides a lot of information about the current SPSite, SPWeb etc. but nothing about the current SPListItem I'm currently positioned at (again, having the page rendered in the context of a Publishing Page).
What I've come up with so far is the idea of passing in the url of the current page (i.e. document.location.href) and parse that in the application page - however, it feels like I'm going in the wrong direction, and SharePoint surely should be able to provide this information.

I'm not sure this is a great answer, or even fully on-topic, but is basically something I originally intended to blog about - anyway:
It is indeed a pain that the Client OM does not seem to provide a method/property with details of the current SPListItem. However, I'd venture to say that this is a simple concept, but actually has quite wide-ranging implications in SharePoint which aren't apparent until you stop to think about it.
Consider:
Although a redirect exists, a discussion post can be surfaced on 2 or 3 different URLs (e.g. Threaded.aspx/Flat.aspx)
Similarly, a blog post can exist on a couple (Post.aspx/EditPost.aspx, maybe one other)
A list item obviously has DispForm.aspx/EditForm.aspx and (sort of) NewForm.aspx
Also for even for items with an associated SPFile (e.g. document, publishing page), consider that these URLs represent the same item:
http://mydomain/sites/someSite/someLib/Forms/DispForm.aspx?ID=x, http://mydomain/sites/someSite/someLib/Filename.aspx
Also, there could be other content types outside of this set which have a similar deal
In our case, we wanted to 'hang' data off internal and external items (e.g. likes, comments). We thought "well everything in SharePoint has a URL, so that could be a sensible way to identify an item". Big mistake, and I'm still kicking myself for falling into it. It's almost like we need some kind of 'normalizeUrl' method in the API if we wanted to use URLs in this way.
Did you ever notice the PageUrlNormalization class in Microsoft.SharePoint.Utilities? Sounds promising doesn't it? Unfortunately that appears to do something which isn't what I describe above - it doesn't work across the variations of content types etc (but does deal with extended web apps, HTTP/HTTPS etc).
To cut a long story short, we decided the best approach was to make the server emit details which allowed us to identify the current SPListItem when passed back to the server (e.g. in an AJAX request). We hide the 'canonical' list item ID in a JavaScript variable or hidden input field (whatever really), and these are evaluated when back at the server to re-obtain the list item. Not as efficient as obtaining everything from context, but for us it's OK because we only need to resolve when the user clicks something, not on every page load. By canonical, I mean:
SiteID|WebID|ListID|ListItemID
IIRC, one of the key objects has a CanonicalId property (or maybe it's internal), which may help you build such a string.
So in terms of using the window.location.href, I'd avoid that if you're in vaguely the same situation as us. Suggest considering an approach similar to the one we used, but do remember that there are some locations (e.g. certain forms) where even on the server SPContext.Current.ListItem is null, despite the fact that SPContext.Current.Web (and possibly SPContext.Current.List) are populated.
In summary - IDs are your friend, URLs are not.

Prevent Javascript games tweaking/hacking

Thanks to the recent browsers enhancements, developing games with canvas and javascript has become a good option, but now that the code is easily accessible, just writing
javascript:score=99999
or
javascript:lives=99
Will spoil the game objectives.
I know that with some server-side checking something can be done, but I would prefer to access the server just to store player stats at the end, or even have it client only in most cases.
I wonder if at least the are some best pratices to start with.
(using not so obvious variables names is a start, but not enough)
-Added-
Thanks for the replies, I was looking to improve the client-side code, enough to stop "casual hackers", but still leaving the code as clean as possible.
Anyone that really wants to hack it will succeed anyway, even with server-side checks, as I've seen it in many flash games.

I'll say what I said at my comment: put every source code in (function(){ }());. Then, the variables and functions can't be accessed from outside.
Example:
(function(){
var a = 'Foo';
var b = 42;
function helloWorld(a,b){
for(i=0;i<b;i++)console.log(a);
}
helloWorld(a,b);
});
//Can't access to a, b, or helloWorld using javascript: or default console of Google Chrome,
//but people still can see by looking source code and may be modified by other tools
//(see comments of Tom & user120242)
I 'learned' this technique this when I dig into Les Paul Google Doodle.
To be more secure (not completely secure, but it'll annoy some hackers), compress and obfuscate your script by tools something like YUI compressor or packer.

One way is to send a record of every move to the server as well, then to verify that those moves would have got that score.
That's easy for games like solitaire or chess, but not really for more complex games.
A simpler version of that is to work out the max points that could be obtained per second, or per move, then to verify that the score isn't higher than your theoretical maximum.
Another way is for each move to be recorded on the server, and to total up the score there. That means there is no send at the end of the game, and that those variables are only for display, not the real score.
Offline games could be starred on the highscore table or something to show they aren't verified.
It's worth pointing out that with any javascript debugger, such as the Inspector in Webkit, Firebug for Firefox or Dragonfly on Opera it's trivial to change the value of variables on the client side, even if your code is in a closure. Any form of obfuscation is pointless, as again it's easy to watch which variable corresponds to the score as the game is played, and any encoding or whatever can simply be read out of the code.

In order of preference:
Send player moves or statistics to server. Prevent strange behavior.
eg: Score too high, invalid actions, actions that cannot be replayed, etc
Prevent strange behavior on client-side. Same as above but not on server. eg: sudden lives changed, moving too fast, etc
Create obfuscated JS output (which you should be doing to reduce JS size anyways) eg: GWT (Java to JS compiler), Google Closure Compiler (ADVANCED_OPTIMIZATIONS will obfuscate more, --output-wrapper (function(){%output%})() to wrap in closure), Yahoo Compressor
Obfuscate variable values eg: Encode strings (xor, substitution, BASE64), don't use normal variable increments
Use a closure to encapsulate variable names: (function(){code here})()
EDIT: I want to make clear that the best solution is still to move calculations to the server, as Rich Bradshaw had said. These things can only do so much, even after you obfuscate the code.
Here's a link that also applies to your Javascript game, and I think is probably the best possible answer to your question: What is the best way to stop people hacking the PHP-based highscore table of a Flash game
The most important idea to get from that link is:
The objective isn't to stop this
attack; it's to make the attack more
expensive than just getting really
good at the game and beating it.

Threat model document

I am in the process of writing a threat model document for one of the applications which I am hosting.Its a Apache website which allows users to login, create their widgets by adding some best selling products etc. Can someone give me pointers on how to start with this ?
Frontend uses javascript + perl, backend is cpp. We do accept sensitive information from the user like his name, SSN etc and we do a store session-id
What are some of the ways I can identify security holes in my application ? How should I start with this ?
What are some of the areas which should be part of the document ?
what are some of the threats like DoS etc. which I should read about ?

Ask as many people as you can find to think about ways to break the system. It's very likely that they'll think of things you won't. Thinking outside the box is crucial.
A proper threat tree analysis starts with a series of bad outcomes ("sensitive data leaked", "servers hijacked to host malware/send spam/be part of botnet/whatever", "company defrauded by use of stolen credit card details", and you can hopefully think of more) and works backwards: what would be necessary for that to happen? Often you'll find that each bad outcome will have several required enabling events - a causal chain - and by comparing them you can identify weak spots and plan your defence in depth.

This might not help in building the threat model document, but the OWASP howto might help you in validating the design of the application against the industry best-practice.

I'm no expert in security, but here are my two cents.
1) You can safely regard javascript as completely insecure, as you don't really control its execution.
2) So, the perl part is the first line of defence. Read perldoc perlsec for starter.
Perl code containing eval, backticks, system, and open should be inspected (always use tree-argument open, just to be sure).
Also code that lacks strict/warnings should be reviewed and, ideally, rewritten.
Any input that is not checked thoroughly for validity is suspicious. Of course, no unprocessed input (except for user's files that are only stored by the system) should ever reach your back-end.
3) From my recent experience:
we had JSON deserialization based on feeding the input to a regexp and then eval'ing it. I've managed to pass perl code through. FAIL.
we had a chunk of code that was obscure, strictless, lacked any comments, and relied on certain behaviour of external programs that forced us to use outdated ssh version. FAIL. (I admit to failing to rewrite it).
we had open (FD, "$file");. While leading /'s and ..'s were removed from $file, apparently it wasn't checked for the pipe symbol (|). A carefully crafted command could be supplied instead of a file name. FAIL. Always use three-argument open. Same goes for system/exec: only #array variant is OK, don't rely on stupid ls '$file'.
I would appreciate additions and/or corrections from other people.

For your methodology of Threat Modeling, check out MyAppSecurity's ThreatModeler. Pretty easy to visualize your application from a high level architecture diagram and identify potential threats as well as find remediating controls in terms of secure code and security controls.
Cheers

Disclaimer:
I am neither a security expert, nor a compliance expert, nor a lawyer. Do not take this advice at face value. You should seek expert advice when dealing with confidential information.
Compliance and regulations.
I really cannot sum it up for you, please have a read:
http://en.wikipedia.org/wiki/Information_privacy_law
United States : FISMA and FIPS
( Including but not limited to... )
There are standards and laws
http://en.wikipedia.org/wiki/Federal_Information_Security_Management_Act_of_2002
http://en.wikipedia.org/wiki/Federal_Information_Processing_Standards
FIPS 199: http://csrc.nist.gov/publications/fips/fips199/FIPS-PUB-199-final.pdf
FIPS 200: http://csrc.nist.gov/publications/fips/fips200/FIPS-200-final-march.pdf
Back to the question...
We do accept sensitive information from the user like his name, SSN etc
FIPS 199 and 200 will give you good starting points for evaluating what needs to be done.
What are some of the ways I can identify security holes in my application?
Pay experts for reviewing your strategy.
Pay experts to do pen-testing
Pay hackers for responsible disclosure.
Look at the Common Vulnerabilities and Exposures (CVE) database: https://cve.mitre.org/
Look at the exploit database: http://www.exploit-db.com/
e.g. for perl... https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=perl
How should I start with this ?
Start with this definition of Information Governance (IG):
http://searchcompliance.techtarget.com/definition/information-governance
Assess how the info is used and where.
Write penetration tests for your own software using relevant info from the CVE / exploit database.
What are some of the areas which should be part of the document ?
I find that using a system architecture diagram is helpful in identifying what parts to test independently and isolate; and which boundaries to secure.
If you have looked and the previous section, you should have a good idea about what you could put in the document.
what are some of the threats like DoS etc. which I should read about ?
These are listed in the CVE / Exploit databases.

We Keep Coding

JavaScript is the programming language of the Web.