escape exactly what in javascript - javascript

Being a newbie in javascript I came to a situation where I need more information on escaping characters in a string.
Basically I know that in order to escape " I need to replace it with \" but what I don't know is for which characters I need to escape a particular string for. Is there a list of these "characters to escape"? or is it any character that is not a-zA-Z0-9 ?
In my situation, I don't have control over the content that is being displayed on my page. Users enter some text and save it. I then use a webservice to extract them from the database, build a json array of objects, then iterate the array when I need to display them. In this case, I have - naturally - no idea of what the text the user has entered and therefore for what characters I need to escape. I also use jQuery for this specific project (just in case it has a function I am not aware of, to do what I need)
Providing examples would be appreciated but I also want to learn the theory and logic behind it.
Hope someone can be of any help.

There's no need to escape everything that's not a-zA-Z0-9, take a look at this example:
http://www.c-point.com/javascript_tutorial/special_characters.htm
You may also want to check out this site which holds information about escaping string, especially URLs, etc. etc.
http://www.the-art-of-web.com/javascript/escape/

Related

Breeze JS: Wildcard in Where

Is it possible to somehow use wildcard characters (* or ?) in breeze queries?
For example: I have a Search-Input-Field where I want people to be able to enter these characters so they can search for german names that have umlauts. Example M*ller for Müller or Mueller or Muller.
I already tried % since I hoped that the where-predicate (contains) would get translated to an SQL-LIKE-Statement.
The next thing I would do if nothing helps is to split the string and create different where-predicates that are and/or connected but I'm still hoping for a better solution.

Should I worry that using GET in a form element doesn't automatically URL-encode angle brackets?

So I decided to use GET in my form element, point it to my cshtml page, and found (as expected) that it automatically URL encodes any passed form values.
I then, however, decided to test if it encodes angle brackets and surprisingly found that it did not when the WebMatrix validator threw a server error warning me about a potentially dangerous value being passed.
I said to myself, "Okay, then I guess I'll use Request.Unvalidated["searchText"] instead of Request.QueryString["searchText"]. Then, as any smart developer who uses Request.Unvalidated does, I tried to make sure that I was being extra careful, but I honestly don't know much about inserting JavaScript into URLs so I am not sure if I should worry about this or not. I have noticed that it encodes apostrophes, quotations, parenthesis, and many other JavaScript special characters (actually, I'm not even sure if an angle bracket even has special meaning in JavaScript OR URLs, but it probably does in one, if not both. I know it helps denote a List in C#, but in any event you can write script tags with it if you could find a way to get it on the HTML page, so I guess that's why WebMatrix's validator screams at me when it sees them).
Should I find another way to submit this form, whereas I can intercept and encode the user data myself, or is it okay to use Request.Unvalidated in this instance without any sense of worry?
Please note, as you have probably already noticed, my question comes from a WebMatrix C#.net environment.
Bonus question (if you feel like saving me some time and you already know the answer off the top of your head): If I use Request.Unvalidated will I have to URL-decode the value, or does it do that automatically like Request.QueryString does?
---------------------------UPDATE----------------------------
Since I know I want neither a YSOD nor a custom error page to appear simply because a user included angle brackets in their "searchText", I know I have to use Request.Unvalidated either way, and I know I can encode whatever I want once the value reaches the cshtml page.
So I guess the question really becomes: Should I worry about possible XSS attacks (or any other threat for that matter) inside the URL based on angle brackets alone?
Also, in case this is relevant:
Actually, the value I am using (i.e. "searchText") goes straight to a cshtml page where the value is ran through a (rather complex) SQL query that queries many tables in a database (using both JOINS and UNIONS, as well as Aliases and function-based calculations) to determine the number of matches found against "searchText" in each applicable field. Then I remember the page locations of all of these matches, determine a search results order based on relevance (determined by type and number of matches found) and finally use C# to write the search results (as links, of course) to a page.
And I guess it is important to note that the database values could easily contain angle brackets. I know it's safe so far (thanks to HTML encoding), but I suppose it may not be necessary to actually "search" against them. I am confused as to how to proceed to maximum security and functional expecations, but if I choose one way or the other, I may not know I chose the wrong decision until it is much too late...
URL and special caracters
The url http://test.com/?param="><script>alert('xss')</script> is "benign" until it is read and ..
print in a template : Hello #param. (Potential reflected/persisted XSS)
or use in Javascript : divContent.innerHTML = '<a href="' + window.location.href + ... (Potential DOM XSS)
Otherwise, the browser doesn't evaluate the query string as html/script.
Request.Unvalidated/Request.QueryString
You should use Request.Unvalidated["searchText"] if you are expecting to receive special caracters.
For example : <b>User content</b><p>Some text...</p>
If your application is working as expected with QueryString["searchText"], you should keep it since it validate for potential XSS.
Ref: http://msdn.microsoft.com/en-us/library/system.web.httprequest.unvalidated.aspx

How can I remove escaping from a RegExp pattern?

I'm trying to simplify input for a particular regex for my users. A simple example of the regex might be
\b(C|C\+\+|Java)\b
I'm now giving the user the option of appending another branch at the end of the regex by inputting the raw string into a <input type="text"> field. The branch will be interpreted literally, so I need to escape it. I've used https://stackoverflow.com/a/2593661/785663 to get RegExp.quote to do this. I then store the complete regex in a database.
Now, when I retrieve the regex from the database and split it back up and display the branches to the user, I need to remove all the escape characters again. Is there some pre-made function for this or do I need to roll my own?
Yes, I know I ought to replace this with a list of strings to search for. But this only a part of a larger (regex based) picture.
The optimal solution is to change your design: store the unescaped regex, then only escape it when you actually use it. That way you don't have to worry about this messy business of converting it back and forth all the time.
If you use this regex a lot and are worried about the overhead of having to escape it all the time, then store both the unescaped and escaped versions. Update both whenever the user makes a change.
p.s. Allowing user-entered regexes may make your site vulnerable to attack. (Update: Though in this case it is less likely to be a problem, since you are only allowing literal strings)

Highlighting HTML parts matching a perl regex - do it in server side perl or client side javascript?

A Perl CGI application is providing a search function. The application writes matching snippets to the HTML page. Now I would like to highlight the matches inside the snippets. I could use something like
s/($searchregex)/<span class="highlight">$1<\/span>/gi
to highlight the matches. This is working fine for text only cases, but breaks sometimes with snippets containing itself HTML tag, e.g. for links or images with references. In failing cases the above replacement is destroying the HTML links by inserting the span tag inside the href value.
At the moment I am seeing three possible solutions:
Write a regex that is not replacing matches inside of html tags, e.g. inside <>. I am not aware how to write a replacement regex for this case. Is there a perl regex to allow this replacement and how does it look like?
Write a regex that replaces all wrong replacements of the above replacement. This would fix the wrong span tags inside the href.
Use Javascript to highlight the matches inside the resulting DOM tree. Possible ways using jQuery are outlined in highlight html with matching text. Even normal Javascript may be enough JavaScript’s Regular Expression Flavor. There are special jQuery plugins for highlighting highlight regular expressions , too. I am new to Javascript so some more advise is appreciated, too.
What is the preferable solution? The best way would to it as 1. - but that seems not possible. So the remaining question is: Do the work in an ugly way on the server side or introduce Javascript to solve the problem in a cleaner way on the client side.
in perl with a lookahead after pattern
s/($searchregex)(?=[^>]*<)/<span class="highlight">$1<\/span>/gi
or shorter
s/$searchregex(?=[^>]*<)/<span class="highlight">$&<\/span>/gi
but maybe you will need to read the whole file in a string or change the input record separator ($/) to '<', because the regexp matches the pattern if it's followed by a sequence of any character except '>' and by '<' because will not match if ($/="\n" and there is a newline between pattern and next '<'.
You could use an HTML parser on the server side, which is the correct tool for the job you are doing.
Or you could do it with javascript as you say, which I prefer myself as it is more versatile, and could lead to more interactivity, although you would probably be facing a similar issue to what you are facing now (just that you have moved it to the client side).
It is actually a more complex question than it first appears. Without more information, it is impossible to try to come up with a better solution.
One good solution would be to traverse the DOM tree and match against each text node, but you have a problem then that you would not match text that spans several text nodes - for example "John the Con Johnson" would not match the search for "John the Con" as they would be in separate nodes. This might or might not be a problem for you, depending on your use case.

Extracting data from JavaScript (Python Scraper)

I'm currently using a fusion of urllib2, pyquery, and json to scrape a site, and now I find that I need to extract some data from JavaScript. One thought would be to use a JavaScript engine (like V8), but that seems like overkill for what I need. I would use regular expressions, but the expression for this seems way to complex.
JavaScript:
(function(){DOM.appendContent(this, HTML("<html>"));;})
I need to extract the <html>, but I'm not entirely sure how to do so. The <html> itself can contain basically every character under the sun, so [^"] won't work.
Any thoughts?
Why regex? Can't you just use two substrings as you know how many characters you want to trim off the beginning and end?
string[42:-7]
As well as being quicker than a regex, it then doesn't matter if quotes inside <html> are escaped or not.
If every occurance of " inside the html code would be escaped by using \" (it is a JavaScript string after all), you could use
HTML\("((?:\\"|.)*?)"\)
to get the parameter to HTML into the first capturing group.
Note that this Regex is not yet escaped to be a Javascript String itself.

Categories