URL encoding ampersand in href attribute - javascript

I'm having some difficulty narrowing down the problem, I am wondering if anyone can help.
We are storing a URI in a element href attribute. This URI is dynamically created. One of the URI path variables maps to an id of an element in our database. Recently, we have started to use ampersands on special occasions in the id. The URI looks something like "/{entityType}/{entityId}/moderation".
When the URL is built, we use the encodeURIComponent javascript function, effectively turning the ampersand into '%26'. By examining the data stored in the href attribute via a web development tool, it looks like it is stored correctly. However, when I mouse hover it only displays the ampersand in the url.
Other ids work fine, including things like single apostrophes, but with an ampersand, it looks like it is getting a page redirection error (I'm guessing the ampersand is making the url invalid). I've tried escaping the percent sign, making the ampersand turn into '%2526', thinking that the %25 portion will be decoded to just the percent sign (making the final result %26).
So far none of my tinkering has worked. Anyone have a suggestion for next steps or what may be going on? Any help appreciated!

Related

Extract URL from HTML/Text but if URL only shows partial like "/secondpage.html"?

I'm trying to extract a URL from a HTML snippet in string format.
I've been using regex to retrieve the part between href=" and ". However, I noticed that in some cases href links to pages within the website without containing the root URL. For example, a snippet can be like:
<div class="textcontent" id="desc">
<br>
<a rel="nofollow" href="/confirm/url/aHR0cHLy9yYZy50bw%3D%3D/" class="ajaxLink">link</a><br>
Instead of the more usual:
Google
Where I can just use this regex to narrow down my results:
/href\n*=\n*".*?"/
I looked around StackOverflow, and saw a few posts about this (extracting URLs from html/text), and saw a mention of using an external library like JSoup. This is for a Chrome Extension, so I'm hoping to keep it lightweight (if that might be an issue). (JSoup is a Java library not JS).
Are there any good solutions for this "partial URL" problem? Would it be best to just check and append to the URL if root is missing, or would using external library like JSoup be more advised?
Following the direction you took by using a regex, the best approach could be to parse the extracted URL in order to detect one of the following three kinds of URL possibilities:
Protocol://FQDN/Document
/DOCUMENT/
DOCUMENT/
The first case points to an absolute document, the second points to an absolute document but omitting the protocol and the FQDN, and the third points to a relative document.
For the second and third cases you need to know the ommitted information in order to build a complete URL. Assuming you know the URL of the original HTML snippet code, the problem here is to detect what kind of possibility are you facing for each href. If you don't know the original URL, you are in a lack of information, meaning that you can't complete the HREF.

Relative path to documents, passed from href to Javascript function, getting back slashes and period stripped out

I have an aspx page with an asp:GridView control that lists documents that have been uploaded pertaining to a legal case. The first column in the GridView is a series of href tags that, if clicked on, will open the document in the default program associated with that document type. The relative path to the document is stored in SQL Server. I use the DataBinder.Eval method to populate the path, which is then sent to a Javascript function that opens the document. By example, I have one document record whose path is DATA\AR000001\AttachFiles\Case11\2.txt. The markup for the anchor in the grid is as follows:
<ItemTemplate>
Open
</ItemTemplate>
The Javascript function is:
function ViewAttachFile(sFile) {
docs = window.open(sFile);
}
If I view the page source, the relative path looks exactly as I would expect:
Open
Setting a breakpoint at ViewAttachFile, however, I find that the following is being passed to the function, note all backslashes and the document name have been stripped:
DATAAR000001AttachFilesCase11.txt (this is from the text visualizer)
I have tried embedding the escape and replace functions around the DataBinder.Eval call, but with no success, as Javascript is not my strong suit, not that I'm sure either of these would be the right solution anyway. I have also tried bypassing the ViewAttachFile function altogether, and coding the link as:
Open
but that essentially yields the same result, as the backslashes are stripped out of the URL, the period is converted to %02 and the window opened returns a 400 error for Bad Request - Invalid URL. I supposed I could double-up on the slashes in the path column in the database, but that seems like a bad idea, and I'm hoping for a cleaner resolution.
Any insight is greatly appreciated!
Here is a way to do it. Quick, dirty, and incomplete, but with the basics to give you the gist of the idea.
In the code behind:
protected string changeSlashes(string filepath){
// do work
return filepath;
}
In front-end code:
<%# changeSlashes(DataBinder.Eval(Container.DataItem, "FullAttchPath").ToString()) %>
Since this is new functionality, and I have the luxury to do so, I decided to change the back-slashes to slashes in the database. Seems to work fine. Not sure why I used back-slashes in the first place. Thanks for the feedback!

Sending HTML from servlet to js application corrupts data in Firefox

I'm sending some HTML code back to an iframe using a java servlet and an iframe on the js side. I'm actually just parsing some json from the HTML code by encasing it in a single <div>, using jQuery, but the string that gets sent back sometimes has added text.
If the text that gets added has a word with enclosing angle brackets, Firefox will automatically close the brackets for me, which I don't want.
For example, if I send this:
<div>{"location":[],"columns":["<case expression>","headers"]}</div>
Firefox (and ONLY Firefox so far, not IE or chrome) will receive it as this:
<div>{"location":[],"columns":["<case expression>","headers"]}</case></div>
which screws up my parsing. I'm sending the text with the Content-Type of text/html, which I think might be causing the issue. I've tried Content-Type of application/json, but it won't write html to the iframe unless I'm using the text/html.
Can someone help me with a solution? I'm willing to try a different method of sending the data if it's not too extensive.
In order to keep the browser from interpreting HTML meta-characters as such, so that your "<" and ">" characters end up as part of the text, you can "escape" them as HTML entities. The "<" character is < and the ">" is >. People generally also quote the ampersand ("&") as & but I think browsers are generally a little smarter about that.
Edit by OP for code solution:
I used StringEscapeUtils.escapeHTML(), which worked perfectly. Thanks!

how to implement if in javascript?

The javascript below extracts www.google.com from http://mysite.com?url=www.google.com
and writes it as an <a> href link
<script>
var urll = (window.location.search.match(/[?&;]url=([^&;]+)/) || [])[1];
document.write('url');
</script>
The problem with it is that when it extracts the url the <a> href value it becomes http://mysite.com/www.google.com so the if should state if the original url http://mysite.com?url=www.google.com doesn't have http:// infront of ?url= then add it after the href value to form url
In a comment for a previous question someone gave me this
if (link.substr(0, 7) !== 'http://') { link = 'http://' + link; }
but I really don't have a clue on how to implement it because I have never used an if in javascript.
Apart from anything else you're making yourself suspectible to XSS attacks:
Assume for a moment that the url parameter (which an external site can easily spoof by providing a link to your site) contains the string "><b>BOLD!</b><div class=". Suddenly your page would display some bold text, even 'though you never used a <b> tag in your site. And that's the most harmless example possible, because the attacker can equally well introduce arbitrary JavaScript into your page (including JS that steals the users cookie!).
Moral of the story: never blindly trust user input, and don't simply convert it to HTML.
To avoid these kinds of attacks (SQL Injection is a very similar attack against server-side code that builds SQL statements) do these two things:
validate the input to ensure that it's exactly what you expect and don't accept it if it doesn't. In your case that would mean that you'd want to make sure that the url parameter actually represents a valid URL.
Use user data only in "safe" ways that don't introduce the possibility of "re-interpretation" of the input. In your case it means that you must not build your HTML using string concatenation like this. Intead use document.createElement() to create your a element, set its href attribute to the desired value (sanitized as stated above) and then add the newly created a element in your DOM at the appropriate position.
It looks like you need https://developer.mozilla.org/en/JavaScript. "if" is the most basic element of any programming language, if you don't understand that you'll really need to run through a bunch of basics.

Filepaths containing ampersand (&) character

I have an ASP.NET MVC web app which includes the facility for clients to upload/download documents from a folder on the server.
I'm having a problem with people uploading file names containing an ampersand character (possibly other characters too, this is the only one I've discovered so far).
The result is I'm getting javascript redirects looking something like:
window.location.href = 'MyController/DownloadDocument?filename=Dog & Cat.pdf';
which obviously doesn't work.
What's the easiest work around for something like this? Is there any way to escape the ampersand in the query string?
Use encodeUriComponent (which will also fix the problem of the spaces, which aren't allowed in URIs)

Categories