Javascript mailto string loses 'encodeURI' encoding

Javascript mailto string loses 'encodeURI' encoding - javascript

Trying to create a simple 'mailto' function using javascript. I just need to be able to send some links (like: See this article bla bla).
Some of the links I need to send include spaces, danish chars. So I've been using the
encodeURI() function.
The problem arises when I try to mail the link (sample code below)
var _encodedPath = encodeURI(path);
var _tempString = "mailto:someemail#somewhere.dk?subject=Shared%20from%20some%20page&body=" + _encodedPath;
If I output the _tempString to the console I get the correct encoded string. However when using the same string in 'mailto' the string loses it's encoding and returns to the way it was before.
Any clue as to why this is?
Thanks in advance :)

The link is decoded when you click it - that's normal. Since you have an http link within a mailto link, it should be encoded twice.
Email clients do their best to make things that look like links clickable. They typically decide where the link ends in a somewhat arbitrary and unpredictable manner.
In email, the best way to keep a link contiguous is to enclose it in angle-brackets like this:
<http://www.example.com/url with spaces>
But this isn't foolproof. Email is fragile and you can't control the content well enough with a mailto link. It might be better to try to reduce the complexity of the url - perhaps by providing or utilizing a url-shortener service. Any url longer than 74 or so characters is likely to be mangled by some email clients.

You should use encodeURIComponent instead of encodeURI.
More information here.

this site helped me solving any troubles with mailto links:
http://www.1ngo.de/web/formular.html
may be it's not the nicest way, but it always works with every browser i know. And it also has very cool algorithm implemented to format the content so that everything should be alright. Just try it and play around a little with code by quoting out parts of the code and you will understand very fast what exactly happens there and how to modify it for your wishes. Althoug it's a little late I hope this one helps anybody checking this question.
althoug it's in german, you just need to copy the code shown there and run it and experiment with it.

Related

XSS vulnerability despite encoding

I had a security audit on a website on which I've been working. The audit has shown that one of my parameter, called backurl, wasn't protected enough in my jsp file. This url is put inside the href of a button, button that allows the user to get back to the previous page.
So what I did was to protect it using the owasp library, with the function "forHTMLAttribute". It gives something like this:
<a class="float_left button" href="${e:forHtmlAttribute(param.backUrl)}">Retour</a>
However, a second audit showed that by replacing the value of the parameter by:
javascript:eval(document%5b%27location%27%5d%5b%27hash%27%5d.substring(1))#alert(1234)
The javascript code would be executed and the alert would show, when clicking on the button only.
They said that something that I could do was to hardcode the hostname value in front of the url, but I don't really get how this would help solve the problem. I feel like no matter what I do, solving a XSS vulnerability will just create a new one.
Could someone help me on this? To understand what's happening and where to look at least.
Thanks a lot.

As #Pointy said, the problem is more fundamental here. Accepting untrusted input and rendering that as a link verbatim (or as text), is a security issue, even if you escape the heck out of it. For example, if you allow login?msg=Password+incorrect and that's how you deal with relaying messages - you have a problem. I can make a site with: Click for cute kittens! and you see why this is a problem.
The real solution is to not accept potentially tainted information, period (nevermind escaping!), if that information ends up being rendered without the surrounding context that it is tainted. For example, take twitter. The username and the tweet are potentially tainted. This is no big deal because users of twitter get clued in due to the very design of the website that you're looking at what some rando wrote. If someone tweets 'If you transfer 5 bucks to Jack Dorsey's account at 12345678, he'll give you a twitter blue logo!', there's a reasonable expectation that it's on the users of the site to not be morons and trust it.
Your website's "click here to go to the previous page" is not like that. You can't reasonably expect the users of your site to hang over that button, check their browser's status bar, and figure it out.
Hence, the entire principle is wrong. You simply can't do it this way, period.
Your alternatives are threefold:
Instead of letting the 'previous link' property be a URL param, it needs to be in the session. Websites work with sessions, generally. You can store whatever you want in them, serverside (the HTTP handler code either manually takes e.g. the cookie and uses that to look up on-server info for that user by doing a lookup, or runs on a framework that just provides an HttpSession-style object, which works in that exact fashion).
If you really want to bend over backwards, you can include it as a signed blob. This is creative but I really wouldn't go there.
A quick hack: What if you just include Click to go back as a static link in your web page?

Unicode characters cannot be decoded

I use browserless.js (headless Chrome) to fetch the html code of a website, and then use a regular expression to find certain image URLs.
One example is the following:
https://vignette.wikia.nocookie.net/moviepedia/images/8/88/Adrien_Brody.jpg/revision/latest/top-crop/width/360/height/450?cb\u003d20141113231800\u0026path-prefix\u003dde
There are unicode characters such as \u003d, which should be decoded (in this case to =). The reason is that I want to include these images in a site, and without decoding some of them cannot be displayed (like that one above, just paste the URL; it gives broken-image.webp).
I have tried lots of things, but nothing works.
JSON.parse(JSON.stringify(...))
String.prototype.normalize()
decodeURIComponent
Curiously, the regular expression for "\u003d" (i.e. "\\u003d" in js) does not match that string above, but "u003d" does.
This is all very weird, and my current guess is that browserless is responsible for some weird formatting behind the scenes. Namely, when I console log the URL and copy paste it somewhere else, every method mentioned above works for decoding.
I hope that someone can help me on this.

Just to mark this one as answered. Thomas replied:
JSON.parse(`"${url}"`)

Facebook-Like URLS

I have been searching for a method of encoding url just like facebook's. All I have been able to find is these methods:
escape
encodeURI
encodeURIComponent
The goal is to encode a string in latin characters, for example:
¿Cómo estás?
Facebook results in the next url
When I use the 3 functions I talked abour earlier I get nothing similar
escape("¿Cómo estás?"); //"%BFC%F3mo%20est%E1s%3F"
encodeURI("¿Cómo estás?");//"%C2%BFC%C3%B3mo%20est%C3%A1s?"
encodeURIComponent("¿Cómo estás?"); //"%C2%BFC%C3%B3mo%20est%C3%A1s%3F"
I need you to guide me to the solution, this is something im doing more than anything for SEO purposes. Do I have to code a function myself?
Thanks for your time.

So my first guess is you are encoding in UTF-8 where Facebook may be encoding in ISO.
https://en.wikipedia.org/wiki/ISO/IEC_8859-1 vs https://en.wikipedia.org/wiki/UTF-8
As far as SEO goes - I do not see the relation, the example is just a query string - so wouldn't be crawled by a search engine. I may be misunderstanding - hopefully that points you in the right direction though.

After I decided to make my search just encoding my string using encodeURIComponent I discovered that when you window.location.href your encoded string your URL actually looks like facebooks. What a surprise, It looks very different in the console.

Browser strips encoded dot character from url

I have a web app, which allows searching. So when I go to somedomain.com/search/<QUERY> it searches for entities according to <QUERY>. The problem is, when I try to search for . or .. it doesn't work as expected (which is pretty obvious). What surprised me though, is that if I manually enter the url of somedomain.com/search/%2E, the browser (tested Chrome and IE11) converts it somedomain.com/search/ and issues a request without necessary payload.
So far I haven't found anything that would say it's not possible to make this work, so I came here. Right now I have only one option: replacing . and .. to something like __dotPlaceholder__, but this feels like a dirty hack to me.
Any solution (js or non-js) will be welcomed. Any information on why do browsers strip url-encoded dots is also a nice-to-have.

Unfortunately part of RFC3986 defines the URI dot segments to be normalised and stripped out in that case, ie http://example.com/a/./ to become http://example.com/a
see https://www.rfc-editor.org/rfc/rfc3986#page-33 for more information

regex to find url in a text

I have to find the first url in the text with a regular expression:
for example:
I love this website:http://www.youtube.com/music it's fantastic
or
[ es. http://www.youtube.com/music] text

I looked into this issue last year and developed a solution that you may want to look at - See: URL Linkification (HTTP/FTP) This link is a test page for the Javascript solution with many examples of difficult-to-linkify URLs.
My regex solution, written for both PHP and Javascript - is not simple (but neither is the problem as it turns out.) For more information I would recommend also reading:
The Problem With URLs by Jeff Atwood, and
An Improved Liberal, Accurate Regex Pattern for Matching URLs by John Gruber
The comments following Jeff's blog post are a must read if you want to do this right...
Note that this question gets asked a lot. Maybe do a search next time :)

You can't do this perfectly with a regular expression. You may be interested in this blog post. There is a bit more information on Regex Guru, but even those look very fragile. You will need to have additional checks outside of your regular expression to catch the edge cases.

Identifying URLs is tricky because they are often surrounded by punctuation marks and because users frequently do not use the full form of the URL. Many JavaScript functions exist for replacing URLs with hyperlinks, but I was unable to find one that works as well as the urlize filter in the Python-based web framework Django. I therefore ported Django's urlize function to JavaScript: https://github.com/ljosa/urlize.js
It actually would not pick up the URL in your example because there is a colon right before the URL. But if we modify the example a little:
urlize("I love this website: http://www.youtube.com/music it's fantastic", true, true)
=> 'I love this website: http://www.youtube.com/music it's fantastic"'
Note the second argument which, if true, inserts rel="nofollow" and the third argument which, if true, quotes characters that have special meaning in HTML.

This might work->
\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))
Found it somewhere
Will find links ->
http://foo.com/blah_blah/
(Something like http://foo.com/blah_blah)
http://foo.com/blah_blah_(wikipedia)
Hope this works....

i am using this regex : :) ( its translated ABNF )
[a-zA-Z]([a-zA-Z]|[0-9]|\+|\-|\.)*:\/\/((([a-zA-Z]|[0-9]|-|\.|_|~)|%[0-9A-Fa-f][0-9A-Fa-f]|[!$&'\(\)\*\+,;=]|:)*#)?(\[((([0-9A-Fa-f]{1,4}:){6}([0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9]))|::([0-9A-Fa-f]{1,4}:){5}([0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9]))|([0-9A-Fa-f]{1,4})?::([0-9A-Fa-f]{1,4}:){4}([0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9]))|(([0-9A-Fa-f]{1,4}:){0,1}[0-9A-Fa-f]{1,4})?::([0-9A-Fa-f]{1,4}:){3}([0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9]))|(([0-9A-Fa-f]{1,4}:){0,2}[0-9A-Fa-f]{1,4})?::([0-9A-Fa-f]{1,4}:){2}([0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9]))|(([0-9A-Fa-f]{1,4}:){0,3}[0-9A-Fa-f]{1,4})?::[0-9A-Fa-f]{1,4}:([0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9]))|(([0-9A-Fa-f]{1,4}:){0,4}[0-9A-Fa-f]{1,4})?::([0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9]))|(([0-9A-Fa-f]{1,4}:){0,5}[0-9A-Fa-f]{1,4})?::[0-9A-Fa-f]{1,4}|(([0-9A-Fa-f]{1,4}:){0,6}[0-9A-Fa-f]{1,4})?::)|v[0-9A-Fa-f]\.(([a-zA-Z]|[0-9]|-|\.|_|~)|[!$&'\(\)\*\+,;=]|:))\]|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])|(([a-zA-Z]|[0-9]|-|\.|_|~)|%[0-9A-Fa-f][0-9A-Fa-f]|[!$&'\(\)\*\+,;=])*)(:[0-9]*)?(((\/(([a-zA-Z]|[0-9]|-|\.|_|~)|%[0-9A-Fa-f][0-9A-Fa-f]|[!$&'\(\)\*\+,;=]|:|#)*)*|\/((([a-zA-Z]|[0-9]|-|\.|_|~)|%[0-9A-Fa-f][0-9A-Fa-f]|[!$&'\(\)\*\+,;=]|:|#){1}(\/(([a-zA-Z]|[0-9]|-|\.|_|~)|%[0-9A-Fa-f][0-9A-Fa-f]|[!$&'\(\)\*\+,;=]|:|#)*)*)?|(([a-zA-Z]|[0-9]|-|\.|_|~)|%[0-9A-Fa-f][0-9A-Fa-f]|[!$&'\(\)\*\+,;=]|:|#){1}(\/(([a-zA-Z]|[0-9]|-|\.|_|~)|%[0-9A-Fa-f][0-9A-Fa-f]|[!$&'\(\)\*\+,;=]|:|#)*)*|(([a-zA-Z]|[0-9]|-|\.|_|~)|%[0-9A-Fa-f][0-9A-Fa-f]|[!$&'\(\)\*\+,;=]|#){1}(\/(([a-zA-Z]|[0-9]|-|\.|_|~)|%[0-9A-Fa-f][0-9A-Fa-f]|[!$&'\(\)\*\+,;=]|:|#)*)*))?\/?(\?((([a-zA-Z]|[0-9]|-|\.|_|~)|%[0-9A-Fa-f][0-9A-Fa-f]|[!$&'\(\)\*\+,;=]|:|#)|\/|\?)*)?(\#((([a-zA-Z]|[0-9]|-|\.|_|~)|%[0-9A-Fa-f][0-9A-Fa-f]|[!$&'\(\)\*\+,;=]|:|#)|\/|\?)*)?

You can use the following regex expression for extracting any type of url coming in message.
String regex = "(http(s)?:\/\/.)?(www\.)?[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9#:%_\+.~#?&/=]*)";

Typescript/Angular
This works for me:
const regExpressionUrl = new RegExp(/(https?:\/\/[^\s]+)/g); //detect URL
Ref: https://www.regextester.com/96249%7CRegular

We Keep Coding

JavaScript is the programming language of the Web.