Regex for valid, and invalid email addresses - javascript

I'm looking for some regex to match valid emails (doesn't need to be some whopping RFC-compatible job) and people trying to trick the system with invalid email addresses.
Examples of things I want to catch:
blah#blah.com
blah#blah.org
blah#blah.weirdtld
blat [ AT ] blah.com
blah[at]blah.com
blah#blah[ DOT]com
blah#blah[ dot ].com
etc.
I'm sure someone out there has published a tried-and-tested expression of all known permutations, but if they have, I can't find it, and would love to see it.
I don't care if it catches domains by accident, as they are being stripped anyway.
A real-world example of what this could be used for is eBay. Seller wants to put in their description "Contact me on: bob#example.com for a cheaper price" as they would not have to pay listing fees. I want to catch that address, regardless of how it is written.
I appreciate it's impossible to check everything, and this is not a replacement for human intervention (which is also a part of the validation process already, I'm just trying to make their lives easier).
I have already searched StackOverflow and Google, but unfortunately it's one of those problems which can be difficult to search for. If anyone has a link to a solution I would be very grateful.
Edit: Just to clarify even more. This is NOT to be used to check if an email address is valid or not. This is to be used to stop people entering valid email addresses AND email addresses with common substitutions into a textarea ([at] for #, [dot] for ., (d0t) for ., and so on and so forth).

I guess if even heavy spammers haven't found an easy way to overcome this problem, you won't have much luck here, either.
there are several reasons why it's a suicidal task to think about an algorithm for this, but the main one is human creativity vs machine stupidity.
There are literally infinite ways to camouflage an email address, for example test # domain.com (remove spaces) or test[d0t]again atsign domain[.com] (it took me 2 seconds to think about them and you surely can decode them without any issue.
Even if you can list every possible alternative (which is an inhuman task, anyway), somebody else will design a different scheme to hide their email contact (example: place email address inside an inline image)
Just by comparison, here is the best regex out there to simply detect valid email addresses that covers every RFC822 case.

See: How to Find or Validate an Email Address.
Excerpt:
...there's often a trade-off between what's exact, and what's
practical.
The virtue of my regular expression above is that it matches 99% of
the email addresses in use today. All the email address it matches can
be handled by 99% of all email software out there. If you're looking
for a quick solution, you only need to read the next paragraph. If you
want to know all the trade-offs and get plenty of alternatives to
choose from, read on.
To catch expressions that are likely aliases for an email address, just do a second test for [AT], [ at ], [DOT], etc. For example, here is a RegEx that does just that (the i qualifier tells Perl to ignore case):
/\[\s*(AT|DOT)\s*\]/i

Related

Regex: How to valid domain part not in email have all numeric

I am struggling with one issue where I need to verify domain part should not all be numeric.
For example:
abc#123.com -> Invalid
abc#1abc.com -> valid
Regex:
^(?=(.{1,64}#.{1,255}))((?!.*?[._]{2})[!#$%&'*+\-\/=?\^_`{|}~a-zA-Z0-9}]{1,64}(\.[!#$%&'*+\-\/=?\^_`{|}~a-zA-Z0-9]{0,}(?<!\.)){0,})#((\[(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}\])|((?!-)(?=.*[a-zA-Z])[a-zA-Z0-9-]{1,63}(?<!-)(\.(?!-)[a-zA-Z0-9-]{1,63}(?<!-)){1,}))$
Above regex need modification because there are some other validation which is working fine with above regex. Only thing is pending to validate domain part should not all numeric.
Updated:
After some research on above regex
I am able to segregate emails in to different groups. Now for group 10 need to add validation if all characters in group 10 string are aplha numeric.
Regex:
^(?=(.{1,64}#.{1,255}))((?!.*?[._]{2})[!#$%&'*+\-\/=?\^_`{|}~a-zA-Z0-9}]{1,64}(\.[!#$%&'*+\-\/=?\^_`{|}~a-zA-Z0-9]{0,}(?<!\.)){0,})#((\[(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}\])|((?!-)(?=.*[a-zA-Z])((?:.*[a-zA-Z0-9]))[a-zA-Z0-9-]{0,63}(?<!-)(\.(?!-)[a-zA-Z0-9-]{1,63}(?<!-)){1,}))$
Explore regex on : https://regex101.com/
TIA
There's no point in doing this - the fact that an email fulfills the requirements as set forth in RFC5322 does not mean it's a valid email address: The only way to know that, is to send an email to it, and have the user reply to it, follow a link inside it, or copy a code/token inside it.
Given that you have to do that anyway, that will also pick up any issues with invalid email addresses. Thus, the correct validation for email is:
Pattern.compile("^.+#.+\\..+$")
(Assuming you don't want single
and this does what you want, which is, filter out obvious incorrect entries, and that's all you need.
If you insist in continuing your mistake, there's always emailregex.com, which has the regex and explains how it works.
NB: Note that you're just wrong. 12345#678.cde can easily be valid - com may not allow you to register a domain that consists solely of digits, but it's not an inherent limitation of the DNS system: Domain parts can be all numbers. The top level domain cannot be, at least, for now, but any other part of it can be. Thus, rejecting foo#123.com is only possible if you program in, on a per-TLD basis, the exact rules. Which also means you need to sign up to the mailing list of every TLD operator to check for any changes they make. You'll be updating that regex every other week. Told you it's a silly thing to want to do!
u can use this to detect the invalid ones.
^\w+([-+.']\w+)*+#\d+.com
just change the .com to which postfix you like.

Combining RegEx's

I have two regex's that I am trying to combine. One is email specific and the other checks certain special characters. I have arrived at this solution following much toying:
"^([-0-9a-zA-Z.+_]+#[-0-9a-zA-Z.+_]+\.[a-zA-Z]{2,4}|[\\w\\-ÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜŸäëïöüŸçÇŒœßØøÅåÆæÞþÐð _]){0,80}$"
It does seem to check what I need it to, but for instance the following is still returned valid: abc#foo it does not force a full email address.
Am I using the correct approach or is there a simpler way to structure this RegEx? I'm on a learning curve with regex so all advice appreciated.
Move the multiplier {0,80} inside the parenthesis:
"^([-0-9a-zA-Z.+_]+#[-0-9a-zA-Z.+_]+\.[a-zA-Z]{2,4}|[\\w\\-ÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜŸäëïöüŸçÇŒœßØøÅåÆæÞþÐð _]{0,80})$"
// here __^^^^^^^
Also [a-zA-Z]{2,4} is really poor to validate TLDs, have a look at IANA.
And me#localhost is a valid email address.

Javascript/image email obfuscation tactics

I'm trying to obfuscate the contact email address on my website. I'm wondering what the best way is to do that.
some javascript way (not sure what is the best one ... http://hivelogic.com/enkoder/ this one looks easy, but not sure if its strong or not).
having an image called like "90210.png" and it is an image of the email address.
If javascript, what are some good scripts to do this?
Thanks!
Ringo
Write a proper contact form system, so that you never give out your email address unless you choose to reply to a contact.
Alternatively, you can write it backwards, then use JavaScript to flip it around:
var email = "moc.elpmaxe#ydobemos";
document.write(email.split("").reverse().join(""));
Somebody did a study for 1.5 years to test which various methods of email obfuscation worked the most effectively -- Perishable Press created a writeup on it.
It seems like one of the best methods was to ROT-13 an email address then decrypt it using Javascript (of course, not everybody has Javascript enabled, so this isn't a perfect solution).
I'd recommend using a contact form if possible though -- that way, your website still remains accessible to people with Javascript disabled.
The safest approach would be to not publish an email address, and instead provide a contact form.
Next safest would be an image, as you said, or any presentation method that is not plain text.
If you're determined to present text, you just have to make sure it doesn't match a regular expression looking for email adresses. So you could break it up with spaces, replace "#" with "(at)" and/or "." with "(dot)", etc. Of course, those methods will not stop someone who wants to spam you specifically, but neither will any javascript trick.

Suggest a good pattern for validating email with javaScript? [duplicate]

This question already has answers here:
How can I validate an email address in JavaScript?
(79 answers)
Closed 2 years ago.
greetings all
i want to validate an email with javaScript
and i need to use the best pattern for the matching
please suggest me a good pattern
In order for you to avoid reinventing the wheel I recommend this quality article on regular-expressions.info.
This ^([0-9a-zA-Z]([-\.\+\_\w]*[0-9a-zA-Z])*#([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})$ should match most cases. I might get massively down-voted for not including every single edge case (email addresses can have all kinds of crazy combinations) but I don't think I'm exaggerating in saying this will match 99.99% of email addresses.
This is the standard validation regular expression: http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html for RFC 822 ;)
You probably want something more simple (taken from http://www.regular-expressions.info/email.html): /^[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}$/i for JavaScript
What you might want to be wary of, if I remember correctly, is the fact that some email servers don't conform to RFC822, so being very strict on the validation might exclude some 'valid' email addresses. Depending on the level of validation that you need, it may be possible to just check that the email address has the correct basic format - something like one or more words separated by periods, followed by an # symbol, followed by two or more words separated by periods.
Having said this, you may also want to consider why you are validating the email address in the first place.
If you simply want to make sure that the user didn't type it incorrectly, then ask for the email address and a confirmation of the email address, then compare the two to decide whether the address is valid or not. (This is the strategy used by quite a lot of websites)
If you want to know whether the email address is real or not, as part of a registration process, then the registration could be made into a two step process, with a confirmation email being sent to the address that the user supplies in the frist step, and that email contains a link to the second step of the process.
I may be making wild assumptions about your needs, but I may just trigger the appropriate thought processes.

Best practices for email address validation (including the + in gmail addresses)

I know there are a lot of questions on here about email validation and specific RegEx's. I'd like to know what the best practices are for validating emails with respect to having the username+anythingelse#gmail.com trick (details here). My current RegExp for JavaScript validation is as follows, but it doesn't support the extra + in the handle:
/^([a-zA-Z0-9_.-])+#(([a-zA-Z0-9-])+.)+([a-zA-Z0-9]{2,4})+$/
Are there any other services that support the extra +? Should I allow a + in the address or should I alter the RegEx to only allow it for an email with gmail.com or googlemail.com as the domain? If so, what would be the altered RegEx?
UPDATE:
Thanks to everyone for pointing out that + is valid per the spec. I didn't know that and now do for the future. For those of you saying that its bad to even use a RegEx to validate it, my reason is completely based on a creative design I'm building to. Our client's design places a green check or a red X next to the email address input on blur of it. That icon indicates whether or not its a valid email address so I must use some JS to validate it then.
+ is a valid character in an email address. It doesn't matter if the domain isn't gmail.com or googlemail.com
Regexes aren't actually a very good way of validating emails, but if you just want to modify your regex to handle the plus, change it to the following:
/^([a-zA-Z0-9_.-\+])+#(([a-zA-Z0-9-])+.)+([a-zA-Z0-9]{2,4})+$/
As an example of how this regex doesn't validate against the spec: The email ..#-.com is valid according to it.
If you need to validate emails via regexp, then read the standard or at least this article.
The standard suggests to use this regexp:
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")#(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
If that doesn't scare you, it should :)
I would tend to go with something along the lines of /.+#.+\..+/ to check for simple mistakes. Then I would send an email to the address to verify that it actually exists, since most typos will still result in syntactically valid email addresses.
The specs allow for some really crazy ugly email addresses. I'm often very annoyed by websites even complaining about perfectly normal, valid email addresses, so please, try not to reject valid email addresses. It's better to accept some illegal addresses than to reject legal ones.
Like others have suggested, I'd go with using a simple regexp like /.+#.+/ and then sending a verification email. If it's important enough to validate, it's important enough to verify, because a legal email address can still belong to someone other than your visitor. Or contain an unintended but fatal typo.
*Edit: removed the dot from the domain part of the regex, because a#to is still a valid email address. So even my super simplified validation rejected valid addresses. Is there any downside at all to just accepting everything that contains an # with something in front and behind it?
A very good article about this subject I Knew How To Validate An Email Address Until I Read The RFC

Categories