I want my bot to delete the URLs posted by the members.
Im not sure about how to detect that a URL has been posted as they could start from https or www. Or something else altogether...
And insight is appreciated !
EDIT:
Found similar question to mine.
Regular expression for URL validation (in JavaScript)
Successfully tackled the same using a npm plugin called Linkify.
Consider this closed.
I'd recommend looking into regular expressions (regex) for URL Validation.
Here's another SO question that you can use as a starting point: Regular expression for URL validation (in JavaScript)
One recommended approach is using a regex string like this (from an answer in that question):
/^(ftp|http|https):\/\/[^ "]+$/.test(url)
This expression matches a full string, so you'll have to either edit this regex to match any place inside a string, or do some message content manipulation before using that regex.
Related
I want it to be correct my JavaScript regex pattern to validate below email address scenarios
msekhar#yahoo.com
msekhar#cs.aau.edu
ms.sekhar#yahoo.com
ms_sekhar#yahoo.com
msekhar#cs2.aau.edu
msekhar#autobots.ai
msekhar#interior.homeland1.myanmar.mm
msekhar1922#yahoo.com
msekhar#21#autobots.com
\u001\u002#autobots.com
I have tried the following regex pattern but it's not validating all the above scenarios
/^[_a-z0-9]+(\.[_a-z0-9]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})$/
Could any one please help me with this where am doing wrong?
The following regex should do:
^(([^<>()\[\]\.,;:\s#\"]+(\.[^<>()\[\]\.,;:\s#\"]+)*)|(\".+\"))#(([^<>()[\]\.,;:\s#\"]+\.)+[^<>()[\]\.,;:\s#\"]{2,})$
Test it: https://regex101.com/r/7gH0BR/2
EDIT: I have added all your test cases
I have always used this one but note it doesn't trigger on escaped unicode:
^([\w\d._\-#])+#([\w\d._\-#]+[.][\w\d._\-#]+)+$
You can see how it works here: https://regex101.com/r/caa7b2/4
First off [_a-z0-9]+ is going to match the username fields for the majority of those testcases. Anything further testing of username field content will result in a mismatch. If you write a pattern that expects two .-delimitered fields, it'll match when you provide two .-delimitered fields and only then, not anything else. Make a mental note of that. I think you probably meant to put the . in the first character set, and omit this part here: (.[_a-z0-9]+)...
As for the domain part of the email address, similar story there... if you're trying to match domains containing two labels (yahoo and com) against a pattern that expects three... it's going to fail because there's one less label, right? There are domain names that only contain one label which you might want to recognise as email addresses, too, like localhost...
You know, there is a point to where you can dig yourself down a very deep rabbit hole trying to parse email addresses, much to the effect of this question and answer sequence. If you're making this complex using regular expressions... I think maybe a better tool is a proper parser generator... otherwise, write the following:
A pattern that matches anything up until an # character
A pattern that matches the # character (this will help you learn how to avoid your .-related error)
A pattern that matches everything (this will help you understand your .-related error)
Combine the three above in the order presented.
I don't mess around with Regex too much but have been able to get this one online. /.+#.+/. This will return true with both joe#joe and joe#joe.com. I want to make it so a user must supply a domain extension otherwise I want it to fail, I presume this is quite simple but I just can't figure it out. I've tried /.+#.+.\S/ but that didn't work. Any help would be great, thanks!
This will be used in both PHP and javascript. The current one works in both, the new will need to also.
Here is expression
/\w+#\w+\.\w{2,10}/
to allow more characters:
/[\w\-\._]+#[\w\-\._]+\.\w{2,10}/
The regex here works for me (from http://www.regextester.com/19 )
/^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+#[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/i
As does this example of regex inside JavaScript from plnkr here: http://embed.plnkr.co/ZlbA1I2TsDBUmDb9o0gj/
Given that I don't know what the rest of your code is and you might really need this for both PHP and JavaScript I will suggest a different approach as I don't agree with the solution given in the accepted answer because it will match email addresses like -#-.aa, .#.-.aa, .#_.aa etc.
I'd suggest you use PHP's filter_var with the FILTER_VALIDATE_EMAIL filter which
validates e-mail addresses against the syntax in RFC 822, with the exceptions that comments and whitespace folding and dotless domain names are not supported.
and probably an additional AJAX call from JavaScript .
Regular Expressions matching all valid email address are not trivial at all and I'm not even sure you can rely on them for matching ALL valid email addresses.
For more information please take a look at Validate email address in JavaScript? and Using a regular expression to validate an email address
.+\#.+\.{1}.+
This is a simple regex and will match the required criteria
For a simple regex that will accept any domain extension of one character or longer, try: /.+#.+\..+/
For a 2 character domain extension or longer, try: /.+#.+\..{2,}/
This question already has answers here:
Regular expression to stop at first match
(9 answers)
Closed 2 years ago.
I have this gigantic ugly string:
J0000000: Transaction A0001401 started on 8/22/2008 9:49:29 AM
J0000010: Project name: E:\foo.pf
J0000011: Job name: MBiek Direct Mail Test
J0000020: Document 1 - Completed successfully
I'm trying to extract pieces from it using regex. In this case, I want to grab everything after Project Name up to the part where it says J0000011: (the 11 is going to be a different number every time).
Here's the regex I've been playing with:
Project name:\s+(.*)\s+J[0-9]{7}:
The problem is that it doesn't stop until it hits the J0000020: at the end.
How do I make the regex stop at the first occurrence of J[0-9]{7}?
Make .* non-greedy by adding '?' after it:
Project name:\s+(.*?)\s+J[0-9]{7}:
Using non-greedy quantifiers here is probably the best solution, also because it is more efficient than the greedy alternative: Greedy matches generally go as far as they can (here, until the end of the text!) and then trace back character after character to try and match the part coming afterwards.
However, consider using a negative character class instead:
Project name:\s+(\S*)\s+J[0-9]{7}:
\S means “everything except a whitespace and this is exactly what you want.
Well, ".*" is a greedy selector. You make it non-greedy by using ".*?" When using the latter construct, the regex engine will, at every step it matches text into the "." attempt to match whatever make come after the ".*?". This means that if for instance nothing comes after the ".*?", then it matches nothing.
Here's what I used. s contains your original string. This code is .NET specific, but most flavors of regex will have something similar.
string m = Regex.Match(s, #"Project name: (?<name>.*?) J\d+").Groups["name"].Value;
I would also recommend you experiment with regular expressions using "Expresso" - it's a utility a great (and free) utility for regex editing and testing.
One of its upsides is that its UI exposes a lot of regex functionality that people unexprienced with regex might not be familiar with, in a way that it would be easy for them to learn these new concepts.
For example, when building your regex using the UI, and choosing "*", you have the ability to check the checkbox "As few as possible" and see the resulting regex, as well as test its behavior, even if you were unfamiliar with non-greedy expressions before.
Available for download at their site:
http://www.ultrapico.com/Expresso.htm
Express download:
http://www.ultrapico.com/ExpressoDownload.htm
(Project name:\s+[A-Z]:(?:\\w+)+.[a-zA-Z]+\s+J[0-9]{7})(?=:)
This will work for you.
Adding (?:\\w+)+.[a-zA-Z]+ will be more restrictive instead of .*
I am trying to find out whether my client-side Javascript regex
/^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/
for email validation (I'm using it just to make sure that the email is formatted properly, not as a primary validation method) will work on the server side with PHP.
I am not sure whether I can use the same one even though both languages use Perl-based regex syntax. Thank you for your help.
You should be able to use the same syntax.
You should use
preg_match(String $pattern, String $email[, array $matches])
with your pattern. It puts all occurrences into the array $matches, if given.
It returns true if a match is found. For E-Mails in particular it's always a
better idea to use the functions of others, because for example "$#us" is a valid
email address
This regex will work nearly identically in both JavaScript and PHP. There are some minuscule differences, for example \s matches the "next line" control character U+0085 in PHP, but not in JavaScript, but they are unlikely to matter in this context (it's unusual anyway to allow newlines and tabs in email addresses - why not use a simple space instead of the generic whitespace shorthand \s).
If you have to do these kinds of comparisons/conversions regularly, I heartily recommend you taking a look at RegexBuddy which can convert regexes between flavors with a single click.
This question already has answers here:
How to replace plain URLs with links?
(25 answers)
Closed 8 years ago.
I have a Javascript function that receives a string.
that string may have URLs in it, with, or without http:// or http/s:
example string:
This is an example string with www.cnn.com and http://microsoft.com"
my purpose is to take that string and inject anchor tags for the URLs such that they are clickable when injected as HTML to an html document.
Is there anything I can use that does this? some Jquery function?
No native jQuery function for this but regular expression can do this
Let me give point you to this blog post that will get you started. What it has is a regular expression (may seem rather complicated and complex, but is not that much) that can detect links and emails in strings so you'll be able to find them and replace them with links that point to extracted URLs.
Maybe an even better expression (that I know I used in the past) has been published by Stackoverflow's own Jeff Atwood. He also describes the whole URL situation in very much detail so you can decide whether this applies to your problem and how much or not.
If a regex still is wanted:
/((www|http:\/\/)\S*)/ig
This matches www.cnn.com and http://www.microsoft.com from the following example:
This is an example string with www.cnn.com and http://www.microsoft.com
If you use $1 to replace you'll get:
www.cnn.com
http://www.microsoft.com
Remember that this regex doesn't validate the urls address. It just searches for www or http:// and then takes everything after that until there is a whitespace.