JavaScript regex to match links with uppercase - javascript

I have this regex code that I want it to match any link preceded by -
this is my regex code
/-(\s+)?[-a-zA-Z0-9#:%_\+.~#?&//=]{1,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9#:%_\+.~#?&//=]*)?/
it already match these links
- www.demo.com
- http://foo.co.uk/
But it doesn't match these
- WWW.TELEGRAM.COM
- WWW.c.COM
- t.mE/rrbot
you can go to this link to check it http://regexr.com/3gnb1

There's two possible ways to go about it. Your regex currently excludes capital letters in the domain name, so you'd have to swap .[a-z]{2,4} for .[a-zA-Z]{2,4} or then make the whole regex case insensitive. In the latter case, you can remove A-Z from the previous groups as well, resulting in:
/-(\s+)?[-a-z0-9#:%_\+.~#?&//=]{1,256}\.[a-z]{2,4}\b(\/[-a-z0-9#:%_\+.~#?&//=]*)?/i

Why are you limiting the TLD to 4 characters? There are many valid TLDs that exceed beyond that such as .finance, .movie, .academy, etc.
You can use my answer from a previous post and make some minor adjustments.
(?(DEFINE)
(?<scheme>[a-z][a-z0-9+.-]*)
(?<userpass>([^:#\/](:[^:#\/])?#))
(?<domain>[a-z0-9]+(-[a-z0-9]+)*(\.[a-z0-9]+(-[a-z0-9]+)*)+)
(?<ip>(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])))
(?<host>((?&domain)|(?&ip)))
(?<port>(:[\d]{1,5}))
(?<path>([^?;\#\s]*))
(?<query>(\?[^\#;\s]*))
(?<anchor>(\#\S*))
)
(?:^)?-\ +((?:(?&scheme):\/\/)?(?&userpass)?(?&host)(?&port)?\/?(?&path)?(?&query)?(?&anchor)?)(?:$|\s+)
You can see this regex in use here. This should catch all valid URLs (albeit the scheme is considered optional in your case, so I've made the scheme optional in the regex)

Related

Regex for simple FEN validation

I'm looking to validate a chess FEN string and I'm working on the Regex for it. I'm looking to implement only very simple validation. Here are the rules I'm looking to match with my regex:
Exactly 7 "/" characters
Start and end of the string cannot be "/"
In between the slashes it must be either a number from 1-8 or the letters PNBRQK uppercase or lowercase
Example of a match
rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR
Examples of non-match
rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR/
/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR/
rnbqkbnr/pppppppp/8/8/8/10/PPPPPPPP/RNBQKBNR
rnbqkbnr/Z/8/8/8/8/PPPPPPPP/RNBQKBNR
Currently, I have been able to implement exactly 7 "/" anywhere in the string with the following regex:
/^(?:[^\/]*\/){7}[^\/]*$/gm
I'm unsure how to implement the rest as RegEx is not my strong suit.
This should do the trick: (passes all your tests)
/^(?:(?:[PNBRQK]+|[1-8])\/){7}(?:[PNBRQK]+|[1-8])$/gim
All you needed was to use positive matching for the characters you're after instead of "not slash". The key addition is the non-capturing group with one or more PNBRQK or a digit from 1-8. The same group is repeated at the end of the expression.
Oh, and I added the i flag for case insensitive matching.
/^([1-8PNBRQK]+\/){7}[1-8PNBRQK]+$/gim
/gim = global, case insensitive, and multiline.
I got the above working on https://regexr.com/ - one of my favorite places for working out regex problems (but I know there are many other good resources online).
Hope this helps.

Adding an additional letter matching group to an existing regex

I have the following regex: (?:\/us)?\/[a-z]{2}[_-][a-z]{2}(?:\/?$|(?=\/))|\/[a-z]{2}(?:\/?$|(?=\/))^([a-z]{2}\/retail)
As you can see, it's not particularly easy on the eyes. You can see it in action here: https://regex101.com/r/4AZwuP/1 (enable substitutions to see the desired result - the removal of matches)
Here's a few entries it's supposed to match:
/us/en_us/retail/en (matches /us/ and /us/en_us/)
/us/en_us/retail (matches /us/ and /en_us/)
/gb/en_gb/retail/en-uk (matches /en_gb and /en-uk)
Note that, these are just prefixes and the full url might look something like:
/de/de_de/retail/de_de/products/catalog
The goal is to run the regex and delete matches so that this lines becomes:
de/retail/products/catalog
The above Regex accomplishes this with one exception: in the first example, I need it to match not only /us/en_us but also /en (or /de or /mx - in other words, there's an additional country code there; it unfortunately does not.
What I do know for a fact is that if those two characters are present, it'll be one of these two:
.../retail/en
.../retail/en/something/or/other
In either case it's always two characters either alone or followed by a forward slash.
How can I modify the original regex to deal with this annoying edge case?
Bonus: how does the original work?
If a lookbehind is supported you might use:
(?:\/[a-z]{2})?\/[a-z]{2}[-_][a-z]{2}\b|(?<=\/retail)\/[a-z]{2}\b
(?:\/[a-z]{2})? Optionally match / and 2 chars a-z
\/[a-z]{2}[-_][a-z]{2}\b Match / 2 chars a-z. Then either - or _ and 2 char a-z
| Or
(?<=\/retail)\/[a-z]{2}\b Match 2 chars a-z asserting /retail directly to the left
Regex demo
Or use a capture group, and in the callback of replace check if group 1 exists. If it does, use it in the replacement to keep it.
(?:\/[a-z]{2})?\/[a-z]{2}[-_][a-z]{2}\b|\/(retail)\/[a-z]{2}\b
Regex demo
I suppose you want remove country code.then the begin /gb is country code also.
My regex is this (\/\w{2}(?=\/|$))|(\/\w{2}(-|_)\w{2}(?=\/|$))
let break in into two regex
(\/\w{2}(?=\/|$)) match two letter after / and end with / or nothing
(\/\w{2}(-|_)\w{2}(?=\/|$)) match two letter plus _|- and plus two letter,also start with / end with /
it match all example in your regex101,but it will failed if there has other two letters in your url

Regular Expression to find a pattern and replace just part of it

I want to know how can I use RegEx to find a pattern and replace just a part of it in JavaScript.
Let's say, for example, I want to replace some patterns like this -foo but just if it has a - after it, like -foo- but replace just the -foo.
Can someone please explain in details the RegEx construction to achieve it?
I did not find a detailed explanation of it here, just codes with a minimum explanation.
You need to use a positive look-ahead (?=-) that will check the existence of - after -foo but will not consume it:
var s = "-foo- -foo";
alert(s.replace(/-foo(?=-)/g, 'REPLACED'));
You can read more about look-aheads (and look-behinds, though they are not supported by the JS regex engine) at regular-expressions.info.
The main idea is that the text is checked for presence or absence of some patterns defined in the look-around, and based on that either allow or fail the match. They can actually be used efficiently together with anchors, but this is not the case here.
Lookahead and lookbehind, collectively called "lookaround", are zero-length assertions... lookaround actually matches characters, but then gives up the match, returning only the result: match or no match... They do not consume characters in the string, but only assert whether a match is possible or not.
As the first poster said, you need to make use of a lookahead (?=) to check for an additional character(s). In this situation, the character you need to look for is -, therefore your pattern would make use of a lookahead followed by - ie(?=-).

Regex to match mostly alphanumeric paths

I tried creating a regular expression to verify path names for a filesystem API I write using GridFS.
My current RegEx ^[A-Za-z0-9\-\[\]()$#_./]*$ can fulfill this criteria:
Allow a-z, A-Z, 0-9, -[]()$#_./
However it doesn't meet these additional criteria:
First Character has to be /
There mustn't be any occurrence of multiple / in a row.
Questions:
Can anybody help me fix my RegEx?
Are there any possible issues for using my criteria for path names? (Did I miss anything important?)
Not sure about the path criteria, but regarding the RegExp, pretty simple:
^\/(?!\/)([A-Za-z0-9\-\[\]()$#_.]|(\/(?!\/)))*$
\/(?!\/) means a slash / not followed by a slash (?!\/). I used it twice, once as the first character, and again as one of the possible matches after the first character.
Here's how you could address your requirements. To enforce the first character is /, simply add that after the ^.
^\/[A-Za-z0-9\-\[\]()$#_./]*$
To not allow consecutive slashes, you should remove it from your character set, and think of the set as a portion of the path. Portions would be separated by a slash. So the final regex would be:
^\/([A-Za-z0-9\-\[\]()$#_.]\/?)*$

RegEx string for three letters and two numbers with pre- and post- spaces

Two quick questions:
What would be a RegEx string for three letters and two numbers with space before and after them (i.e. " LET 12 ")?
Would you happen to know any good RegEx resources/tools?
For a good resource, try this website and the program RegexBuddy. You may even be able to figure out the answer to your question yourself using these sites.
To start you off you want something like this:
/^[a-zA-Z]{3}\s+[0-9]{2}$/
But the exact details depend on your requirements. It's probably a better idea that you learn how to use regular expressions yourself and then write the regular expression instead of just copying the answers here. The small details make a big difference. Examples:
What is a "letter"? Just A-Z or also foreign letters? What about lower case?
What is a "number"? Just 0-9 or also foreign numerals? Only integers? Only positive integers? Can there be leading zeros?
Should there be a single space between the letters and numbers? Or any amount of any whitespace? Even none?
Do you want to search for this string in a larger text? Or match a line exactly?
etc..
The answers to these questions will change the regular expression. It would be much faster for you in the long run to learn how to create the regular expression than to completely specify your requirements and wait for other people to reply.
I forgot to mention that there will be a space before and after. How do I include that?
Again you need to consider the questions:
Do you mean just one space or any amount of spaces? Possibly not always a space but only sometimes?
Do you mean literally a space character or any whitespace characters?
My guess is:
/^\s+[a-zA-Z]{3}\s+[0-9]{2}\s+$/
/[a-z]{3} [0-9]{2}/i will match 3 letters followed by a whitespace character, and then 2 numbers. [a-z] is a character class containing the letters a through z, and the {3} means that you want exactly 3 members of that class. The space character matches a literal space (alternately, you could use \s, which is a "shorthand" character class that matches any whitespace character). The i at the end is a pattern modifier specifying that your pattern is case-insenstive.
If you want the entire string to only be that, you need to anchor it with ^ and $:
/^[a-z]{3} [0-9]{2}$/i
Regular expression resources:
http://www.regular-expressions.info - great tutorial with a lot of information
http://rexv.org/ - online regular expression tester that supports a variety of engines.
^([A-Za-z]{3}) ([0-9]{2})$ assuming one space between the letters/numbers, as in your example. This will capture the letters and numbers separately.
I use http://gskinner.com/RegExr/ - it allows you to build a regex and test it with your own text.
As you can probably tell from the wide variety of answers, RegEx is a complex subject with a wide variety of opinions and preferences, and often more than one way of doing things. Here's my preferred solution.
^[a-zA-Z]{3}\s*\d{2}$
I used [a-zA-Z] instead of \w because \w sometimes includes underscores.
The \s* is to allow zero or more spaces.
I try to use character classes wherever possible, which is why I went with \d.
\w{3}\s{1}\d{2}
And I like this site.
EDIT:[a-zA-Z]{3}\s{1}\d{2} - The \w supports numeric characters too.
try this regularexpression
[^"\r\n]{3,}

Categories