I have a text:
Wheels – F/R_ Schwalbe TABLE TOP/Schwalbe Black Jack 26x2.2
And regex to parse wheels size from that string:
/.*Wheels.*(\d*)x/
But it does not work. Besides, when i'm removing asterisk from regex, i'm getting number 6 as group match.
You need to make your .* before the digits lazy instead of greedy:
/.*Wheels.*?(\d*)x/
The .* will greedily consume everything up to the x, leaving nothing for the following \d*. Since * can validly match zero characters, an empty match for \d* is not an incorrect result.
By adding a ? to make a lazy .*? expression, it will try match as few characters as possible, allowing the following \d* to match all the numbers before the x.
You need to make your regex non-greedy because .* will consume your digits and \d* mean zero or no match
.*Wheels.*?(\d*)x
.*? mean match as many characters as few time as possible to stop .* to consume your digits
Follow this Demo for example
Alternately you can make it more efficient if there are no digits after Wheel and your desired values with following regex
.*Wheels[^\d]*(\d*)x
where [^\d]* mean matches anything except digits
Related
Using regular expression, I want to select only the words which:
are alphanumeric
do not contain only numbers
do not contain only alphabets
have unique numbers(1 or more)
I am not really good with the regex but so far, I have tried [^\d\s]*(\d+)(?!.*\1) which takes me nowhere close to the desired output :(
Here are the input strings:
I would like abc123 to match but not 123.
ab12s should also match
Only number-words like 1234 should not match
Words containing same numbers like ab22s should not match
234 should not match
hel1lo2haha3hoho4
hel1lo2haha3hoho3
Expected Matches:
abc123
ab12s
hel1lo2haha3hoho4
You can use
\b(?=\d*[a-z])(?=[a-z]*\d)(?:[a-z]|(\d)(?!\w*\1))+\b
https://regex101.com/r/TimjdW/3
Anchor the start and end of the pattern at word boundaries with \b, then:
(?=\d*[a-z]) - Lookahead for an alphabetical character somewhere in the word
(?=[a-z]*\d) - Lookahead for a digit somewhere in the word
(?:[a-z]|(\d)(?!\w*\1))+ Repeatedly match either:
[a-z] - Any alphabetical character, or
(\d)(?!\w*\1) - A digit which does not occur again in the same word
Here is a bit shorter & faster regex to make it happen since it doesn't assert negative lookahead for each character:
/\b(?=[a-z]*\d)(?=\d*[a-z])(?!\w*(\d)\w*\1)[a-z\d]+\b/ig
RegEx Demo
RegEx Details:
\b: Word boundary
(?=[a-z]*\d): Make sure we have at least a digit
(?=\d*[a-z]): Make sure we have at least a letter
(?!\w*(\d)\w*\1): Make sure digits are not repeated anywhere in the word
[a-z\d]+: Match 1+ alphanumericals
\b: Word boundary
You could assert all the conditions using one negative lookahead:
\b(?![a-z]+\b|\d+\b|\w*(\d)\w*\1)[a-z\d]+\b
See live demo here
The important parts are starting match from \b and immediately looking for the conditions:
[a-z]+\b Only alphabetic
\d+\b Only numeric
\w*(\d)\w*\1 Has a repeating digit
You can use this
\b(?!\w*(\d)\w*\1)(?=(?:[a-z]+\d+)|(?:\d+[a-z]+))[a-z0-9]+\b
\b - Word boundary.
(?!\w*(\d)\w*\1) - Condition to check unique digits.
(?=(?:[a-z]+\d+)|(?:\d+[a-z]+)) - Condition to check alphanumeric words.
[a-z0-9]+ - Matches a to z and 0 to 9
Demo
I'm stuck trying to capture a structure like this:
1:1 wefeff qwefejä qwefjk
dfjdf 10:2 jdskjdksdjö
12:1 qwe qwe: qwertyå
I would want to match everything between the digits, followed by a colon, followed by another set of digits. So the expected output would be:
match 1 = 1:1 wefeff qwefejä qwefjk dfjdf
match 2 = 10:2 jdskjdksdjö
match 3 = 12:1 qwe qwe: qwertyå
Here's what I have tried:
\d+\:\d+.+
But that fails if there are word characters spanning two lines.
I'm using a javascript based regex engine.
You may use a regex based on a tempered greedy token:
/\d+:\d+(?:(?!\d+:\d)[\s\S])*/g
The \d+:\d+ part will match one or more digits, a colon, one or more digits and (?:(?!\d+:\d)[\s\S])* will match any char, zero or more occurrences, that do not start a sequence of one or more digits followed with a colon and a digit. See this regex demo.
As the tempered greedy token is a resource consuming construct, you can unroll it into a more efficient pattern like
/\d+:\d+\D*(?:\d(?!\d*:\d)\D*)*/g
See another regex demo.
Now, the () is turned into a pattern that matches strings linearly:
\D* - 0+ non-digit symbols
(?: - start of a non-capturing group matching zero or more sequences of:
\d - a digit that is...
(?!\d*:\d) - not followed with 0+ digits, : and a digit
\D* - 0+ non-digit symbols
)* - end of the non-capturing group.
you can use or not the ñ-Ñ, but you should be ok this way
\d+?:\d+? [a-zñA-ZÑ ]*
Edited:
If you want to include the break lines, you can add the \n or \r to the set,
\d+?:\d+? [a-zñA-ZÑ\n ]*
\d+?:\d+? [a-zñA-ZÑ\r ]*
Give it a try ! also tested in https://regex101.com/
for more chars:
^[a-zA-Z0-9!##\$%\^\&*)(+=._-]+$
Thanks for taking a look.
My goal is to come up with a regexp that will match input that contains no digits, whitespace or the symbols !#£$%^&*()+= or any other symbol I may choose.
I am however struggling to grasp precisely how regular expressions work.
I started out with the simple pattern /\D/, which from my understanding will match the first non-digit character it can find. This would match the string 'James' which is correct but also 'James1' which I don't want.
So, my understanding is that if I want to ensure that a pattern is not found anywhere in a given string, I use the ^ and $ characters, as in /^\D$/. Now because this will only match a single character that is not a digit, I needed to use + to specify that 1 or more digits should not be founds in the entire string, giving me the expression /^\D+$/. Brilliant, it no longer matches 'James1'.
Question 1
Is my reasoning up to this point correct?
The next requirement was to ensure no whitespace is in the given string. \s will match a single whitespace and [^\s] will match the first non-whitespace character. So, from my understanding I just had to add this to what I have already to match strings that contain no digits and no whitespace. Again, because [^\s] will only match a single non-white space character, I used + to match one or more whitespace characters, giving the new regexp of /^\D+[^\s]+$/.
This is where I got lost, as the expression now matches 'James1' or even 'James Smith25'. What? Massively confused at this point.
Question 2
Why is /^\D+[^\s]+$/ matching strings that contain spaces?
Question 3
How would I go about writing the regular expression I'm trying to solve?
While I am keen to solve the problem I am more interested in figuring where my understanding of regular expressions is lacking, so any explanations would be helpful.
Not quite; ^ and $ are actually "anchors" - they mean "start" and "end", it's actually a little more complicated, but you can consider them to mean the start and end of a line for now - look up the various modifiers on regular expressions if you're interested in learning more about this. Unfortunately ^ has an overloaded meaning; if used inside square brackets it means "not", which is the meaning you are already acquainted with. It's very important that you understand the difference between these two meanings and that the definition in your head actually applies only to character range matching!
Contributing further to your confusion is that \d means "a numerical digit" and \D means "not a numerical digit". Similarly \s means "a whitespace (space/tab/newline/etc.) character" and \S means "not a whitespace character."
It's worth noting that \d is effectively a shortcut for [0-9] (note that - has a special meaning inside square brackets), and \D is a shortcut for [^0-9].
The reason it's matching strings that contain spaces is that you've asked for "1+ non-numerical digits followed by 1+ non-space characters" - so it'll match lots of strings! I think that perhaps you don't understand that regular expressions match bits of strings, you're not adding constraints as you go, but rather building up bots of matchers that will match bits of corresponding strings.
/^[^\d\s!#£$%^&*()+=]+$/ is the answer you're looking for - I'd look at it like this:
i. [] - match a range of characters
ii. []+ - match one or more of that range of characters
iii. [^\d\s]+ - match one or more characters that do not match \d (numerical digit) or \s (whitespace)
iv. [^\d\s!#£$%^&*()+=]+ - here's a bunch of other characters I don't want you to match
v. ^[^\d\s!#£$%^&*()+=]+$ - now there are anchors applied, so this matcher has to apply to the whole line otherwise it fails to match
A useful website to explore regexs is http://regexr.com/3b9h7 - which I supply with my suggested solution as an example. Edit: Pruthvi Raj's link to debuggerx is awesome!
Is my reasoning up to this point correct?
Almost. /\D/ matches any character other than a digit, but not just the first one (if you use g option).
and [^\s] will match the first non-whitespace character
Almost, [^\s] will match any non-whitespace character, not just the first one (if you use g option).
/^\D+[^\s]+$/ matching strings that contain spaces?
Yes, it does, because \D matches a space (space is not a digit).
Why is /^\D+[^\s]+$/ matching strings that contain spaces?
Because \D+ in /^\D+[^\s]+$/can match spaces.
Conclusion:
Use
^[^\d\s!#£$%^&*()+=]+$
It will match strings that have no digits and spaces, and the symbols you do not allow.
Mind that to match a literal -, ] or [ with a character class, you either need to escape them, or use at the start or end of the expression. To play it safe, escape them.
Just insert every character you don't want to include in a negated character class as follows:
^[^\s\d!#£$%^&*()+=]*$
DEMO
Debuggex Demo
^ - start of the string
[^...] - matches one character that is not in `...`
\s - matches a whitespace (space, newline,tab)
\d - matches a digit from 0 to 9
* - a quantifier that repeats immediately preceeding element by 0 or more times
so the regex matches any string that has
1. string that has a beginning
2. containing 0 or more number of characters that is not whitesapce, digit, and all the symbols included in the character class ( In this example !#£$%^&*()+=) i.e., characters that are not included in the character class `[...]`
3.that has ending
NOTE:
If the symbols you don't want it to have also includes - , a hyphen, don't put it in between some other characters because it is a metacharacter in character class, put it at last of character class
I am trying to get a regular expression to work but am stumped. What I want is to do the inverse of this:
/(\w)\1{5,}/
This regex does the exact opposite of what I'm trying to do. I would like to get everything but a string that has 6 repeating numbers i.e. 111111 or 999999.
Is there a way to use a negative look-around or something with this regex?
You can use this rgex:
/^(?!.*?(\w)\1{5}).*$/gm
RegEx Demo
(?!.*?(\w)\1{5}) is a negative lookaahead that will fail the match if there are 6 consecutive same word characters in it.
I'd rather go with the \d shorthand class for digits since \w also allows letters and an underscore.
^(?!.*(\d)\1{5}).*$
Regex explanation:
^ - Start of string/line anchor
(?!.*(\d)\1{5}) - The negative lookahead checking if after an optional number of characters (.*) we have a digit ((\d)) that is immediately followed with 5 identical digits (\1{5}).
.* - Match 0 or more characters up to the
$ - End of string/line.
See demo. This regex will allow
The regex I have is...
^[A-z0-9]*[A-z0-9\s]{0,20}[A-z0-9]*$
The ultimate goal of this regex is not to allow leading and trailing spaces, while limiting the characters that are entered to 20, which the above regex doesn't do a good job at.
I found a some questions similar to this and the closest one to this would be How to validate a user name with regex?, but it did not limit the number of chars. This did solve the problem of leading and trailing spaces.
I also saw a way using negation and another negative lookahead, but that didn't work out so well for me.
Is there a better way to write the regex above with the 20 character limit? The repeat of the allowed characters is pretty ugly especially when the list of the allowed characters are large and specific.
Update:
I like this one even better. We use a negative lookahead to make sure there isn't ^\s (whitespace at the beginning of the string) or \s$ whitespace at the end of the string. And then match 1 alphanumeric character. We repeat this 1-20 times.
/^(?:(?!^\s|\s$)[a-z0-9\s]){1,20}$/i
Demo
^ (?# beginning of string)
(?: (?# non-capture group for repetition)
(?! (?# begin negative lookahead)
^\s (?# whitespace at beginning of string)
| (?# OR)
\s$ (?# whitespace at end of string)
) (?# end negative lookahead)
[a-z0-9\s] (?# match one alphanumeric/whitespace character)
){1,20} (?# repeat this process 1-20 times)
$ (?# end of string)
Initial:
I use a negative lookahead at the beginning of the string ((?!...)) to make sure that we don't start off with whitespace. Then we check for 0-19 alphanumeric (case-insensitive thanks to i modifier) or whitespace characters. Finally, we make sure we end with a pure alphanumeric character (no whitespace) since we can't use lookbehinds in Javascript.
/^(?!\s)[a-z0-9\s]{0,19}[a-z0-9]$/i
Hmm, if you need to exclude the single character text, I would go with:
^[A-z0-9][A-z0-9\s]{0,18}[A-z0-9]$
If a single character is also acceptable:
^[A-z0-9](?:[A-z0-9\s]{0,18}[A-z0-9])?$
I think your regex limits the input to 22 characters, not 20.
Are you aware that character range [A-z] includes characters [\]^_`?
I think I'd do something like this:
input = input.trim().replace(/\s+/, ' ');
if (input.length > MAX_INPUT_LENGTH ||
! /^[a-z ]+$/i.match(input) ) {
# raise exception?
}
\S matches a non-whitespace character. Therefore this should match what you're looking for:
^\S.{0,18}\S$
That is, a non-space character \S, followed by up to 18 of any type of character . (space or not), and finally a non-space character.
The only limitation of the above regex is that the value must be at least 2 characters. If you need to allow 1 character, you can use:
^\S(.{0,18}\S)?$
If you're looking to validate a user name (as you implied but didn't explicitly state) you're probably looking to allow only numbers, letters, and underscores. In that case, ^\w{1,20}$ will suffice.
use this pattern ^(?!\s).{0,20}(?<!\s)$
^(?!\s) start of line does not see a space
.{0,20} followed by 0 to 20 characters
(?<!\s)$ ends with a character that is not a space
Demo
or this pattern ^(\S.{0,18}\S)?$
Demo