Simple regex with repeated unordered matches

Simple regex with repeated unordered matches - javascript

I have this regex
/^[a-z]{1,}( (?=[a-z])){0,}(_(?=[a-z])){0,}[a-z]{0,}$/
I want to match
ag_b_cf_ajk
or
zva b c de
or
hh_b opxop a_b
so any character tokens separated by a single space or underscore.
(In the regex above, we have a literal space, which is legal, and we have look-aheads that ensure that a space or underscore is followed by a character).
The problem is, my above regex is only matching the first space or underscore, like so:
axz_be
axz be
but these fail
axz_be_j
axz be j
I believe I missing some concept with regexes in order to solve this as I have been trying for the last few hours!

It seems you can just use
^[a-z]+(?:[_ ][a-z]+)*$
See the regex demo
The regex matches
^ - start of string
[a-z]+ - one or more lowercase ASCII letters
(?:[_ ][a-z]+)* - zero or more sequences of:
[_ ] - a space or an underscore
[a-z]+ - one or more lowercase ASCII letters
$ - end of string
If the space or underscore must appear at least once, use the + quantifier instead of *:
^[a-z]+(?:[_ ][a-z]+)+$
^
To add a multicharacter alternative to the underscore and hyphen, you need to introduce another non-capturing group:
^[a-z]+(?:(?:[_ ]|\[])[a-z]+)+$
See another regex demo

Related

Issue with javascript regex not matching less than 3 characters

I have the following javascript regex:
/^[^\s][a-z0-9 ]+[^\s]$/i
I need to allow any alphanumeric character as well as spaces inside the string but not at the beginning nor at the end.
Oddly enough, the above regex will not accept less than 3 characters, e.g. aa will not match but aaa will.
I am not sure why. Can anyone please help ?

You have: [^\s] (requires matching at least one non-whitespace character), [a-z0-9 ]+ (requires matching at least one alphanumeric or space character), and [^\s] again (requires matching at least one non-whitespace character). So, in total, you need at least 3 characters in the string.
Use word boundaries at the beginning and end instead:
/^\b[a-z0-9 ]+\b$/i
https://regex101.com/r/2GhH3N/1

Try the following regex:
^(?! )[a-z0-9 ]*[a-z0-9]$
Details:
^(?! ) - Start of the string and no space after it (so here we exclude the
initial space).
[a-z0-9 ]* - A sequence of letters, digits and spaces, possibly empty
(the content before the last letter(see below).
[a-z0-9]$ - The last letter and the end of string (so here we exclude the
terminal space).

You should re-write the expression as
/^[a-z0-9]+(?:\s+[a-z0-9]+)*$/i
See the regex demo.
NOTE: If only one whitespace is allowed between the alphanumeric chars use
/^[a-z0-9]+(?:\s[a-z0-9]+)*$/i
^^
Details
^ - start of string
[a-z0-9]+ - 1+ letters/digits
(?:\s+[a-z0-9]+)* - 0 or more repetitions of 1+ whitespaces (\s+) and 1+ digit/letters
$ - end of string.
See the regex graph:

RegEx pattern to not allow special character except underscore

I have a special requirement, where i need the achieve the following
No Special Character is allowed except _ in between string.
string should not start or end with _, . and numeric value.
underscore should not be allowed before or after any numeric value.
I am able to achieve most of it, but my RegEx pattern is also allowing other special characters.
How can i modify the below RegEx pattern to not allow any special character apart from underscore that to in between strings.
^[^0-9._]*[a-zA-Z0-9_]*[^0-9._]$

What you might do is use negative lookaheads to assert your requirements:
^(?![0-9._])(?!.*[0-9._]$)(?!.*\d_)(?!.*_\d)[a-zA-Z0-9_]+$
Explanation
^ Assert the start of the string
(?![0-9._]) Negative lookahead to assert that the string does not start with [0-9._]
(?!.*[0-9._]$) Negative lookahead to assert that the string does not end with [0-9._]
(?!.*\d_) Negative lookahead to assert that the string does not contain a digit followed by an underscore
(?!.*_\d) Negative lookahead to assert that the string does not contain an underscore followed by a digit
[a-zA-Z0-9_]+ Match what is specified in the character class one or more times. You can add to the character class what you would allow to match, for example also add a .
$ Assert the end of the string
Regex demo

Keep it simple. Only allow underscore and alphanumeric regex:
/^[a-zA-Z0-9_]+$/
Javascript es6 implementation (works for React):
const re = /^[a-zA-Z0-9_]+$/;
re.test(variable_to_test);

Your opening and closing sections; [^0-9._], say match ANY character other than those.
So you need to change it to be what you can match.
/^[A-Z][A-Z0-9_]*[A-Z]$/i
And since you now said one character is valid:
/^[A-Z]([A-Z0-9_]*[A-Z])?$/i

Regex to allow alphanumeric, spaces, some special characters

I have this ^[a-zA-Z0-9 #&$]*$, but not working for me in few cases.
If someone types
A string that only consists of digits (e.g. 1234567)
A string starting with a special character (e.g. &123abc)
need to be rejected. Note that a special char can be in the middle and at the end.

You seem to need to avoid matching strings that only consist of digits and make sure the strings start with an alphanumeric. I assume you also need to be able to match empty strings (the original regex matches empty strings).
That is why I suggest
^(?!\d+$)(?:[a-zA-Z0-9][a-zA-Z0-9 #&$]*)?$
See the regex demo
Details
^ - start of string
(?!\d+$) - the negative lookahead that fails the match if a string is numeric only
(?:[a-zA-Z0-9][a-zA-Z0-9 #&$]*)? - an optional sequence of:
[a-zA-Z0-9] - a digit or a letter
[a-zA-Z0-9 #&$]* - 0+ digits, letters, spaces, #, & or $ chars
$ - end of string.

you can do it with the following regex
^(?!\d+$)\w+\S+
check the demo here

JavaScript regular expressions to match no digits, whitespace and selected symbols

Thanks for taking a look.
My goal is to come up with a regexp that will match input that contains no digits, whitespace or the symbols !#£$%^&*()+= or any other symbol I may choose.
I am however struggling to grasp precisely how regular expressions work.
I started out with the simple pattern /\D/, which from my understanding will match the first non-digit character it can find. This would match the string 'James' which is correct but also 'James1' which I don't want.
So, my understanding is that if I want to ensure that a pattern is not found anywhere in a given string, I use the ^ and $ characters, as in /^\D$/. Now because this will only match a single character that is not a digit, I needed to use + to specify that 1 or more digits should not be founds in the entire string, giving me the expression /^\D+$/. Brilliant, it no longer matches 'James1'.
Question 1
Is my reasoning up to this point correct?
The next requirement was to ensure no whitespace is in the given string. \s will match a single whitespace and [^\s] will match the first non-whitespace character. So, from my understanding I just had to add this to what I have already to match strings that contain no digits and no whitespace. Again, because [^\s] will only match a single non-white space character, I used + to match one or more whitespace characters, giving the new regexp of /^\D+[^\s]+$/.
This is where I got lost, as the expression now matches 'James1' or even 'James Smith25'. What? Massively confused at this point.
Question 2
Why is /^\D+[^\s]+$/ matching strings that contain spaces?
Question 3
How would I go about writing the regular expression I'm trying to solve?
While I am keen to solve the problem I am more interested in figuring where my understanding of regular expressions is lacking, so any explanations would be helpful.

Not quite; ^ and $ are actually "anchors" - they mean "start" and "end", it's actually a little more complicated, but you can consider them to mean the start and end of a line for now - look up the various modifiers on regular expressions if you're interested in learning more about this. Unfortunately ^ has an overloaded meaning; if used inside square brackets it means "not", which is the meaning you are already acquainted with. It's very important that you understand the difference between these two meanings and that the definition in your head actually applies only to character range matching!
Contributing further to your confusion is that \d means "a numerical digit" and \D means "not a numerical digit". Similarly \s means "a whitespace (space/tab/newline/etc.) character" and \S means "not a whitespace character."
It's worth noting that \d is effectively a shortcut for [0-9] (note that - has a special meaning inside square brackets), and \D is a shortcut for [^0-9].
The reason it's matching strings that contain spaces is that you've asked for "1+ non-numerical digits followed by 1+ non-space characters" - so it'll match lots of strings! I think that perhaps you don't understand that regular expressions match bits of strings, you're not adding constraints as you go, but rather building up bots of matchers that will match bits of corresponding strings.
/^[^\d\s!#£$%^&*()+=]+$/ is the answer you're looking for - I'd look at it like this:
i. [] - match a range of characters
ii. []+ - match one or more of that range of characters
iii. [^\d\s]+ - match one or more characters that do not match \d (numerical digit) or \s (whitespace)
iv. [^\d\s!#£$%^&*()+=]+ - here's a bunch of other characters I don't want you to match
v. ^[^\d\s!#£$%^&*()+=]+$ - now there are anchors applied, so this matcher has to apply to the whole line otherwise it fails to match
A useful website to explore regexs is http://regexr.com/3b9h7 - which I supply with my suggested solution as an example. Edit: Pruthvi Raj's link to debuggerx is awesome!

Is my reasoning up to this point correct?
Almost. /\D/ matches any character other than a digit, but not just the first one (if you use g option).
and [^\s] will match the first non-whitespace character
Almost, [^\s] will match any non-whitespace character, not just the first one (if you use g option).
/^\D+[^\s]+$/ matching strings that contain spaces?
Yes, it does, because \D matches a space (space is not a digit).
Why is /^\D+[^\s]+$/ matching strings that contain spaces?
Because \D+ in /^\D+[^\s]+$/can match spaces.
Conclusion:
Use
^[^\d\s!#£$%^&*()+=]+$
It will match strings that have no digits and spaces, and the symbols you do not allow.
Mind that to match a literal -, ] or [ with a character class, you either need to escape them, or use at the start or end of the expression. To play it safe, escape them.

Just insert every character you don't want to include in a negated character class as follows:
^[^\s\d!#£$%^&*()+=]*$
DEMO
Debuggex Demo
^ - start of the string
[^...] - matches one character that is not in `...`
\s - matches a whitespace (space, newline,tab)
\d - matches a digit from 0 to 9
* - a quantifier that repeats immediately preceeding element by 0 or more times
so the regex matches any string that has
1. string that has a beginning
2. containing 0 or more number of characters that is not whitesapce, digit, and all the symbols included in the character class ( In this example !#£$%^&*()+=) i.e., characters that are not included in the character class `[...]`
3.that has ending
NOTE:
If the symbols you don't want it to have also includes - , a hyphen, don't put it in between some other characters because it is a metacharacter in character class, put it at last of character class

Regex to match '-' delimited alphanumeric words

I would like to test if user type only alphanumeric value or one "-".
hello-world -> Match
hello-first-world -> match
this-is-my-super-world -> match
hello--world -> NO MATCH
hello-world-------this-is -> NO MATCH
-hello-world -> NO MATCH (leading dash)
hello-world- -> NO MATCH (trailing dash)
Here is what I have so far, but I dont know how to implement the "-" sign to test it if it is only once without repeating.
var regExp = /^[A-Za-z0-9-]+$/;

Try this:
/^[A-Za-z0-9]+(?:-[A-Za-z0-9]+)*$/
This will only match sequences of one or more sequences of alphanumeric characters separated by a single -. If you do not want to allow single words (e.g. just hello), replace the * multiplier with + to allow only one or more repetitions of the last group.

Here you go (this works).
var regExp = /^[A-Za-z0-9]+([-]{1}[A-Za-z0-9]+)+$/;
letters and numbers greedy, single dash, repeat this combination, end with letters and numbers.

(^-)|-{2,}|[^a-zA-Z-]|(-$) looks for invalid characters, so zero matches to that pattern would satisfy your requirement.

I'm not entirely sure if this works because I haven't done regex in awhile, but it sounds like you need the following:
/^[A-Za-z0-9]+(-[A-Za-z0-9]+)+$/
You're requirement is split up in the following:
One or more alphanumeric characters to start (that way you ALWAYS have an alphanumeric starting.
The second half entails a "-" followed by one or more alphanumeric characters (but this is optional, so the entire thing is required 0 or more times). That way you'll have 0 or more instances of the dash followed by 1+ alphanumeric.
I'm just not sure if I did the regex properly to follow that format.

The expression can be simplified to: /^[^\W_]+(?:-[^\W_]+)+$/
Explanation:
^ match the start of string
[^\W_]+ match one or more word(a-zA-Z0-9) chars
(?:-[^\W_]+)+ match one or more group of '-' follwed by word chars
$ match the end of string
Test: https://regex101.com/r/MODQxw/1

We Keep Coding

JavaScript is the programming language of the Web.