Search validation regex not working in JavaScript

Search validation regex not working in JavaScript - javascript

I want to validate my textbox with a regex. The textbox's value should be like this – "/word/Search" – i.e. forward slash, then any word, then forward slash again, then the fixed word "Search".
Here is an example input:
/admin/Search
I tried the following pattern, but it's not matching:
[/\/][\w][/\/][\Search]
What is wrong in this pattern?

Try this instead:
\/\w+\/Search
\/ – no real need for a character class (i.e. opened by [ and closed by ]) to match a slash...when you can just match the slash (Note that \/ presumes you are using a regex literal rather than constructing a RegExp object: for the latter / would do the trick.)
\w+ – one or more word characters (similarly with no need for a character class)
\/ – the closing slash (same as at the outset)
Search – the fixed word Search that you expect following the closing slash (also with no need for a character class)
You can try the updated pattern against the sample textbox input you provided in a regex fiddle.
(Nitpicky) Caveat:
You may want to tighten up the word subpattern (i.e. \w+) because \w...
Matches any alphanumeric character including the underscore. Equivalent to [A-Za-z0-9_].
...according to MDN's JavaScript regular expressions reference. At least be conscious that it matches digits and underscore.
If you do want to tighten it up, then you have an occasion for a character class – for example:
[A-Za-z0-9]+ – like \w+ except that it will not match underscores
[A-Za-z]+ – like \w+ except that it will not match digits or underscores

Since everyone seems to ignore your actual question, what's wrong with your regex is this:
character classes [...]. This makes [\Search] match a single character out of \Search, not the literal word Search.
for some reason you wrote /\/ when you actually just want to match a single forward slash /.
You want to match multiple characters, so you have to change \w to \w+.
Result:
/\w+/Search

I think this is the expresion you are looking for:
\\[\w]*\\Search

Related

How to modify this hashtag regex to check if the second character is a-z or A-Z?

I'm building on a regular expression I found that works well for my use case. The purpose is to check for what I consider valid hashtags (I know there's a ton of hashtag regex posts on SO but this question is specific).
Here's the regex I'm using
/(^|\B)#(?![0-9_]+\b)([a-zA-Z0-9_]{1,20})(\b|\r)/g
The only problem I'm having is I can't figure out how to check if the second character is a-z (the first character would be the hashtag). I only want the first character after the hashtag to be a-z or A-Z. No numbers or non-alphanumeric.
Any help much appreciated, I'm very novice when it comes to regular expressions.

As I mentioned in the comments, you can replace [a-zA-Z0-9_]{1,20} with [a-zA-Z][a-zA-Z0-9_]{0,19} so that the first character is guaranteed to be a letter and then followed by 0 to 19 word characters (alphanumeric or underscore).
However, there are other unnecessary parts in your pattern. It appears that all you need is something like this:
/(?:^|\B)#[a-zA-Z][a-zA-Z0-9_]{0,19}\b/g
Demo.
Breakdown of (?:^|\B):
(?: # Start of a non-capturing group (don't use a capturing group unless needed).
^ # Beginning of the string/line.
| # Alternation (OR).
\B # The opposite of `\b`. In other words, it makes sure that
# the `#` is not preceded by a word character.
) # End of the non-capturing group.
Note: You may also replace [a-zA-Z0-9_] with \w.
References:
Word Boundaries.
Difference between \b and \B in regex.

The below should work.
(^|\B)#(?![0-9_]+\b)([a-zA-Z][a-zA-Z0-9_]{0,19})(\b|\r)
If you only want to accept two or more letter hashtags then change {0,19} with {1,19}.
You can test it here

In your pattern you use (?![0-9_]+\b) which asserts that what is directly on the right is not a digit or an underscore and can match a lot of other characters as well besides an upper or lower case a-z.
If you want you can use this part [a-zA-Z0-9_]{1,20} but then you have to use a positive lookahead instead (?=[a-zA-Z]) to assert what is directly to the right is an upper or lower case a-z.
(?:^|\B)#(?=[a-zA-Z])[a-zA-Z0-9_]{1,20}\b
Regex demo

RegEx matching help: won't match on each appearence

I need to write a little RegEx matcher which will match any occurrence of strings in the form of
[a-zA-Z]+(_[a-zA-Z0-9]+)?
If I use the regex above it does match the sections needed but would also match onto the abc part of 4_abc which is not intended. I tried to exclude it with:
(?:[^a-zA-Z0-9_]|^)([a-zA-Z]+(_[a-zA-Z0-9]+)?)(?:[^a-zA-Z0-9_]|$)
The problem is that the 'not' matches at the beginning and end are not really working like I hoped they would. If I use them on the example
a_d Dd_da 4_d d_4
they would block matching the second Dd_da because the space was used in the first match.Sadly I can't use lookarounds because I am using JS.
So the input:
a_d Dd_da 4_d d_4
should match: a_d, Dd_da and d_4
but matches: a_d (there is a space at the end)
Is there another way to match the needed sections, or to not consume the 'anchor' matches?
I really appreciate your help.

You can make use of \b:
\b[a-zA-Z]+(_[a-zA-Z0-9]+)?\b
\b matches the (zero-width) point where either the preceding character or following character is a letter, digit or underscore, but not both. It also matches with the start/end of the string if the first/last character is a letter, digit or underscore.

JavaScript regular expressions to match no digits, whitespace and selected symbols

Thanks for taking a look.
My goal is to come up with a regexp that will match input that contains no digits, whitespace or the symbols !#£$%^&*()+= or any other symbol I may choose.
I am however struggling to grasp precisely how regular expressions work.
I started out with the simple pattern /\D/, which from my understanding will match the first non-digit character it can find. This would match the string 'James' which is correct but also 'James1' which I don't want.
So, my understanding is that if I want to ensure that a pattern is not found anywhere in a given string, I use the ^ and $ characters, as in /^\D$/. Now because this will only match a single character that is not a digit, I needed to use + to specify that 1 or more digits should not be founds in the entire string, giving me the expression /^\D+$/. Brilliant, it no longer matches 'James1'.
Question 1
Is my reasoning up to this point correct?
The next requirement was to ensure no whitespace is in the given string. \s will match a single whitespace and [^\s] will match the first non-whitespace character. So, from my understanding I just had to add this to what I have already to match strings that contain no digits and no whitespace. Again, because [^\s] will only match a single non-white space character, I used + to match one or more whitespace characters, giving the new regexp of /^\D+[^\s]+$/.
This is where I got lost, as the expression now matches 'James1' or even 'James Smith25'. What? Massively confused at this point.
Question 2
Why is /^\D+[^\s]+$/ matching strings that contain spaces?
Question 3
How would I go about writing the regular expression I'm trying to solve?
While I am keen to solve the problem I am more interested in figuring where my understanding of regular expressions is lacking, so any explanations would be helpful.

Not quite; ^ and $ are actually "anchors" - they mean "start" and "end", it's actually a little more complicated, but you can consider them to mean the start and end of a line for now - look up the various modifiers on regular expressions if you're interested in learning more about this. Unfortunately ^ has an overloaded meaning; if used inside square brackets it means "not", which is the meaning you are already acquainted with. It's very important that you understand the difference between these two meanings and that the definition in your head actually applies only to character range matching!
Contributing further to your confusion is that \d means "a numerical digit" and \D means "not a numerical digit". Similarly \s means "a whitespace (space/tab/newline/etc.) character" and \S means "not a whitespace character."
It's worth noting that \d is effectively a shortcut for [0-9] (note that - has a special meaning inside square brackets), and \D is a shortcut for [^0-9].
The reason it's matching strings that contain spaces is that you've asked for "1+ non-numerical digits followed by 1+ non-space characters" - so it'll match lots of strings! I think that perhaps you don't understand that regular expressions match bits of strings, you're not adding constraints as you go, but rather building up bots of matchers that will match bits of corresponding strings.
/^[^\d\s!#£$%^&*()+=]+$/ is the answer you're looking for - I'd look at it like this:
i. [] - match a range of characters
ii. []+ - match one or more of that range of characters
iii. [^\d\s]+ - match one or more characters that do not match \d (numerical digit) or \s (whitespace)
iv. [^\d\s!#£$%^&*()+=]+ - here's a bunch of other characters I don't want you to match
v. ^[^\d\s!#£$%^&*()+=]+$ - now there are anchors applied, so this matcher has to apply to the whole line otherwise it fails to match
A useful website to explore regexs is http://regexr.com/3b9h7 - which I supply with my suggested solution as an example. Edit: Pruthvi Raj's link to debuggerx is awesome!

Is my reasoning up to this point correct?
Almost. /\D/ matches any character other than a digit, but not just the first one (if you use g option).
and [^\s] will match the first non-whitespace character
Almost, [^\s] will match any non-whitespace character, not just the first one (if you use g option).
/^\D+[^\s]+$/ matching strings that contain spaces?
Yes, it does, because \D matches a space (space is not a digit).
Why is /^\D+[^\s]+$/ matching strings that contain spaces?
Because \D+ in /^\D+[^\s]+$/can match spaces.
Conclusion:
Use
^[^\d\s!#£$%^&*()+=]+$
It will match strings that have no digits and spaces, and the symbols you do not allow.
Mind that to match a literal -, ] or [ with a character class, you either need to escape them, or use at the start or end of the expression. To play it safe, escape them.

Just insert every character you don't want to include in a negated character class as follows:
^[^\s\d!#£$%^&*()+=]*$
DEMO
Debuggex Demo
^ - start of the string
[^...] - matches one character that is not in `...`
\s - matches a whitespace (space, newline,tab)
\d - matches a digit from 0 to 9
* - a quantifier that repeats immediately preceeding element by 0 or more times
so the regex matches any string that has
1. string that has a beginning
2. containing 0 or more number of characters that is not whitesapce, digit, and all the symbols included in the character class ( In this example !#£$%^&*()+=) i.e., characters that are not included in the character class `[...]`
3.that has ending
NOTE:
If the symbols you don't want it to have also includes - , a hyphen, don't put it in between some other characters because it is a metacharacter in character class, put it at last of character class

RegEx works in tester not on site using JavaScript

I'm trying to write a RegEx that returns true if the string starts with / or http: and only allows alpha numeric characters, the dash and underscore. Any white space and any other special characters should fire a false response when tested.
Below works fine (except that it allows special characters, I have not figured out how to do that yet) when tested at https://www.regex101.com/#javascript. Unfortunately returns false when I implement it in my site and test it with /products/homedecor/tablecloths. What am I doing wrong and is there a better regEx to use that would accomplish my goals?
^(\\/|(?:http:))\S+[a-zA-Z0-9-_]+$

Keep unescaped hyphen at first or at last position in character class:
^(\/|(?:http:))[/.a-zA-Z0-9_-]+$
Or even simpler:
^(\/|http:)[/\w.-]+$
Since \w is same as [a-zA-Z0-9_]
To match URL you may need to match DOT and forward slash as well.

Just remove the \S+ from your regex and put the hyphen inside the character class at the first or at the last. Note that \S+ matches any non-space characters (including non-word characters).
^(\/|http:)[a-zA-Z0-9_-]+$

decoding a JS regular expression

I am going through some legacy code and I came across this regular express:
var REGEX_STRING_REGEXP = /^\/(.+)\/([a-z]*)$/;
I am slightly confused as to what this regular expression signifies.
I have so far concluded the following:
Begin with /
Then any character (numeric, alphabetic, symbols, spaces)
then a forward slash
End with alphabetic characters
Can someone advice?

You can use a tool like Regexper to visualise your regular expressions. If we pass your regular expression into Regexper, we'll be given the following visualisation:
Direct link to Regexper result.

regex: /^/(.+)/([a-z]*)$/
^ : anchor regex to start of line
(.+) : 1 or more instances of word characters, non-word characters, or digits
([a-z]*) : 0 or more instances of any single lowercase character a-z
$ : anchor regex to end of line
In summary, your regular expression is looking to match strings where it is the first forwardslash, then 1 or more instances of word characters, non-word characters, or digits followed, then another forwardslash, then 0 or more instances of any single lowercase character a-z. Lastly, since both (.+) and ([a-z]*) are surrounded in parenthesis, they will capture whatever matches when you use them to perform regular expression operations.
I would suggest going to rubular, placing the regex ^/(.+)/([a-z]*)$ in the top field and playing with example strings in the test string box to better understand what strings will fit within that regex. (/string/something for example will work with your regular expression).

We Keep Coding

JavaScript is the programming language of the Web.