RegExp extract specific string followed by any number with leading / trailing whitespace - javascript

I want to extract a string from another using JavaScript / RegExp.
Here is what I got:
var string = "wp-button wp-image-45 wp-label";
string.match(/(?:(?:.*)?\s+)?(wp-image-([0-9]+))(:?\s(?:.*)?)?/);
// returnes: ["wp-button ", "wp-image-45", "45", undefined]
I just want to have "wp-image-45", so:
(Optional) any character
(Optional) followed by whitespace
(Required) followed by "wp-image-"
(Required) followed by any number
(Optional) followed by whitespacy
(Optional) followed by any character
What is missing here? Is it just some kind of bracketing or more?
I also tried
string.match(/(?:(?:.*)?\s+)?(?=(wp-image-([0-9]+)))(?=(:?\s(?:.*)?)?)/)
Edit: In the end I just want to have the number. But I'd also make this step in between.

Regexps are not required to start matching at the beginning of the string, so your attempts to match whitespace and any character aren't necessary. Also, "any character" includes whitespace (except newlines in certain modes).
This should be all you need:
string.match(/\bwp-image-(\d+)\b/)
This will capture, for example, "wp-image-123" into matching group 0, and "123" into matching group 1.
\b means "word boundary", which ensures that you won't match "abcwp-image-123def". A word boundary is defined as any place where a non-word character is followed by a word character, or vice versa. A word character is consists of a letter, a number or an underscore.
Also, I used \d instead of [0-9] simply out of convenience. They have slightly different meaning (\d also matches characters considered numbers in other languages), but that won't make a difference in your case.

If all of that surrounding stuff is optional and all you want is the number then there's no point to matching for any of that stuff except for that "wp-image-" prefix, just do:
var string = "wp-button wp-image-45 wp-label";
string.match(/wp-image-([0-9]+)/);

Related

Write a regex for usernames

I want a Regex for my mongoose schema to test if a username contains only letters, numbers and underscore, dash or dot. What I got so far is
/[a-zA-Z0-9-_.]/
but somehow it lets pass everything.
Your regex is set to match a string if it contains ANY of the contained characters, but it doesn't make sure that the string is composed entirely of those characters.
For example, /[a-zA-Z0-9-_.]/.test("a&") returns true, because the string contains the letter a, regardless of the fact that it also includes &.
To make sure all characters are one of your desired characters, use a regex that matches the beginning of the string ^, then your desired characters followed by a quantifier + (a plus means one or more of the previous set, a * would mean zero or more), then end of string $. So:
const reg = /^[a-zA-Z0-9-_.]+$/
console.log(reg.test("")) // false
console.log(reg.test("I-am_valid.")) // true
console.log(reg.test("I-am_not&")) // false
Try like this with start(^) and end($),
^[a-zA-Z0-9-_.]+$
See demo : https://regex101.com/r/6v0nNT/3
/^([a-zA-Z0-9]|[-_\.])*$/
This regex should work.
^ matches at the beginning of the string. $ matches at the end of the string. This means it checks for the entire string.
The * allows it to match any number of characters or sequences of characters. This is required to match the entire password.
Now the parentheses are required for this as there is a | (or) used here. The first stretch was something you already included, and it is for capital/lowercase letters, and numbers. The second area of brackets are used for the other characters. The . must be escaped with a backslash, as it is a reserved character in regex, used for denoting that something can be any character.

Regex to allow alphanumeric, spaces, some special characters

I have this ^[a-zA-Z0-9 #&$]*$, but not working for me in few cases.
If someone types
A string that only consists of digits (e.g. 1234567)
A string starting with a special character (e.g. &123abc)
need to be rejected. Note that a special char can be in the middle and at the end.
You seem to need to avoid matching strings that only consist of digits and make sure the strings start with an alphanumeric. I assume you also need to be able to match empty strings (the original regex matches empty strings).
That is why I suggest
^(?!\d+$)(?:[a-zA-Z0-9][a-zA-Z0-9 #&$]*)?$
See the regex demo
Details
^ - start of string
(?!\d+$) - the negative lookahead that fails the match if a string is numeric only
(?:[a-zA-Z0-9][a-zA-Z0-9 #&$]*)? - an optional sequence of:
[a-zA-Z0-9] - a digit or a letter
[a-zA-Z0-9 #&$]* - 0+ digits, letters, spaces, #, & or $ chars
$ - end of string.
you can do it with the following regex
^(?!\d+$)\w+\S+
check the demo here

Regex: How do I remove the character BEFORE the matched string?

I am intercepting messages which contain the following characters:
*_-
However, whenever any one of these characters comes through, it will always be preceded by a \. The \ is just for formatting though and I want to remove it before sending it off to my server. I know how to easily create a regex which would remove this backslash from a single letter:
'omg\_bbq\_everywhere'.replace(/\\_/g, '')
And I recognize I could just do this operation 3 times: once for each character I want to remove the preceding backslash for. But how can I create a single regex which would detect all three characters and remove the preceding backslash in all 3 cases?
You can use a character class like [*_-].
To remove only the backslash before these characters:
document.body.innerHTML =
"omg\\-bbq\\*everywhere\\-".replace(/\\([*_-])/g, '$1');
When you place a subpattern into a capturing group ((...)), you capture that subtext into a numbered buffer, and then you can reference it with a $1 backreference (1 because there is only one (...) in the pattern.)
This is a good time to use atomic matching. Specifically you want to check for the slash and then positive lookahead for any of those characters.
Ignoring the code, the raw regex you want is:
\\(?=[*_-])
A literal backslash, with one of these characters in front of it: *_-
So now you are matching the slash. The atomic match is a 0 length match, so it doesn't match anything, but sets a requirement that "for this to be a valid match, it needs to be followed by [*_-]"
Atomic groups: http://www.regular-expressions.info/atomic.html
Lookaround statements: http://www.regular-expressions.info/lookaround.html
Positive and negative lookahead and lookbehind matches are available.

JavaScript regular expressions to match no digits, whitespace and selected symbols

Thanks for taking a look.
My goal is to come up with a regexp that will match input that contains no digits, whitespace or the symbols !#£$%^&*()+= or any other symbol I may choose.
I am however struggling to grasp precisely how regular expressions work.
I started out with the simple pattern /\D/, which from my understanding will match the first non-digit character it can find. This would match the string 'James' which is correct but also 'James1' which I don't want.
So, my understanding is that if I want to ensure that a pattern is not found anywhere in a given string, I use the ^ and $ characters, as in /^\D$/. Now because this will only match a single character that is not a digit, I needed to use + to specify that 1 or more digits should not be founds in the entire string, giving me the expression /^\D+$/. Brilliant, it no longer matches 'James1'.
Question 1
Is my reasoning up to this point correct?
The next requirement was to ensure no whitespace is in the given string. \s will match a single whitespace and [^\s] will match the first non-whitespace character. So, from my understanding I just had to add this to what I have already to match strings that contain no digits and no whitespace. Again, because [^\s] will only match a single non-white space character, I used + to match one or more whitespace characters, giving the new regexp of /^\D+[^\s]+$/.
This is where I got lost, as the expression now matches 'James1' or even 'James Smith25'. What? Massively confused at this point.
Question 2
Why is /^\D+[^\s]+$/ matching strings that contain spaces?
Question 3
How would I go about writing the regular expression I'm trying to solve?
While I am keen to solve the problem I am more interested in figuring where my understanding of regular expressions is lacking, so any explanations would be helpful.
Not quite; ^ and $ are actually "anchors" - they mean "start" and "end", it's actually a little more complicated, but you can consider them to mean the start and end of a line for now - look up the various modifiers on regular expressions if you're interested in learning more about this. Unfortunately ^ has an overloaded meaning; if used inside square brackets it means "not", which is the meaning you are already acquainted with. It's very important that you understand the difference between these two meanings and that the definition in your head actually applies only to character range matching!
Contributing further to your confusion is that \d means "a numerical digit" and \D means "not a numerical digit". Similarly \s means "a whitespace (space/tab/newline/etc.) character" and \S means "not a whitespace character."
It's worth noting that \d is effectively a shortcut for [0-9] (note that - has a special meaning inside square brackets), and \D is a shortcut for [^0-9].
The reason it's matching strings that contain spaces is that you've asked for "1+ non-numerical digits followed by 1+ non-space characters" - so it'll match lots of strings! I think that perhaps you don't understand that regular expressions match bits of strings, you're not adding constraints as you go, but rather building up bots of matchers that will match bits of corresponding strings.
/^[^\d\s!#£$%^&*()+=]+$/ is the answer you're looking for - I'd look at it like this:
i. [] - match a range of characters
ii. []+ - match one or more of that range of characters
iii. [^\d\s]+ - match one or more characters that do not match \d (numerical digit) or \s (whitespace)
iv. [^\d\s!#£$%^&*()+=]+ - here's a bunch of other characters I don't want you to match
v. ^[^\d\s!#£$%^&*()+=]+$ - now there are anchors applied, so this matcher has to apply to the whole line otherwise it fails to match
A useful website to explore regexs is http://regexr.com/3b9h7 - which I supply with my suggested solution as an example. Edit: Pruthvi Raj's link to debuggerx is awesome!
Is my reasoning up to this point correct?
Almost. /\D/ matches any character other than a digit, but not just the first one (if you use g option).
and [^\s] will match the first non-whitespace character
Almost, [^\s] will match any non-whitespace character, not just the first one (if you use g option).
/^\D+[^\s]+$/ matching strings that contain spaces?
Yes, it does, because \D matches a space (space is not a digit).
Why is /^\D+[^\s]+$/ matching strings that contain spaces?
Because \D+ in /^\D+[^\s]+$/can match spaces.
Conclusion:
Use
^[^\d\s!#£$%^&*()+=]+$
It will match strings that have no digits and spaces, and the symbols you do not allow.
Mind that to match a literal -, ] or [ with a character class, you either need to escape them, or use at the start or end of the expression. To play it safe, escape them.
Just insert every character you don't want to include in a negated character class as follows:
^[^\s\d!#£$%^&*()+=]*$
DEMO
Debuggex Demo
^ - start of the string
[^...] - matches one character that is not in `...`
\s - matches a whitespace (space, newline,tab)
\d - matches a digit from 0 to 9
* - a quantifier that repeats immediately preceeding element by 0 or more times
so the regex matches any string that has
1. string that has a beginning
2. containing 0 or more number of characters that is not whitesapce, digit, and all the symbols included in the character class ( In this example !#£$%^&*()+=) i.e., characters that are not included in the character class `[...]`
3.that has ending
NOTE:
If the symbols you don't want it to have also includes - , a hyphen, don't put it in between some other characters because it is a metacharacter in character class, put it at last of character class

Regex: string up to 20char long, without specific characters

I am trying to make regexp for validating string not containing
^ ; , & . < > | and having 1-20 characters. Any other Unicode characters are valid (asian letters for example).
How to do it?
You can use the following:
^[^^;,&.<>|]{1,20}$
Explanation:
^ assert starting of the string
[^ start of negated character class ([^ ])
^;,&.<>| all the characters you dont want to match
] close the negates character class
{1,20} range of matches
$ assert ending of the string
It will match any character other than specified characters within range of 1-20.
Your regex \w[^;,&.<>|]{1,20} contains \w that might not match all Unicode letters (I guess your regex flavor does not match Unicode letters with \w). Anyway, the \w only matches 1 character in your pattern.
Also, you say you need to exclude ^ but it is missing in your pattern.
When you want to validate length, you also must use ^/$ anchors to mark the beginning and end of a string.
To create a pattern for some range that does not match specific characters, you need a negated character class with anchors around it, and the length is set with limiting quantifiers:
^[^^;,&.<>|]{1,20}$
Or (this version makes sure we only match at the beginning and end of the string, never a line):
\A[^^;,&.<>|]{1,20}\z
Note that inside a character class, almost all special characters do not require escaping (only some of them, none in your case). Even the ^ caret symbol.
See demo

Categories