Match all numbers in regex except for phrase foo2020

Match all numbers in regex except for phrase foo2020 - javascript

In a jumbled up string such as:
dfnjvqifoo2020o43e25w54p32n5qvto4325432543nvgn4325gn2020repw
I want to match all numbers from the string except the phrase foo2020
I have tried /(?<!foo)\d+/g
The problem is that the 020 in foo2020 gets matched when it's not supposed to, I know it's because \d means matching a single digit but I don't know how to get around this

Your regex, /(?<!foo)\d+/g, matches any one or more digits that are not immediately preceded with foo. That means, if a leftmost digit is not immediately preceded with foo, it will be matched with this regex.
All you need to do is add another restriction: do not start matching digits if there is a digit immediately on the left, i.e. add (?<!\d), a left-hand digit boundary construct.
/(?<!foo)(?<!\d)\d+/g
// ^^^^^^^
See the regex demo.
Note the two consecutive lookbehinds, (?<!foo)(?<!\d) are executed at the same position one after another, which means there will be no match if there is foo or a digit immediately to the left of the current location.
If you need to make sure there is no digit immediately on the right, append (?!\d), a right-hand digit boundary construct, after \d+ pattern. It is not required here, but if you need to match 50 in abc50def and not in abc500def, you would need this negative lookahead.

If you want the match only, you could add \d* to the lookbehind. Note that this is not widely supported by all browsers. I will work in for example chrome or nodejs.
(?<!foo\d*)\d+
Explanation
(?<!foo\d*) Negative lookbehind, assert what is on the left is not foo followed by 0+ digits
\d+ Match 1+ digits
Regex demo
Another option is to use a capturing group, match what you don't want and capture what you do want.
foo\d+|(\d+)
Regex demo

Related

Replacements only in the first line with a regex

There is a transform of multiline string.
!a! b!
should become
.a. b.
And
!a! b!
c!
!d!
should become
.a. b.
c!
!d!
I approached it with a lookbehind:
str(/(?<!\n)([^\n!]*)!+/g, '$1.')
It didn't work as intended:
.a. b.
c.
!d.
Splitting a string and transforming the first line seems straightforward. But is there a reliable way to do replacements only in the first line of multiline string with a regex only?
Also would appreciate an explanation what exactly goes wrong with my approach so it fails.
The question is not limited to JS regex flavour but I'm interested in this one in the first place.

About the pattern you tried:
(?<!\n) Negative lookbehind, assert what is directly to the left is not a newline or !
([^\n!]*) Capture group 1, match 0+ times any char except a newline or !
!+ Match 1+ times ! (What you want to remove)
The pattern will match too much, as it will match all the individual parts. There is for example no rule that says match this pattern 2 times, so you will replace with group 1 for every time that pattern has a match.
Note that the quantifier in this part is 0+ times ([^\n!]*) it will also match a single ! except when preceded by a newline.
If you can make use of SKIP FAIL, you can first match what you want to avoid, which in this case is a line that optionally starts with an exclamation mark and ends with an exclamation mark with none in between.
After that match all the other exclamation marks and replace them with a dot.
^!?[^\r\n!]*!$(*SKIP)(*FAIL)|!
See a regex demo
Another option could be using 2 capturing groups.
The first group will match between the first set of exclamation marks, and the second group will match the whitespaces after followed by a char other than !.
Then match the ! at the end so it is not in the replacement
!([^\s!]+)!([^\S\r\n]+[^\s!])!
See another regex demo
In the replacement use the 2 capturing groups with the dots
.$1.$2.

REGEX: \d not working

I have a text:
Wheels – F/R_ Schwalbe TABLE TOP/Schwalbe Black Jack 26x2.2
And regex to parse wheels size from that string:
/.*Wheels.*(\d*)x/
But it does not work. Besides, when i'm removing asterisk from regex, i'm getting number 6 as group match.

You need to make your .* before the digits lazy instead of greedy:
/.*Wheels.*?(\d*)x/
The .* will greedily consume everything up to the x, leaving nothing for the following \d*. Since * can validly match zero characters, an empty match for \d* is not an incorrect result.
By adding a ? to make a lazy .*? expression, it will try match as few characters as possible, allowing the following \d* to match all the numbers before the x.

You need to make your regex non-greedy because .* will consume your digits and \d* mean zero or no match
.*Wheels.*?(\d*)x
.*? mean match as many characters as few time as possible to stop .* to consume your digits
Follow this Demo for example
Alternately you can make it more efficient if there are no digits after Wheel and your desired values with following regex
.*Wheels[^\d]*(\d*)x
where [^\d]* mean matches anything except digits

JavaScript regular expressions to match no digits, whitespace and selected symbols

Thanks for taking a look.
My goal is to come up with a regexp that will match input that contains no digits, whitespace or the symbols !#£$%^&*()+= or any other symbol I may choose.
I am however struggling to grasp precisely how regular expressions work.
I started out with the simple pattern /\D/, which from my understanding will match the first non-digit character it can find. This would match the string 'James' which is correct but also 'James1' which I don't want.
So, my understanding is that if I want to ensure that a pattern is not found anywhere in a given string, I use the ^ and $ characters, as in /^\D$/. Now because this will only match a single character that is not a digit, I needed to use + to specify that 1 or more digits should not be founds in the entire string, giving me the expression /^\D+$/. Brilliant, it no longer matches 'James1'.
Question 1
Is my reasoning up to this point correct?
The next requirement was to ensure no whitespace is in the given string. \s will match a single whitespace and [^\s] will match the first non-whitespace character. So, from my understanding I just had to add this to what I have already to match strings that contain no digits and no whitespace. Again, because [^\s] will only match a single non-white space character, I used + to match one or more whitespace characters, giving the new regexp of /^\D+[^\s]+$/.
This is where I got lost, as the expression now matches 'James1' or even 'James Smith25'. What? Massively confused at this point.
Question 2
Why is /^\D+[^\s]+$/ matching strings that contain spaces?
Question 3
How would I go about writing the regular expression I'm trying to solve?
While I am keen to solve the problem I am more interested in figuring where my understanding of regular expressions is lacking, so any explanations would be helpful.

Not quite; ^ and $ are actually "anchors" - they mean "start" and "end", it's actually a little more complicated, but you can consider them to mean the start and end of a line for now - look up the various modifiers on regular expressions if you're interested in learning more about this. Unfortunately ^ has an overloaded meaning; if used inside square brackets it means "not", which is the meaning you are already acquainted with. It's very important that you understand the difference between these two meanings and that the definition in your head actually applies only to character range matching!
Contributing further to your confusion is that \d means "a numerical digit" and \D means "not a numerical digit". Similarly \s means "a whitespace (space/tab/newline/etc.) character" and \S means "not a whitespace character."
It's worth noting that \d is effectively a shortcut for [0-9] (note that - has a special meaning inside square brackets), and \D is a shortcut for [^0-9].
The reason it's matching strings that contain spaces is that you've asked for "1+ non-numerical digits followed by 1+ non-space characters" - so it'll match lots of strings! I think that perhaps you don't understand that regular expressions match bits of strings, you're not adding constraints as you go, but rather building up bots of matchers that will match bits of corresponding strings.
/^[^\d\s!#£$%^&*()+=]+$/ is the answer you're looking for - I'd look at it like this:
i. [] - match a range of characters
ii. []+ - match one or more of that range of characters
iii. [^\d\s]+ - match one or more characters that do not match \d (numerical digit) or \s (whitespace)
iv. [^\d\s!#£$%^&*()+=]+ - here's a bunch of other characters I don't want you to match
v. ^[^\d\s!#£$%^&*()+=]+$ - now there are anchors applied, so this matcher has to apply to the whole line otherwise it fails to match
A useful website to explore regexs is http://regexr.com/3b9h7 - which I supply with my suggested solution as an example. Edit: Pruthvi Raj's link to debuggerx is awesome!

Is my reasoning up to this point correct?
Almost. /\D/ matches any character other than a digit, but not just the first one (if you use g option).
and [^\s] will match the first non-whitespace character
Almost, [^\s] will match any non-whitespace character, not just the first one (if you use g option).
/^\D+[^\s]+$/ matching strings that contain spaces?
Yes, it does, because \D matches a space (space is not a digit).
Why is /^\D+[^\s]+$/ matching strings that contain spaces?
Because \D+ in /^\D+[^\s]+$/can match spaces.
Conclusion:
Use
^[^\d\s!#£$%^&*()+=]+$
It will match strings that have no digits and spaces, and the symbols you do not allow.
Mind that to match a literal -, ] or [ with a character class, you either need to escape them, or use at the start or end of the expression. To play it safe, escape them.

Just insert every character you don't want to include in a negated character class as follows:
^[^\s\d!#£$%^&*()+=]*$
DEMO
Debuggex Demo
^ - start of the string
[^...] - matches one character that is not in `...`
\s - matches a whitespace (space, newline,tab)
\d - matches a digit from 0 to 9
* - a quantifier that repeats immediately preceeding element by 0 or more times
so the regex matches any string that has
1. string that has a beginning
2. containing 0 or more number of characters that is not whitesapce, digit, and all the symbols included in the character class ( In this example !#£$%^&*()+=) i.e., characters that are not included in the character class `[...]`
3.that has ending
NOTE:
If the symbols you don't want it to have also includes - , a hyphen, don't put it in between some other characters because it is a metacharacter in character class, put it at last of character class

Replace function does only replace every second regex match

I would like to use regex in javascript to put a zero before every number that has exactly one digit.
When i debug the code in the chrome debugger it gives me a strange result where only every second match the zero is put.
My regex
"3-3-7-3-9-8-10-5".replace(/(\-|^)(\d)(\-|$)/g, "$10$2$3");
And the result i get from this
"03-3-07-3-09-8-10-05"
Thanks for the help

Use word boundaries,
(\b\d\b)
Replacement string:
0$1
DEMO
> "3-3-7-3-9-8-10-5".replace(/(\b\d\b)/g, "0$1")
'03-03-07-03-09-08-10-05'
Explanation:
( starting point of first Capturing group.
\b Matches between a word character and a non word character.
\d Matches a single digit.
\b Matches between a word character and a non word character.
) End of first Capturing group.

You can use this better lookahead based regex to prefix 0 before every single digit number:
"3-3-7-3-9-8-10-5".replace(/\b(\d)\b(?=-|$)/g, "0$1");
//=> "03-03-07-03-09-08-10-05"
Reason why you're getting alternate prefixes in your regex:
"3-3-7-3-9-8-10-5".replace(/(\-|^)(\d)(\-|$)/g, "$10$2$3");
is that rather than looking ahead you're actually matching hyphen after the digit. Once a hyphen has been matched it is not matched again since internal regex pointer has already moved ahead.

use a positive lookahead to see the one digit numbers :
"3-3-7-3-9-8-10-5".replace(/(?=\b\d\b)/g, "0");

/(\S)\1(\1)+/g matching all occurrences of three equal non-whitespace characters following each other

Its given: /(\S)\1(\1)+/g matches all occurrences of three equal non-whitespace characters following each other.
I don't understand why there is () around (\S) and 2nd (\1), but not around 1st (\1). Can anyone help in explaining how above regex works?
src: http://www.javascriptkit.com/javatutors/redev2.shtml
Thnx in advance.

The \S needs parentheses to capture its value, so you can refer back to the captured value with \1. \1 means "match the same text which capturing group #1 matched".
I believe there is a problem with this regex. You said you want to match "three equal non-whitespace characters". But the + will make this match 3 or more equal, consecutive non-whitespace characters.
The g on the end means "apply this regex over the entire input string, or globally".

The second set of parentheses is not necessary. It needlessly captures the repeated character a second time, while matching the same strings as this regex:
/(\S)\1\1+/g
Also, as #AlexD pointed out, the description should say that it matches at least three characters. If you replaced that regex with BONK in the string fooxxxxxxbar:
'fooxxxxxxbar'.replace(/(\S)\1\1+/g, 'BONK')
..you might expect the result to be fooBONKBONKbar from their description, because there are two sets of three 'x's. But in fact the result would be fooBONKbar; the first \1 matches the second 'x', and the \1+ matches the third 'x' and any 'x's that follow it. If they wanted to match just three characters, they should have left the + off.
I noticed several other sloppy descriptions like that, plus at least one outright error: \B is equivalent to (?!\b) (a position that's not a word boundary), not [^\b] (a character that's not a backspace). For that matter, their description of word boundaries--"the position between a word and a space"--is wrong, too. A word boundary isn't defined by any particular character, like a space--in fact, it can just as well be the absence of any character that creates one. The string:
Word
...starts with a word boundary because 'W' is a word character and, being first, it's not preceded by another word character. Similarly, the 'd' is not followed by another word character, so the end of the string is also a word boundary.
Also, a regex doesn't know from words, only word characters. The definition of a word character can vary depending on the regex flavor and Unicode or locale settings, but it always includes [A-Za-z0-9_] (ASCII letters and digits plus the underscore). A word boundary is simply a position that's between one of those characters and any other character (or no other character, as I explained earlier).
If you want to learn about regexes, I suggest you forget that site and start here instead: regular-expressions.info.

We Keep Coding

JavaScript is the programming language of the Web.