Regex to match multiple white spaces until one character - javascript

I'm trying to create a regex that will detect markdown quoting blocks.
Here is my regex so far:
(?:^|\n)[ \t]*(>[ \t]*\S(?:(?!\n(\s*\n)+[^>])[\s\S])*)
(?:^|\n) beginning of string, or a line break
[ \t]* optional spaces
>[ \t]*\S `>` followed by optional spaces and at least one character
( ... [\s\S])* capture any following character\white-space multiple times
(?!\n(\s*\n)+[^>]) stop capture if next following characters are
at least 2 line breaks mixed with other white-spaces
followed by anything but `>`
Everything works fine, except for the negative lookahead: the capture stops if more than 2 line breaks are encountered.
Regex101 shows me 4 matches when I want 3. Any pointers?

I believe your issue is here:
(?!\n(\s*\n)+[^>])
The [^>] is matching a \n. Change it to this:
(?!\n(\s*\n)+[^>\s])
http://regex101.com/r/mT4wC4

Related

Regex max string length between commas but ignores whitespace

I am trying to set the max string length between commas to 4.
I am using ^([^,?]{0,4},)*[^,?]{0,4}$, which works fine. However if the user adds a space before the word, the current code counts that whitespace.
Example: 'this','will','be','fine'. <-- this works.
'this',' will','not','work' <-- this does Not work. Notice the whitespace before the ' will'. How do I modify my regex to not count this whitespace?
You can use
Validators.pattern('\\s*[^\\s,?]{0,4}(?:\\s*,\\s*[^\\s,?]{0,4})*\\s*')
Validators.pattern(/^\s*[^\s,?]{0,4}(?:\s*,\s*[^\s,?]{0,4})*\s*$/)
See the regex demo. Adjust the pattern by removing \s* anywhere you see fit.
Whenever you see a regex matches in the regex101.com tester and does not work in your code, always check the Code generator page link. See the Regex not working in Angular Validators.pattern() while working in online regex testers and Regular expression works on regex101.com, but not on prod.
Details:
^ - start of string
\s* - zero or more whitespacs
[^\s,?]{0,4} - zero to four chars other than whitespace, comma and a question mark
(?:\s*,\s*[^\s,?]{0,4})* - zero or more sequences of a comma enclosed with zero or more whitespaces followed with zero to four chars other than whitespace, comma and a question mark
\s* - zero or more whitespaces
$ - end of string

Replacements only in the first line with a regex

There is a transform of multiline string.
!a! b!
should become
.a. b.
And
!a! b!
c!
!d!
should become
.a. b.
c!
!d!
I approached it with a lookbehind:
str(/(?<!\n)([^\n!]*)!+/g, '$1.')
It didn't work as intended:
.a. b.
c.
!d.
Splitting a string and transforming the first line seems straightforward. But is there a reliable way to do replacements only in the first line of multiline string with a regex only?
Also would appreciate an explanation what exactly goes wrong with my approach so it fails.
The question is not limited to JS regex flavour but I'm interested in this one in the first place.
About the pattern you tried:
(?<!\n) Negative lookbehind, assert what is directly to the left is not a newline or !
([^\n!]*) Capture group 1, match 0+ times any char except a newline or !
!+ Match 1+ times ! (What you want to remove)
The pattern will match too much, as it will match all the individual parts. There is for example no rule that says match this pattern 2 times, so you will replace with group 1 for every time that pattern has a match.
Note that the quantifier in this part is 0+ times ([^\n!]*) it will also match a single ! except when preceded by a newline.
If you can make use of SKIP FAIL, you can first match what you want to avoid, which in this case is a line that optionally starts with an exclamation mark and ends with an exclamation mark with none in between.
After that match all the other exclamation marks and replace them with a dot.
^!?[^\r\n!]*!$(*SKIP)(*FAIL)|!
See a regex demo
Another option could be using 2 capturing groups.
The first group will match between the first set of exclamation marks, and the second group will match the whitespaces after followed by a char other than !.
Then match the ! at the end so it is not in the replacement
!([^\s!]+)!([^\S\r\n]+[^\s!])!
See another regex demo
In the replacement use the 2 capturing groups with the dots
.$1.$2.

Regex: non-word-char preceded by space or at the beginning of a line

I'm trying to match #'s that are preceeded by nothing or whitespace like in the following example:
#one
#two
who#three
/#four
My approach is (^|\s)#, but this captures one two three four. \s# only captures two. How do I get one two, without getting three four?
If it's relevant, all #'s are succeeded by a letter.
How about ^\s*# ? That is, start of line followed by zero or more whitespace then #. (Note, different laguanges have different regex rules)
This is one way:
/^ *#(.+)/ #allow zero or more spaces before # then start matching everything until a newline character.
Javascript Demo
If you can use lookbehind, this should work (not supported in native javascript):
(?<=\s|^)#

Javascript regular expression quantifier

I am trying to write a javascript regular expression that matches a min and max number of words based on finding this pattern: any number of characters followed by a space. This matches one word followed by an empty space (for example: one ):
(^[a-zA-Z]+\s$)
Debuggex Demo
When I add in the range quantifier {1,3}, it doesn't match two occurrences of the pattern (for example: one two ). What do I need to change to the regular expression to match a min and max of this pattern?
(^[a-zA-Z]+\s$){1,3}
Debuggex Demo
Any explanation is greatly appreciated.
Take ^ and $ out of the quantified group, because you can't match the beginning and end of the string multiple times in one line.
^([a-zA-Z]+\s){1,3}$
DEMO
The following will work exactly as specified:
^([a-zA-Z]+ ){1,3}$
Replace the space with \s to match any single whitespace character:
^([a-zA-Z]+\s){1,3}$
Add a quantifier to the \s to set how many whitespace characters are acceptable. The following allows one or more by adding +:
^([a-zA-Z]+\s+){1,3}$
If the whitespace at the end is optional, then the following will work:
^([a-zA-Z]+(\s[a-zA-Z]+){0,2})\s*$
(^[a-zA-Z]+\s$) will start scanning from the start of the line ^, scan for a word [a-zA-Z]+, scan for a space \s, and expect the end of the line $
When you have two words, it does not find the end of the line, so it fails. If you take out $, the second word would fail because it is not the start of the line.
So the start line and end line have to go around the limit scan.
To make it more generic:
(\S+\s*){1,3}
\S+: At least one Non-whitespace
\s*: Any amount of Whitespace
This will allow scanning of words even if there is no space at the end of the string. If you want to force the whole line, then you can put ^ in the front and $ at the end:
^(\S+\s*){1,3}$

Regex to create a group from an entire line, or just up to a given token

I'm using a JavaScript Regex Engine.
The regex ^(.*?)\s*(?=[*\[]).* will capture a group containing all the characters up to a [ or * character. It works well with these lines, matching the entire line and capturing the first section:
This should be captured up to here[ but no further]
This should be captured up to this asterisk* but not after it*
However, I would like to also capture an entire line if it contains neither of these characters:
This entire line should be captured.
This regex ^(.*?)\s*(?=[*\[]).*|^(.*)$ will match the entire line, but it will not capture anything in group \1.
Is it possible to modify the lookahead so that it will also find no more characters?
Just add an end of the line anchor inside the positive lookahead assertion.
^(.*?)\s*(?=[*\[]|$)
DEMO
You can use this regex:
/^(.*?)\s*(?=[*\[]|[^*\[]$)/
RegEx Demo

Categories