I'm looking for a regular expression that can match both of these lines:
foo/bar
foo/bar baz
And capture foo, bar, and baz into separate match groups.
I've tried with this regex:
^([^\/]+)\/([^\/#]+)? (\w+)$
You can use below regex
^(\w+)\/(\w+)\s*(\w+)?$
^: Starts with anchor
(\w+): Match one or more word characters(alphabets, numbers and underscore) and add them to capturing group
\/: Match forward slash
\s*: Match any number of spaces
(\w+)?: Optional alphanumeric+underscore match
$: Ends with anchor
Here's demo on RegEx101.com.
This will match first word before / in first capture group which can be accessed by $1, word after / in second group-$2 and optional word in $3.
If there are other characters than \w i.e. [a-zA-Z0-9_], you can use below regex
^([^\/]+)\/(\S+)\s*(\S+)?$
Demo
[^\/]+ will match one or more characters except /. \S+ will match one or more non-space characters.
Try using this ^([^\/]+)\/([^\/#]+)\s*(\w*)$ with g and m flags.
Related
I need a regular expression that matches the complete string with a zero/even number of backslashes anywhere in the string. If the string contains an odd number of backslashes, it should not match the complete string.
Example:
\\ -> match
\\\ -> does not match
test\\test -> match
test\\\test-> does not match
test\\test\ -> does not match
test\\test\\ -> match
and so on...
Note: We can assume any string of any length in place of 'test' in the above example
I am using this ^[^\\]*(\\\\)*[^\\]*$ regular expression, but it does not match the backslashes after the second test.
For example:
test\\test(doesn't match anything after this)
Thanks for any help in advance.
You may use this regex:
^(?:(?:[^\\]*\\){2})*[^\\]*$
RegEx Demo
RegEx Breakdown:
^: Start
(?:: Start non-capture group #1
(?:: Start non-capture group #2
[^\\]*: Match 0 or more og any char except a \
\\: Match a \
){2}: End non-capture group #2. Repeat this group 2 times.
)*: End non-capture group #1. Repeat this group 0 or more times.
[^\\]*: Match 0 or more og any char except a \
$: End
The current regular expression ^[^\\]*(\\\\)*[^\\]*$ can be interpreted as Any(\\)*Any, Where Any means any character except backslash.
The expected language shall be Any\\Any\\Any\\..., which can be obtained by containing the current regular expression in Kleene closure operator. That is (Any(\\)*Any)*
The original regular expression after modification:
^([^\\]*(\\\\)*[^\\]*)*$
It can be further optimized as:
^((\\\\)*[^\\]*)*$
The input is this:
*Word. Word.* Word word. *…*
"…" Word word. "…"
"…" word. "…"
The following is matching the empty space on the right side of a sentence.
(?<=["*]*[A-Z].+?\.["*]*)\s
If I want to match the empty space on the left side, I have to do this:
\s(?=["*]*[A-Z].+?\.["*]*)
The output should be this (the [] symbolize the matches):
*Word.[]Word.*[]Word word.[]*…*
"…"[]Woad word.[]"…"
"…" word.[]"…"
How to modify this regex so it matches the empty spaces on both sides of a sentence at the same time?
https://regexr.com/5tddc
For the examples shown, you may be able to use this regex with look arounds to match spaces:
(?<=\.\*?) |(?<!\w) (?=[A-Z])
RegEx Demo
RegEx Details:
(?<=\.\*?) : Match a space if that is preceded by a dot and optional *
|: OR
(?<!\w) (?=[A-Z]): Match a space that must be followed by an uppercase letter and must not be preceded by a word character
Perhaps you can match a non word boundary and assert either an uppercase char A-Z or one of " * at the right.
\B[ ](?=[A-Z"*])
The pattern matches:
\B A position where \b does not match
[ ] Match a space (The brackets are for clarity only)
(?= Positive lookahead, assert what is at the right is
[A-Z"*] Match one of A-Z or " or *
) Close lookahead
regex demo
I'm attempting to match the last character in a WORD.
A WORD is a sequence of non-whitespace characters
'[^\n\r\t\f ]', or an empty line matching ^$.
The expression I made to do this is:
"[^ \n\t\r\f]\(?:[ \$\n\t\r\f]\)"
The regex matches a non-whitespace character that follows a whitespace character or the end of the line.
But I don't know how to stop it from excluding the following whitespace character from the result and why it doesn't seem to capture a character preceding the end of the line.
Using the string "Hi World!", I would expect: the "i" and "!" to be captured.
Instead I get: "i ".
What steps can I take to solve this problem?
"Word" that is a sequence of non-whitespace characters scenario
Note that a non-capturing group (?:...) in [^ \n\t\r\f](?:[ \$\n\t\r\f]) still matches (consumes) the whitespace char (thus, it becomes a part of the match) and it does not match at the end of the string as the $ symbol is not a string end anchor inside a character class, it is parsed as a literal $ symbol.
You may use
\S(?!\S)
See the regex demo
The \S matches a non-whitespace char that is not followed with a non-whitespace char (due to the (?!\S) negative lookahead).
General "word" case
If a word consists of just letters, digits and underscores, that is, if it is matched with \w+, you may simply use
\w\b
Here, \w matches a "word" char, and the word boundary asserts there is no word char right after.
See another regex demo.
In Word text, if I want to highlight the last a in para. I search for all the words that have [space][para][space] to make sure I only have the word I want, then when it is found it should be highlighted.
Next, I search for the last [a ] space added, in the selection and I will get only the last [a] and I will highlight it or color it differently.
I have the following Regex
console.log("Test #words 100-200-300".toLowerCase().match(/(?:\B#)?\w+/g))
From the above you can see it is splitting "100-200-300". I want it to ignore "-" and keep the word in full as below:
--> ["test", "#words", "100-200-300"]
I need the Regex to keep the same rules, with the addition of not splitting words connected with "-"
For your current example, you could match an optional #, 1+ word chars and repeat 0+ times a part that matches a # and 1+ word chars again.
#?\w+(?:-\w+)*
#? Optional #
\w+ 1+ word characters
(?:-\w+)* Repeat as a group 0+ times matching - and 1+ word chars
Regex demo
console.log("Test #words 100-200-300".toLowerCase().match(/#?\w+(?:-\w+)*/g));
About the \B anchor (following text taken from the link)
\B is the negated version of \b. \B matches at every position where \b
does not. Effectively, \B matches at any position between two word
characters as well as at any position between two non-word characters.
If you do want to use that anchor, see for example some difference in matches with \B and without \B
I want to match only a dollar symbol without a backslash immediately before, as demonstrated below:
$not\$yes $
^.........^
So far, I have [^\\]\$, but this doesn't match any dollar that begins a line. The dollar could be the first symbol in the document, so matching a newline would not work. How do I match this? Is the regex I have so far even right?
You could use an alternation with the ^ anchor in order to match the $ character literally if it is the first character in the string or if it follows a character that is not a backslash.
/(?:^|[^\\])\$/
Explanation:
(?: - Start of a non-capturing group that is used to group the alternation.
^|[^\\] - Alternation that matches the start of the string using the ^ anchor or match a non-\ character
) - Close the non-capturing group that was used to group ^|[^\\]
\$ - The $ character literally
In other words, the ^ anchor will match the start of the string; while [^\\] will match anything but a backslash. The pipe | acts as an "or" operator that will match the start of the string or anything but a backslash (i.e., ^|[^\\]).
So in the string you provided, the first/last $ character would be matched.
Use a negative lookbehind assertion
(?<!\\)\$
In Action: https://regex101.com/r/dA8aA1/1