Javascript regex to match between two patterns, where first pattern is optional - javascript

I've tried so many things and tried adapting similar answers... but still lost today to this, if anyone can help I'd be eternally grateful!
I need to use a regex (the JS lexer-library I'm using doesn't allow for anything else) to match:
Any content between $$ and */
Must not include the opening $$
But must include the closing */
The "content" can be any character/digit/whitespace/newline
Given this:
xxx. 123 $$yyy.234 */zzz.567
^^^^^^^^^^
...I need the indicated string to be matched.
As such, this seems to work fine:
(?<=\$\$)(?:[\s\S])*?(?:[\s\S])*?\*\/
(...as seen here)
But there's an additional requirement of:
If there's no $$, then just match to the beginning of the string.
E.g.:
xxx. 123 yyy.234 */zzz.567
^^^^^^^^^^^^^^^^^^^
Yeah, at the limits of my regex knowledge and just can't land it! :-(
Might be worth mentioning the opening $$ symbol isn't quite that solid, it's more like:
\$[\p{L}0-9_]*?\$

When matching against www $$ xxx $$ yyy */ zzz, I'm assuming the result should be $$ yyy */ rather than $$ xxx $$ yyy */. The solution may be more complicated than it needs to be if this isn't a requirement.
(?: ^ | \$\$ ) # Starting at the start of the string or at "$$"
( (?: (?!\$\$). )* # A sequence of characters (.) now of which starting with "$$"
\*/ # Followed by "*/"
) # End capture
Except not quite. That will fail for $$$abc*/. So we fix:
(?: ^ | \$\$(?!\$) ) # Starting at the start of the string or at "$$" (but not "$$$")
( (?: (?!\$\$). )* # A sequence of characters (.) now of which starting with "$$"
\*/ # Followed by "*/"
)
We could also avoid lookaheads.
(?: ^ | \$\$ )
( (?: [^$]+ ( \$[^$]+ )* \$? )?
\*/
)
Regarding the the updated question, the lookahead version can be modified to accommodate \$[\p{L}0-9_]*\$.
(?: ^
| \$ [\p{L}0-9_]* \$ (?! [\p{L}0-9_]* \$ )
)
( (?: (?! \$ [\p{L}0-9_]* \$ ) . )*
\*/
)
I've used line breaks and whitespace for readability. You will need to remove them (since JS's engine doesn't appear to have a flag to cause them to be ignored like some other engines do).

I know this has already been answered and accepted. But here's the shortest way of doing it.
let str = "xxx. $$ 123 $$yyy.234 */zzz.567";
let regex = /\$?\w*\$?([\w \d.-]*\$?[\w \d.-]*\*\/)/gm;
console.log(regex.exec(str)[1]);
Update:
As mentioned in the comments, the above method fails for a $ b */ kind of strings. So, I came up with this. This isn't as good as #ikugami's, but this can definitely be another way.
let str = "$$xxx. $$gjjd*/ fhjgd";
let regex = /(\$?\w*\$?)([\w \d.-]*\$?[\w \d.-]*\*\/)/gm;
result = regex.exec(str).slice(1);
if (result[0].startsWith('$')) {
result = result[1]
} else {
result = result[0] + result[1]
}
console.log(result);

Related

Regex match url with params to specific pattern but not query string

My regex pattern:
const pattern = /^\/(test|foo|bar\/baz|en|ppp){1}/i;
const mat = pattern.exec(myURL);
I want to match:
www.mysite.com/bar/baz/myParam/...anything here
but not
www.mysite.com/bar/baz/?uid=100/..
myParam can be any string with or without dashes but only after that anything else can occur like query strings but not immediately after baz.
Tried
/^\/(test|foo|bar\/baz\/[^/?]*|en|ppp){1}/i;
Nothing works.
This, I believe, is what you are asking for:
const myURL = "www.mysite.com/bar/baz/myParam/";
const myURL2 = "www.mysite.com/bar/baz/?uid=100";
const regex = /\/[^\?]\w+/gm;
console.log('with params', myURL.match(regex));
console.log('with queryParams', myURL2.match(regex))
You can test this and play further in Regex101. Even more, if you use that page, it tells you what does what in the regex string.
If it's not what you were asking for, there was another question related to yours, without regex: Here it is
For the 2 example strings, you might use
^[^\/]+\/bar\/baz\/[\w-]+\/.*$
Regex demo
If you want to use the alternations as well, it might look like
^[^\/]+\/(?:test|foo|bar)\/(?:baz|en|ppp)\/[\w-]+\/.*$
^ Start of string
[^\/]+ Match 1+ times any char except a /
\/ Match /
(?:test|foo|bar) Match 1 of the options
\/ Match /
(?:baz|en|ppp) Match 1 of the options
\/ Match /
[\w-]+ Match 1+ times a word char or -
\/ Match /
.* Match 0+ occurrences of any char except a newline
$ End of string
Regex demo
Using a negative lookahead or lookbehind will solve your problem. There are 2 options not clear from the question:
?uid=100 is not allowed after the starting part /bar/baz, so www.mysite.com/test/bar/baz?uid=100 should be valid.
?uid=100 is not allowed anywhere in the string following /bar/baz, which means that www.mysite.com/test/bar/baz/?uid=100 is invalid as well.
Option 1
In short:
\/(test|foo|bar\/baz(?!\/?\?)|en|ppp)(\/[-\w?=]+)*\/?
Explanation of the important parts:
| # OR
bar # 'bar' followed by
\/ # '/' followed by
baz # 'baz'
(?! # (negative lookahead) so, **not** followed by
\/? # 0 or 1 times '/'
\? # '?'
) # END negative lookahead
and
( # START group
\/ # '/'
[-\w?=]+ # any word char, or '-','?','='
)* # END group, occurrence 0 or more times
\/? # optional '/'
Examples Option 1
You can make the lookahead even more specific with something like (?!\/?\?\w+=\w+) to make explicit that ?a=b is not allowed, but that's up to you.
Option 2
To make explicit that ?a=b is not allowed anywhere we can use negative lookbehind. Let's first find a solution for not allowing* bar/baz preceding the ?a=b.
Shorthand:
(?<!bar\/baz\/?)\?\w+=\w+
Explanation:
(?<! # Negative lookbehind: do **not** match preceding
bar\/baz # 'bar/baz'
\/? # optional '/'
)
\? # match '?'
\w+=\w+ # match e.g. 'a=b'
Let's make this part of the complete regex:
\/(test|foo|en|ppp|bar\/baz)(\/?((?<!bar\/baz\/?)\?\w+=\w+|[-\w]+))*\/?$
Explanation:
\/ # match '/'
(test|foo|en|ppp|bar\/baz) # start with 'test', 'foo', 'en', 'ppp', 'bar/baz'
(\/? # optional '/'
((?<!bar\/baz\/?)\?\w+=\w+ # match 'a=b', with negative lookbehind (see above)
| # OR
[-\w]+) # 1 or more word chars or '-'
)* # repeat 0 or more times
\/? # optional match for closing '/'
$ # end anchor
Examples Option 2

Capitalizing first letter of each word, \b\w also applies to I'm

Need to capitalize the first letter of each word in a sentence, my regex expression however is also capitalizing the 'm' in I'm.
The full expression is this:
/(?:^\w|[A-Z]|\b\w)/g
The problem here (I think) is that \b\w will grab the first letter after a word boundary. I'm assuming that single quotes denote a word boundary therefore also capitalizing the m of I'm into I'M.
Can anyone help me change the expression to exclude the 'm' after the single quotes?
Thanks in advance.
Finding a real word break in the middle of language might be a bit more
complicated than using regex word boundary's.
( \s* [\W_]* ) # (1), Not letters/numbers,
( [^\W_] ) # (2), Followed by letter/number
( # (3 start)
(?: # -----------
\w # Letter/number or _
| # or,
[[:punct:]_-] # Punctuation
(?= [\w[:punct:]-] ) # if followed by punctuation/letter/number or '-'
| #or,
[?.!] # (Add) Special word ending punctuation
)* # ----------- 0 to many times
) # (3 end)
var str = 'This "is the ,input _str,ng, the End ';
console.log(str);
console.log(str.replace(/(\s*[\W_]*)([^\W_])((?:\w|[[:punct:]_-](?=[\w[:punct:]-])|[?.!])*)/g, function( match, p1,p2,p3) {return p1 + p2.toUpperCase() + p3;}));

Skip char in regex

I need something like this (for '-' char):
when --- skip it, but when -- gives the position.
The problem is when ---, the -- is strating from the first char or the second one instead of skipping it.
What is the way to exclude --- from regex and continue to find the next -- in javascript regex ?
Thanks in advance for your help.
You haven't said what's around the dashes but you could do this.
Note that JS does not support lookbehind, so you have to consume whats
before the dashes as well.
(?:[^-]|$)--(?!-)
Explanation
(?:
[^-] # Not a dash
| ^ # or, beginning of string
)
-- # Two dashes
(?! - ) # Not a dash after this

Match only subregex, part of regex

Hello I wanted to do autofiller to match to this format "HH:MM".
I wanted to check only against this regex /^(0[1-9]|1[012]):[0-5][0-9]$/ but have no idea how to match regex substring. I've looked at wikipedia and some sites and can't find modificator to check for 'subregex'. Doesn't this option exist? I've finally solved this problem with code below, but this array could certainly be generated programmatically, so there should already be solution I am searching for. Or it doesn't exist and I should write it?
patterns = [ /./, /^[0-9]$/, /^(0?[1-9]|1[012])$/, /^(0[1-9]|1[012]):$/, /^(0[1-9]|1[012]):[0-5]$/, /^(0[1-9]|1[012]):[0-5][0-9]$/]
unless patterns[newTime.length].test(newTime)
newTime = newTime.substring(0, newTime.length - 1)
You could probably accomplish the same thing a bit more efficient.
Combine the regexes into a cascading optional form, then use the match length, substring
and a template to auto complete the time.
Pseudo code (don't know JS too well) and real regex.
# pseudo-code:
# -------------------------
# input = ....;
# template = '00:00';
# rx = ^(?:0(?:[0-9](?::(?:[0-5](?:[0-9])?)?)?)?|1(?:[0-2](?::(?:[0-5](?:[0-9])?)?)?)?)$
# match = regex( input, rx );
# input = input + substr( template, match.length(), -1 );
^
(?:
0
(?:
[0-9]
(?:
:
(?:
[0-5]
(?: [0-9] )?
)?
)?
)?
|
1
(?:
[0-2]
(?:
:
(?:
[0-5]
(?: [0-9] )?
)?
)?
)?
)
$

Replace spaces but not when between parentheses

I guess I can do this with multiple regexs fairly easily, but I want to replace all the spaces in a string, but not when those spaces are between parentheses.
For example:
Here is a string (that I want to) replace spaces in.
After the regex I want the string to be
Hereisastring(that I want to)replacespacesin.
Is there an easy way to do this with lookahead or lookbehing operators?
I'm a little confused on how they work, and not real sure they would work in this situation.
Try this:
replace(/\s+(?=[^()]*(\(|$))/g, '')
A quick explanation:
\s+ # one or more white-space chars
(?= # start positive look ahead
[^()]* # zero or more chars other than '(' and ')'
( # start group 1
\( # a '('
| # OR
$ # the end of input
) # end group 1
) # end positive look ahead
In plain English: it matches one or more white space chars if either a ( or the end-of-input can be seen ahead without encountering any parenthesis in between.
An online Ideone demo: http://ideone.com/jaljw
The above will not work if:
there are nested parenthesis
parenthesis can be escaped

Categories