Skip char in regex - javascript

I need something like this (for '-' char):
when --- skip it, but when -- gives the position.
The problem is when ---, the -- is strating from the first char or the second one instead of skipping it.
What is the way to exclude --- from regex and continue to find the next -- in javascript regex ?
Thanks in advance for your help.

You haven't said what's around the dashes but you could do this.
Note that JS does not support lookbehind, so you have to consume whats
before the dashes as well.
(?:[^-]|$)--(?!-)
Explanation
(?:
[^-] # Not a dash
| ^ # or, beginning of string
)
-- # Two dashes
(?! - ) # Not a dash after this

Related

Javascript regex to match between two patterns, where first pattern is optional

I've tried so many things and tried adapting similar answers... but still lost today to this, if anyone can help I'd be eternally grateful!
I need to use a regex (the JS lexer-library I'm using doesn't allow for anything else) to match:
Any content between $$ and */
Must not include the opening $$
But must include the closing */
The "content" can be any character/digit/whitespace/newline
Given this:
xxx. 123 $$yyy.234 */zzz.567
^^^^^^^^^^
...I need the indicated string to be matched.
As such, this seems to work fine:
(?<=\$\$)(?:[\s\S])*?(?:[\s\S])*?\*\/
(...as seen here)
But there's an additional requirement of:
If there's no $$, then just match to the beginning of the string.
E.g.:
xxx. 123 yyy.234 */zzz.567
^^^^^^^^^^^^^^^^^^^
Yeah, at the limits of my regex knowledge and just can't land it! :-(
Might be worth mentioning the opening $$ symbol isn't quite that solid, it's more like:
\$[\p{L}0-9_]*?\$
When matching against www $$ xxx $$ yyy */ zzz, I'm assuming the result should be $$ yyy */ rather than $$ xxx $$ yyy */. The solution may be more complicated than it needs to be if this isn't a requirement.
(?: ^ | \$\$ ) # Starting at the start of the string or at "$$"
( (?: (?!\$\$). )* # A sequence of characters (.) now of which starting with "$$"
\*/ # Followed by "*/"
) # End capture
Except not quite. That will fail for $$$abc*/. So we fix:
(?: ^ | \$\$(?!\$) ) # Starting at the start of the string or at "$$" (but not "$$$")
( (?: (?!\$\$). )* # A sequence of characters (.) now of which starting with "$$"
\*/ # Followed by "*/"
)
We could also avoid lookaheads.
(?: ^ | \$\$ )
( (?: [^$]+ ( \$[^$]+ )* \$? )?
\*/
)
Regarding the the updated question, the lookahead version can be modified to accommodate \$[\p{L}0-9_]*\$.
(?: ^
| \$ [\p{L}0-9_]* \$ (?! [\p{L}0-9_]* \$ )
)
( (?: (?! \$ [\p{L}0-9_]* \$ ) . )*
\*/
)
I've used line breaks and whitespace for readability. You will need to remove them (since JS's engine doesn't appear to have a flag to cause them to be ignored like some other engines do).
I know this has already been answered and accepted. But here's the shortest way of doing it.
let str = "xxx. $$ 123 $$yyy.234 */zzz.567";
let regex = /\$?\w*\$?([\w \d.-]*\$?[\w \d.-]*\*\/)/gm;
console.log(regex.exec(str)[1]);
Update:
As mentioned in the comments, the above method fails for a $ b */ kind of strings. So, I came up with this. This isn't as good as #ikugami's, but this can definitely be another way.
let str = "$$xxx. $$gjjd*/ fhjgd";
let regex = /(\$?\w*\$?)([\w \d.-]*\$?[\w \d.-]*\*\/)/gm;
result = regex.exec(str).slice(1);
if (result[0].startsWith('$')) {
result = result[1]
} else {
result = result[0] + result[1]
}
console.log(result);

Regex for matching string in parentheses including when opening or closing parenthesis is missing

I want to match strings in parentheses (including the parens themselves) and also match strings when a closing or opening parenthesis is missing.
From looking around my ideal solution would involve conditional regex however I need to work within the limitations of javascript's regex engine.
My current solution that almost works: /\(?[^()]+\)|\([^()]+/g. I could split this up (might be better for readability) but am curious to know if there is a way to achieve it without being overly verbose with multiple |'s.
Examples
Might help to understand what I'm trying to achieve through examples (highlighted sections are the parts I want to match):
(paren without closing
(paren in start) of string
paren (in middle) of string
paren (at end of string)
paren without opening)
string without any parens
(string with only paren)
string (with multiple) parens (in a row)
Here's a link to the tests I set up in regexr.com.
You can match the following regular expression.
^\([^()]*$|^[^()]*\)$|\([^()]*\)
Javascript Demo
Javascript's regex engine performs the following operations.
^ # match the beginning of the string
\( # match '('
[^()]*. # match zero or more chars other than parentheses,
# as many as possible
$ # match the end of the string
| # or
^ # match the beginning of the string
[^()]*. # match zero or more chars other than parentheses,
# as many as possible
\) # match ')'
$ # match the end of the string
| # or
\( # match '('
[^()]*. # match zero or more chars other than parentheses,
# as many as possible
\) # match ')'
As of question date (May 14th 2020) the Regexr's test mechanism was not working as expected (it matches (with multiple) but not (in a row)) Seems to be a bug in the test mechanism. If you copy and paste the 8 items in the "text' mode of Regexr and test your expression you'll see it matches (in a row). The expression also works as expected in Regex101.
I think you've done alright. The issue is that you need to match [()] in two places and only one of them needs to be true but both can't be false and regex isn't so smart as to keep state like that. So you need to check if there is 0 or 1 opening or 0 or 1 closing in alternatives like you have.
Update:
I stand corrected since all you seem to care about is where there is an open or closing parenthesis you could just do something like this:
.*[\(?\)]+.*
In English: any number of characters with eith an ( or ) followed by any number of characters. This will match them in any order though, so if you need ( to be before closed even though you don't seem to care if both are present, this won't work.

regex pattern to match a type of strings

I need to match the below type of strings using a regex pattern in javascript.
E.g. /this/<one or more than one word with hyphen>/<one or more than one word with hyphen>/<one or more than one word with hyphen>/<one or more than one word with hyphen>
So this single pattern should match both these strings:
1. /this/is/single-word
2. /this/is-more-than/single/word-patterns/to-match
Only the slash (/) and the 'this' string in the beginning are consistent and contains only alphabets.
You can use:
\/this\/[a-zA-Z ]+\/[a-zA-Z ]+\/[a-zA-Z ]+
Working Demo
I think you want something like this maybe?
(\/this\/(\w+\s?){1,}\/\w+\/(\w+\s?)+)
break down:
\/ # divder
this # keyword
\/ # divider
( # begin section
\w+ # single valid word character
\s? # possibly followed by a space
) # end section
{1,} # match previous section at least 1 times, more if possible.
\/ # divider
\w+ # single valid word character
\/ # divider
( # begin section
\w+ # single valid word character
\s? # possible space
) # end section
Working example
This might be obvious, however to match each pattern as a separate result, I believe you want to place parenthesis around the whole expression, like so:
(\/[a-zA-Z ]+\/[a-zA-Z ]+\/[a-zA-Z ]+\/[a-zA-Z ]+)
This makes sure that TWO results are returned, not just one big group.
Also, your question did not state that "this" would be static, as the other answers assumed... it says only the slashes are static. This should work for any text combo (no word this required).
Edit - actually looking back at your attempt, I see you used /this/ in your expression, so I assume that's why others did as well.
Demo: http://rubular.com/r/HGYp2qtmAM
Modified question samples:
/this/is/single-word
/this/is-more-than/single/word-patterns/to-match
Modified again The sections may have hyphen (no spaces) and there may be 3 or 4 sections beyond '/this/'
Modified pattern /^\/this(?:\/[a-zA-Z]+(?:-[a-zA-Z]+)*){3,4}$/
^
/this
(?:
/ [a-zA-Z]+
(?: - [a-zA-Z]+ )*
){3,4}
$

Replace spaces but not when between parentheses

I guess I can do this with multiple regexs fairly easily, but I want to replace all the spaces in a string, but not when those spaces are between parentheses.
For example:
Here is a string (that I want to) replace spaces in.
After the regex I want the string to be
Hereisastring(that I want to)replacespacesin.
Is there an easy way to do this with lookahead or lookbehing operators?
I'm a little confused on how they work, and not real sure they would work in this situation.
Try this:
replace(/\s+(?=[^()]*(\(|$))/g, '')
A quick explanation:
\s+ # one or more white-space chars
(?= # start positive look ahead
[^()]* # zero or more chars other than '(' and ')'
( # start group 1
\( # a '('
| # OR
$ # the end of input
) # end group 1
) # end positive look ahead
In plain English: it matches one or more white space chars if either a ( or the end-of-input can be seen ahead without encountering any parenthesis in between.
An online Ideone demo: http://ideone.com/jaljw
The above will not work if:
there are nested parenthesis
parenthesis can be escaped

Regex explanation

I am looking at the code in the tumblr bookmarklet and was curious what the code below did.
try{
if(!/^(.*\.)?tumblr[^.]*$/.test(l.host))
throw(0);
tstbklt();
}
Can anyone tell me what the if line is testing? I have tried to decode the regex but have been unable to do so.
Initially excluding the specifics of the regex, this code is:
if ( ! /.../.test(l.host) )
"if not regex.matches(l.host)" or "if l.host does not match this regex"
So, the regex must correctly describe the contents of l.host text for the conditional to fail and thus avoid throwing the error.
On to the regex itself:
^(.*\.)?tumblr[^.]*$
This is checking for the existence of tumblr but only after any string ending in . that might exist:
^ # start of line
( # begin capturing group 1
.* # match any (non-newline) character, as many times as possible, but zero allowed
\. # match a literal .
) # end capturing group 1
? # make whole preceeding item optional
tumblr # match literal text tumblr
[^.]* # match any non . character, as many times as possible, but zero allowed
$ # match end of line
I thought it was testing to see if the host was tumblr
Yeah, it looked like it might be intended to check that, but if so it's the wrong way to do it.
For that, the first bit should be something like ^(?:[\w-]+\.)? to capture an alphanumeric subdomain (the ?: is a non-capturing group, the [\w-]+ is at least 1 alphanumeric, underscore or hyphen) and the last bit should be either \.(?:com|net|org)$ or perhaps like (?:\.[a-zA-Z]+)+$ depending on how flexible the tld section might need to be.
My attempt to break it down. I'm no expert with regex however:
if(!/^(..)?tumblr[^.]$/.test(l.host))
This part isn't really regex but tells us to only execute the if() if this test does not work.
if(!/^(.*\.)?tumblr[^.]*$/.test(l.host))
This part allows for any characters before the tumblr word as long as they are followed by a . But it is all optional (See the ? at the end)
if(!/^(.*.)?tumblr**[^.]*$/**.test(l.host))
Next, it matches any character except the . and it the *$ extends that to match any character afterwards (so it doesn't break after 1) and it works until the end of the string.
Finally, the .test() looks to test it against the current hostname or whatever l.host contains (I'm not familiar with the tumblr bookmarklet)
So basically, it looks like that part is checking to see that if the host is not part of tumblr, then throw that exception.
Looking forward to see how wrong I am :)

Categories