Javascript - regex odd number of quotes - javascript

So I have this javascript regex expression:
var reg = new RegExp("(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))");
How could I escape the quotes so that the quotes are contained, since right now, they overflow, and quote the lines after it.
Edit:
regex expanded:
(?xi)
\b
( # Capture 1: entire matched URL
(?:
[a-z][\w-]+: # URL protocol and colon
(?:
/{1,3} # 1-3 slashes
| # or
[a-z0-9%] # Single letter or digit or '%'
# (Trying not to match e.g. "URI::Escape")
)
| # or
www\d{0,3}[.] # "www.", "www1.", "www2." … "www999."
| # or
[a-z0-9.\-]+[.][a-z]{2,4}/ # looks like domain name followed by a slash
)
(?: # One or more:
[^\s()<>]+ # Run of non-space, non-()<>
| # or
\(([^\s()<>]+|(\([^\s()<>]+\)))*\) # balanced parens, up to 2 levels
)+
(?: # End with:
\(([^\s()<>]+|(\([^\s()<>]+\)))*\) # balanced parens, up to 2 levels
| # or
[^\s`!()\[\]{};:'".,<>?«»“”‘’] # not a space or one of these punct chars
)
)

If you just use the native declaration form of regex in javascript:
var reg = /regex here/;
Then, you can freely use quotes in the regex without escaping anything. You will have to escape any forward slashes in the regex by putting a backslash in front of it.
If you want to stick with the string form, then you can escape a quote with a backslash in front of it to keep it from being a string terminator:
var reg = new RegExp('My dog\'s breath');

Related

Regex match url with params to specific pattern but not query string

My regex pattern:
const pattern = /^\/(test|foo|bar\/baz|en|ppp){1}/i;
const mat = pattern.exec(myURL);
I want to match:
www.mysite.com/bar/baz/myParam/...anything here
but not
www.mysite.com/bar/baz/?uid=100/..
myParam can be any string with or without dashes but only after that anything else can occur like query strings but not immediately after baz.
Tried
/^\/(test|foo|bar\/baz\/[^/?]*|en|ppp){1}/i;
Nothing works.
This, I believe, is what you are asking for:
const myURL = "www.mysite.com/bar/baz/myParam/";
const myURL2 = "www.mysite.com/bar/baz/?uid=100";
const regex = /\/[^\?]\w+/gm;
console.log('with params', myURL.match(regex));
console.log('with queryParams', myURL2.match(regex))
You can test this and play further in Regex101. Even more, if you use that page, it tells you what does what in the regex string.
If it's not what you were asking for, there was another question related to yours, without regex: Here it is
For the 2 example strings, you might use
^[^\/]+\/bar\/baz\/[\w-]+\/.*$
Regex demo
If you want to use the alternations as well, it might look like
^[^\/]+\/(?:test|foo|bar)\/(?:baz|en|ppp)\/[\w-]+\/.*$
^ Start of string
[^\/]+ Match 1+ times any char except a /
\/ Match /
(?:test|foo|bar) Match 1 of the options
\/ Match /
(?:baz|en|ppp) Match 1 of the options
\/ Match /
[\w-]+ Match 1+ times a word char or -
\/ Match /
.* Match 0+ occurrences of any char except a newline
$ End of string
Regex demo
Using a negative lookahead or lookbehind will solve your problem. There are 2 options not clear from the question:
?uid=100 is not allowed after the starting part /bar/baz, so www.mysite.com/test/bar/baz?uid=100 should be valid.
?uid=100 is not allowed anywhere in the string following /bar/baz, which means that www.mysite.com/test/bar/baz/?uid=100 is invalid as well.
Option 1
In short:
\/(test|foo|bar\/baz(?!\/?\?)|en|ppp)(\/[-\w?=]+)*\/?
Explanation of the important parts:
| # OR
bar # 'bar' followed by
\/ # '/' followed by
baz # 'baz'
(?! # (negative lookahead) so, **not** followed by
\/? # 0 or 1 times '/'
\? # '?'
) # END negative lookahead
and
( # START group
\/ # '/'
[-\w?=]+ # any word char, or '-','?','='
)* # END group, occurrence 0 or more times
\/? # optional '/'
Examples Option 1
You can make the lookahead even more specific with something like (?!\/?\?\w+=\w+) to make explicit that ?a=b is not allowed, but that's up to you.
Option 2
To make explicit that ?a=b is not allowed anywhere we can use negative lookbehind. Let's first find a solution for not allowing* bar/baz preceding the ?a=b.
Shorthand:
(?<!bar\/baz\/?)\?\w+=\w+
Explanation:
(?<! # Negative lookbehind: do **not** match preceding
bar\/baz # 'bar/baz'
\/? # optional '/'
)
\? # match '?'
\w+=\w+ # match e.g. 'a=b'
Let's make this part of the complete regex:
\/(test|foo|en|ppp|bar\/baz)(\/?((?<!bar\/baz\/?)\?\w+=\w+|[-\w]+))*\/?$
Explanation:
\/ # match '/'
(test|foo|en|ppp|bar\/baz) # start with 'test', 'foo', 'en', 'ppp', 'bar/baz'
(\/? # optional '/'
((?<!bar\/baz\/?)\?\w+=\w+ # match 'a=b', with negative lookbehind (see above)
| # OR
[-\w]+) # 1 or more word chars or '-'
)* # repeat 0 or more times
\/? # optional match for closing '/'
$ # end anchor
Examples Option 2

JavaScript RegEx - Minimum characters with Wildcard

I'm working on matching a wildcard search input. it's a name field.
Below are the conditions I need to match.
User must enter at least 3 alphanumeric characters, if he chooses to do a Wildcard search
User may/maynot enter a wildcard at the start or end of the string,but it can be on either side.
Allow spaces between words.
I want to mention that i'm trimming the string before doing a match. This is what I tried so far.
^[^\W_](\s?\w?)*$|^[^\W_]{3,}(\s?\w?)*\*$|^[\*][^\W_]{3,}(\s?\w?)*$
Debuggex Demo
Below are some examples I tried -
someone xxx, someone xxx yyy - Passed
someone* xxx- Failed
someone , someone - Passed
This is the nearest match of what i want- But it fail for these test case.
AB asf* -- Fails , this will pass- ABC asf*
*AB asf -- Fails , this will pass- *ABC asf
I know I have a condition that says - starts with at least 3 alphanumeric character and repeat space and alphanumeric characters.
That's where I need help with.
Thanks.
UPDATE2 This pattern should do:
/^([a-zA-Z0-9]{3,}[^\n*]*\*?|\*[a-zA-Z0-9]{2,}[^\n*]*|[a-zA-Z0-9]{2}\*)$/gm
EXPLANATION:
^ # assert start of line
( # 1st capturing group starts
[a-zA-Z0-9]{3,} # match 3+ times alphanumeric characters
[^\n*]* # match 0 or more non-newline and non-star (*) characters
\*? # match 0 or one literal star (*) character;
| # OR
\* # match one literal star (*) character
[a-zA-Z0-9]{2,} # match 2+ times alphanumeric characters
[^\n*]* # match 0 or more non-newline and non-star (*) characters;
| # OR
[a-zA-Z0-9]{2} # match 2 non-newline and non-star (*) characters
\* # match one literal star (*) character
) # 1st capturing group ends
$ # assert end of line
REGEX 101 DEMO.
Try this one:
^(?:[^\W_]+|\*[^\W_]{3,}|[^\W_]{3,}\*)(?:\s+(?:[^\W_]+|\*[^\W_]{3,}|[^\W_]{3,}\*))*$
NOTE: using [^\W_] instead of \w just as in your original regex.
regex101
However, I argue that this task cannot be solved in a clean way using a regex. Maybe a proper javascript function would be more readable.
If I understand correctly the requirements,
this might work. It does in my tests.
^(?:\*[^\W_]{3,}(?:\s*[^\W_]\s*)*|(?:\s*[^\W_]\s*)*[^\W_]{3,}\*|(?:\s*[^\W_]\s*)+)$
Expanded
^ # BOS
(?: # One of either ---
\* # Star at beeginning
[^\W_]{3,} # 3 or more words
(?: \s* [^\W_] \s* )* # Any number of word's following spaces
| # or,
(?: \s* [^\W_] \s* )* # Any number of word's following spaces
[^\W_]{3,} # 3 or more words
\* # Star at end
| # or,
(?: \s* [^\W_] \s* )+ # Any number of word's following spaces
) # ---------
$ # EOS

Any way to match a pattern EITHER preceded OR followed by a certain character?

For instance, I'd like to match all of the following strings:
'abc'
'cba'
'bca'
'cab'
'racb'
'rcab'
'bacr'
but NOT any of the following:
'rabcr'
'rbacr'
'rbcar'
Is this possible with regex?
The easiest way would be to use alternation:
/^(?:[abc]{3}r?|r[abc]{3})$/
Explanation:
^ # Start of string
(?: # Non-capturing group:
[abc]{3} # Either match abc,cba,bac etc.
r? # optionally followed by r
| # or
r # match r
[abc]{3} # followed by abc,cba,bac etc.
) # End of group
$ # End of string
Some regex engines support conditionals, but JavaScript is not among them. But in .NET, you could do
^(r)?[abc]{3}(?(1)|r?)$
without the need to write your character class twice in the same regex.
Explanation:
^ # Start of string
(r)? # Match r in group 1, but make the group optional
[abc]{3} # Match abc,cab etc.
(?(1) # If group 1 participated in the match,
# then match nothing,
| # else
r? # match r (or nothing)
) # End of conditional
$ # End of string
Another solution in JavaScript would be to use a negative lookahead assertion:
/^(?:r(?!.*r$))?[abc]{3}r?$/
Explanation:
^ # Start of string
(?: # Non-capturing group:
r # Match r
(?!.*r$) # only if the string doesn't end in r
)? # Make the group optional
[abc]{3} # Match abc etc.
r? # Match r (optionally)
$ # End of string

How do I combine these two regular expressions into one?

I'm writing a rudimentary lexer using regular expressions in JavaScript and I have two regular expressions (one for single quoted strings and one for double quoted strings) which I wish to combine into one. These are my two regular expressions (I added the ^ and $ characters for testing purposes):
var singleQuotedString = /^'(?:[^'\\]|\\'|\\\\|\\\/|\\b|\\f|\\n|\\r|\\t|\\u[0-9A-F]{4})*'$/gi;
var doubleQuotedString = /^"(?:[^"\\]|\\"|\\\\|\\\/|\\b|\\f|\\n|\\r|\\t|\\u[0-9A-F]{4})*"$/gi;
Now I tried to combine them into a single regular expression as follows:
var string = /^(["'])(?:[^\1\\]|\\\1|\\\\|\\\/|\\b|\\f|\\n|\\r|\\t|\\u[0-9A-F]{4})*\1$/gi;
However when I test the input "Hello"World!" it returns true instead of false:
alert(string.test('"Hello"World!"')); //should return false as a double quoted string must escape double quote characters
I figured that the problem is in [^\1\\] which should match any character besides matching group \1 (which is either a single or a double quote - the delimiter of the string) and \\ (which is the backslash character).
The regular expression correctly filters out backslashes and matches the delimiters, but it doesn't filter out the delimiter within the string. Any help will be greatly appreciated. Note that I referred to Crockford's railroad diagrams to write the regular expressions.
You can't refer to a matched group inside a character class: (['"])[^\1\\]. Try something like this instead:
(['"])((?!\1|\\).|\\[bnfrt]|\\u[a-fA-F\d]{4}|\\\1)*\1
(you'll need to add some more escapes, but you get my drift...)
A quick explanation:
(['"]) # match a single or double quote and store it in group 1
( # start group 2
(?!\1|\\). # if group 1 or a backslash isn't ahead, match any non-line break char
| # OR
\\[bnfrt] # match an escape sequence
| # OR
\\u[a-fA-F\d]{4} # match a Unicode escape
| # OR
\\\1 # match an escaped quote
)* # close group 2 and repeat it zero or more times
\1 # match whatever group 1 matched
This should work too (raw regex).
If speed is a factor, this is the 'unrolled' method, said to be the fastest for this kind of thing.
(['"])(?:(?!\\|\1).)*(?:\\(?:[\/bfnrt]|u[0-9A-F]{4}|\1)(?:(?!\\|\1).)*)*/1
Expanded
(['"]) # Capture a quote
(?:
(?!\\|\1). # As many non-escape and non-quote chars as possible
)*
(?:
\\ # escape plus,
(?:
[\/bfnrt] # /,b,f,n,r,t or u[a-9A-f]{4} or captured quote
| u[0-9A-F]{4}
| \1
)
(?:
(?!\\|\1). # As many non-escape and non-quote chars as possible
)*
)*
/1 # Captured quote
Well, you can always just create a larger regex by just using the alternation operator on the smaller regexes
/(?:single-quoted-regex)|(?:double-quoted-regex)/
Or explicitly:
var string = /(?:^'(?:[^'\\]|\\'|\\\\|\\\/|\\b|\\f|\\n|\\r|\\t|\\u[0-9A-F]{4})*'$)|(?:^"(?:[^"\\]|\\"|\\\\|\\\/|\\b|\\f|\\n|\\r|\\t|\\u[0-9A-F]{4})*"$)/gi;
Finally, if you want to avoid the code duplication, you can build up this regex dynamically, using the new Regex constructor.
var quoted_string = function(delimiter){
return ('^' + delimiter + '(?:[^' + delimiter + '\\]|\\' + delimiter + '|\\\\|\\\/|\\b|\\f|\\n|\\r|\\t|\\u[0-9A-F]{4})*' + delimiter + '$').replace(/\\/g, '\\\\');
//in the general case you could consider using a regex excaping function to avoid backslash hell.
};
var string = new RegExp( '(?:' + quoted_string("'") + ')|(?:' + quoted_string('"') + ')' , 'gi' );

Replace spaces but not when between parentheses

I guess I can do this with multiple regexs fairly easily, but I want to replace all the spaces in a string, but not when those spaces are between parentheses.
For example:
Here is a string (that I want to) replace spaces in.
After the regex I want the string to be
Hereisastring(that I want to)replacespacesin.
Is there an easy way to do this with lookahead or lookbehing operators?
I'm a little confused on how they work, and not real sure they would work in this situation.
Try this:
replace(/\s+(?=[^()]*(\(|$))/g, '')
A quick explanation:
\s+ # one or more white-space chars
(?= # start positive look ahead
[^()]* # zero or more chars other than '(' and ')'
( # start group 1
\( # a '('
| # OR
$ # the end of input
) # end group 1
) # end positive look ahead
In plain English: it matches one or more white space chars if either a ( or the end-of-input can be seen ahead without encountering any parenthesis in between.
An online Ideone demo: http://ideone.com/jaljw
The above will not work if:
there are nested parenthesis
parenthesis can be escaped

Categories