Why this Regex selects parenthesis and after though I use look ahead - javascript

I use this regex
/\.(.+)?(?=(\(?)|\r\n)/gi
with
part1.part2
part1.part2(part3) part4
I want only match .part2 in both cases
but in second case I get .part2(part3) part4

You should make the .+ part non-greedy, by using .+?, as otherwise it will also capture the opening parenthesis you want to see in the look-ahead part.
Also, in the second part, don't make the \( optional, otherwise you will be OK in having nothing in your look-ahead to match.
Finally, don't match \r\n, but the end-of-line anchor $ in combination with the m flag (so that it matches the end of each line instead of the whole input).
So:
\.(.+?)(?=\(|$)
regex101 link

You see the parenthesis in the match as the . can also match (.
The pattern will match the rest of the line after the first dot without backtracking to a ( as the parenthesis in the lookahead is optional \(? and the assertion will be true.
You could make use of a negated character class not crossing parenthesis or a newline when matching.
\.([^()\r\n]+)
Regex demo

Related

How to not match given prefix in RegEx without negative lookbehind?

Goal
The goal is matching a string in JavaScript without certain delimiters, i.e. a string between two characters (the characters can be included in the match).
For example, this string should be fully matched: $ test string $. This can appear anywhere in a string. That would be trivial, however, we want to allow escaping the syntax, e.g. The price is 5\$ to 10\$.
Summarized:
Match any string that is enclosed by two $ signs.
Do not match it if the dollar signs are escaped using \$.
Solution using negative lookbehind
A solution that achieves this goal perfectly is: (?<!\\)\$(.*?)(?<!\\)\$.
Problem
This solution uses negative lookbehind, which is not supported on Safari. How can the same matches be achieved without using negative lookbehind (i.e. on Safari)?
A solution that partially works is (?<!\\)\$(.*?)(?<!\\)\$. However, this will also match the character in front of the $ sign if it is not a \.
You might rule out what you don't want by matching it, and capture what you want to keep in group 1
\\\$.*?\$|\$.*?\\\$|(\$.*?\$)
Regex demo
You may use this regex and grab your inner text using capture group #1 as you are already doing in your current regex using lookbehind:
(?:^|[^\\])\$((?:\\.|[^$])*)\$
RegEx Demo
RegEx Details:
(?:^|[^\\]): Match start position or a non-backslash character in a non-capturing group
\$: Match starting $
(: Start capturing group
(?:\\.|[^$])*: Match any escaped character or a non-$ character. Repeat this group 0 or more times
): End capturing group
\$: Match closing $
PS: This regex will give same matches as your current regex: (?<!\\)\$(.*?)(?<!\\)\$

How to form regex to match everything up to a "("

In javascript, how can a regular expression be formed to match everything up to and NOT including an opening parenthesis "("?
example input:
"12(pm):00"
"12(am):))"
"8(am):00"
ive found /^(.*?)\(/ to be successful with the "up to" part, but the match returned includes the "("
In regex101.com, its says the first capturing group is what im looking for, is there a way to return only the captured group?
There are three ways to deal with this. The first is to restrict the characters you match to not include the parenthesis:
let match = "12(pm):00".match(/[^(]*/);
console.log(match[0]);
The second is to only get the part of the match you are interested in, using capture groups:
let match = "12(pm):00".match(/(.*?)\(/);
console.log(match[1]);
The third is to use lookahead to explicitly exclude the parenthesis from the match:
let match = "12(pm):00".match(/.*?(?=\()/);
console.log(match[0]);
As in OP, note the non-greedy modifier in the second and third case: it is necessary to restrict the quantifier in case there is another open parenthesis further inside the string. This is not necessary in the first place, since the quantifier is explicitly forbidden to gobble up the parenthesis.
Try
^\d+
^ asserts position at start of a line
\d matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
https://regex101.com/r/C9XNT4/1

How to modify this hashtag regex to check if the second character is a-z or A-Z?

I'm building on a regular expression I found that works well for my use case. The purpose is to check for what I consider valid hashtags (I know there's a ton of hashtag regex posts on SO but this question is specific).
Here's the regex I'm using
/(^|\B)#(?![0-9_]+\b)([a-zA-Z0-9_]{1,20})(\b|\r)/g
The only problem I'm having is I can't figure out how to check if the second character is a-z (the first character would be the hashtag). I only want the first character after the hashtag to be a-z or A-Z. No numbers or non-alphanumeric.
Any help much appreciated, I'm very novice when it comes to regular expressions.
As I mentioned in the comments, you can replace [a-zA-Z0-9_]{1,20} with [a-zA-Z][a-zA-Z0-9_]{0,19} so that the first character is guaranteed to be a letter and then followed by 0 to 19 word characters (alphanumeric or underscore).
However, there are other unnecessary parts in your pattern. It appears that all you need is something like this:
/(?:^|\B)#[a-zA-Z][a-zA-Z0-9_]{0,19}\b/g
Demo.
Breakdown of (?:^|\B):
(?: # Start of a non-capturing group (don't use a capturing group unless needed).
^ # Beginning of the string/line.
| # Alternation (OR).
\B # The opposite of `\b`. In other words, it makes sure that
# the `#` is not preceded by a word character.
) # End of the non-capturing group.
Note: You may also replace [a-zA-Z0-9_] with \w.
References:
Word Boundaries.
Difference between \b and \B in regex.
The below should work.
(^|\B)#(?![0-9_]+\b)([a-zA-Z][a-zA-Z0-9_]{0,19})(\b|\r)
If you only want to accept two or more letter hashtags then change {0,19} with {1,19}.
You can test it here
In your pattern you use (?![0-9_]+\b) which asserts that what is directly on the right is not a digit or an underscore and can match a lot of other characters as well besides an upper or lower case a-z.
If you want you can use this part [a-zA-Z0-9_]{1,20} but then you have to use a positive lookahead instead (?=[a-zA-Z]) to assert what is directly to the right is an upper or lower case a-z.
(?:^|\B)#(?=[a-zA-Z])[a-zA-Z0-9_]{1,20}\b
Regex demo

Matching Regex till a character

This is my string:
<address>tel+1234567890</address>
This is my regex:
([\d].*<)
which matches this:
1234567890<
but I dont want to match the last <character.
You can use a positive lookahead:
\d+(?=<)
The (?=...) syntax makes sure what's inside the parens matches at that position, without moving the match cursor forward, thus without consuming the input string. It's also called a zero-width assertion.
By the way, the square brackets in [\d] are redundant, so you can omit them. Also, I've changed the regex, but perhaps you really meant to match this:
\d.*?(?=<)
This pattern matches everything between a digit and a <, including the digit. It makes use of an ungreedy quantifier (*?) to match up until the first < if there are several.
([\d]+)
This should work , try it out and let me know
Check the demo
Also as #LucasTrzesniewski said , you can use the look ahead
(\d+.(?=<))
Here is the demo

What does ?=^ mean in a regexp?

I want to write regexp which allows some special characters like #-. and it should contain at least one letter. I want to understand below things also:
/(?=^[A-Z0-9. '-]{1,45}$)/i
In this regexp what is the meaning of ?=^ ? What is a subexpression in regexp?
(?=) is a lookahead, it's looking ahead in the string to see if it matches without actually capturing it
^ means it matches at the BEGINNING of the input (for example with the string a test, ^test would not match as it doesn't start with "test" even though it contains it)
Overall, your expression is saying it has to ^ start and $ end with 1-45 {1,45} items that exist in your character group [A-Z0-9. '-] (case insensitive /i). The fact it is within a lookahead in this case just means it's not going to capture anything (zero-length match).
?= is a positive lookahead
Read more on regex

Categories