Regex: Match until first occurrence met - javascript

What I am trying is to match until first occurrence of & met. Right now it is matching only the last occurrence of &.
My regular expression is
(?!^)(http[^\\]+)\&
And I'm trying to match against this text:
https://www.google.com/url?rct3Dj&sa3Dt&url3Dhttp://business.itbusinessnet.com/article/WorldStage-Supports-Massive-4K-Video-Mapping-at-Adobe-MAX-with-Christie-Boxer-4K-Projectors---4820052&ct3Dga&cd3DCAEYACoTOTEwNTAyMzI0OTkyNzU0OTI0MjIaMTBmYTYxYzBmZDFlN2RlZjpjb206ZW46VVM&usg3DAFQjCNE6oIhIxR6qRMBmLkHOJTKLvamLFg
What I need is:
http://business.itbusinessnet.com/article/WorldStage-Supports-Massive-4K-Video-Mapping-at-Adobe-MAX-with-Christie-Boxer-4K-Projectors---4820052
Click for the codebase.

Use the non-greedy mode like this:
/(?!^)(http[^\\]+?)&/
// ^
In non-greedy mode (or lazy mode) the match will be as short as possible.
If you want to get rid ot the & then just wrap it in a lookahead group so it won't be in the match like this:
/(?!^)(http[^\\]+?)(?=&)/
// ^^ ^
Or you could optimize the regular expression as #apsillers suggested in the comment bellow like this:
/(?!^)(http[^\\&]+)/
Note: & is not a special character so you don't need to escape it,

Related

Regular expression match specific key words

I am trying to use regexp to match some specific key words.
For those codes as below, I'd like to only match those IFs at first and second line, which have no prefix and postfix. The regexp I am using now is \b(IF|ELSE)\b, and it will give me all the IFs back.
IF A > B THEN STOP
IF B < C THEN STOP
LOL.IF
IF.LOL
IF.ELSE
Thanks for any help in advance.
And I am using http://regexr.com/ for test.
Need to work with JS.
I'm guessing this is what you're looking for, assuming you've added the m flag for multiline:
(?:^|\s)(IF|ELSE)(?:$|\s)
It's comprised of three groups:
(?:^|\s) - Matches either the beginning of the line, or a single space character
(IF|ELSE) - Matches one of your keywords
(?:$|\s) - Matches either the end of the line, or a single space character.
Regexr
you can do it with lookaround (lookahead + lookbehind). this is what you really want as it explicitly matches what you are searching. you don't want to check for other characters like string start or whitespaces around the match but exactly match "IF or ELSE not surrounded by dots"
/(?<!\.)(IF|ELSE)(?!\.)/g
explanation:
use the g-flag to find all occurrences
(?<!X)Y is a negative lookbehind which matches a Y not preceeded by an X
Y(?!X) is a negative lookahead which matches a Y not followed by an X
working example: https://regex101.com/r/oS2dZ6/1
PS: if you don't have to write regex for JS better use a tool which supports the posix standard like regex101.com

JavaScript Alternation without parenthesis

I have a string in JavaScript which looks like the following:
"This {{#is}} a $|test$| string with $|#string$| delimiters {{as}} follows"
And I have a regex which is used to extract the strings between $|.*?$| and {{.*?}} as follows:
/{{(.*?)}}|\$\|(.*?)\$\|/g
Example: https://regex101.com/r/mV3uR1/1
I would like to combine the alternation so there is only one matching group, e.g.:
/{{|\$\|(.*?)}}|\$\|/g
But this seems to ignore my quantifier for 0 or 1 times (the ?) and it matches the entire string up to ... {{as.
Example: https://regex101.com/r/qZ7iI5/1
Why is that happening?
If I enhance that regex to include parenthesis as follows, it does work:
/({{|\$\|)(.*?)(}}|\$\|)/g
Example: https://regex101.com/r/fT5qH0/1
But this then includes the curly braces/dollar-pipe in my matching group which is what I am trying to avoid (as I only care about the string between these delimiters so only want one matching group).
Can anybody shed some light on this please?
Let's compare:
working regex:
/({{|\$\|)(.*?)(}}|\$\|)/g
and not working regex:
/{{|\$\|(.*?)}}|\$\|/g
In the 2nd regex (.*?) has to be followed by }} and alternation is for the whole \$\|(.*?)}} sub-pattern so effectively it means match:
{{ OR \$\|(.*?)}} OR \$\|
Whereas in the first regex due to grouping alternation is correctly applied before & after (.*?).
You can use non-capturing groups as well:
/(?:{{|\$\|)(.*?)(?:}}|\$\|)/g
Now it means:
{{ OR \$\| followed by (.*?) followed by }} OR \$\|.
(?:{{|\$\|)(.*?)(?:}}|\$\|)
^^ ^^
You can try this.See demo.By making the other groups non capturing you will have only 1 group.
https://regex101.com/r/cT0hV4/13
But this will match {{asd$| too.
Your regex {{|\$\|(.*?)}}|\$\| will match any of the 3 following different strings:
{{
\$\|(.*?)}} #look at the start and end of this string and you will understand
$|.
That is the reason you are getting that match,

What's wrong with the non capture group in my regular expression

I'm trying to write a regular expression that will match a strings similar to the ones below:
Yu MSBE26
w AWAQBNL
I am using Javascript and have come up with the following regular expression:
(.*?(?:[AWMS\d]{2})[AWMS\d]{2}[A-Z]{2}[\dA-Za-z]{1,3})
In words, I start my capture group off by matching everything until the [AWMS\d]{2} pattern is encountered, then I match the [AWMS\d]{2} pattern, the [A-Z]{2} that follows and finally the [\dA-Za-z]{1,3} to match the final two or three characters.
From what I have read, this should be working, but I'm not getting any matches.
For example when I use a regex tester I don't get any matches: Sample
Remove the second [AWMS\d]{2} - it looks like an accidental addition and is the reason your regex doesn't work:
(.*?(?:[AWMS\d]{2})[A-Z]{2}[\dA-Za-z]{1,3})
Edit: you don't even need the non capture group, the square brackets are enough:
(.*?[AWMS\d]{2}[A-Z]{2}[\dA-Za-z]{1,3})
Your regex doesn't match your values because simply they don't match.
Your pattern is:
(.*?(?:[AWMS\d]{2})[AWMS\d]{2}[A-Z]{2}[\dA-Za-z]{1,3})
Yu MSBE26
^--- fails here
w AWAQBNL
^--- fails here
Btw, you can use your regex to match your strings as this:
(.*?[AWMS\d]{2}[A-Z]{2}[\dA-Za-z]{1,3})
Working demo

What does ?=^ mean in a regexp?

I want to write regexp which allows some special characters like #-. and it should contain at least one letter. I want to understand below things also:
/(?=^[A-Z0-9. '-]{1,45}$)/i
In this regexp what is the meaning of ?=^ ? What is a subexpression in regexp?
(?=) is a lookahead, it's looking ahead in the string to see if it matches without actually capturing it
^ means it matches at the BEGINNING of the input (for example with the string a test, ^test would not match as it doesn't start with "test" even though it contains it)
Overall, your expression is saying it has to ^ start and $ end with 1-45 {1,45} items that exist in your character group [A-Z0-9. '-] (case insensitive /i). The fact it is within a lookahead in this case just means it's not going to capture anything (zero-length match).
?= is a positive lookahead
Read more on regex

How to write a RegEx to check for a certain, specific number of characters?

I am trying to test a string for a state code, the regex I have is
^A[LKSZRAEP]|C[AOT]|D[EC]|F[LM]|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEHINOPST]|N[CDEHJMVY]|O[HKR]|P[ARW]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY]$
The issue is, if I have something like "CTA12" as a test string, it will get a match of CT. How can I modify my regex to make it only match state codes that are not part of a larger string?
Your use of anchors with alternation is incorrect, ^AB|DC$ means "strings that start with AB or end with DC". To get the ^ and $ to both apply to each element of the alternation, you need to put the alternation in a group, for example ^(AB|DC)$.
Try changing your regex to the following:
^(A[LKSZRAEP]|C[AOT]|D[EC]|F[LM]|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEHINOPST]|N[CDEHJMVY]|O[HKR]|P[ARW]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])$
The alternative to using a group is to put the ^ and $ as a part of each element in the alternation, for example ^AB$|^DC$, but that would make your regex significantly longer so a group is the way to go.

Categories