What's wrong with the non capture group in my regular expression - javascript

I'm trying to write a regular expression that will match a strings similar to the ones below:
Yu MSBE26
w AWAQBNL
I am using Javascript and have come up with the following regular expression:
(.*?(?:[AWMS\d]{2})[AWMS\d]{2}[A-Z]{2}[\dA-Za-z]{1,3})
In words, I start my capture group off by matching everything until the [AWMS\d]{2} pattern is encountered, then I match the [AWMS\d]{2} pattern, the [A-Z]{2} that follows and finally the [\dA-Za-z]{1,3} to match the final two or three characters.
From what I have read, this should be working, but I'm not getting any matches.
For example when I use a regex tester I don't get any matches: Sample

Remove the second [AWMS\d]{2} - it looks like an accidental addition and is the reason your regex doesn't work:
(.*?(?:[AWMS\d]{2})[A-Z]{2}[\dA-Za-z]{1,3})
Edit: you don't even need the non capture group, the square brackets are enough:
(.*?[AWMS\d]{2}[A-Z]{2}[\dA-Za-z]{1,3})

Your regex doesn't match your values because simply they don't match.
Your pattern is:
(.*?(?:[AWMS\d]{2})[AWMS\d]{2}[A-Z]{2}[\dA-Za-z]{1,3})
Yu MSBE26
^--- fails here
w AWAQBNL
^--- fails here
Btw, you can use your regex to match your strings as this:
(.*?[AWMS\d]{2}[A-Z]{2}[\dA-Za-z]{1,3})
Working demo

Related

Is it possible to replace a negative "look around" in a regular expression with something that doesn't use a look-behind?

I have a regex that matches a single character x but not xx.
(?<!x)x(?!x)
This works great, the problem is the regex engine in firefox doesn't support look-behinds. (this bug)
Is there any way to rewrite this so it doesn't use a looking behind?
edit:
the solution in the duplicate link doesn't work in this case. If I replace the negative look-behind with
(?!^x)x(?!x)
It doesn't match correctly.
Here is a workaround:
^(?:^|[^x])x(?!x)
Demo
It matches x if preceded by beginning of line or by a non-x
You could use non-capturing groups:
(?:[^x]|^)(x)(?:[^x]|$)
This means we search single "x" symbol in some string. Yes, the regex matches three symbols, but we can refer to match x as $1.

Regex: Match until first occurrence met

What I am trying is to match until first occurrence of & met. Right now it is matching only the last occurrence of &.
My regular expression is
(?!^)(http[^\\]+)\&
And I'm trying to match against this text:
https://www.google.com/url?rct3Dj&sa3Dt&url3Dhttp://business.itbusinessnet.com/article/WorldStage-Supports-Massive-4K-Video-Mapping-at-Adobe-MAX-with-Christie-Boxer-4K-Projectors---4820052&ct3Dga&cd3DCAEYACoTOTEwNTAyMzI0OTkyNzU0OTI0MjIaMTBmYTYxYzBmZDFlN2RlZjpjb206ZW46VVM&usg3DAFQjCNE6oIhIxR6qRMBmLkHOJTKLvamLFg
What I need is:
http://business.itbusinessnet.com/article/WorldStage-Supports-Massive-4K-Video-Mapping-at-Adobe-MAX-with-Christie-Boxer-4K-Projectors---4820052
Click for the codebase.
Use the non-greedy mode like this:
/(?!^)(http[^\\]+?)&/
// ^
In non-greedy mode (or lazy mode) the match will be as short as possible.
If you want to get rid ot the & then just wrap it in a lookahead group so it won't be in the match like this:
/(?!^)(http[^\\]+?)(?=&)/
// ^^ ^
Or you could optimize the regular expression as #apsillers suggested in the comment bellow like this:
/(?!^)(http[^\\&]+)/
Note: & is not a special character so you don't need to escape it,

How to find which part/group of regular expression fails

I need to find which part of expression fails.Let say I have a expression ^-?(\\d*)(,\\d{1,3})*(?:[,]|([.]\\d{0,2}))?$ And I want to know if it fails while matching comma (,) or decimal part . How I can find unmatched group in given regular expression
Break it in to smaller chunks and test that each part matches what you expect it to.
Also as #Avinash Raj has mentioned, online regex checkers like regex101 are indespensible.
These tools highlight what has and hasn't been matched in a given set of data. This will show you where the regex is failing.

How to write a RegEx to check for a certain, specific number of characters?

I am trying to test a string for a state code, the regex I have is
^A[LKSZRAEP]|C[AOT]|D[EC]|F[LM]|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEHINOPST]|N[CDEHJMVY]|O[HKR]|P[ARW]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY]$
The issue is, if I have something like "CTA12" as a test string, it will get a match of CT. How can I modify my regex to make it only match state codes that are not part of a larger string?
Your use of anchors with alternation is incorrect, ^AB|DC$ means "strings that start with AB or end with DC". To get the ^ and $ to both apply to each element of the alternation, you need to put the alternation in a group, for example ^(AB|DC)$.
Try changing your regex to the following:
^(A[LKSZRAEP]|C[AOT]|D[EC]|F[LM]|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEHINOPST]|N[CDEHJMVY]|O[HKR]|P[ARW]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])$
The alternative to using a group is to put the ^ and $ as a part of each element in the alternation, for example ^AB$|^DC$, but that would make your regex significantly longer so a group is the way to go.

Alternation operator inside square brackets does not work

I'm creating a javascript regex to match queries in a search engine string. I am having a problem with alternation. I have the following regex:
.*baidu.com.*[/?].*wd{1}=
I want to be able to match strings that have the string 'word' or 'qw' in addition to 'wd', but everything I try is unsuccessful. I thought I would be able to do something like the following:
.*baidu.com.*[/?].*[wd|word|qw]{1}=
but it does not seem to work.
replace [wd|word|qw] with (wd|word|qw) or (?:wd|word|qw).
[] denotes character sets, () denotes logical groupings.
Your expression:
.*baidu.com.*[/?].*[wd|word|qw]{1}=
does need a few changes, including [wd|word|qw] to (wd|word|qw) and getting rid of the redundant {1}, like so:
.*baidu.com.*[/?].*(wd|word|qw)=
But you also need to understand that the first part of your expression (.*baidu.com.*[/?].*) will match baidu.com hello what spelling/handle????????? or hbaidu-com/ or even something like lkas----jhdf lkja$##!3hdsfbaidugcomlaksjhdf.[($?lakshf, because the dot (.) matches any character except newlines... to match a literal dot, you have to escape it with a backslash (like \.)
There are several approaches you could take to match things in a URL, but we could help you more if you tell us what you are trying to do or accomplish - perhaps regex is not the best solution or (EDIT) only part of the best solution?

Categories