For context I am using Mongoose and regex to match a string in a database using find().
Given an example string {W}{W}{U}{U}{B}{B}{R}{R}{G}{G} I need to match occurrences of certain letters. I'm trying to make a RegExp that will match only when I have the required number of letters.
{W}{W}{U}{U}{B}{B}{R}{R}{G}{G} => wwuubbrrgg, ggrrbbuuww, wuwubrbrgg, etc
{W}{W}{U} => wwu, wuw, uww, etc
Solutions I found were not able to account for the order of the string being somewhat random and multiple letters potentially being in the same bracket: {U/R}. Because of that I only want to take into account the actual letters and only match when it's found the sufficient number of letters and not encountered any letters that are not present.
Regex is really, really bad at counting. wanting a specific number of a specific character in no specific order is not something Regex is very good at. It can be done, but not with any reasonable measure of efficiency. As an example, here is a working Regex for your scenario:
^(?=[^wW\n]*[wW][^wW\n]*[wW][^wW\n]*)(?=[^uU\n]*[uU][^uU\n]*[uU][^uU\n]*)(?=[^bB\n]*[bB][^bB\n]*[bB][^bB\n]*)(?=[^rR\n]*[rR][^rR\n]*[rR][^rR\n]*)(?=[^gG\n]*[gG][^gG\n]*[gG][^gG\n]*).{10}$
As we can see, it's very, very long for something so simple. That's because this behavior is not really what Regex is designed for, as the desired functionality isn't much of a pattern. I would personally recommend going through and simply counting occurences of each character. But, if you're dead set on regex, here's the breakdown:
^(?=[^wW\n]*[wW][^wW\n]*[wW][^wW\n]*)(?=[^uU\n]*[uU][^uU\n]*[uU][^uU\n]*)(?=[^bB\n]*[bB][^bB\n]*[bB][^bB\n]*)(?=[^rR\n]*[rR][^rR\n]*[rR][^rR\n]*)(?=[^gG\n]*[gG][^gG\n]*[gG][^gG\n]*).{10}$
^ //anchor to start of string
(?= //start lookahead
[^wW\n]* //any number of characters that aren't a 'w' or new line
[wW] //followed by the first instance of a character we're looking for
[^wW\n]* //any number of characters that aren't a 'w' or new line
[wW] //followed by the second instance of a character we're looking for
[^wW\n]* //any number of characters that aren't a 'w' or new line
) //end lookahead
... //repeat this for every character we want to be sure is in the string
.{10} //now actually match the ten characters, now that we know the number of each is correct
$ //then validate that that takes us to the end of the string
EDIT: Actually, this regex can be reduced slightly down to:
^(?=[^wW\n]*[wW][^wW\n]*[wW])(?=[^uU\n]*[uU][^uU\n]*[uU])(?=[^bB\n]*[bB][^bB\n]*[bB])(?=[^rR\n]*[rR][^rR\n]*[rR])(?=[^gG\n]*[gG][^gG\n]*[gG]).{10}$
Essentially, this just gets rid of the final negative capture group in each lookahead. It is not necessary since we are constraining the total capture length to the same as the sum of each character requirement. That condition is enough to know that we satisfy the requirement of not having MORE than 2 of any given character. Still, I'd avoid the regex solution to this problem, as in the time taken to generate and run this regex for a given combination of characters you could already have counted the instances of each character and come upon the same result.
Related
I've written a regular expression that matches any number of letters with any number of single spaces between the letters. I would like that regular expression to also enforce a minimum and maximum number of characters, but I'm not sure how to do that (or if it's possible).
My regular expression is:
[A-Za-z](\s?[A-Za-z])+
I realized it was only matching two sets of letters surrounding a single space, so I modified it slightly to fix that. The original question is still the same though.
Is there a way to enforce a minimum of three characters and a maximum of 30?
Yes
Just like + means one or more you can use {3,30} to match between 3 and 30
For example [a-z]{3,30} matches between 3 and 30 lowercase alphabet letters
From the documentation of the Pattern class
X{n,m} X, at least n but not more than m times
In your case, matching 3-30 letters followed by spaces could be accomplished with:
([a-zA-Z]\s){3,30}
If you require trailing whitespace, if you don't you can use: (2-29 times letter+space, then letter)
([a-zA-Z]\s){2,29}[a-zA-Z]
If you'd like whitespaces to count as characters you need to divide that number by 2 to get
([a-zA-Z]\s){1,14}[a-zA-Z]
You can add \s? to that last one if the trailing whitespace is optional. These were all tested on RegexPlanet
If you'd like the entire string altogether to be between 3 and 30 characters you can use lookaheads adding (?=^.{3,30}$) at the beginning of the RegExp and removing the other size limitations
All that said, in all honestly I'd probably just test the String's .length property. It's more readable.
This is what you are looking for
^[a-zA-Z](\s?[a-zA-Z]){2,29}$
^ is the start of string
$ is the end of string
(\s?[a-zA-Z]){2,29} would match (\s?[a-zA-Z]) 2 to 29 times..
Actually Benjamin's answer will lead to the complete solution to the OP's question.
Using lookaheads it is possible to restrict the total number of characters AND restrict the match to a set combination of letters and (optional) single spaces.
The regex that solves the entire problem would become
(?=^.{3,30}$)^([A-Za-z][\s]?)+$
This will match AAA, A A and also fail to match AA A since there are two consecutive spaces.
I tested this at http://regexpal.com/ and it does the trick.
You should use
[a-zA-Z ]{20}
[For allowed characters]{for limiting of the number of characters}
I'm trying to parse shorthand notation into an integer representation. This works fine for Hours, Seconds, and Minutes, but not with Milliseconds, where the regex is failing to match.
'50ms'.match(/^(\d+)([MS|S|M|H|ms|s|m|h])$/);
I wasn't sure how to phrase the question correctly, but i did perform several searches prior to asking here.
jsfiddle
If you need to match sequences of characters, you need to use alternation groups defiend with (...|...) constructs.
A character class only matches a single character defined in it. See more details on character class here.
Your regex does not work with milliseconds because you require 1 character after digits followed with the end of string immediately. Thus, there is no place for 2 letters "ms".
So, the correct way is to use
'50ms'.match(/^(\d+)(MS|S|M|H|ms|s|m|h)$/);
As Tushar suggests, you can further contract the pattern using /i modifier and reducing the number of alternatives.
/^(\d+)(MS|ms|[SMH])$/i
See this demo
Say I have a string which contains some units (which may or may not have prefixes) that I want to break into the individual units. For example the string may contain "Btu(th)" or "Btu(th).ft" or even "mBtu(th).ft" where mBtu(th) is the bastardised unit milli thermochemical BTU's (this is purely an example).
I currently have the following (simplified) regex however it fails for the case "mBtu(th).ft":
/(m|k)??(Btu\(th\)|ft|m)(?:\b|\s|$)/g
Currently this does not correctly detect the boundary between the end of 'Btu(th)' and the start of 'ft'. I understand javascript regex does not support look back so how do I accurately parse the string?
Additional notes
The regex presented above is greatly simplified around the prefixes and units groups. The prefixes could span multiple characters like 'Ki' and therefore character sets are not suitable.
The desire is for each group to catch the prefix match as group 1 and the unit as match two i.e for 'mBtu(th).ft' match one would be ['m','Btu(th)'] and match two would be ['','ft'].
The prefix match needs to be lazy so that the string 'm' would be matched as the unit metres rather than the prefix milli. Likewise the match for 'mm' would need to be the prefix milli and the unit metres.
I would try with:
/((m)|(k)|(Btu(\(th\))?)|(ft)|(m)|(?:\.))+/g
at least with example above, it matches all units merged into one string.
DEMO
EDIT
Another try (DEMO):
/(?:(m)|(k)|(Btu)|(th)|(ft)|[\.\(\)])/g
this one again match only one part, but if you use $1,$2,$3,$4, etc, (DEMO) you can extract other fragments. It ignores ., (, ), characters. The problem is to count proper matched groups, but it works to some degree.
Or if you accept multiple separate matches I think simple alternative is:
/(m|k|Btu|th|ft)/g
A word boundary will not separate two non-word characters. So, you don't actually want a word boundary since the parentheses and period are not valid word characters. Instead, you want the string to not be followed by a word character, so you can use this instead:
[mk]??(Btu\(th\)|ft|m)(?!\w)
Demo
I believe you're after something like this. If I understood you correctly that want to match any kind of element, possibly preceded by the m or k character and separated by parantheses or dots.
/[\s\.\(]*(m|k?)(\w+)[\s\.\)]*/g
https://regex101.com/r/eQ5nR4/2
If you don't care about being able to match the parentheses but just return the elements you can just do
/(m|k?)(\w+)/g
https://regex101.com/r/oC1eP5/1
I know how to do a regex to validate if it's just letter number without no white spaces:
/^[0-9a-zA-Z]+$/
but how do I add to this regex also such that it cannot contain just numbers, so for example this is not valid:
08128912382
Any ideas?
"Must contain only letters and numbers and at least one letter" is equivalent to "must contain a letter surrounded by numbers or letters":
/^[0-9a-zA-Z]*[a-zA-Z][0-9a-zA-Z]*$/
I would like to add that this answer shows a way you can think about the problem so writing the regexp is simpler. It is not meant to be the best solution to the problem. I just took what you had and gave it a nudge in the right direction.
With several more nudges, you end up with other different answers (posted by ZER0, Tomalak and OGHaza respectively) :
You could notice that if there is a letter in the first or last group, the middle part is satisfied. In other words, since you have the middle part, you don't need to allow letters in the first or last part (but not both!):
/^[0-9]*[a-zA-Z][0-9a-zA-Z]*$/ - some numbers, followed by a letter, followed by some more numbers and letters
/^[0-9a-zA-Z]*[a-zA-Z][0-9]*$/ - equivalent if you read from the end
Knowing about lookaheads you can assert that there is at least one letter in the string:
/^(?=.*[a-z])/ - matches the start of any string that contains at least 1 letter
Or the other way around, as you expressed it, assert that there aren't only numbers in the string:
/^(?!\d+$)/ - matches the start of any string which doesn't contain just digits
The 2nd and 3rd solutions should also be combined with your original regexp that validates that the string contains only the characters you want it to (letters and numbers)
I for one am particularly fond of the 2nd solution which is i believe the fastest of all attempted so far.
A look-ahead can do it:
/^(?=.*[a-z])[0-9a-z]+$/i
I think the most elegant solution is a negative lookahead to check it's not only numbers
/^(?!\d+$)[0-9a-zA-Z]+$/
RegExr Example
So basically you need at that at least one letter is in the string. In that case you can just check the presence of one or more letter, preceded maybe by one or more numbers, and maybe followed by both:
/^[0-9]*[a-z][0-9a-z]*$/i
Notice that it will returns true if you test against string like "A" for instance, because in this case all the numbers are considered optional.
Finding a specific string is relatively easy, but I am not sure where to begin on this one. I would need to extract a string that would be different every time, but with similar characteristics.
Here are some example strings I need to find in a paragraph, either at the beginning, end or somewhere in the middle.
7b.9t.7iv.4x
4ir.4i.5i.6t
7ix.7t.4t.0z
As you can see the string will always begin with a number, and would have up to 2 characters after it and will always contain 4 octets separated by dots.
Let me know if you may need more details.
EDIT:
Thanks to the answer below I came up with this, while not pretty, does what I need.
$body="test 1f.9t.7iv.4x test 1a.9a.7ab.4xa test ";
$var=preg_match_all("([0-9][a-z]{1,2}\.[0-9][a-z]{1,2}\.[0-9][a-z]{1,2}\.[0-9][a-z]{1,2})",$body,$matches);
$count=count($matches[0]);
$stack = array();
while($count > 0){
$count--;
array_push($stack, "<span id='ip_".$matches[0][$count]."'>".$matches[0][$count]."</span>");
}
$stack=array_reverse($stack);
$body=str_replace($matches[0],$stack,$body);
You can use a regular expression.
Something like this to get you started. There may be a better way to match since it's repeated, but....
([0-9][a-z]{1,2}\.[0-9][a-z]{1,2}\.[0-9][a-z]{1,2}\.[0-9][a-z]{1,2})
( Start a capture group
[0-9] match any character 0 through 9
[a-z] match any character [a-z]
{1,2} but only match the previous 1 or 2 times
\. match a literal . the \ is needed as an escape because . is a special character
) End capture group
Both php and javascript allow for regular expression use.
For an even better visual representation you can check out this tool: http://www.debuggex.com/
If you need each octet by itself (as a match) you can add more parenthesis () around each [0-9][a-z]{1,2} which will then store those octets individually.
Also note that \d is the same as [0-9] but I prefer the later as I find it a little more readable.