I am trying to write a regex to get only integer numbers e.g, 23, 234, 45, etc, and not select numbers with decimal points.
For Context :
I need it in a larger regex that I am writing to convert mixed fraction latex input
For example:
5\frac{7}{8}
But it should not select latex such as:
3.5\frac{7}{8}
The regex That I have so far is:
(^(.*)(?!(\.))(.*))\\frac{([^{}]+(?:{(?:[^{}]+)}|))}{([^{}]+(?:{(?:[^{}]+)}|))}
But it is for integer and decimal numbers alike. Need to change the regex for group1.
Maybe this will do it for you:
(?<!\d\.)\b(\d+)\\frac{([^{}]+(?:{(?:[^{}]+)}|))}{([^{}]+(?:{(?:[^{}]+)}|))}
It captures an integer expression before \fraq, unless it's preceded by a digit and a full stop.
(?<!\d\.) This ensures the number isn't preceded by a digit followed by a full stop, i.e.
the integer part of a floating.
\b Must be at the start of a number (to make sure we don't get a match with the end
of a multi digit number).
(\d+) Captures the integer number
\\frac Matches the string "\fraq"
The rest is the same as you original expression.
See it here at regex101.
Edit
Since there obviously are people out there, however unbelievable, that still haven't moved to a real browser ;) - the answer has to change to:
It depends of the syntax of latex, whether you can do it or not.
(And since I don't know that, I shouldn't have said anything in the first place ;)
The problem is that, without look behinds, you can't do it without matching characters outside the expression your interested in. In your regexr-example you clearly show that you want to be able to match expression not only at the beginning of strings, but also in the middle of the. Thus we need to be able to tell that the part before our match isn't the integer part of a decimal number. Now, doing this with matching isn't a problem. E.g.
(?:^|[^\d.])(\d+)\\frac{...
like Wiktor suggested in comments, will match if the expression is on the start of the line (^), or is preceded by something that isn't a decimal point or a digit ([^\d.]). That should do it. (Here at regex101.)
Well, as pointed out earlier, it depends on the syntax of latex. If two expressions can be directly adjacent, without any operators or stuff between them, you can't (as far as I can tell) do it with JS regex. Consider 1\fraq{0}{1}2\fraq{2}{3} (which I have no idea if it's syntactically correct). The first expression 1\fraq{0}{1} is a piece of a cake. But after that has been matched, we'd need to match a character before the second expression to verify the it doesn't start with a decimal number, but since the first expression already ate the characters, we can't. Because the test (?:^|[^\d.]) to verify that our expression doesn't start with a decimal number, would match one of the characters that actually belongs to our expression (the 2 in 2\fraq{2}{3}), thus making the match fail, because the remaining part doesn't start with the digit needed to satisfy the rest of the regex (\d+)\\frac{....
If, however, an expression always starts the string tested, or is preceded by and operator, or such, then it should be possible using
(?:^|[^\d.])(\d+)\\frac{([^{}]+(?:{(?:[^{}]+)})?)}{([^{}]+(?:{(?:[^{}]+)})?)}
Here at regex101.
(Sorry for my rambling)
Related
Good day!
I don't know regular expressions very well, but I tried to compose one. I need this regular expression matched a record for example:
The user enters any value in the text field that can start with 00x00 and end with 12x99, it must contain only the sign "x" and the first pair of numbers (the one before "x") must not exceed the number "12".
I tried a record like this:
/^(00|01|02|03|04|05|06|07|08|09|10|11|12)x([0-9]{2,2})&/
and it fits me, but it's too long expression, I'm sure there's something shorter. Asking for help from You !
You can shorten the expression quite a bit.
^(0\d|1[0-2])x\d{2}$
First you can remove the parenthesis around the entire expression, they are not required if you want a full match.
The you can replace every [0-9] block with the \d token.
Then the quantifier can be simplified if you want a strict quantity {2,2} to {2}
The first part is a bit more tricky. You can actually separate the match in 2 parts. You need to match every number from 00 to 09, and every number from 10 to 12.
So this is exactly what we are going to do.
First the match from 00 to 09, the first digit doesn't change, so that's easy. The second digit is a full range from 0 to 9, so we use \d as previously mentioned. That gives us 0\d.
The second half has the same fixed first digit, 1. Again that's easy. Then it's actually a shortened range from 0 to 2. That gives us 1[0-2].
Could be one or the other, so we encapsulate that part and use the | (or) token.
And that's it, we combine everything and get the expression above!
I'm new to learning Regular Expressions, and I came across this answer which uses positive lookahead to validate passwords.
The regular expression is - (/^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])[0-9a-zA-Z]{8,}$/) and the breakdown provided by the user is -
(/^
(?=.*\d) //should contain at least one digit
(?=.*[a-z]) //should contain at least one lower case
(?=.*[A-Z]) //should contain at least one upper case
[a-zA-Z0-9]{8,} //should contain at least 8 from the mentioned characters
$/)
However, I'm not very clear on chaining multiple lookaheads together. From what I have learned, a positive lookahead checks if the expression is followed by what is specified in the lookahead. As an example, this answer says -
The regex is(?= all) matches the letters is, but only if they are immediately followed by the letters all
So, my question is how do the individual lookaheads work? If I break it down -
The first part is ^(?=.*\d). Does this indicate that at the starting of the string, look for zero or more occurrences of any character, followed by 1 digit (thereby checking the presence of 1 digit)?
If the first part is correct, then with the second part (?=.*[a-z]), does it check that after checking for Step 1 at the start of the string, look for zero or more occurrences of any character, followed by a lowercase letter? Or are the two lookaheads completely unrelated to each other?
Also, what is the use of the ( ) around every lookahead? Does it create a capturing group?
I have also looked at the Rexegg article on lookaheads, but it didn't help much.
Would appreciate any help.
As mentionned in the comments, the key point here are not the lookaheads but backtracking:
(?=.*\d) looks for a complete line (.*), then backtracks to find at least one number (\d).
This is repeated throughout the different lookaheads and could be optimized like so:
(/^
(?=\D*\d) // should contain at least one digit
(?=[^a-z]*[a-z]) // should contain at least one lower case
(?=[^A-Z]*[A-Z]) // should contain at least one upper case
[a-zA-Z0-9]{8,} // should contain at least 8 from the mentioned characters
$/)
Here, the principle of contrast applies.
Assertions are atomic, independent expressions with separate context
from the rest of the regex.
It is best visualized as: They exist between characters.
Yes, there is such a place.
Being independent though, they receive the current search position,
then they start moving through the string trying to match something.
They literally advance their private (local) copy of the search position
to do this.
They return a true or false, depending on if they matched something.
The caller of this assertion maintains it's own copy of the search position.
So, when the assertion returns, the callers search position has not changed.
Thus, you can weave in and out of places without having to worry about
the search position.
You can see this a little more dramatically, when assertions are nested:
Target1: Boy1 has a dog and a train
Target2: Boy2 has a dog
Regex: Boy\d(?= has a dog(?! and a train))
Objective: Find the Boy# that matches the regex.
Other noteworthy things about assertions:
They are atomic (ie: independent) in that they are immune to backtracking
from external forces.
Internally, they can backtrack just like anywhere else.
But, when it comes to the position they were given, that cannot change.
Also, inside assertions, it is possible to capture just like anywhere else.
Example ^(?=.*\b(\w+)\b) captures the last word in string, however the search position does not change.
Also, assertions are like a red light. The immediate expression that follows the assertion
must wait until it gets the green light.
This is the result the assertion passes back, true or false.
The title might seem a bit recursive, and indeed it is.
I am working on a Javascript which can highlight/color Javascript code displayed in HTML. Thus, in the Internet Browser, comments will be turned green, definitions (for, if, while, etc.) will be turned a dark blue and italic, numbers will be red, and so on for other elements. However, the coloring is not all that important.
I am trying to figure out two different regular expressions which have started to cause a minor headache.
1. Finding a regular expression using a regular expression
I want to find regular expressions within the script-tags of HTML using a Javascript, such as:
match(/findthis/i);
, where the regex part of course is "/findthis/i".
The rules are as follows:
Finding multiple occurrences (/g) is not important.
It must be on the same line (not /m).
Caseinsensitive (/i).
If a backward slash (ignore character) is followed directly by a forward slash, "/", the forward slash is part of the expression - not an escape character. E.g.: /itdoesntstop\/untilnow:/
Two forward slashes right next to each other (//) is: (A) At the beginning: Not a regex; it's a comment. (B) Later on: First slash is the end of the regex and the second slash is nothing but a character.
Regex continues until the line breaks or end of input (\n|$), or the escape character (second forward slash which complies with rule 4) is encountered. However, also as long as only alphabetic characters are encountered, following the second forward slash, they are considered part of the regex. E.g.: /aregex/allthisispartoftheregex
So far what I've got is this:
'\\/(?:[^\\/\\\\]|\\/\\*)*\\/([a-zA-Z]*)?'
However, it isn't consistent. Any suggestions?
2. Find digits (alphanumeric, floating) using a regular expression
Finding digits on their own is simple. However, finding floating numbers (with multiple periods) and letters including underscore is more of a challenge.
All of the below are considered numbers (a new number starts after each space):
3 3.1 3.1.4 3a 3.A 3.a1 3_.1
The rules:
Finding multiple occurrences (/g) is not important.
It must be on the same line (not /m).
Caseinsensitive (/i).
A number must begin with a digit. However, the number can be preceeded or followed by a non-word (\W) character. E.g.: "=9.9;" where "9.9" is the actual number. "a9" is not a number. A period before the number, ".9", is not considered part of the number and thus the actual number is "9".
Allowed characters: [a-zA-Z0-9_.]
What I've got:
'(^|\\W)\\d([a-zA-Z0-9_.]*?)(?=([^a-zA-Z0-9_.]|$))'
It doesn't work quite the way I want it.
For the first part, I think you are quite close. Here is what I would use (as a regex literal, to avoid all the double escapes):
/\/(?:[^\/\\\n\r]|\\.)+\/([a-z]*)/i
I don't know what you intended with your second alternative after the character class. But here the second alternative is used to consume backslashes and anything that follows them. The last part is important, so that you can recognize the regex ending in something like this: /backslash\\/. And the ? at the end of your regex was redundant. Otherwise this should be fine.
Test it here.
Your second regex is just fine for your specification. There are a few redundant elements though. The main thing you might want to do is capture everything but the possible first character:
/(?:^|\W)(\d[\w.]*)/i
Now the actual number (without the first character) will be in capturing group 1. Note that I removed the ungreediness and the lookahead, because greediness alone does exactly the same.
Test it here.
I'm trying to create a regular expression in javascript for a UK bank sort code so that the user can input 6 digits, or 6 digits with a hyphen between pairs. For example "123456" or "12-34-56". Also not all of the digits can be 0.
So far I've got /(?!0{2}(-?0{2}){2})(\d{2}(-\d{2}){2})|(\d{6})/ and this jsFiddle to test.
This is my first regular expression so I'm not sure I'm doing it right. The test for 6 0-digits should fail and I thought the -? optional hyphen in the lookahead would cause it to treat it the same as 6 0-digits with hyphens, but it isn't.
I'd appreciate some help and any criticism if I'm doing it completely incorrectly!
Just to answer your question, you can validate user input with:
/^(?!(?:0{6}|00-00-00))(?:\d{6}|\d\d-\d\d-\d\d)$/.test(inputString)
It will strictly match only input in the form XX-XX-XX or XXXXXX where X are digits, and will exclude 00-00-00, 000000 along with any other cases (e.g. XX-XXXX or XXXX-XX).
However, in my opinion, as stated in other comments, I think it is still better if you force user to either always enter the hyphen, or none at all. Being extra strict when dealing with anything related to money saves (unknown) troubles later.
Since any of the digits can be zero, but not all at once, you should treat the one case where they are all zero as a single, special case.
You are checking for two digits (\d{2}), then an optional hyphen (-?), then another two digits (\d{2}) and another optional hyphen (-?), before another two digits (\d{2}).
Putting this together gives \d{2}-?\d{2}-?\d{2}, but you can simplify this further:
(\d{2}-?){2}\d{2}
You then use the following pseudocode to match the format but not 000000 or 00-00-00:
if (string.match("/(\d{2}-?){2}\d{2}/") && !string.match("/(00-?){2}00/"))
//then it's a valid code, you could also use (0{2}-?){2}0{2} to check zeros
You may wish to add the string anchors ^ (start) and $ (end) to check the entire string.
I want to build a regular expression which allows following cases:
M1 1AA
B33 8TH
CR2 6XH
DN55 1PT
W1A 1HQ
EC1A 1BB
and should not allow, only a single letter or a single number. E.g:
A
2
AA AAA
22 343
Also how do i call this in javascript. I am currently using the RegEx
^[A-Z0-9 ]*[A-Z0-9][A-Z0-9 ]*$
Which validates all the above cases includin a single letter or single number.
Please help me how to use a javascript to be called on a textbox change n validate using above regular expression
You're close. One way to solve this problem would be
^[A-Z0-9 ]*[A-Z][A-Z0-9 ]*\d[A-Z0-9 ]*$|^[A-Z0-9 ]*\d[A-Z0-9 ]*[A-Z][A-Z0-9 ]*$
which allows arbitrary letter, number, or space ([A-Z0-9 ]*) before, after, and between the letter and number, and allows either the letter to come first or the number to come first.
However, there's another way to solve this. You could use two regular expressions. First, check against the regular expression
^[A-Z0-9 ]*$
which checks that you have only letters, numbers, and spaces. Then, check against the regular expression
[A-Z][A-Z0-9 ]*\d|\d[A-Z0-9 ]*[A-Z]
which checks for at least one letter ([A-Z]) and one number (\d), with both orders allowed. Note that this pattern doesn't include the hat and dollar sign, so the letter and number can occur anywhere in the string.
The two-regular-expression approach has the benefit of being easier to read and modify later.