This is my string:
<address>tel+1234567890</address>
This is my regex:
([\d].*<)
which matches this:
1234567890<
but I dont want to match the last <character.
You can use a positive lookahead:
\d+(?=<)
The (?=...) syntax makes sure what's inside the parens matches at that position, without moving the match cursor forward, thus without consuming the input string. It's also called a zero-width assertion.
By the way, the square brackets in [\d] are redundant, so you can omit them. Also, I've changed the regex, but perhaps you really meant to match this:
\d.*?(?=<)
This pattern matches everything between a digit and a <, including the digit. It makes use of an ungreedy quantifier (*?) to match up until the first < if there are several.
([\d]+)
This should work , try it out and let me know
Check the demo
Also as #LucasTrzesniewski said , you can use the look ahead
(\d+.(?=<))
Here is the demo
Related
I use this regex
/\.(.+)?(?=(\(?)|\r\n)/gi
with
part1.part2
part1.part2(part3) part4
I want only match .part2 in both cases
but in second case I get .part2(part3) part4
You should make the .+ part non-greedy, by using .+?, as otherwise it will also capture the opening parenthesis you want to see in the look-ahead part.
Also, in the second part, don't make the \( optional, otherwise you will be OK in having nothing in your look-ahead to match.
Finally, don't match \r\n, but the end-of-line anchor $ in combination with the m flag (so that it matches the end of each line instead of the whole input).
So:
\.(.+?)(?=\(|$)
regex101 link
You see the parenthesis in the match as the . can also match (.
The pattern will match the rest of the line after the first dot without backtracking to a ( as the parenthesis in the lookahead is optional \(? and the assertion will be true.
You could make use of a negated character class not crossing parenthesis or a newline when matching.
\.([^()\r\n]+)
Regex demo
I need to write a little RegEx matcher which will match any occurrence of strings in the form of
[a-zA-Z]+(_[a-zA-Z0-9]+)?
If I use the regex above it does match the sections needed but would also match onto the abc part of 4_abc which is not intended. I tried to exclude it with:
(?:[^a-zA-Z0-9_]|^)([a-zA-Z]+(_[a-zA-Z0-9]+)?)(?:[^a-zA-Z0-9_]|$)
The problem is that the 'not' matches at the beginning and end are not really working like I hoped they would. If I use them on the example
a_d Dd_da 4_d d_4
they would block matching the second Dd_da because the space was used in the first match.Sadly I can't use lookarounds because I am using JS.
So the input:
a_d Dd_da 4_d d_4
should match: a_d, Dd_da and d_4
but matches: a_d (there is a space at the end)
Is there another way to match the needed sections, or to not consume the 'anchor' matches?
I really appreciate your help.
You can make use of \b:
\b[a-zA-Z]+(_[a-zA-Z0-9]+)?\b
\b matches the (zero-width) point where either the preceding character or following character is a letter, digit or underscore, but not both. It also matches with the start/end of the string if the first/last character is a letter, digit or underscore.
I want to match all words which are starting with dollar sign but not slash and dollar sign.
I already try few regex.
(?:(?!\\)\$\w+)
\\(\\?\$\w+)\b
String
$10<i class="">$i01d</i>\$id
Expected result
*$10*
*$i01d*
but not this
*$id*
After find all expected matching word i want to replace this my object.
One option is to eliminate escape sequences first, and then match the cleaned-up string:
s = String.raw`$10<i class="">$i01d</i>\$id`
found = s.replace(/\\./g, '').match(/\$\w+/g)
console.log(found)
The big problem here is that you need a negative lookbehind, however, JavaScript does not support it. It's possible to emulate it crudely, but I will offer an alternative which, while not great, will work:
var input = '$10<i class="">$i01d</i>\\$id';
var regex = /\b\w+\b\$(?!\\)/g;
//sample implementation of a string reversal function. There are better implementations out there
function reverseString(string) {
return string.split("").reverse().join("");
}
var reverseInput = reverseString(input);
var matches = reverseInput
.match(regex)
.map(reverseString);
console.log(matches);
It is not elegant but it will do the job. Here is how it works:
JavaScript does support a lookahead expression ((?>)) and a negative lookahead ((?!)). Since this is the reverse of of a negative lookbehind, you can reverse the string and reverse the regex, which will match exactly what you want. Since all the matches are going to be in reverse, you need to also reverse them back to the original.
It is not elegant, as I said, since it does a lot of string manipulations but it does produce exactly what you want.
See this in action on Regex101
Regex explanation Normally, the "match x long as it's not preceded by y" will be expressed as (?<!y)x, so in your case, the regex will be
/(?<!\\)\$\b\w+\b/g
demonstration (not JavaScript)
where
(?<!\\) //do not match a preceding "\"
\$ //match literal "$"
\b //word boundary
\w+ //one or more word characters
\b //second word boundary, hence making the match a word
When the input is reversed, so do all the tokens in order to match. Furthermore, the negative lookbehind gets inverted into a negative lookahead of the form x(?!y) so the new regular expression is
/\b\w+\b\$(?!\\)/g;
This is more difficult than it appears at first blush. How like Regular Expressions!
If you have look-behind available, you can try:
/(?<!\\)\$\w+/g
This is NOT available in JS. Alternatively, you could specify a boundary that you know exists and use a capture group like:
/\s(\$\w+)/g
Unfortunately, you cannot rely on word boundaries via /b because there's no such boundary before '\'.
Also, this is a cool site for testing your regex expressions. And this explains the word boundary anchor.
If you're using a language that supports negative lookback assertions you can use something like this.
(?<!\\)\$\w+
I think this is the cleanest approach, but unfortunately it's not supported by all languages.
This is a hackier implementation that may work as well.
(?:(^\$\w+)|[^\\](\$\w+))
This matches either
A literal $ at the beginning of a line followed by multiple word characters. Or...
A literal $ this is preceded by any character except a backslash.
Here is a working example.
I am trying to use regexp to match some specific key words.
For those codes as below, I'd like to only match those IFs at first and second line, which have no prefix and postfix. The regexp I am using now is \b(IF|ELSE)\b, and it will give me all the IFs back.
IF A > B THEN STOP
IF B < C THEN STOP
LOL.IF
IF.LOL
IF.ELSE
Thanks for any help in advance.
And I am using http://regexr.com/ for test.
Need to work with JS.
I'm guessing this is what you're looking for, assuming you've added the m flag for multiline:
(?:^|\s)(IF|ELSE)(?:$|\s)
It's comprised of three groups:
(?:^|\s) - Matches either the beginning of the line, or a single space character
(IF|ELSE) - Matches one of your keywords
(?:$|\s) - Matches either the end of the line, or a single space character.
Regexr
you can do it with lookaround (lookahead + lookbehind). this is what you really want as it explicitly matches what you are searching. you don't want to check for other characters like string start or whitespaces around the match but exactly match "IF or ELSE not surrounded by dots"
/(?<!\.)(IF|ELSE)(?!\.)/g
explanation:
use the g-flag to find all occurrences
(?<!X)Y is a negative lookbehind which matches a Y not preceeded by an X
Y(?!X) is a negative lookahead which matches a Y not followed by an X
working example: https://regex101.com/r/oS2dZ6/1
PS: if you don't have to write regex for JS better use a tool which supports the posix standard like regex101.com
I have this regex which looks for %{any charactering including new lines}%:
/[%][{]\s*((.|\n|\r)*)\s*[}][%]/gm
If I test the regex on a string like "%{hey}%", the regex returns "hey" as a match.
However, if I give it "%{hey}%%{there}%", it doesn't match both "hey" and "there" seperately, it has one match—"hey}%%{there".
How do I make it ungreedy to so it returns a match for each %{}%?
Add a question mark after the star.
/[%][{]\s*((.|\n|\r)*?)\s*[}][%]/gm
Firstly, to make a wildcard match non-greedy, just append it with ? (so *? instead of * and +? instead of +).
Secondly, your pattern can be simplified in a number of ways.
/%\{\s*([\s\S]*?)\s*\}%/gm
There's no need to put a single character in square brackets.
Lastly the expression in the middle you want to capture, you'll note I put [\s\S]. That comes from Matching newlines in JavaScript as a replacement for the DOTALL behaviour.
Shorter and faster working:
/%\{([^}]*)\}%/gm