Regex exclude doesn't exclude string only first character

Regex exclude doesn't exclude string only first character - javascript

Firstly we have the following string:
aaa{ignoreme}asdebla bla f{}asdfdsaignoreme}asd
We want our regex to find the whitespaces and any special charsacters like {}, but if after { comes exactly ignoreme} then exclude it
This is where we are right now:
(?!{ignoreme})[\s\[\]{}()<>\\'"|^`]
The problem is that our regex finds the } after ignoreme
Here is the link https://regex101.com/r/bU1oG0/2
Any help is appreciated,
Thanks

The point is that the } is matched since your (?!{ignoreme}) lookahead only skips a { followed with ignoreme} and matches a } since it is not starting a {ignoreme} char sequence. Also, in JS, you cannot use a lookbehind, like (?<!{ignoreme)}.
This is a kind of issue that can be handled with a regex that matches what you do not need, and matches and captures what you need:
/{ignoreme}|([\s[\]{}()<>\\'"|^`])/g
See the regex demo
Now, {ignoreme} is matched (and you do not have to use this value) and ([\s[]{}()<>\\'"|^`]) is captured into Group 1 the value of which you need to use.

Related

Javascript RegEx positive lookahead not working as expected

First of all i am not very good in dealing with regex But I am trying to create a regex to match specific string while replace it by skipping first character of matcher string using positive look ahead. Please see detail below
Test String asdf.wakawaka asdf.waka df.waka [df.waka (df.waka _df.waka {df,waka
RegEx (?=[^\w])df\.
Replacement String kkk.
Expected Result asdf.wakawaka asdf.waka kkk.waka [kkk.waka (kkk.waka _df.waka {df,waka
But regex above dos not found any match thus it replaces nothing as a result and give original test string in the result.
Without positive lookahead (skip first character strategy) it matches my requirement. see matching regex sample on regex101.com
With positive lookahead giving unexpected results regex with positive look aheah on regex101.com
Thanks in advance for any help.

Using [^\w] means you want to match and consume a char other than a word char before the match you need to replace.
However, this char is consumed and you cannot restore it without capturing it first. You might use Gurman's approach to match /(^|\W)df\./g and replace with '$1kkk., but you may also use a word boundary:
\bdf\.
See the regex demo
JS demo:
var s = "asdf.wakawaka asdf.waka df.waka [df.waka (df.waka _df.waka {df,waka";
console.log(
s.replace(/\bdf\./g, 'kkk.')
);
However, if you do not want to replace df. at the start of the string, use
var s = "df. asdf.wakawaka asdf.waka df.waka [df.waka (df.waka _df.waka {df,waka";
console.log(
s.replace(/(\W)df\./g, '$1kkk.')
);

RegEx matching help: won't match on each appearence

I need to write a little RegEx matcher which will match any occurrence of strings in the form of
[a-zA-Z]+(_[a-zA-Z0-9]+)?
If I use the regex above it does match the sections needed but would also match onto the abc part of 4_abc which is not intended. I tried to exclude it with:
(?:[^a-zA-Z0-9_]|^)([a-zA-Z]+(_[a-zA-Z0-9]+)?)(?:[^a-zA-Z0-9_]|$)
The problem is that the 'not' matches at the beginning and end are not really working like I hoped they would. If I use them on the example
a_d Dd_da 4_d d_4
they would block matching the second Dd_da because the space was used in the first match.Sadly I can't use lookarounds because I am using JS.
So the input:
a_d Dd_da 4_d d_4
should match: a_d, Dd_da and d_4
but matches: a_d (there is a space at the end)
Is there another way to match the needed sections, or to not consume the 'anchor' matches?
I really appreciate your help.

You can make use of \b:
\b[a-zA-Z]+(_[a-zA-Z0-9]+)?\b
\b matches the (zero-width) point where either the preceding character or following character is a letter, digit or underscore, but not both. It also matches with the start/end of the string if the first/last character is a letter, digit or underscore.

Regular expression match specific key words

I am trying to use regexp to match some specific key words.
For those codes as below, I'd like to only match those IFs at first and second line, which have no prefix and postfix. The regexp I am using now is \b(IF|ELSE)\b, and it will give me all the IFs back.
IF A > B THEN STOP
IF B < C THEN STOP
LOL.IF
IF.LOL
IF.ELSE
Thanks for any help in advance.
And I am using http://regexr.com/ for test.
Need to work with JS.

I'm guessing this is what you're looking for, assuming you've added the m flag for multiline:
(?:^|\s)(IF|ELSE)(?:$|\s)
It's comprised of three groups:
(?:^|\s) - Matches either the beginning of the line, or a single space character
(IF|ELSE) - Matches one of your keywords
(?:$|\s) - Matches either the end of the line, or a single space character.
Regexr

you can do it with lookaround (lookahead + lookbehind). this is what you really want as it explicitly matches what you are searching. you don't want to check for other characters like string start or whitespaces around the match but exactly match "IF or ELSE not surrounded by dots"
/(?<!\.)(IF|ELSE)(?!\.)/g
explanation:
use the g-flag to find all occurrences
(?<!X)Y is a negative lookbehind which matches a Y not preceeded by an X
Y(?!X) is a negative lookahead which matches a Y not followed by an X
working example: https://regex101.com/r/oS2dZ6/1
PS: if you don't have to write regex for JS better use a tool which supports the posix standard like regex101.com

Regular expression match array of words excluding words found in contractions

Say I have an array of words, for example: (hi|ll|this|that|etc) and I want to find it in the following text:
Hi, I'll match this and ll too
I'm using: \\b(hi|ll|this|that|etc)\\b
But I want to only match whole words, excluding words found in contractions. Basically treat apostrophes as another "word seperator". In this case, it shouldn't match the "ll" in "I'll".
Ideas?

Use the apostrophe in addition to \b to begin and end a match:
(?:\b|')(hi|ll|this|that|etc)(?:\b|')
(?:...) means a non-capturing group. Stub on Regex101

If you want match just words you can try with:
(?:^|(?=[^']).\b)(hi|ll|th(?:is|at)|etc)\b
DEMO
and get words with group 1. However the \b will still allow to match fragments like: -this or #ll. I don't know is it desired result.

Javascript Regex for all words not between certain characters

I'm trying to return a count of all words NOT between square brackets. So given ..
[don't match these words] but do match these
I get a count of 4 for the last four words.
This works in .net:
\b(?<!\[)[\w']+(?!\])\b
but it won't work in Javascript because it doesn't support lookbehind
Any ideas for a pure js regex solution?

Ok, I think this should work:
\[[^\]]+\](?:^|\s)([\w']+)(?!\])\b|(?:^|\s)([\w']+)(?!\])\b
You can test it here:
http://regexpal.com/
If you need an alternative with text in square brackets coming after the main text, it could be added as a second alternative and the current second one would become third.
It's a bit complicated but I can't think of a better solution right now.
If you need to do something with the actual matches you will find them in the capturing groups.
UPDATE:
Explanation:
So, we've got two options here:
\[[^\]]+\](?:^|\s)([\w']+)(?!\])\b
This is saying:
\[[^\]]+\] - match everything in square brackets (don't capture)
(?:^|\s) - followed by line start or a space - when I look at it now take the caret out as it doesn't make sense so this will become just \s
([\w']+) - match all following word characters as long as (?!\])the next character is not the closing bracket - well this is probably also unnecessary now, so let's try and remove the lookahead
\b - and match word boundary
2 (?:^|\s)([\w']+)(?!\])\b
If you cannot find the option 1 - do just the word matching, without looking for square brackets as we ensured with the first part that they are not here.
Ok, so I removed all the things that we don't need (they stayed there because I tried quite a few options before it worked:-) and the revised regex is the one below:
\[[^\]]+\]\s([\w']+)(?!\])\b|(?:^|\s)([\w']+)\b

I would use something like \[[^\]]*\] to remove the words between square brackets, and then explode by spaces the returned string to count the remaining words.

Chris, resurrecting this question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a general question about how to exclude patterns in regex.)
Here's our simple regex (see it at work on regex101, looking at the Group captures in the bottom right panel):
\[[^\]]*\]|(\b\w+\b)
The left side of the alternation matches complete [bracketed groups]. We will ignore these matches. The right side matches and captures words to Group 1, and we know they are the right words because they were not matched by the expression on the left.
This program shows how to use the regex (see the count result in the online demo):
<script>
var subject = '[match ye not these words] but do match these';
var regex = /\[[^\]]*\]|(\b\w+\b)/g;
var group1Caps = [];
var match = regex.exec(subject);
// put Group 1 captures in an array
while (match != null) {
if( match[1] != null ) group1Caps.push(match[1]);
match = regex.exec(subject);
}
document.write("<br>*** Number of Matches ***<br>");
document.write(group1Caps.length);
</script>
Reference
How to match (or replace) a pattern except in situations s1, s2, s3...

We Keep Coding

JavaScript is the programming language of the Web.

Regex exclude doesn't exclude string only first character - javascript

Related

Javascript RegEx positive lookahead not working as expected

RegEx matching help: won't match on each appearence

Regular expression match specific key words

Regular expression match array of words excluding words found in contractions

Javascript Regex for all words not between certain characters

Categories

Resources