String in negative look ahead being partially captured

String in negative look ahead being partially captured - javascript

My regular expression:
/(?!#REF!)([^!,]{1,99})!/g
My test string:
foo,#REF!,bar!,baz,qux!
It currently matches REF! but the desired outcome is for only bar! and qux! to be matched. I used the negative look-ahead (?!#REF!) to prevent that but REF! is being captured as is matches [^!,]{1,99}.
How can prevent REF! getting matched - is using a negative look-ahead the correct approach?

Since your string is a comma separated item list, you may split the string with a comma, remove all empty items (if any), get only those ending with a ! and then remove the ! from the end of the strings:
var s = "foo,#REF!,bar!,baz,qux!";
console.log(s.split(',')
.filter(Boolean) // remove empty items
.filter(function (x) {return x.charAt(x.length-1)==="!" && x!== "#REF!";} ) // ends with ! and not #REF!
.map(function(y) {return y.substr(0, y.length-1)}) // remove !
);
If for some reason you still need to use a regex, you may use
/(?:^|,)(?!#REF!)([^!,]{1,99})!/g
Access Group 1 value. See the regex demo here.
NOTE: You only have 1 capturing group here, as (?!...) is a lookahead that is a special regex construct. (?:...) is a non-capturing group, its value is not stored in any additional memory buffer as compared to a capturing group.
Details
(?:^|,) - either start of string or ,
(?!#REF!) - no #REF! is allowed to appear right after the current location
([^!,]{1,99}) - Capturing group 1: 1 to 99 chars other than ! and ,
! - a ! char
var s = "foo,#REF!,bar!,baz,qux!";
var rx = /(?:^|,)(?!#REF!)([^!,]{1,99})!/g, m, res=[];
while (m=rx.exec(s)) {
res.push(m[1]);
}
console.log(res);

You can use the following regex:
(?<=^|,)(?!#REF!)([^!,]{1,99})!
Explanations:
Adding (?<=^|,) forces the start of your regex matching to either the beginning of the line or to the previous comma. If you don't add it REF! will also be matched. The , will not be part of the result because it is in a lookbehind clause.
DEMO
If you can not use lookbehind, then you can go for a solution like the one proposed by WiktorStribizew
(?:^|,)(?!#REF!)([^!,]{1,99}!)
and by referencing to the 1st capturing group

Related

How to limit the search scope without regex lookbehinds?

Given a regular expression, I can easily decide where to start looking for a match from in a string using lastIndex.
Now, I want to make sure that the match I get doesn't go past a certain point in the string.
I would happily enclose the regular expression in a non-capturing group and append, for instance, (?<=^.{0,8}).
But how can I achieve the same goal without lookbehinds, that still aren't globally supported?
Note:
While it might be the only reasonable fallback, slicing the string is not a good option as it results in a loss of context for the search.
Example
https://regex101.com/r/7bWtSW/1
with the base regular expression that:
matches the letter 'a', at least once and as many times as possible
as long as an 'X' comes later
We can see that we can achieve our goal with a lookbehind: we still get a match, shorter.
However, if we sliced the string, we would lose the match (because the lookahead in the base regular expression would fail).

Your pattern in the regex demo (?:a+(?=.*X))(?<=^.{0,4}) uses a lookbehind assertion with that can yield multiple separate matches.
See a regex demo for the same pattern with multiple matches in the same string
Without using a lookbehind, you can not get those separate matches.
What you might do is use an extra step to get all the matches for consecutive a char over matched part that fulfills the length restriction (In this case the group 1 value)
^([^\nX]{0,3}a)[^\nX]*X
The pattern matches
^ Start of string
( Capture group 1
[^\nX]{0,3}a Match 0-3 times a char other than a newline or X and then match a
) Close group 1
[^\nX]*X Match optional chars other than a newline or X and then match X
Regex demo
const regex = /^([^\nX]{0,3}a)[^\nX]*X/;
[
"aaaaaaaaX",
"baaaaaaaaX",
"bbaaaaaaaaX",
"bbbaaaaaaaaX",
"bbbbaaaaaaaaX",
"babaaaaaaaaX",
"aX",
"abaaX"
].forEach(s => {
const m = s.match(regex);
if (m) {
console.log(m[1].match(/a+/g))
}
})

Slice the match instead of slicing the string.
In your example, you want the match to account for the positive lookahead for X. But X is outside the limited scope, so we don't want to limit the search scope, essentially slicing the string, instead we want to limit match length relative to its position in the string.
To do that we'll use the index property of the returned match array.
const string = 'aaaaaaaX'
const regex = /a+(?=X)/
function limitedMatch(string, regex, lastIndex) {
const match = string.match(regex)
const {index} = match;
const matchLength = Math.max(lastIndex - index,0)
return match[0].slice(0, matchLength)
}
console.log(limitedMatch(string, regex, 4))
console.log(limitedMatch(string, regex, 2))

regex to coordinates WGS84?

I'm trying this regular expressión, but I can't validate correctly the end white space and the letter:
/^\d{0,2}(\-\d{0,2})?(\-\d{0,2})?(\ ?\d[W,E]?)?$/
Examples of correct values:
33-39-10 N //OK
85-50 W //OK
-85-50 E //Wrong
What's wrong?

\d{0,2} this quantifier also matches a digit zero times so that would match the leading - in the 3rd example.
In the character class [W,E] you could omit the comma and list the characters you allow to match [ENW]
If only the third group is optional you could try including the whitespace before the end of the line $
^\d{2}(-\d{2})(-\d{2})? [ENW] $

I have used this regular expression : ^(?!\-)\d{0,2}?(\-\d{0,2}).+\s(N|E|W|S)$
Using a negative lookahead, we have excluded anything that starts with a dash (-).
(?!\-) = Starting at the current position in the expression,
ensures that the given pattern will not match
\s(N|E|W|S) matches anything with a space (\s) and one of the letters using OR operator |.
You may also use \s+(N|E|W|S).
+ = Matches between one and unlimited times, as many times as
possible, giving back as needed

regex - don't allow name to finish with hyphen

I'm trying to create a regex using javascript that will allow names like abc-def but will not allow abc-
(hyphen is also the only nonalpha character allowed)
The name has to be a minimum of 2 characters. I started with
^[a-zA-Z-]{2,}$, but it's not good enough so I'm trying something like this
^([A-Za-z]{2,})+(-[A-Za-z]+)*$.
It can have more than one - in a name but it should never start or finish with -.
It's allowing names like xx-x but not names like x-x. I'd like to achieve that x-x is also accepted but not x-.
Thanks!

Option 1
This option matches strings that begin and end with a letter and ensures two - are not consecutive so a string like a--a is invalid. To allow this case, see the Option 2.
^[a-z]+(?:-?[a-z]+)+$
^ Assert position at the start of the line
[a-z]+ Match any lowercase ASCII letter one or more times (with i flag this also matches uppercase variants)
(?:-?[a-z]+)+ Match the following one or more times
-? Optionally match -
[a-z]+ Match any ASCII letter (with i flag)
$ Assert position at the end of the line
var a = [
"aa","a-a","a-a-a","aa-aa-aa","aa-a", // valid
"aa-a-","a","a-","-a","a--a" // invalid
]
var r = /^[a-z]+(?:-?[a-z]+)+$/i
a.forEach(function(s) {
console.log(`${s}: ${r.test(s)}`)
})
Option 2
If you want to match strings like a--a then you can instead use the following regex:
^[a-z]+[a-z-]*[a-z]+$
var a = [
"aa","a-a","a-a-a","aa-aa-aa","aa-a","a--a", // valid
"aa-a-","a","a-","-a" // invalid
]
var r = /^[a-z]+[a-z-]*[a-z]+$/i
a.forEach(function(s) {
console.log(`${s}: ${r.test(s)}`)
})

You can use a negative lookahead:
/(?!.*-$)^[a-z][a-z-]+$/i
Regex101 Example
Breakdown:
// Negative lookahead so that it can't end with a -
(?!.*-$)
// The actual string must begin with a letter a-z
[a-z]
// Any following strings can be a-z or -, there must be at least 1 of these
[a-z-]+
let regex = /(?!.*-$)^[a-z][a-z-]+$/i;
let test = [
'xx-x',
'x-x',
'x-x-x',
'x-',
'x-x-x-',
'-x',
'x'
];
test.forEach(string => {
console.log(string, ':', regex.test(string));
});

The problem is that the first assertion accepts 2 or more [A-Za-z]. You will need to modify it to accept one or more character:
^[A-Za-z]+((-[A-Za-z]{1,})+)?$
Edit: solved some commented issues
/^[A-Za-z]+((-[A-Za-z]{1,})+)?$/.test('xggg-dfe'); // Logs true
/^[A-Za-z]+((-[A-Za-z]{1,})+)?$/.test('x-d'); // Logs true
/^[A-Za-z]+((-[A-Za-z]{1,})+)?$/.test('xggg-'); // Logs false
Edit 2: Edited to accept characters only
/^[A-Za-z]+((-[A-Za-z]{1,})+)?$/.test('abc'); // Logs true

Use this if you want to accept such as A---A as well :
^(?!-|.*-$)[A-Za-z-]{2,}$
https://regex101.com/r/4UYd9l/4/
If you don't want to accept such as A---A do this:
^(?!-|.*[-]{2,}.*|.*-$)[A-Za-z-]{2,}$
https://regex101.com/r/qH4Q0q/4/
So both will accept only word starting from two characters of the pattern [A-Za-z-] and not start or end (?!-|.*-$) (negative lookahead) with - .

Try this /([a-zA-Z]{1,}-[a-zA-Z]{1,})/g

I suggest the following :
^[a-zA-Z][a-zA-Z-]*[a-zA-Z]$
It validates :
that the matched string is at least composed of two characters (the first and last character classes are matched exactly once)
that the first and the last characters aren't dashes (the first and last character classes do not include -)
that the string can contain dashes and be greater than 2 characters (the second character class includes dashes and will consume as much characters as needed, dashes included).
Try it online.

^(?=[A-Za-z](?:-|[A-Za-z]))(?:(?:-|^)[A-Za-z]+)+$
Asserts that
the first character is a-z
the second is a-z or hyphen
If this matches
looks for groups of one or more letters prefixed by a hyphen or start of string, all the way to end of string.
You can also use the I switch to make it case insensitive.

Regex needed to split a string by "."

I am in need for a regex in Javascript. I have a string:
'*window.some1.some\.2.(a.b + ")" ? cc\.c : d.n [a.b, cc\.c]).some\.3.(this.o.p ? ".mike." [ff\.]).some5'
I want to split this string by periods such that I get an array:
[
'*window',
'some1',
'some\.2', //ignore the . because it's escaped
'(a.b ? cc\.c : d.n [a.b, cc\.c])', //ignore everything inside ()
'some\.3',
'(this.o.p ? ".mike." [ff\.])',
'some5'
]
What regex will do this?

var string = '*window.some1.some\\.2.(a.b + ")" ? cc\\.c : d.n [a.b, cc\\.c]).some\\.3.(this.o.p ? ".mike." [ff\\.]).some5';
var pattern = /(?:\((?:(['"])\)\1|[^)]+?)+\)+|\\\.|[^.]+?)+/g;
var result = string.match(pattern);
result = Array.apply(null, result); //Convert RegExp match to an Array
Fiddle: http://jsfiddle.net/66Zfh/3/
Explanation of the RegExp. Match a consecutive set of characters, satisfying:
/ Start of RegExp literal
(?: Create a group without reference (example: say, group A)
\( `(` character
(?: Create a group without reference (example: say, group B)
(['"]) ONE `'` OR `"`, group 1, referable through `\1` (inside RE)
\) `)` character
\1 The character as matched at group 1, either `'` or `"`
| OR
[^)]+? Any non-`)` character, at least once (see below)
)+ End of group (B). Let this group occur at least once
| OR
\\\. `\.` (escaped backslash and dot, because they're special chars)
| OR
[^.]+? Any non-`.` character, at least once (see below)
)+ End of group (A). Let this group occur at least once
/g "End of RegExp, global flag"
/*Summary: Match everything which is not satisfying the split-by-dot
condition as specified by the OP*/
There's a difference between + and +?. A single plus attempts to match as much characters as possible, while a +? matches only these characters which are necessary to get the RegExp match. Example: 123 using \d+? > 1 and \d+ > 123.
The String.match method performs a global match, because of the /g, global flag. The match function with the g flag returns an array consisting of all matches subsequences.
When the g flag is omitted, only the first match will be selected. The array will then consist of the following elements:
Index 0: <Whole match>
Index 1: <Group 1>

The regex below :
result = subject.match(/(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))/g);
Can be used to acquire the desired results. Group 1 has the results since you want to omit the .
Use this :
var myregexp = /(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))/g;
var match = myregexp.exec(subject);
while (match != null) {
for (var i = 0; i < match.length; i++) {
// matched text: match[i]
}
match = myregexp.exec(subject);
}
Explanation :
// (?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))
//
// Match the regular expression below «(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))»
// Match the regular expression below and capture its match into backreference number 1 «(\(.*?[^'"]\)|.*?[^\\])»
// Match either the regular expression below (attempting the next alternative only if this one fails) «\(.*?[^'"]\)»
// Match the character “(” literally «\(»
// Match any single character that is not a line break character «.*?»
// Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
// Match a single character NOT present in the list “'"” «[^'"]»
// Match the character “)” literally «\)»
// Or match regular expression number 2 below (the entire group fails if this one fails to match) «.*?[^\\]»
// Match any single character that is not a line break character «.*?»
// Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
// Match any character that is NOT a “A \ character” «[^\\]»
// Match the regular expression below «(?:\.|$)»
// Match either the regular expression below (attempting the next alternative only if this one fails) «\.»
// Match the character “.” literally «\.»
// Or match regular expression number 2 below (the entire group fails if this one fails to match) «$»
// Assert position at the end of the string (or before the line break at the end of the string, if any) «$»

It is notoriously difficult to use a Regex to do balanced parenthesis matching, especially in Javascript.
You would be way better off creating your own parser. Here's a clever way to do this that will utilize the strength of Regex's:
Create a Regex that matches and captures any "pattern of interest" - /(?:(\\.)|([\(\[\{])|([\)\]\}])|(\.))/g
Use string.replace(pattern, function (...)), and in the function, keep a count of opening braces and closing braces.
Add the matching text to a buffer.
If the split character is found and the opening and closing braces are balanced, add the buffer to your results array.
This solution will take a bit of work, and requires knowledge of closures, and you should probably see the documentation of string.replace, but I think it is a great way to solve your problem!
Update:
After noticing the number of questions related to this one, I decided to take on the above challenge.
Here is the live code to use a Regex to split a string.
This code has the following features:
Uses a Regex pattern to find the splits
Only splits if there are balanced parenthesis
Only splits if there are balanced quotes
Allows escaping of parenthesis, quotes, and splits using \
This code will work perfectly for your example.

not need regex for this work.
var s = '*window.some1.some\.2.(a.b + ")" ? cc\.c : d.n [a.b, cc\.c]).some\.3.(this.o.p ? ".mike." [ff\.]).some5';
console.log(s.match(/(?:\([^\)]+\)|.*?\.)/g));
output:
["*window.", "some1.", "some.", "2.", "(a.b + ")", "" ? cc.", "c : d.", "n [a.", "b, cc.", "c]).", "some.", "3.", "(this.o.p ? ".mike." [ff.])", "."]

So, was working with this, and now I see that #FailedDev is rather not a failure, since that was pretty nice. :)
Anyhow, here's my solution. I'll just post the regex only.
((\(.*?((?<!")\)(?!")))|((\\\.)|([^.]))+)
Sadly this won't work in your case however, since I'm using negative lookbehind, which I don't think is supported by javascript regex engine. It should work as intended in other engines however, as can be confirmed here: http://gskinner.com/RegExr/. Replace with $1\n.

Javascript regex to get substring, excluding a pattern?

I am still a beginner :)
I need to get a substring ignoring the last section inside [] (including the brackets []), i.e. ignore the [something inside] section in the end.
Note - There could be other single occurances of [ in the string. And they should appear in the result.
Example
Input of the form -
1 checked arranged [1678]
Desired output -
1 checked arranged
I tried with this
var item = "1 checked arranged [1678]";
var parsed = item.match(/([a-zA-Z0-9\s]+)([(\[d+\])]+)$/);
|<-section 1 ->|<-section 2->|
alert(parsed);
I tried to mean the following -
section 1 - multiple occurrences of words (containing literals and nos.) followed by spaces
section 2 - ignore the pattern [something] in the end.
But I am getting 1678],1678,] and I am not sure which way it is going.
Thanks

OK here is the problem in your expression
([a-zA-Z0-9\s]+)([(\[d+\])]+)$
The Problem is only in the last part
([(\[d+\])]+)$
^ ^
here are you creating a character class,
what you don't want because everything inside will be matched literally.
((\[d+\])+)$
^ ^^
here you create a capturing group and repeat this at least once ==> not needed
(\[d+\])$
^
here you want to match digits but forgot to escape
That brings us to
([a-zA-Z0-9\s]+)(\[\d+\])$
See it here on Regexr, the complete string is matched, the section 1 in capturing group 1 and section 2 in group 2.
When you now replace the whole thing with the content of group 1 you are done.

You could do this
var s = "1 checked arranged [1678]";
var a = s.indexOf('[');
var b = s.substring(0,a);
alert(b);
http://jsfiddle.net/jasongennaro/ZQe6Y/1/
This s.indexOf('['); checks for where the first [ appears in the string.
This s.substring(0,a); chops the string, from the beginning to the first [.
Of course, this assumes the string is always in a similar format

var item = '1 check arranged [1678]',
matches = item.match(/(.*)(?=\[\d+\])/));
alert(matches[1]);
The regular expression I used makes use of a positive lookahead to exclude the undesired portion of the string. The bracketed number must be a part of the string for the match to succeed, but it will not be returned in the results.

Here you can find how to delete stuff inside square brackets. This will leave you with the rest. :)
Regex: delete contents of square brackets

try this if you only want to get rid of that [] in the end
var parsed = item.replace(/\s*\[[^\]]*\]$/,"")

var item = "1 checked arranged [1678]";
var parsed = item.replace(/\s\[.*/,"");
alert(parsed);
That work as desired?

Use escaped brackets and non-capturing parentheses:
var item = "1 checked arranged [1678]";
var parsed = item.match(/([\w\s]+)(?:\s+\[\d+\])$/);
alert(parsed[1]); //"1 checked arranged"
Explanation of regex:
([\w\s]+) //Match alphanumeric characters and spaces
(?: //Start of non-capturing parentheses
\s* //Match leading whitespace if present, and remove it
\[ //Bracket literal
\d+ //One or more digits
\] //Bracket literal
) //End of non-capturing parentheses
$ //End of string

We Keep Coding

JavaScript is the programming language of the Web.

String in negative look ahead being partially captured - javascript

Related

How to limit the search scope without regex lookbehinds?

regex to coordinates WGS84?

regex - don't allow name to finish with hyphen

Regex needed to split a string by "."

Javascript regex to get substring, excluding a pattern?

Categories

Resources