How to limit the search scope without regex lookbehinds? - javascript

Given a regular expression, I can easily decide where to start looking for a match from in a string using lastIndex.
Now, I want to make sure that the match I get doesn't go past a certain point in the string.
I would happily enclose the regular expression in a non-capturing group and append, for instance, (?<=^.{0,8}).
But how can I achieve the same goal without lookbehinds, that still aren't globally supported?
Note:
While it might be the only reasonable fallback, slicing the string is not a good option as it results in a loss of context for the search.
Example
https://regex101.com/r/7bWtSW/1
with the base regular expression that:
matches the letter 'a', at least once and as many times as possible
as long as an 'X' comes later
We can see that we can achieve our goal with a lookbehind: we still get a match, shorter.
However, if we sliced the string, we would lose the match (because the lookahead in the base regular expression would fail).

Your pattern in the regex demo (?:a+(?=.*X))(?<=^.{0,4}) uses a lookbehind assertion with that can yield multiple separate matches.
See a regex demo for the same pattern with multiple matches in the same string
Without using a lookbehind, you can not get those separate matches.
What you might do is use an extra step to get all the matches for consecutive a char over matched part that fulfills the length restriction (In this case the group 1 value)
^([^\nX]{0,3}a)[^\nX]*X
The pattern matches
^ Start of string
( Capture group 1
[^\nX]{0,3}a Match 0-3 times a char other than a newline or X and then match a
) Close group 1
[^\nX]*X Match optional chars other than a newline or X and then match X
Regex demo
const regex = /^([^\nX]{0,3}a)[^\nX]*X/;
[
"aaaaaaaaX",
"baaaaaaaaX",
"bbaaaaaaaaX",
"bbbaaaaaaaaX",
"bbbbaaaaaaaaX",
"babaaaaaaaaX",
"aX",
"abaaX"
].forEach(s => {
const m = s.match(regex);
if (m) {
console.log(m[1].match(/a+/g))
}
})

Slice the match instead of slicing the string.
In your example, you want the match to account for the positive lookahead for X. But X is outside the limited scope, so we don't want to limit the search scope, essentially slicing the string, instead we want to limit match length relative to its position in the string.
To do that we'll use the index property of the returned match array.
const string = 'aaaaaaaX'
const regex = /a+(?=X)/
function limitedMatch(string, regex, lastIndex) {
const match = string.match(regex)
const {index} = match;
const matchLength = Math.max(lastIndex - index,0)
return match[0].slice(0, matchLength)
}
console.log(limitedMatch(string, regex, 4))
console.log(limitedMatch(string, regex, 2))

Related

Regex javascript match the second letter by the first letter, but the second letter must be an uppercase

This regex matches the second letter by the first letter, but the second letter is not an uppercase
([a-z])\1
Now regex matches letters like aa or bb,... but I need that my regex can match all aA,bB,... from this string "abcaAcvbBklmM"
So how to make that regex can match the second value by the first value, but the second must be an uppercase
You could phrase this by matching all [a-z]\1 in lowercase mode and then checking if each match also matches [a-z][a-Z]:
var input = "abcaAcvbBklmMzzQQ";
var matches = input.match(/([a-z])\1/gi)
.filter(x => /^[a-z][A-Z]$/.test(x));
console.log(matches);
It's ugly as, but if you must use a simple regex, you could just brute-force the full set of pairings:
var myText = "aAaaABbCCddEEeEfF";
var letterPairs = myText.match(/aA|bB|cC|dD|eE|fF|gG|hH|iI|jJ|kK|lL|mM|nN|oO|pP|qQ|rR|sS|tT|uU|vV|wW|xX|yY|zZ/g);
console.log(letterPairs);
It doesn't feel very satisfactory as a solution, but it'll get the job done.
I don't think you can do this with just a JavaScript regular expression (well, not unless you list all possible combinations). Regular expressions in some other environments may be able to do it, but not JavaScript's.
It's fairly straight-forward, though, to match a lower-case character followed by an upper-case character and then filter out ones where the one isn't the lower-case equivalent of the other:
const str = "abcaAcvbBklmM";
const results = [];
for (const [, lower, upper] of str.matchAll(/([a-z])([A-Z])/g)) {
if (lower === upper.toLocaleLowerCase()) {
results.push(lower + upper);
}
}
console.log(results);
How that works:
The regular expression /([a-z])([A-Z])/g matches a lower-case letter followed by an upper-case letter (but they aren't necessarily the "same" letter), capturing each in a capture group.
matchAll matches all occurrences of that in a string, returning an iterator for the results.
The for-of loop loops through the results from the iterator. The results from matchAll are an augmented array,¹ where the first element in the array is the overall match, followed by elements for any capture groups. The code uses destructuring in the for-of to pick out the second and third elements of the array (the two capture groups).
If the first letter matches the second letter converted to lower case, it's a match and we retain it.
We can also do this with multiple passes using array methods rather than a single for-of:
const str = "abcaAcvbBklmM";
const results = [...str.matchAll(/([a-z])([A-Z])/g)]
.filter(([, lower, upper]) => lower === upper.toLocaleLowerCase())
.map(([match]) => match);
console.log(results);
¹ This code isn't using the augmented parts of the array, but basically it addition to being an array it has an index property saying where the match occurred, a groups property containing capture group information (which we could have used, but it was just as easy to use the array elements for this simple regex), and an input property with the original string.
You could use the capture group with the backreference and make the pattern case insensitive with the /i flag.
Then compare the matched string with the expected lower and uppercase variants.
const input = "yyabcaAcvbBklmMzzQQ";
const regex = /([a-z])\1/ig;
const matches = input.match(regex)
.filter(s =>
s[0].toLowerCase() + s[1].toUpperCase() === s
)
console.log(matches);

Regex: Equivalent expression without using positive lookbehind

I need to split 4d6/2d6/1d6 matching /. However, in order to split it I need to have specifically \d preceding / and another \d|d succeeding /, so other expressions like 1d6+db/2 don't get split. I've come to the solution of using positive lookbehind, but I need my regex to be valid in IOS systems too. The solution I've come up with is /(?<=\d)\/(?=\d|d)/g. Is there a way to create an equivalent to this expression without using positive lookbehind?
You can use an extracting, matching approach rather than splitting:
/(?:\/|^)(.*?\d)(?=\/\d|$)/g
/(?:\/|^)([\w\W]*?\d)(?=\/\d|$)/g
See the regex demo. Your value is in Group 1. The regex with [\w\W] matches across lines, too (as . does not match line breaks by default).
Details:
(?:\/|^) - a non-capturing group that matches either / or start of string
(.*?\d) - Group 1: any zero or more chars (here, other than line break chars) as few as possible, and then a digit
(?=\/\d|$) - a location that is immediately followed with / + digit, or end of string.
In JavaScript, you can either use const matches = Array.from(text.matchAll(/(?:\/|^)(.*?\d)(?=\/\d|$)/g), x => x[1]), or - if you cannot use this syntax - a more verbose legacy extraction like
var s = "4d6/2d6/1d6";
var re = /(?:\/|^)(.*?\d)(?=\/\d|$)/g;
var matches=[], m;
while(m=re.exec(s)) {
matches.push(m[1]);
}
console.log(matches)

How to match regular expression In Javascript

I have string [FBWS-1] comes first than [FBWS-2]
In this string, I want to find all occurance of [FBWS-NUMBER]
I tried this :
var term = "[FBWS-1] comes first than [FBWS-2]";
alert(/^([[A-Z]-[0-9]])$/.test(term));
I want to get all the NUMBERS where [FBWS-NUMBER] string is matched.
But no success. I m new to regular expressions.
Can anyone help me please.
Note that ^([[A-Z]-[0-9]])$ matches start of a string (^), a [ or an uppercase ASCII letter (with [[A-Z]), -, an ASCII digit and a ] char at the end of the string. So,basically, strings like [-2] or Z-3].
You may use
/\[[A-Z]+-[0-9]+]/g
See the regex demo.
NOTE If you need to "hardcode" FBWS (to only match values like FBWS-123 and not ABC-3456), use it instead of [A-Z]+ in the pattern, /\[FBWS-[0-9]+]/g.
Details
\[ - a [ char
[A-Z]+ - one or more (due to + quantifier) uppercase ASCII letters
- - a hyphen
[0-9]+ - one or more (due to + quantifier) ASCII digits
] - a ] char.
The /g modifier used with String#match() returns all found matches.
JS demo:
var term = "[FBWS-1] comes first than [FBWS-2]";
console.log(term.match(/\[[A-Z]+-[0-9]+]/g));
You can use:
[\w+-\d]
var term = "[FBWS-1] comes first than [FBWS-2]";
alert(/[\w+-\d]/.test(term));
There are several reasons why your existing regex doesn't work.
You trying to match the beginning and ending of your string when you
actually want everything in between, don't use ^$
Your only trying to match one alpha character [A-Z] you need to make this greedy using the +
You can shorten [A-Z] and [0-9] by using the shorthands \w and \d. The brackets are generally unnecessary.
Note your code only returns a true false value (your using test) ATM it's unclear if this is what you want. You may want to use match with a global modifier (//g) instead of test to get a collection.
Here is an example using string.match(reg) to get all matches strings:
var term = "[FBWS-1] comes first than [FBWS-2]";
var reg1 = /\[[A-Z]+-[0-9]\]/g;
var reg2 = /\[FBWS-[0-9]\]/g;
var arr1 = term.match(reg1);
var arr2 = term.match(reg2)
console.log(arr1);
console.log(arr2);
Your regular expression /^([[A-Z]-[0-9]])$/ is wrong.
Give this regex a try, /\[FBWS-\d\]/g
remove the g if you only want to find 1 match, as g will find all similar matches
Edit: Someone mentioned that you want ["any combination"-"number"], hence if that's what you're looking for then this should work /\[[A-Z]+-\d\]/

javascript find protocol, domain, plus first slash with regexp from a src tag, replace with empty string

I tried to construct a regex for this task but I'm afraid I am still failing to have an intuitive understanding of regexp.
The problem is the regex matches until the last slash in a string. I want it to stop at the first match of the string.
My pathetic attempt at regex:
/^http(s?):\/\/.+\/{1}/
Test subject:
http://foo.com/bar/test/foo.jpeg
The goal is to obtain bar/test/foo.jpeg, so that I may then split the string, pop the last element and then join the remainder, resulting in having the path to the JavaScript file.
Example
var str = 'http://foo.com/bar/test/foo.jpeg';
str.replace(regexp,'');
While the other answer shows how to match a part of a string, I think a replace solution is more appropriate for the current task.
The issue you have is that .+ matches one or more characters other than a newline greedily, that is, all the string is grabbed first in one go, and then the regex engine starts backtracking (moving backwards along the input string looking for a / to accommodate in the match). Thus, you get the match from http until the last /.
To restrict the match from http to the first / use a negated character class [^/]+ instead of .+.
^https?:\/\/[^\/]+\/
^^^^^^
See the regex demo
Note that you do not need to place s into a capturing group to make it optional, unescaped ? is a quantifier that makes the preceding character match one or zero times. Also, {1} is a redundant quantifier since this is default behavior, c will only match 1 c, (?:something) will only match one something.
var re = /^https?:\/\/[^\/]+\//;
var str = 'http://foo.com/bar/test/foo.jpeg';
var result = str.replace(re, '');
document.getElementById("r").innerHTML = result;
<div id="r"/>
Note that you will need to assign the replace result to some variable, since in JS, strings are immutable.
Regex explanation:
^ - start of string
https? - either http or https substring
:\/\/ - a literal sequence of ://
[^\/]+ - 1 or more characters other than a /
\/ - a literal / symbol
Use capturing group based regex.
> var s = "http://foo.com/bar/test/foo.jpeg"
> s.match(/^https?:\/\/[^\/]+((?:\/[^\/]*)*)/)[1]
'/bar/test/foo.jpeg'

Regex needed to split a string by "."

I am in need for a regex in Javascript. I have a string:
'*window.some1.some\.2.(a.b + ")" ? cc\.c : d.n [a.b, cc\.c]).some\.3.(this.o.p ? ".mike." [ff\.]).some5'
I want to split this string by periods such that I get an array:
[
'*window',
'some1',
'some\.2', //ignore the . because it's escaped
'(a.b ? cc\.c : d.n [a.b, cc\.c])', //ignore everything inside ()
'some\.3',
'(this.o.p ? ".mike." [ff\.])',
'some5'
]
What regex will do this?
var string = '*window.some1.some\\.2.(a.b + ")" ? cc\\.c : d.n [a.b, cc\\.c]).some\\.3.(this.o.p ? ".mike." [ff\\.]).some5';
var pattern = /(?:\((?:(['"])\)\1|[^)]+?)+\)+|\\\.|[^.]+?)+/g;
var result = string.match(pattern);
result = Array.apply(null, result); //Convert RegExp match to an Array
Fiddle: http://jsfiddle.net/66Zfh/3/
Explanation of the RegExp. Match a consecutive set of characters, satisfying:
/ Start of RegExp literal
(?: Create a group without reference (example: say, group A)
\( `(` character
(?: Create a group without reference (example: say, group B)
(['"]) ONE `'` OR `"`, group 1, referable through `\1` (inside RE)
\) `)` character
\1 The character as matched at group 1, either `'` or `"`
| OR
[^)]+? Any non-`)` character, at least once (see below)
)+ End of group (B). Let this group occur at least once
| OR
\\\. `\.` (escaped backslash and dot, because they're special chars)
| OR
[^.]+? Any non-`.` character, at least once (see below)
)+ End of group (A). Let this group occur at least once
/g "End of RegExp, global flag"
/*Summary: Match everything which is not satisfying the split-by-dot
condition as specified by the OP*/
There's a difference between + and +?. A single plus attempts to match as much characters as possible, while a +? matches only these characters which are necessary to get the RegExp match. Example: 123 using \d+? > 1 and \d+ > 123.
The String.match method performs a global match, because of the /g, global flag. The match function with the g flag returns an array consisting of all matches subsequences.
When the g flag is omitted, only the first match will be selected. The array will then consist of the following elements:
Index 0: <Whole match>
Index 1: <Group 1>
The regex below :
result = subject.match(/(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))/g);
Can be used to acquire the desired results. Group 1 has the results since you want to omit the .
Use this :
var myregexp = /(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))/g;
var match = myregexp.exec(subject);
while (match != null) {
for (var i = 0; i < match.length; i++) {
// matched text: match[i]
}
match = myregexp.exec(subject);
}
Explanation :
// (?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))
//
// Match the regular expression below «(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))»
// Match the regular expression below and capture its match into backreference number 1 «(\(.*?[^'"]\)|.*?[^\\])»
// Match either the regular expression below (attempting the next alternative only if this one fails) «\(.*?[^'"]\)»
// Match the character “(” literally «\(»
// Match any single character that is not a line break character «.*?»
// Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
// Match a single character NOT present in the list “'"” «[^'"]»
// Match the character “)” literally «\)»
// Or match regular expression number 2 below (the entire group fails if this one fails to match) «.*?[^\\]»
// Match any single character that is not a line break character «.*?»
// Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
// Match any character that is NOT a “A \ character” «[^\\]»
// Match the regular expression below «(?:\.|$)»
// Match either the regular expression below (attempting the next alternative only if this one fails) «\.»
// Match the character “.” literally «\.»
// Or match regular expression number 2 below (the entire group fails if this one fails to match) «$»
// Assert position at the end of the string (or before the line break at the end of the string, if any) «$»
It is notoriously difficult to use a Regex to do balanced parenthesis matching, especially in Javascript.
You would be way better off creating your own parser. Here's a clever way to do this that will utilize the strength of Regex's:
Create a Regex that matches and captures any "pattern of interest" - /(?:(\\.)|([\(\[\{])|([\)\]\}])|(\.))/g
Use string.replace(pattern, function (...)), and in the function, keep a count of opening braces and closing braces.
Add the matching text to a buffer.
If the split character is found and the opening and closing braces are balanced, add the buffer to your results array.
This solution will take a bit of work, and requires knowledge of closures, and you should probably see the documentation of string.replace, but I think it is a great way to solve your problem!
Update:
After noticing the number of questions related to this one, I decided to take on the above challenge.
Here is the live code to use a Regex to split a string.
This code has the following features:
Uses a Regex pattern to find the splits
Only splits if there are balanced parenthesis
Only splits if there are balanced quotes
Allows escaping of parenthesis, quotes, and splits using \
This code will work perfectly for your example.
not need regex for this work.
var s = '*window.some1.some\.2.(a.b + ")" ? cc\.c : d.n [a.b, cc\.c]).some\.3.(this.o.p ? ".mike." [ff\.]).some5';
console.log(s.match(/(?:\([^\)]+\)|.*?\.)/g));
output:
["*window.", "some1.", "some.", "2.", "(a.b + ")", "" ? cc.", "c : d.", "n [a.", "b, cc.", "c]).", "some.", "3.", "(this.o.p ? ".mike." [ff.])", "."]
So, was working with this, and now I see that #FailedDev is rather not a failure, since that was pretty nice. :)
Anyhow, here's my solution. I'll just post the regex only.
((\(.*?((?<!")\)(?!")))|((\\\.)|([^.]))+)
Sadly this won't work in your case however, since I'm using negative lookbehind, which I don't think is supported by javascript regex engine. It should work as intended in other engines however, as can be confirmed here: http://gskinner.com/RegExr/. Replace with $1\n.

Categories