Possible to get 'regex source' from match? - javascript

I can get the source of a regex when it's defined separately. For example:
let r1 = new RegExp("el*");
console.log(r1.source);
// el*
Or:
let r2 = /el*/;
console.log(r2.source);
// el*
Is there a way to extract that if the regex isn't defined separately? For example, something along the lines of:
let m = "Hello".match(/el*/);
console.log(m.source?);

No,
quoting the documents of the match() function
Return value
An Array whose contents depend on the presence or absence of the
global (g) flag, or null if no matches are found.
So the return value is an array (you can test it by Array.isArray(m)// true)
However, the returned array has some extra information about the ocurred match (like groups, index and original input) but none of them include the original regex used to get the match
So there is no way to get that information from the match because its not returned by the matching function

The match result by itself cannot lead to the original regex, simply because different regexes can lead to the same result, even on the same string. Take for example the string "abcd" - all the following regexes: /abcd/, /a..d/ /a.*/ and many more, would match the string exactly the same way.
The only way you could retrive the original regex is if a reference to the regex was literally stored by the match() method inside the returned object. There is no reason to think that's the case, but you can implement your own match function that would do. Something like
function myMatch(str, regex) {
var match = str.match(regex);
if (match === null) {
match = [null];
}
match.source = regex;
return match;
}

Related

Whats wrong with this regex logic

I am trying to fetch the value after equal sign, its works but i am getting duplicated values , any idea whats wrong here?
// Regex for finding a word after "=" sign
var myregexpNew = /=(\S*)/g;
// Regex for finding a word before "=" sign
var mytype = /(\S*)=/g;
//Setting data from Grid Column
var strNew = "QCById=20";
var matchNew = myregexpNew.exec(strNew);
var newtype = mytype.exec(strNew);
alert(matchNew);
https://jsfiddle.net/6vjjv0hv/
exec returns an array, the first element is the global match, the following ones are the submatches, that's why you get ["=20", "20"] (using console.log here instead of alert would make it clearer what you get).
When looking for submatches and using exec, you're usually interested in the elements starting at index 1.
Regarding the whole parsing, it's obvious there are better solution, like using only one regex with two submatches, but it depends on the real goal.
You can try without using Regex like this:
var val = 'QCById=20';
var myString = val.substr(val.indexOf("=") + 1);
alert(myString);
Presently exec is returning you the matched value.
REGEXP.exec(SOMETHING) returns an array (see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec).
The first item in the array is the full match and the rest matches the parenthesized substrings.
You do not get duplicated values, you just get an array of a matched value and the captured text #1.
See RegExp#exec() help:
If the match succeeds, the exec() method returns an array and updates properties of the regular expression object. The returned array has the matched text as the first item, and then one item for each capturing parenthesis that matched containing the text that was captured.
Just use the [1] index to get the captured text only.
var myregexpNew = /=(\S*)/g;
var strNew = "QCById=20";
var matchNew = myregexpNew.exec(strNew);
if (matchNew) {
console.log(matchNew[1]);
}
To get values on both sides of =, you can use /(\S*)=(\S*)/g regex:
var myregexpNew = /(\S*)=(\S*)/g;
var strNew = "QCById=20";
var matchNew = myregexpNew.exec(strNew);
if (matchNew) {
console.log(matchNew[1]);
console.log(matchNew[2]);
}
Also, you may want to add a check to see if the captured values are not undefined/empty since \S* may capture an empty string. OR use /(\S+)=(\S+)/g regex that requires at least one non-whitespace character to appear before and after the = sign.

JavaScript regex not returning match group

I'm trying to get the content in between square brackets within a string but my Regex isn't working.
RegExp: /\[([^\n\]]+)\]/g
It returns the correct match groups on regex101 but when I try something like '[a][b]'.match(/\[([^\n\]]+)\]/g), I get ['[a]', '[b]'] instead of ['a', 'b'].
I can get the correct results if I iterate through and do RegExp.exec, but from looking at examples online it seems like I should be able to get the match groups using String.match
You're using the String .match() method, which has different behavior from RegExp .exec() in the case of regular expressions with the "g" flag. The .match() method gives you all the complete matches across the entire searched string for "g" regular expressions.
If you change your code to
/\[([^\n\]]+)\]/g.exec('[a][b]')
you'll get the result you expect: an array in which the first entry (index 0) is the entire match, and the second and subsequent entries are the groups from the regex.
You'll have to iterate to match all of them:
var re = /\[([^\n\]]+)\]/g, search = "[a][b]", bracketed = [];
for (var m = null; m = re.exec(search); bracketed.push(m[1]));

Shared part in RegEx matched string

In following code:
"a sasas b".match(/sas/g) //returns ["sas"]
The string actually include two sas strings, a [sas]as b and a sa[sas] b.
How can I modify RegEx to match both?
Another example:
"aaaa".match(/aa/g); //actually include [aa]aa,a[aa]a,aa[aa]
Please consider the issue in general not just above instances.
A pure RexEx solution is preferred.
If you want to match at least one such "merged" occurrence, then you could do something like:
"a sasas b".match(/s(as)+/g)
If you want to retrieve the matches as separate results, then you have a bit more work to do; this is not a case that regular expressions are designed to handle. The basic algorithm would be:
Attempt a match. If it was unsuccessful, stop.
Extract the match you are interested in and do whatever you want with it.
Take the substring of the original target string, starting from one character following the first character in your match.
Start over, using this substring as the new input.
(To be more efficient, you could match with an offset instead of using substrings; that technique is discussed in this question.)
For example, you would start with "a sasas b". After the first match, you have "sas". Taking the substring that starts one character after the match starts, we would have "asas b". The next match would find the "sas" here, and you would again repeat the process with "as b". This would fail to match, so you would be done.
This significantly-improved answer owes itself to #EliGassert.
String.prototype.match_overlap = function(re)
{
if (!re.global)
re = new RegExp(re.source,
'g' + (re.ignoreCase ? 'i' : '')
+ (re.multiline ? 'm' : ''));
var matches = [];
var result;
while (result = re.exec(this))
matches.push(result),
re.lastIndex = result.index + 1;
return matches.length ? matches : null;
}
#EliGassert points out that there is no need to walk through the entire string character by character; instead we can find a match anywhere (i.e. do without the anchor), and then continue one character after the index of the found match. While researching how to retrieve said index, I found that the re.lastIndex property, used by exec to keep track of where it should continue its search, is in fact settable! This works rather nicely with what we intend to do.
The only bit needing further explanation might be the beginning. In the absence of the g flag, exec may never return null (always returning its one match, if it exists), thus possibly going into an infinite loop. Since, however, match_overlap by design seeks multiple matches, we can safely recompile any non-global RegExp as a global RegExp, importing the i and m options as well if set.
Here is a new jsFiddle: http://jsfiddle.net/acheong87/h5MR5/.
document.write("<pre>");
document.write('sasas'.match_overlap(/sas/));
document.write("\n");
document.write('aaaa'.match_overlap(/aa/));
document.write("\n");
document.write('my1name2is3pilchard'.match_overlap(/[a-z]{2}[0-9][a-z]{2}/));
document.write("</pre>");​
Output:
sas,sas
aa,aa,aa
my1na,me2is,is3pi
var match = "a sasas b".match(/s(?=as)/g);
for(var i =0; i != match.length; ++i)
alert(match[i]);
Going off of the comment by Q. Sheets and the response by cdhowie, I came up with the above solution: it consumes ONE character in the regular expression and does a lookahead for the rest of the match string. With these two pieces, you can construct all the positions and matching strings in your regular expression.
I wish there was an "inspect but don't consume" operator that you could use to actually include the rest of the matching (lookahead) string in the results, but there unfortunately isn't -- at least not in JS.
Here's a generic way to do it:
​String.prototype.match_overlap = function(regexp)
{
regexp = regexp.toString().replace(/^\/|\/$/g, '');
var re = new RegExp('^' + regexp);
var matches = [];
var result;
for (var i = 0; i < this.length; i++)
if (result = re.exec(this.substr(i)))
matches.push(result);
return matches.length ? matches : null;
}
Usage:
var results = 'sasas'.match_overlap(/sas/);
Returns:
An array of (overlapping) matches, or null.
Example:
Here's a jsFiddle in which this:
document.write("<pre>");​
document.write('sasas'.match_overlap(/sas/));
document.write("\n");
document.write('aaaa'.match_overlap(/aa/));
document.write("\n");
document.write('my1name2is3pilchard'.match_overlap(/[a-z]{2}[0-9][a-z]{2}/));
document.write("</pre>");​
returns this:
sas,sas
aa,aa,aa
my1na,me2is,is3pi
Explanation:
To explain a little bit, we intend for the user to pass a RegExp object to this new function, match_overlap, as he or she would do normally with match. From this we want to create a new RegExp object anchored at the beginning (to prevent duplicate overlapped matches—this part probably won't make sense unless you encounter the issue yourself—don't worry about it). Then, we simply match against each substring of the subject string this and push the results to an array, which is returned if non-empty (otherwise returning null). Note that if the user passes in an expression that is already anchored, this is inherently wrong—at first I stripped anchors out, but then I realized I was making an assumption in the user's stead, which we should avoid. Finally one could go further and somehow merge the resulting array of matches into a single match result resembling what would normally occur with the //g option; and one could go even further and make up a new flag, e.g. //o that gets parsed to do overlap-matching, but this is getting a little crazy.

remove all but a specific portion of a string in javascript

I am writing a little app for Sharepoint. I am trying to extract some text from the middle of a field that is returned:
var ows_MetaInfo="1;#Subject:SW|NameOfADocument
vti_parservers:SR|23.0.0.6421
ContentTypeID:SW|0x0101001DB26Cf25E4F31488B7333256A77D2CA
vti_cachedtitle:SR|NameOfADocument
vti_title:SR|ATitleOfADocument
_Author:SW:|TheNameOfOurCompany
_Category:SW|
ContentType:SW|Document
vti_author::SR|mrwienerdog
_Comments:SW|This is very much the string I need extracted
vti_categories:VW|
vtiapprovallevel:SR|
vti_modifiedby:SR|mrwienerdog
vti_assignedto:SR|
Keywords:SW|Project Name
ContentType _Comments"
So......All I want returned is "This is very much the string I need extracted"
Do I need a regex and a string replace? How would you write the regex?
Yes, you can use a regular expression for this (this is the sort of thing they are good for). Assuming you always want the string after the pipe (|) on the line starting with "_Comments:SW|", here's how you can extract it:
var matchresult = ows_MetaInfo.match(/^_Comments:SW\|(.*)$/m);
var comment = (matchresult==null) ? "" : matchresult[1];
Note that the .match() method of the String object returns an array. The first (index 0) element will be the entire match (here, we the entire match is the whole line, as we anchored it with ^ and $; note that adding the "m" after the regex makes this a multiline regex, allowing us to match the start and end of any line within the multi-line input), and the rest of the array are the submatches that we capture using parenthesis. Above we've captured the part of the line that you want, so that will present in the second item in the array (index 1).
If there is no match ("_Comments:SW|" doesnt appear in ows_MetaInfo), then .match() will return null, which is why we test it before pulling out the comment.
If you need to adjust the regex for other scenarios, have a look at the Regex docs on Mozilla Dev Network: https://developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions
You can use this code:
var match = ows_MetaInfo.match(/_Comments:SW\|([^\n]+)/);
if (match)
document.writeln(match[1]);
I'm far from competent with RegEx, so here is my RegEx-less solution. See comments for further detail.
var extractedText = ExtractText(ows_MetaInfo);
function ExtractText(arg) {
// Use the pipe delimiter to turn the string into an array
var aryValues = ows_MetaInfo.split("|");
// Find the portion of the array that contains "vti_categories:VW"
for (var i = 0; i < aryValues.length; i++) {
if (aryValues[i].search("vti_categories:VW") != -1)
return aryValues[i].replace("vti_categories:VW", "");
}
return null;
}​
Here's a working fiddle to demonstrate.

Javascript Regexp - Match Characters after a certain phrase

I was wondering how to use a regexp to match a phrase that comes after a certain match. Like:
var phrase = "yesthisismyphrase=thisiswhatIwantmatched";
var match = /phrase=.*/;
That will match from the phrase= to the end of the string, but is it possible to get everything after the phrase= without having to modify a string?
You use capture groups (denoted by parenthesis).
When you execute the regex via match or exec function, the return an array consisting of the substrings captured by capture groups. You can then access what got captured via that array. E.g.:
var phrase = "yesthisismyphrase=thisiswhatIwantmatched";
var myRegexp = /phrase=(.*)/;
var match = myRegexp.exec(phrase);
alert(match[1]);
or
var arr = phrase.match(/phrase=(.*)/);
if (arr != null) { // Did it match?
alert(arr[1]);
}
phrase.match(/phrase=(.*)/)[1]
returns
"thisiswhatIwantmatched"
The brackets specify a so-called capture group. Contents of capture groups get put into the resulting array, starting from 1 (0 is the whole match).
It is not so hard, Just assume your context is :
const context = "https://example.com/pa/GIx89GdmkABJEAAA+AAAA";
And we wanna have the pattern after pa/, so use this code:
const pattern = context.match(/pa\/(.*)/)[1];
The first item include pa/, but for the grouping second item is without pa/, you can use each what you want.
Let try this, I hope it work
var p = /\b([\w|\W]+)\1+(\=)([\w|\W]+)\1+\b/;
console.log(p.test('case1 or AA=AA ilkjoi'));
console.log(p.test('case2 or AA=AB'));
console.log(p.test('case3 or 12=14'));
If you want to get value after the regex excluding the test phrase, use this:
/(?:phrase=)(.*)/
the result will be
0: "phrase=thisiswhatIwantmatched" //full match
1: "thisiswhatIwantmatched" //matching group

Categories