Regex grouping and context

Regex grouping and context - javascript

I've been messing around with something I saw on twitter which to me makes sense but doesn't work as expected.
Why doesn't the first example work? (My regex could be wrong I suppose but it looks ok)
I expected the context of my toUpperCase call to be the group and be a shorthand version of the second example.
var output = "james".replace(/(^.+?)/,"".toUpperCase.call("$1"));
var output2 = "james".replace(/(^.+?)/,function(a){
return "".toUpperCase.call(a);
});
console.log(output); // outputs james
console.log(output2); // outputs James
Edit I fixed the regex from a M42 comment. Bad pasting on my part.

#M42 is correct, in that there's no reason that your first regex would match "james". But fixing it won't work either:
var output = "james".replace(/(^.+?)/,"".toUpperCase.call("$1"));
console.log(output); // outputs james
That's because there are two options for the second argument to replace(): a string or a function. If it's a string, then "$1" will be replaced with the first match (and so on). If it's a function, then the first argument will be the first match (and so on).
In your second example, you're using the function parameter, and correctly getting the first match as an argument. But in your first example, you're passing in the result of the function call "".toUpperCase.call("$1") - which, when you run it, returns the string "$1". So the first example is actually using the string argument "$1" for .replace(), which does nothing but replace the first match in the string with itself:
"james".replace(/(^.+?)/,"$1"); // "james"
That's why this won't work as a shorthand - you're not actually passing in a function.

The first example doesn't work because there is no match (i.e. there're no parens around the name). And more the hanchor ^ must not be placed after some char.
your regex:
/ : delimiter
\( : open parens
( : begining of group 1
^ : start of string
.+? : one or more char non greedy
) : end of group 1
\) : close parens
/ : delimiter
this doesn't match james, so there is no replace or upperCased

Related

regular expression find end of string

I have troubles with a regular expression.
I want to replace all ocurrences of myData=xxxx& xxxx can change, but always ends with &, except the last ocurrence, when it is myData=xxx.
var data = "the text myData=data1& and &myData=otherData& and end myData=endofstring"
data.replace(/myData=.*?&/g,'newData');
it returns :
the text newData and &newData and end myData=endofstring
which is correct, but how can I detect the last one?

Two things:
You need to assign the result of replace somewhere, which you're not doing in your question's code
You can use an alternation (|) to match either & or end of string
So:
var data = "the text myData=data1& and &myData=otherData& and end myData=endofstring"
data = data.replace(/myData=.*?(?:&|$)/g,'newData');
// ^^^^^^^-- 1 ^^^^^^^-- 2
console.log(data);
Note the use of a non-capturing group ((?:...)), to limit the scope of the alternation.

What about :
data="myData=abc& and linked with something else";
data.replace(/myData=.*?&/g,'newData');
https://jsfiddle.net/ob8c2j9v/

Javascript regex match returning a string with comma at the end

Just as the title says...i'm trying to parse a string for example
2x + 3y
and i'm trying to get only the coefficients (i.e. 2 and 3)
I first tokenized it with space character as delimiter giving me "2x" "+" "3y"
then i parsed it again to this statement to get only the coefficients
var number = eqTokens[i].match(/(\-)?\d+/);
I tried printing the output but it gave me "2,"
why is it printing like this and how do i fix it? i tried using:
number = number.replace(/[,]/, "");
but this just gives me an error that number.replace is not a function

What's wrong with this?
> "2x + 3y".match(/-?\d+(?=[A-Za-z]+)/g)
[ '2', '3' ]
The above regex would match the numbers only if it's followed by one or more alphabets.

Match is going to return an array of every match. Since you put the optional negative in a parentheses, it's another capture group. That capture group has one term and it's optional, so it'll return an empty match in addition to your actual match.
Input 2x -> Your output: [2,undefined] which prints out as "2,"
Input -2x -> Your output: [2,-]
Remove the parentheses around the negative.
This is just for the sake of explaining why your case is breaking but personally I'd use Avinash's answer.

`match` and `exec` with non-global regex appear to return the first match twice

I do not quite understand the behaviour of the JavaScript regex methods.
The problem is that I can’t get regexes of type /(something|something)/ to work with the match or exec methods without the global identifier, e.g. /(somereg1|somereg2)/g.
When the global identifier is there, the methods correctly return every instance it finds. But when it is not there, both methods correctly return only the first match they find. The problem is that they appear to return it twice. For instance:
const str = "Here is somereg1 and somereg2";
str.match(/(somereg1|somereg2)/)
I would expect this match call to return "somereg1". Instead it appears to return "somereg1,somereg1".
Check this JSFiddle. The code should be fairly self explanatory. The first example is taken from W3Schools.

The first element is the full match of the regex. If you tried this:
const str = "Here is somereg1 and somereg2";
str.match(/.*(somereg1|somereg2)/)
Your result would be [ "Here is somereg1 and somereg2", "somereg2" ].
This same behaviour occurs with an .exec(str) method call.
You might want to read about .match and .exec.
About the “sub parentheses matches”: in regexes, parentheses delimit capture groups. So, if you had this regex:
/.*(somereg1).*?(somereg2)/
Your .match result would be [ "Here is somereg1 and somereg2", "somereg1", "somereg2" ]. So, as you can see, the result array consists of the full match followed by all capture groups matches.
And to force a group not to be captured, just delimit with (?: and ):
"Here is somereg1 and somereg2".match(/.*(?:somereg1).*?(somereg2)/);
// Will result in [ "Here is somereg1 and somereg2", "somereg2" ].
Note that the g (global) flag changes the return semantics of match: they will return an array of full matches and capture groups will be ignored. exec, on the other hand, always returns the full match and capture group matches of the match which is after the current lastIndex of the RegExp instance. For convenience, matchAll can be used instead, which returns an iterator of all matches, including all capture groups.

You can use the following to get the req. result:
var str = "Here is somereg1 and somereg2" //I would expect
str.match(/(?=(somereg1|somereg2))/)
As for the match and exec. I would say go for the match as it uses regex object and prevents you from double escape and all for the strings used as re.

Modify your second line as below:
str.match(/somereg1|somereg2/)

RegEx - Get All Characters After Last Slash in URL

I'm working with a Google API that returns IDs in the below format, which I've saved as a string. How can I write a Regular Expression in javascript to trim the string to only the characters after the last slash in the URL.
var id = 'http://www.google.com/m8/feeds/contacts/myemail%40gmail.com/base/nabb80191e23b7d9'

Don't write a regex! This is trivial to do with string functions instead:
var final = id.substr(id.lastIndexOf('/') + 1);
It's even easier if you know that the final part will always be 16 characters:
var final = id.substr(-16);

A slightly different regex approach:
var afterSlashChars = id.match(/\/([^\/]+)\/?$/)[1];
Breaking down this regex:
\/ match a slash
( start of a captured group within the match
[^\/] match a non-slash character
+ match one of more of the non-slash characters
) end of the captured group
\/? allow one optional / at the end of the string
$ match to the end of the string
The [1] then retrieves the first captured group within the match
Working snippet:
var id = 'http://www.google.com/m8/feeds/contacts/myemail%40gmail.com/base/nabb80191e23b7d9';
var afterSlashChars = id.match(/\/([^\/]+)\/?$/)[1];
// display result
document.write(afterSlashChars);

Just in case someone else comes across this thread and is looking for a simple JS solution:
id.split('/').pop(-1)

this is easy to understand (?!.*/).+
let me explain:
first, lets match everything that has a slash at the end, ok?
that's the part we don't want
.*/ matches everything until the last slash
then, we make a "Negative lookahead" (?!) to say "I don't want this, discard it"
(?!.*) this is "Negative lookahead"
Now we can happily take whatever is next to what we don't want with this
.+
YOU MAY NEED TO ESCAPE THE / SO IT BECOMES:
(?!.*\/).+

this regexp: [^\/]+$ - works like a champ:
var id = ".../base/nabb80191e23b7d9"
result = id.match(/[^\/]+$/)[0];
// results -> "nabb80191e23b7d9"

This should work:
last = id.match(/\/([^/]*)$/)[1];
//=> nabb80191e23b7d9

Don't know JS, using others examples (and a guess) -
id = id.match(/[^\/]*$/); // [0] optional ?

Why not use replace?
"http://google.com/aaa".replace(/(.*\/)*/,"")
yields "aaa"

How to with extract url from tweet using Regular Expressions

Ok so i'm executing the following line of code in javascript
RegExp('(http:\/\/t.co\/)[a-zA-Z0-9\-\.]{8}').exec(tcont);
where tcont is equal to some string like 'Test tweet to http://t.co/GXmaUyNL' (the content of a tweet obtained by jquery).
However it is returning, in the case above for example, 'http://t.co/GXmaUyNL,http://t.co/'.
This is frustracting because I want the url without the bit on the end - after and including the comma.
Any ideas why this is appearing? Thanks

First, get rid of the parens in the pattern - they're unnecessary:
RegExp('http:\/\/t.co\/[a-zA-Z0-9\-\.]{8}').exec(tcont);
Second, a regex match returns an array of matching groups - you want the first item in it (the entire match):
var match = RegExp('http:\/\/t.co\/[a-zA-Z0-9\-\.]{8}').exec(tcont);
if(match) {
var result = match[0];
}
The reason you had "a part on the end" is because your result is actually an array - the parens you had in the expression were resulting in an extra matching group (the portion they were around), which would be match[1].

Try this : RegExp('http:\/\/t\.co\/[a-zA-Z0-9\-\.]{8}').exec(tcont);

We Keep Coding

JavaScript is the programming language of the Web.

Regex grouping and context - javascript

Related

regular expression find end of string

Javascript regex match returning a string with comma at the end

`match` and `exec` with non-global regex appear to return the first match twice

RegEx - Get All Characters After Last Slash in URL

How to with extract url from tweet using Regular Expressions

Categories

Resources