I have the following regular expressions:
var regEx = /^\W*(.*?)\W*$/;
var regEx2 = /^\W*(.*)\W*$/;
What does (.*?) actually mean? What's the difference between (.*?) and (.*)?
Why does regEx.exec("abc ") returns ['abc ', 'abc'] in Javascript?
Why does regEx2.exec("abc ") returns ['abc ', 'abc '] in Javascript?
Adding ? after quantifier *, +, {n,m}, etc. makes reluctant/lazy matching, as opposed to the default greedy matching. It's quite intuitive from the name. Greedy means it will try to match as many as possible. Lazy means that it will try to match as few as possible.
There is no non-word \W token, so \W* matches empty string. Then (.*?) will match as few as possible but checking whether \W* can match something. So (.*?) will match and capture "abc", and \W* (non-word) will match the space.
Almost the same as above, but (.*) will eat up as much as possible and will match and capture "abc " , and \W* will be left with empty string, which it matches.
For 2 and 3, the 2nd element in the return array is the captured text by the first capturing group in the regex. The 1st element in the array is the text that matches the entire regex.
What does (.*?) actually mean?
Non-greedily match any character zero or more times, in a matching group.
Why does regEx.exec("abc ") returns ['abc ', 'abc'] in Javascript?
You get one member of the array for each matching group. The element at index 0 is the entire match, the next element is from the first (and only) matching group above.
Why does regEx2.exec("abc ") returns ['abc ', 'abc '] in Javascript?
For the same reason as above, except this time, the greedy match will match the space at the end as well, so your first capture group is identical to the full match in this case.
Okay, the easiest thing to do when looking at regular expressions I find is to break them down and write out what each part is doing.
So taking the first regular expression /^\W*(.*?)\W*$/
^ Start of search string
\W* Match a non-word character zero or more times
( Start of group
.*? Match any character (except a line terminator) zero or more times but as few as possible
) End of group
\W* Match a non-word character zero or more times
$ End of search string
The exec method searches the text and returns an array of strings (or null if it fails). The string at element 0 is the substring matched by the entire expression, strings after this are those which correspond to the individual capture groups.
So for your first example, the entire expression is capturing "abc " but the (.*?) group is capturing "abc" and so you get two items in your array
Related
I have a text like this;
[Some Text][1][Some Text][2][Some Text][3][Some Text][4]
I want to match [Some Text][2] with this regex;
/\[.*?\]\[2\]/
But it returns [Some Text][1][Some Text][2]
How can i match only [Some Text][2]?
Note : There can be any character in Some Text including [ and ] And the numbers in square brackets can be any number not only 1 and 2. The Some Text that i want to match can be at the beginning of the line and there can be multiple Some Texts
JSFiddle
The \[.*?\]\[2\] pattern works like this:
\[ - finds the leftmost [ (as the regex engine processes the string input from left to right)
.*? - matches any 0+ chars other than line break chars, as few as possible, but as many as needed for a successful match, as there are subsequent patterns, see below
\]\[2\] - ][2] substring.
So, the .*? gets expanded upon each failure until it finds the leftmost ][2]. Note the lazy quantifiers do not guarantee the "shortest" matches.
Solution
Instead of a .*? (or .*) use negated character classes that match any char but the boundary char.
\[[^\]\[]*\]\[2\]
See this regex demo.
Here, .*? is replaced with [^\]\[]* - 0 or more chars other than ] and [.
Other examples:
Strings between angle brackets: <[^<>]*> matches <...> with no < and > inside
Strings between parentheses: \([^()]*\) matches (...) with no ( and ) inside
Strings between double quotation marks: "[^"]*" matches "..." with no " inside
Strings between curly braces: \{[^{}]*} matches "..." with no " inside
In other situations, when the starting pattern is a multichar string or complex pattern, use a tempered greedy token, (?:(?!start).)*?. To match abc 1 def in abc 0 abc 1 def, use abc(?:(?!abc).)*?def.
You could try the below regex,
(?!^)(\[[A-Z].*?\]\[\d+\])
DEMO
I have a text like this;
[Some Text][1][Some Text][2][Some Text][3][Some Text][4]
I want to match [Some Text][2] with this regex;
/\[.*?\]\[2\]/
But it returns [Some Text][1][Some Text][2]
How can i match only [Some Text][2]?
Note : There can be any character in Some Text including [ and ] And the numbers in square brackets can be any number not only 1 and 2. The Some Text that i want to match can be at the beginning of the line and there can be multiple Some Texts
JSFiddle
The \[.*?\]\[2\] pattern works like this:
\[ - finds the leftmost [ (as the regex engine processes the string input from left to right)
.*? - matches any 0+ chars other than line break chars, as few as possible, but as many as needed for a successful match, as there are subsequent patterns, see below
\]\[2\] - ][2] substring.
So, the .*? gets expanded upon each failure until it finds the leftmost ][2]. Note the lazy quantifiers do not guarantee the "shortest" matches.
Solution
Instead of a .*? (or .*) use negated character classes that match any char but the boundary char.
\[[^\]\[]*\]\[2\]
See this regex demo.
Here, .*? is replaced with [^\]\[]* - 0 or more chars other than ] and [.
Other examples:
Strings between angle brackets: <[^<>]*> matches <...> with no < and > inside
Strings between parentheses: \([^()]*\) matches (...) with no ( and ) inside
Strings between double quotation marks: "[^"]*" matches "..." with no " inside
Strings between curly braces: \{[^{}]*} matches "..." with no " inside
In other situations, when the starting pattern is a multichar string or complex pattern, use a tempered greedy token, (?:(?!start).)*?. To match abc 1 def in abc 0 abc 1 def, use abc(?:(?!abc).)*?def.
You could try the below regex,
(?!^)(\[[A-Z].*?\]\[\d+\])
DEMO
let a = ' lots of spaces in this ! '
console.log(a.replace(/\s+(\W)/g, '$1'))
log shows lots of spaces in this!
The above regex does exactly what I want, but I am trying to understand why?
I understand the following:
s+ is looking for 1 or more spaces
(\W) is capturing the non-alphanumeric characters
/g - global, search/replace all
$1 returns the prior alphanumeric character
The capture/$1 is what removes the space between the words This and !
I get it, but what I don't get is HOW are all the other spaces being removed?? I don't believe I have asked for them to (although I am happy they are).
I get this one console.log(a.replace(/\s+/g, ' ')); because the replace is replacing 1 or more spaces between alphanumeric characters with a single space ' '.
I'm scratching my head to understand HOW the first RegEx /\s+(\W)/g, '$1'replaces 1 or more spaces with a single space.
What your regex says is "match one or more spaces, followed by one or more non-alphanumeric character, and replace that whole result with that one or more non-alphanumeric character". The key is that the \s+ is greedy, meaning that it will try and match as many characters as possible. So in any given string of spaces it will try and match all of the spaces it can. However, your regex also requires one or more non-word characters (\W+). Because in your case the next character after each final space is a word character (i.e. a letter), this last part of the regex must match the last space.
Therefore, given the string a b, and using parens to mark the \s+ and \W+ matches, a( )( )b is the only way for the regex to be valid (\s+ matches the first two spaces and \W+ matches the last space). Now it's just a simple substitution. Since you wrapped the \W+ in parentheses that makes it the first and only capturing group, so replacing the match with $1 will replace it with that final space.
As another example, running this replace against a !b will result in the match looking like a( )(!)b (since ! is now the last non-word character), so the final replaced result will be a!b.
Lets take this string 'aaa &bbb' and run it through.
We get 'aaa&bbb'
\s+ grabs the 3 spaces before the ampersand
(\W) grabs the ampersand
$1 is the ampersand and replaces ' &' with '&'
That same principal applies to the spaces. You are forcing one of the spaces to satisfy the (\W) capture group for the replacement. It's also why your exclamation point isn't nuked.
List of matches would be the following. I replaced space with ☹ so it is easier to see
"☹☹☹☹(☹)",
"☹☹☹☹(☹)",
"☹☹(!)",
"☹(☹)"
And the code is saying to replace the match with what is in the capture group.
' lots of☹☹☹☹(☹)spaces☹☹☹☹(☹)in this☹☹(!)☹(☹)'
so when you replace it you get
' lots of☹spaces☹in this!☹'
How do I retrieve an entire word that has a specific portion of it that matches a regex?
For example, I have the below text.
Using ^.[\.\?\!:;,]{2,} , I match the first 3, but not the last. The last should be matched as well, but $ doesn't seem to produce anything.
a!!!!!!
n.......
c..,;,;,,
huhuhu..
I want to get all strings that have an occurrence of certain characters equal to or more than twice. I produced the aforementioned regex, but on Rubular it only matches the characters themselves, not the entire string. Using ^ and $
I've read a few stackoverflow posts similar, but not quite what I'm looking for.
Change your regex to:
/^.*[.?!:;,]{2,}/gm
i.e. match 0 more character before 2 of those special characters.
RegEx Demo
If I understand well you are trying to match an entire string that contains at least the same punctuation character two times:
^.*?([.?!:;,])\1.*
Note: if your string has newline characters, change .* to [\s\S]*
The trick is here:
([.?!:;,]) # captures the punct character in group 1
\1 # refers to the character captured in group 1
Hy, is there a way to find the first letter of the last word in a string? The strings are results in a XML parser function. Inside the each() loop i get all the nodes and put every name inside a variable like this: var person = xml.find("name").find().text()
Now person holds a string, it could be:
Anamaria Forrest Gump
John Lock
As you see, the first string holds 3 words, while the second holds 2 words.
What i need are the first letters from the last words: "G", "L",
How do i accomplish this? TY
This should do it:
var person = xml.find("name").find().text();
var names = person.split(' ');
var firstLetterOfSurname = names[names.length - 1].charAt(0);
This solution will work even if your string contains a single word. It returns the desired character:
myString.match(/(\w)\w*$/)[1];
Explanation: "Match a word character (and memorize it) (\w), then match any number of word characters \w*, then match the end of the string $". In other words : "Match a sequence of word characters at the end of the string (and memorize the first of these word characters)". match returns an array with the whole match in [0] and then the memorized strings in [1], [2], etc. Here we want [1].
Regexps are enclosed in / in javascript : http://www.w3schools.com/js/js_obj_regexp.asp
You can hack it with regex:
'Marry Jo Poppins'.replace(/^.*\s+(\w)\w+$/, "$1"); // P
'Anamaria Forrest Gump'.replace(/^.*\s+(\w)\w+$/, "$1"); // G
Otherwise Mark B's answer is fine, too :)
edit:
Alsciende's regex+javascript combo myString.match(/(\w)\w*$/)[1] is probably a little more versatile than mine.
regular expression explanation
/^.*\s+(\w)\w+$/
^ beginning of input string
.* followed by any character (.) 0 or more times (*)
\s+ followed by any whitespace (\s) 1 or more times (+)
( group and capture to $1
\w followed by any word character (\w)
) end capture
\w+ followed by any word character (\w) 1 or more times (+)
$ end of string (before newline (\n))
Alsciende's regex
/(\w)\w*$/
( group and capture to $1
\w any word character
) end capture
\w* any word character (\w) 0 or more times (*)
summary
Regular expressions are awesomely powerful, or as you might say, "Godlike!" Regular-Expressions.info is a great starting point if you'd like to learn more.
Hope this helps :)