Explanation of this Regular expression

Explanation of this Regular expression - javascript

I'm not very good with Regular Expressions, and I didn't fully understood this one, All I get from this is that it find every h1 and add a class to it's last word.
$("h1").html(function(index, old) {
return old.replace(/(\b\w+)$/, '<span class="myClass">$1</span>');
});
I'm trying to make it work by last two characters

Here is and explanation:
/ : regex delimter
( : begin capture group #1
\b : word boundary
\w+ : one or more word character (same as [a-zA-Z0-9_]+)
) : end of group
$ : end of string
/ : regex delimiter
It matches the last word of the string, ie the last word of the h1 tag.

This (poorly written) regex finds a sequence of word characters (latin letters, numbers and underscore) at the end of the input. The same can be achieved much simpler: /\w+$/, so neither \b nor parens are actually necessary here.
To match two last words you'll need something like
/\w+(?=(\W+\w+)?$)/g
which means "a word, optionally followed by another word before the end of the input".
To match two last characters -- well, this is something you should be able to figure out on your own (hint: any character is . (dot) in regex language).

Related

(/\s+(\W)/g, '$1') - how are the spaces being removed?

let a = ' lots of spaces in this ! '
console.log(a.replace(/\s+(\W)/g, '$1'))
log shows lots of spaces in this!
The above regex does exactly what I want, but I am trying to understand why?
I understand the following:
s+ is looking for 1 or more spaces
(\W) is capturing the non-alphanumeric characters
/g - global, search/replace all
$1 returns the prior alphanumeric character
The capture/$1 is what removes the space between the words This and !
I get it, but what I don't get is HOW are all the other spaces being removed?? I don't believe I have asked for them to (although I am happy they are).
I get this one console.log(a.replace(/\s+/g, ' ')); because the replace is replacing 1 or more spaces between alphanumeric characters with a single space ' '.
I'm scratching my head to understand HOW the first RegEx /\s+(\W)/g, '$1'replaces 1 or more spaces with a single space.

What your regex says is "match one or more spaces, followed by one or more non-alphanumeric character, and replace that whole result with that one or more non-alphanumeric character". The key is that the \s+ is greedy, meaning that it will try and match as many characters as possible. So in any given string of spaces it will try and match all of the spaces it can. However, your regex also requires one or more non-word characters (\W+). Because in your case the next character after each final space is a word character (i.e. a letter), this last part of the regex must match the last space.
Therefore, given the string a b, and using parens to mark the \s+ and \W+ matches, a( )( )b is the only way for the regex to be valid (\s+ matches the first two spaces and \W+ matches the last space). Now it's just a simple substitution. Since you wrapped the \W+ in parentheses that makes it the first and only capturing group, so replacing the match with $1 will replace it with that final space.
As another example, running this replace against a !b will result in the match looking like a( )(!)b (since ! is now the last non-word character), so the final replaced result will be a!b.

Lets take this string 'aaa &bbb' and run it through.
We get 'aaa&bbb'
\s+ grabs the 3 spaces before the ampersand
(\W) grabs the ampersand
$1 is the ampersand and replaces ' &' with '&'
That same principal applies to the spaces. You are forcing one of the spaces to satisfy the (\W) capture group for the replacement. It's also why your exclamation point isn't nuked.

List of matches would be the following. I replaced space with ☹ so it is easier to see
"☹☹☹☹(☹)",
"☹☹☹☹(☹)",
"☹☹(!)",
"☹(☹)"
And the code is saying to replace the match with what is in the capture group.
' lots of☹☹☹☹(☹)spaces☹☹☹☹(☹)in this☹☹(!)☹(☹)'
so when you replace it you get
' lots of☹spaces☹in this!☹'

Grab full regex word if pattern inside it matches

How do I retrieve an entire word that has a specific portion of it that matches a regex?
For example, I have the below text.
Using ^.[\.\?\!:;,]{2,} , I match the first 3, but not the last. The last should be matched as well, but $ doesn't seem to produce anything.
a!!!!!!
n.......
c..,;,;,,
huhuhu..
I want to get all strings that have an occurrence of certain characters equal to or more than twice. I produced the aforementioned regex, but on Rubular it only matches the characters themselves, not the entire string. Using ^ and $
I've read a few stackoverflow posts similar, but not quite what I'm looking for.

Change your regex to:
/^.*[.?!:;,]{2,}/gm
i.e. match 0 more character before 2 of those special characters.
RegEx Demo

If I understand well you are trying to match an entire string that contains at least the same punctuation character two times:
^.*?([.?!:;,])\1.*
Note: if your string has newline characters, change .* to [\s\S]*
The trick is here:
([.?!:;,]) # captures the punct character in group 1
\1 # refers to the character captured in group 1

Replace function does only replace every second regex match

I would like to use regex in javascript to put a zero before every number that has exactly one digit.
When i debug the code in the chrome debugger it gives me a strange result where only every second match the zero is put.
My regex
"3-3-7-3-9-8-10-5".replace(/(\-|^)(\d)(\-|$)/g, "$10$2$3");
And the result i get from this
"03-3-07-3-09-8-10-05"
Thanks for the help

Use word boundaries,
(\b\d\b)
Replacement string:
0$1
DEMO
> "3-3-7-3-9-8-10-5".replace(/(\b\d\b)/g, "0$1")
'03-03-07-03-09-08-10-05'
Explanation:
( starting point of first Capturing group.
\b Matches between a word character and a non word character.
\d Matches a single digit.
\b Matches between a word character and a non word character.
) End of first Capturing group.

You can use this better lookahead based regex to prefix 0 before every single digit number:
"3-3-7-3-9-8-10-5".replace(/\b(\d)\b(?=-|$)/g, "0$1");
//=> "03-03-07-03-09-08-10-05"
Reason why you're getting alternate prefixes in your regex:
"3-3-7-3-9-8-10-5".replace(/(\-|^)(\d)(\-|$)/g, "$10$2$3");
is that rather than looking ahead you're actually matching hyphen after the digit. Once a hyphen has been matched it is not matched again since internal regex pointer has already moved ahead.

use a positive lookahead to see the one digit numbers :
"3-3-7-3-9-8-10-5".replace(/(?=\b\d\b)/g, "0");

/(\S)\1(\1)+/g matching all occurrences of three equal non-whitespace characters following each other

Its given: /(\S)\1(\1)+/g matches all occurrences of three equal non-whitespace characters following each other.
I don't understand why there is () around (\S) and 2nd (\1), but not around 1st (\1). Can anyone help in explaining how above regex works?
src: http://www.javascriptkit.com/javatutors/redev2.shtml
Thnx in advance.

The \S needs parentheses to capture its value, so you can refer back to the captured value with \1. \1 means "match the same text which capturing group #1 matched".
I believe there is a problem with this regex. You said you want to match "three equal non-whitespace characters". But the + will make this match 3 or more equal, consecutive non-whitespace characters.
The g on the end means "apply this regex over the entire input string, or globally".

The second set of parentheses is not necessary. It needlessly captures the repeated character a second time, while matching the same strings as this regex:
/(\S)\1\1+/g
Also, as #AlexD pointed out, the description should say that it matches at least three characters. If you replaced that regex with BONK in the string fooxxxxxxbar:
'fooxxxxxxbar'.replace(/(\S)\1\1+/g, 'BONK')
..you might expect the result to be fooBONKBONKbar from their description, because there are two sets of three 'x's. But in fact the result would be fooBONKbar; the first \1 matches the second 'x', and the \1+ matches the third 'x' and any 'x's that follow it. If they wanted to match just three characters, they should have left the + off.
I noticed several other sloppy descriptions like that, plus at least one outright error: \B is equivalent to (?!\b) (a position that's not a word boundary), not [^\b] (a character that's not a backspace). For that matter, their description of word boundaries--"the position between a word and a space"--is wrong, too. A word boundary isn't defined by any particular character, like a space--in fact, it can just as well be the absence of any character that creates one. The string:
Word
...starts with a word boundary because 'W' is a word character and, being first, it's not preceded by another word character. Similarly, the 'd' is not followed by another word character, so the end of the string is also a word boundary.
Also, a regex doesn't know from words, only word characters. The definition of a word character can vary depending on the regex flavor and Unicode or locale settings, but it always includes [A-Za-z0-9_] (ASCII letters and digits plus the underscore). A word boundary is simply a position that's between one of those characters and any other character (or no other character, as I explained earlier).
If you want to learn about regexes, I suggest you forget that site and start here instead: regular-expressions.info.

Find the first letter of the last word with jquery inside a string (string can have multiple words)

Hy, is there a way to find the first letter of the last word in a string? The strings are results in a XML parser function. Inside the each() loop i get all the nodes and put every name inside a variable like this: var person = xml.find("name").find().text()
Now person holds a string, it could be:
Anamaria Forrest Gump
John Lock
As you see, the first string holds 3 words, while the second holds 2 words.
What i need are the first letters from the last words: "G", "L",
How do i accomplish this? TY

This should do it:
var person = xml.find("name").find().text();
var names = person.split(' ');
var firstLetterOfSurname = names[names.length - 1].charAt(0);

This solution will work even if your string contains a single word. It returns the desired character:
myString.match(/(\w)\w*$/)[1];
Explanation: "Match a word character (and memorize it) (\w), then match any number of word characters \w*, then match the end of the string $". In other words : "Match a sequence of word characters at the end of the string (and memorize the first of these word characters)". match returns an array with the whole match in [0] and then the memorized strings in [1], [2], etc. Here we want [1].
Regexps are enclosed in / in javascript : http://www.w3schools.com/js/js_obj_regexp.asp

You can hack it with regex:
'Marry Jo Poppins'.replace(/^.*\s+(\w)\w+$/, "$1"); // P
'Anamaria Forrest Gump'.replace(/^.*\s+(\w)\w+$/, "$1"); // G
Otherwise Mark B's answer is fine, too :)
edit:
Alsciende's regex+javascript combo myString.match(/(\w)\w*$/)[1] is probably a little more versatile than mine.
regular expression explanation
/^.*\s+(\w)\w+$/
^ beginning of input string
.* followed by any character (.) 0 or more times (*)
\s+ followed by any whitespace (\s) 1 or more times (+)
( group and capture to $1
\w followed by any word character (\w)
) end capture
\w+ followed by any word character (\w) 1 or more times (+)
$ end of string (before newline (\n))
Alsciende's regex
/(\w)\w*$/
( group and capture to $1
\w any word character
) end capture
\w* any word character (\w) 0 or more times (*)
summary
Regular expressions are awesomely powerful, or as you might say, "Godlike!" Regular-Expressions.info is a great starting point if you'd like to learn more.
Hope this helps :)

We Keep Coding

JavaScript is the programming language of the Web.