what's the meaning of the below regex in javascript - javascript

data.replace(/(.*)/g, '$1')
I encountered the above in smashing nodejs, can someone quickly explain this syntax? I'm new to Regex.

. means match characters except new line.
* matches 0 or more of the preceeding token. This is a greedy match, and will match as many characters as possible before satisfying the next token.
$1 refers to the matched group.
g modifier means global, which in turn means,
"don't stop at the first match. Continue to match even after that"
Basically what it is doing is capturing every character into a group until it encounters a \n(newline) and replacing it with the same.
There is no change in this operation and you should avoid doing this.

. can be any character, except the newline character, and * quantifier means that . can be matched 0 to unlimited times. So, it matches all the characters in the data. The parenthesis around .*, group all the matched characters into a group and $1 refers to the first captured group. So, we basically match all the characters and replace that with the matched characters.
It is similar to doing
str.replace(str1, str1)

You found it in "Smashing Node.js". I tried and found it too. There is the code: data.replace(/(.*)/g, ' $1') there. Please notice the two leading spaces before $1. It makes the indentation of the whole text.
.* matches the whole line,
replaces it with " " + the same line,
repeats it until eof because g modifier is there

Related

How to form regex to match everything up to a "("

In javascript, how can a regular expression be formed to match everything up to and NOT including an opening parenthesis "("?
example input:
"12(pm):00"
"12(am):))"
"8(am):00"
ive found /^(.*?)\(/ to be successful with the "up to" part, but the match returned includes the "("
In regex101.com, its says the first capturing group is what im looking for, is there a way to return only the captured group?
There are three ways to deal with this. The first is to restrict the characters you match to not include the parenthesis:
let match = "12(pm):00".match(/[^(]*/);
console.log(match[0]);
The second is to only get the part of the match you are interested in, using capture groups:
let match = "12(pm):00".match(/(.*?)\(/);
console.log(match[1]);
The third is to use lookahead to explicitly exclude the parenthesis from the match:
let match = "12(pm):00".match(/.*?(?=\()/);
console.log(match[0]);
As in OP, note the non-greedy modifier in the second and third case: it is necessary to restrict the quantifier in case there is another open parenthesis further inside the string. This is not necessary in the first place, since the quantifier is explicitly forbidden to gobble up the parenthesis.
Try
^\d+
^ asserts position at start of a line
\d matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
https://regex101.com/r/C9XNT4/1

(/\s+(\W)/g, '$1') - how are the spaces being removed?

let a = ' lots of spaces in this ! '
console.log(a.replace(/\s+(\W)/g, '$1'))
log shows lots of spaces in this!
The above regex does exactly what I want, but I am trying to understand why?
I understand the following:
s+ is looking for 1 or more spaces
(\W) is capturing the non-alphanumeric characters
/g - global, search/replace all
$1 returns the prior alphanumeric character
The capture/$1 is what removes the space between the words This and !
I get it, but what I don't get is HOW are all the other spaces being removed?? I don't believe I have asked for them to (although I am happy they are).
I get this one console.log(a.replace(/\s+/g, ' ')); because the replace is replacing 1 or more spaces between alphanumeric characters with a single space ' '.
I'm scratching my head to understand HOW the first RegEx /\s+(\W)/g, '$1'replaces 1 or more spaces with a single space.
What your regex says is "match one or more spaces, followed by one or more non-alphanumeric character, and replace that whole result with that one or more non-alphanumeric character". The key is that the \s+ is greedy, meaning that it will try and match as many characters as possible. So in any given string of spaces it will try and match all of the spaces it can. However, your regex also requires one or more non-word characters (\W+). Because in your case the next character after each final space is a word character (i.e. a letter), this last part of the regex must match the last space.
Therefore, given the string a b, and using parens to mark the \s+ and \W+ matches, a( )( )b is the only way for the regex to be valid (\s+ matches the first two spaces and \W+ matches the last space). Now it's just a simple substitution. Since you wrapped the \W+ in parentheses that makes it the first and only capturing group, so replacing the match with $1 will replace it with that final space.
As another example, running this replace against a !b will result in the match looking like a( )(!)b (since ! is now the last non-word character), so the final replaced result will be a!b.
Lets take this string 'aaa &bbb' and run it through.
We get 'aaa&bbb'
\s+ grabs the 3 spaces before the ampersand
(\W) grabs the ampersand
$1 is the ampersand and replaces ' &' with '&'
That same principal applies to the spaces. You are forcing one of the spaces to satisfy the (\W) capture group for the replacement. It's also why your exclamation point isn't nuked.
List of matches would be the following. I replaced space with ☹ so it is easier to see
"☹☹☹☹(☹)",
"☹☹☹☹(☹)",
"☹☹(!)",
"☹(☹)"
And the code is saying to replace the match with what is in the capture group.
' lots of☹☹☹☹(☹)spaces☹☹☹☹(☹)in this☹☹(!)☹(☹)'
so when you replace it you get
' lots of☹spaces☹in this!☹'

Regex get entire word starting with #

basically im trying to create a regex pattern that get every word that starts with an #
For example :
#Server1:IP:Name Just a few words more
the pattern should find "#Server1:IP:Name"
Ive created a regex pattern that worked so far :
/#\w+/
The problem is everything after a colon wont get matched anymore. If i use this regex i get this as a result for example :
#Server1
how do i make sure it will get the entire word starting with an # and ignoring colons in it?
it is works fine try it:
#\w\S+
https://regex101.com/
\w matches any word character (equal to [a-zA-Z0-9_])
matches any non-whitespace character
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
You can use this
#[\w\s:]+
\w matches any word character (equal to [a-zA-Z0-9_])
\s matches any whitespace character (equal to [\r\n\t\f\v ])
: matches the character : literally (case sensitive)
If your string contains any (!##$%^&*()_+.) you could add them too.
try this #\S+
It gives you everything between "#" and the next space.
\S matches any non-whitespace character.
refer this

Regex, replace all words starting with #

I have this regular expression that puts all the words that starts with # into span tags.
I've accomplished what is needed but i'm not sure that i completely understand what i did here.
content.replace(/(#\S+)/gi,"<span>$1</span>")
The () means to match a whole word, right?
The # means start with #.
The \S means "followed by anything until a whitespaces" .
But how come that if don't add the + sign after the \S , it matches only the first letter?
Any input would be appreciated .
\S is any non-whitespace character and a+ means one or more of a. So
#\S -> An # followed by one non-whitespace character.
#\S+ -> An # followed by one or more non-whitespace characters
Sharing code to change hashtags into links
var p = $("p");
var string = p.text();
p.html(string.replace(/#(\S+)/gi,'#$1'));
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<p>Test you code here #abc #123 #xyz</p>
content.replace(/(#\S+)/gi,"<span>$1</span>")
(#\S+) is a capturing group which captures # followed by 1 or more (+ means 1 or more) non-whitespace characters (\S is a non-whitespace character)
g means global, ie replace all instances, not just the first match
i means case insensitive
$1 fetches what was captured by the first capturing group.
So, the i is unnecessary, but won't affect anything.
/(#\S+)gi/
1st Capturing group (#\S+)
# matches the character # literally
\S+ match any non-white space character [^\r\n\t\f ]
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
g - all the matches not just first
i - case insensitive match
The \S means "followed by anything until a whitespaces" .
That's not what \S means. It's "any character that's not a whitespace", that is, one character that's not a whitespace.

/(\S)\1(\1)+/g matching all occurrences of three equal non-whitespace characters following each other

Its given: /(\S)\1(\1)+/g matches all occurrences of three equal non-whitespace characters following each other.
I don't understand why there is () around (\S) and 2nd (\1), but not around 1st (\1). Can anyone help in explaining how above regex works?
src: http://www.javascriptkit.com/javatutors/redev2.shtml
Thnx in advance.
The \S needs parentheses to capture its value, so you can refer back to the captured value with \1. \1 means "match the same text which capturing group #1 matched".
I believe there is a problem with this regex. You said you want to match "three equal non-whitespace characters". But the + will make this match 3 or more equal, consecutive non-whitespace characters.
The g on the end means "apply this regex over the entire input string, or globally".
The second set of parentheses is not necessary. It needlessly captures the repeated character a second time, while matching the same strings as this regex:
/(\S)\1\1+/g
Also, as #AlexD pointed out, the description should say that it matches at least three characters. If you replaced that regex with BONK in the string fooxxxxxxbar:
'fooxxxxxxbar'.replace(/(\S)\1\1+/g, 'BONK')
..you might expect the result to be fooBONKBONKbar from their description, because there are two sets of three 'x's. But in fact the result would be fooBONKbar; the first \1 matches the second 'x', and the \1+ matches the third 'x' and any 'x's that follow it. If they wanted to match just three characters, they should have left the + off.
I noticed several other sloppy descriptions like that, plus at least one outright error: \B is equivalent to (?!\b) (a position that's not a word boundary), not [^\b] (a character that's not a backspace). For that matter, their description of word boundaries--"the position between a word and a space"--is wrong, too. A word boundary isn't defined by any particular character, like a space--in fact, it can just as well be the absence of any character that creates one. The string:
Word
...starts with a word boundary because 'W' is a word character and, being first, it's not preceded by another word character. Similarly, the 'd' is not followed by another word character, so the end of the string is also a word boundary.
Also, a regex doesn't know from words, only word characters. The definition of a word character can vary depending on the regex flavor and Unicode or locale settings, but it always includes [A-Za-z0-9_] (ASCII letters and digits plus the underscore). A word boundary is simply a position that's between one of those characters and any other character (or no other character, as I explained earlier).
If you want to learn about regexes, I suggest you forget that site and start here instead: regular-expressions.info.

Categories