Regex get entire word starting with # - javascript

basically im trying to create a regex pattern that get every word that starts with an #
For example :
#Server1:IP:Name Just a few words more
the pattern should find "#Server1:IP:Name"
Ive created a regex pattern that worked so far :
/#\w+/
The problem is everything after a colon wont get matched anymore. If i use this regex i get this as a result for example :
#Server1
how do i make sure it will get the entire word starting with an # and ignoring colons in it?

it is works fine try it:
#\w\S+
https://regex101.com/
\w matches any word character (equal to [a-zA-Z0-9_])
matches any non-whitespace character
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)

You can use this
#[\w\s:]+
\w matches any word character (equal to [a-zA-Z0-9_])
\s matches any whitespace character (equal to [\r\n\t\f\v ])
: matches the character : literally (case sensitive)
If your string contains any (!##$%^&*()_+.) you could add them too.

try this #\S+
It gives you everything between "#" and the next space.
\S matches any non-whitespace character.
refer this

Related

(/\s+(\W)/g, '$1') - how are the spaces being removed?

let a = ' lots of spaces in this ! '
console.log(a.replace(/\s+(\W)/g, '$1'))
log shows lots of spaces in this!
The above regex does exactly what I want, but I am trying to understand why?
I understand the following:
s+ is looking for 1 or more spaces
(\W) is capturing the non-alphanumeric characters
/g - global, search/replace all
$1 returns the prior alphanumeric character
The capture/$1 is what removes the space between the words This and !
I get it, but what I don't get is HOW are all the other spaces being removed?? I don't believe I have asked for them to (although I am happy they are).
I get this one console.log(a.replace(/\s+/g, ' ')); because the replace is replacing 1 or more spaces between alphanumeric characters with a single space ' '.
I'm scratching my head to understand HOW the first RegEx /\s+(\W)/g, '$1'replaces 1 or more spaces with a single space.
What your regex says is "match one or more spaces, followed by one or more non-alphanumeric character, and replace that whole result with that one or more non-alphanumeric character". The key is that the \s+ is greedy, meaning that it will try and match as many characters as possible. So in any given string of spaces it will try and match all of the spaces it can. However, your regex also requires one or more non-word characters (\W+). Because in your case the next character after each final space is a word character (i.e. a letter), this last part of the regex must match the last space.
Therefore, given the string a b, and using parens to mark the \s+ and \W+ matches, a( )( )b is the only way for the regex to be valid (\s+ matches the first two spaces and \W+ matches the last space). Now it's just a simple substitution. Since you wrapped the \W+ in parentheses that makes it the first and only capturing group, so replacing the match with $1 will replace it with that final space.
As another example, running this replace against a !b will result in the match looking like a( )(!)b (since ! is now the last non-word character), so the final replaced result will be a!b.
Lets take this string 'aaa &bbb' and run it through.
We get 'aaa&bbb'
\s+ grabs the 3 spaces before the ampersand
(\W) grabs the ampersand
$1 is the ampersand and replaces ' &' with '&'
That same principal applies to the spaces. You are forcing one of the spaces to satisfy the (\W) capture group for the replacement. It's also why your exclamation point isn't nuked.
List of matches would be the following. I replaced space with ☹ so it is easier to see
"☹☹☹☹(☹)",
"☹☹☹☹(☹)",
"☹☹(!)",
"☹(☹)"
And the code is saying to replace the match with what is in the capture group.
' lots of☹☹☹☹(☹)spaces☹☹☹☹(☹)in this☹☹(!)☹(☹)'
so when you replace it you get
' lots of☹spaces☹in this!☹'

Lookaheads to delimit text

I'm trying to delimit a huge text with several documents inside. Each document starts with the word 'MINISTÉRIO', so i'm trying to use lookaheads to catch everything from MINISTÉRIO until the next MINISTÉRIO:
(MINISTÉRIO)[\s\S]*?(^(?=\1))
http://regexr.com/3dk6k
I also was trying to:
(^MINISTÉRIO)[\s\S]*?(?=\1)
http://regexr.com/3dk6h
Nether is working. I have two questions: Why my regex is not working? Should be i think... And, how to fix?
Thanks!
Issue Description
The /(MINISTÉRIO)[\s\S]*?(^(?=\1))/gm matches the word MINISTÉRIO at any place in the text capturing it into Group 1. [\s\S]*? matches lazily any character, 0 or more repetitions up to a beginning of a line that is followed with the word MINISTÉRIO. Thus, if you have a "document" from some place in the string up to the end, that match won't be found as you cannot specify the $ anchor since it is redefined to match the end of a line.
Using /(^MINISTÉRIO)[\s\S]*?(?=\1)/g, you match and capture the MINISTÉRIO word at the beginning of the whole string only, and match any char as few as possible up to the first MINISTÉRIO substring in the string, at any place in the string, and there is no check for the beginning of a line.
Solution
You may use an unrolled regex like
/^MINISTÉRIO\b.*(?:\n(?!MINISTÉRIO\b).*)*/gm
The regex demo is here
When the text is too long, lazy matching like in your pattern takes too much time, and using negated character classes can greatly increase performance.
In short:
^MINISTÉRIO\b - matches MINISTÉRIO as a whole word at the start of a line:
^ - start of a line (due to /m modifier)
MINISTÉRIO\b - a whole word MINISTÉRIO as \b is a word boundary
.*(?:\n(?!MINISTÉRIO\b).*)* - matches any text that is not MINISTÉRIO at the start of a line:
.* - 0+ chars other than a newline
(?:\n(?!MINISTÉRIO\b).*)* - 0+ sequences of:
\n(?!MINISTÉRIO\b) - a newline not followed with MINISTÉRIO as a whole word
.* - 0+ chars other than a newline
It is basically the same as /^MINISTÉRIO\b(?:(?!^MINISTÉRIO\b)[\s\S])*/gm, but should be much faster as the tempered greedy token ((?:(?!^MINISTÉRIO\b)[\s\S])*) is rather resource consuming.

Grab full regex word if pattern inside it matches

How do I retrieve an entire word that has a specific portion of it that matches a regex?
For example, I have the below text.
Using ^.[\.\?\!:;,]{2,} , I match the first 3, but not the last. The last should be matched as well, but $ doesn't seem to produce anything.
a!!!!!!
n.......
c..,;,;,,
huhuhu..
I want to get all strings that have an occurrence of certain characters equal to or more than twice. I produced the aforementioned regex, but on Rubular it only matches the characters themselves, not the entire string. Using ^ and $
I've read a few stackoverflow posts similar, but not quite what I'm looking for.
Change your regex to:
/^.*[.?!:;,]{2,}/gm
i.e. match 0 more character before 2 of those special characters.
RegEx Demo
If I understand well you are trying to match an entire string that contains at least the same punctuation character two times:
^.*?([.?!:;,])\1.*
Note: if your string has newline characters, change .* to [\s\S]*
The trick is here:
([.?!:;,]) # captures the punct character in group 1
\1 # refers to the character captured in group 1

Replace function does only replace every second regex match

I would like to use regex in javascript to put a zero before every number that has exactly one digit.
When i debug the code in the chrome debugger it gives me a strange result where only every second match the zero is put.
My regex
"3-3-7-3-9-8-10-5".replace(/(\-|^)(\d)(\-|$)/g, "$10$2$3");
And the result i get from this
"03-3-07-3-09-8-10-05"
Thanks for the help
Use word boundaries,
(\b\d\b)
Replacement string:
0$1
DEMO
> "3-3-7-3-9-8-10-5".replace(/(\b\d\b)/g, "0$1")
'03-03-07-03-09-08-10-05'
Explanation:
( starting point of first Capturing group.
\b Matches between a word character and a non word character.
\d Matches a single digit.
\b Matches between a word character and a non word character.
) End of first Capturing group.
You can use this better lookahead based regex to prefix 0 before every single digit number:
"3-3-7-3-9-8-10-5".replace(/\b(\d)\b(?=-|$)/g, "0$1");
//=> "03-03-07-03-09-08-10-05"
Reason why you're getting alternate prefixes in your regex:
"3-3-7-3-9-8-10-5".replace(/(\-|^)(\d)(\-|$)/g, "$10$2$3");
is that rather than looking ahead you're actually matching hyphen after the digit. Once a hyphen has been matched it is not matched again since internal regex pointer has already moved ahead.
use a positive lookahead to see the one digit numbers :
"3-3-7-3-9-8-10-5".replace(/(?=\b\d\b)/g, "0");

what's the meaning of the below regex in javascript

data.replace(/(.*)/g, '$1')
I encountered the above in smashing nodejs, can someone quickly explain this syntax? I'm new to Regex.
. means match characters except new line.
* matches 0 or more of the preceeding token. This is a greedy match, and will match as many characters as possible before satisfying the next token.
$1 refers to the matched group.
g modifier means global, which in turn means,
"don't stop at the first match. Continue to match even after that"
Basically what it is doing is capturing every character into a group until it encounters a \n(newline) and replacing it with the same.
There is no change in this operation and you should avoid doing this.
. can be any character, except the newline character, and * quantifier means that . can be matched 0 to unlimited times. So, it matches all the characters in the data. The parenthesis around .*, group all the matched characters into a group and $1 refers to the first captured group. So, we basically match all the characters and replace that with the matched characters.
It is similar to doing
str.replace(str1, str1)
You found it in "Smashing Node.js". I tried and found it too. There is the code: data.replace(/(.*)/g, ' $1') there. Please notice the two leading spaces before $1. It makes the indentation of the whole text.
.* matches the whole line,
replaces it with " " + the same line,
repeats it until eof because g modifier is there

Categories