Regex, replace all words starting with # - javascript

I have this regular expression that puts all the words that starts with # into span tags.
I've accomplished what is needed but i'm not sure that i completely understand what i did here.
content.replace(/(#\S+)/gi,"<span>$1</span>")
The () means to match a whole word, right?
The # means start with #.
The \S means "followed by anything until a whitespaces" .
But how come that if don't add the + sign after the \S , it matches only the first letter?
Any input would be appreciated .

\S is any non-whitespace character and a+ means one or more of a. So
#\S -> An # followed by one non-whitespace character.
#\S+ -> An # followed by one or more non-whitespace characters

Sharing code to change hashtags into links
var p = $("p");
var string = p.text();
p.html(string.replace(/#(\S+)/gi,'#$1'));
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<p>Test you code here #abc #123 #xyz</p>

content.replace(/(#\S+)/gi,"<span>$1</span>")
(#\S+) is a capturing group which captures # followed by 1 or more (+ means 1 or more) non-whitespace characters (\S is a non-whitespace character)
g means global, ie replace all instances, not just the first match
i means case insensitive
$1 fetches what was captured by the first capturing group.
So, the i is unnecessary, but won't affect anything.

/(#\S+)gi/
1st Capturing group (#\S+)
# matches the character # literally
\S+ match any non-white space character [^\r\n\t\f ]
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
g - all the matches not just first
i - case insensitive match

The \S means "followed by anything until a whitespaces" .
That's not what \S means. It's "any character that's not a whitespace", that is, one character that's not a whitespace.

Related

Regex get entire word starting with #

basically im trying to create a regex pattern that get every word that starts with an #
For example :
#Server1:IP:Name Just a few words more
the pattern should find "#Server1:IP:Name"
Ive created a regex pattern that worked so far :
/#\w+/
The problem is everything after a colon wont get matched anymore. If i use this regex i get this as a result for example :
#Server1
how do i make sure it will get the entire word starting with an # and ignoring colons in it?
it is works fine try it:
#\w\S+
https://regex101.com/
\w matches any word character (equal to [a-zA-Z0-9_])
matches any non-whitespace character
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
You can use this
#[\w\s:]+
\w matches any word character (equal to [a-zA-Z0-9_])
\s matches any whitespace character (equal to [\r\n\t\f\v ])
: matches the character : literally (case sensitive)
If your string contains any (!##$%^&*()_+.) you could add them too.
try this #\S+
It gives you everything between "#" and the next space.
\S matches any non-whitespace character.
refer this

Regex to match words with hyphens and/or apostrophes

I was looking for a regex to match words with hyphens and/or apostrophes. So far, I have:
(\w+([-'])(\w+)?[']?(\w+))
and that works most of the time, though if there's a apostrophe and then a hyphen, like "qu'est-ce", it doesn't match. I could append more optionals, though perhaps there's another more efficient way?
Some examples of what I'm trying to match: Mary's, High-school, 'tis, Chambers', Qu'est-ce.
use this pattern
(?=\S*['-])([a-zA-Z'-]+)
Demo
(?= # Look-Ahead
\S # <not a whitespace character>
* # (zero or more)(greedy)
['-] # Character in ['-] Character Class
) # End of Look-Ahead
( # Capturing Group (1)
[a-zA-Z'-] # Character in [a-zA-Z'-] Character Class
+ # (one or more)(greedy)
) # End of Capturing Group (1)
[\w'-]+ would match pretty much any occurrence of words with (or without) hyphens and apostrophes, but also in cases where those characters are adjacent.
(?:\w|['-]\w)+ should match cases where the characters can't be adjacent.
If you need to be sure that the word contains hyphens and/or apostrophes and that those characters aren't adjacent maybe try \w*(?:['-](?!['-])\w*)+. But that would also match ' and - alone.
debuggex.com is a great resource for visualizing these sorts of things
\b\w*[-']\w*\b should do the trick
The problem you're running into is that you actually have three possible sub-patterns: one or more chars, an apostrophe followed by one or more chars, and a hyphen followed by one or more chars.
This presumes you don't wish to accept words that begin or end with apostrophes or hyphens or have hyphens next to apostrophes (or vice versa).
I believe the best way to represent this in a RegExp would be:
/\b[a-z]+(?:['-]?[a-z]+)*\b/
which is described as:
\b # word-break
[a-z]+ # one or more
(?: # start non-matching group
['-]? # zero or one
[a-z]+ # one or more
)* # end of non-matching group, zero or more
\b # word-break
which will match any word that begins and ends with an alpha and can contain zero or more groups of either a apos or a hyphen followed by one or more alpha.
How about: \'?\w+([-']\w+)*\'?
demo
I suppose these words shouldn't be matched:
something- or -something: start or end with -
some--thing or some'-thing: - not followed by a character
some'': two hyphens
This worked for me:
([a-zA-Z]+'?-?[a-zA-Z]+(-?[a-zA-Z])?)|[a-zA-Z]
Use
([\w]+[']*[\w]*)|([']*[\w]+)
It will properly parse
"You've and we i've it' '98"
(supports ' in any place in the word but single ' is ignored).
If needed \w could be replaced with [a-zA-Z] etc.

Regarding JavaScript RegEx - Replace all punctuation including underscore

In javascript, how can I replace all punctuation (including underscore) marks with hyphen? Moreover, it should not contain more than one hyphen sequentially.
I tried "h....e l___l^^0".replace(/[^\w]/g, "-") but it gives me h----e---l___l--0
What should I do so that it returns me h-e-l-l-0 instead?
+ repeats the previous token one or more times.
> "h....e l___l^^0".replace(/[\W_]+/g, "-")
'h-e-l-l-0'
[\W_]+ matches non-word characters or _ one or more times.
All you need to do is to add an quatifier + to regex
"h....e l___l^^0".replace(/[^a-zA-Z0-9]+/g, "-")
NOTE
instead of [^\w] give [^a-zA-Z0-9]+ because \w contains _ hence it wont be replaced if you give [^\w]
Regex101
[!"#$%&'()*+,\-.\/:;<=>?#[\\\]^_`{|}~ ]+
Description
[!"#$%&'()*+,\-.\/:;<=>?#[\\\]^_`{|}~ ]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
!"#$%&'()*+, a single character in the list !"#$%&'()*+, literally (case sensitive)
\- matches the character - literally
. the literal character .
\/ matches the character / literally
:;<=>?#[ a single character in the list :;<=>?#[ literally (case sensitive)
\\ matches the character \ literally
\] matches the character ] literally
^_`{|}~ a single character in the list ^_`{|}~ literally
g modifier: global. All matches (don't return on first match)
JS
alert("h....e l___l^^0".replace(/[!"#$%&'()*+,\-.\/ :;<=>?#[\\\]^_`{|}~]+/g, "-"));
Result:
h-e-l-l-0

Regex match string until whitespace Javascript

I want to be able to match the following examples:
www.example.com
http://example.com
https://example.com
I have the following regex which does NOT match www. but will match http:// https://. I need to match any prefix in the examples above and up until the next white space thus the entire URL.
var regx = ((\s))(http?:\/\/)|(https?:\/\/)|(www\.)(?=\s{1});
Lets say I have a string that looks like the following:
I have found a lot of help off www.stackoverflow.com and the people on there!
I want to run the matching on that string and get
www.stackoverflow.com
Thanks!
You can try
(?:www|https?)[^\s]+
Here is online demo
sample code:
var str="I have found a lot of help off www.stackoverflow.com and the people on there!";
var found=str.match(/(?:www|https?)[^\s]+/gi);
alert(found);
Pattern explanation:
(?: group, but do not capture:
www 'www'
| OR
http 'http'
s? 's' (optional)
) end of grouping
[^\s]+ any character except: whitespace
(\n, \r, \t, \f, and " ") (1 or more times)
You have an error in your regex.
Use this:
((\s))(http?:\/\/)|(https?:\/\/)|(www\.)(?!\s{1})
^--- Change to negative lookaround
Btw, I think you can use:
(?:(http?:\/\/)|(https?:\/\/)|(www\.))(?!\s{1})
MATCH 1
3. [0-4] `www.`
MATCH 2
1. [16-23] `http://`
MATCH 3
2. [35-43] `https://`
Not quite sure what you're trying to do, but this should match any group of non-space characters not immediately preceded with "www." case insensitive.
/(https?:\/\/)?(?<!(www\.))[^\s]*/i
... [edit] but you did want to match www.
/(https?:\/\/)?([^\s\.]{2,}\.?)+/i
First things first, to match any whitespace char, use \S construct (in POSIX, you would use [^[:space:]], but JavaScript regex is not POSIX compliant). Here are some common patterns with \S:
\S* - zero or more non-whitespace chars
\S+ - one or more non-whitespace chars
Matching any text until first whitespace can mean match any zero or more chars other than whitespace, so, the answer to the current OP problem is
(?:www|https?)\S*
// ^^^
See the regex demo. This pattern will match up to the first whitespace or end of string. If there must be a whitespace char on the right use
(?:www|https?)\S*(?=\s)
The (?=\s) positive lookahead requires a whitespace immediately to the right of the current location.
Whenver there is a need to match until last whitespace you could match any zero or more chars that are followed with a whitespace, \s, pattern:
/(?:www|https?)[\w\W]*(?=\s)/
/(?:www|https?)[^]*(?=\s)/
// Or even (for ECMAScript 2018+):
/(?:www|https?).*(?=\s)/s
The [\w\W], [^] and . with s flag match any char including line break chars.

what's the meaning of the below regex in javascript

data.replace(/(.*)/g, '$1')
I encountered the above in smashing nodejs, can someone quickly explain this syntax? I'm new to Regex.
. means match characters except new line.
* matches 0 or more of the preceeding token. This is a greedy match, and will match as many characters as possible before satisfying the next token.
$1 refers to the matched group.
g modifier means global, which in turn means,
"don't stop at the first match. Continue to match even after that"
Basically what it is doing is capturing every character into a group until it encounters a \n(newline) and replacing it with the same.
There is no change in this operation and you should avoid doing this.
. can be any character, except the newline character, and * quantifier means that . can be matched 0 to unlimited times. So, it matches all the characters in the data. The parenthesis around .*, group all the matched characters into a group and $1 refers to the first captured group. So, we basically match all the characters and replace that with the matched characters.
It is similar to doing
str.replace(str1, str1)
You found it in "Smashing Node.js". I tried and found it too. There is the code: data.replace(/(.*)/g, ' $1') there. Please notice the two leading spaces before $1. It makes the indentation of the whole text.
.* matches the whole line,
replaces it with " " + the same line,
repeats it until eof because g modifier is there

Categories