linkify words using regex

linkify words using regex - javascript

I'm trying to linkify hashtags using regex, most of the cases work except when there is a word with dot at the end hot., this should only linkify #hot, but at the same time #hot.hot is valid
Here is my regex code:
var text = "#hot#hot hot #hot #hot.hot #hót #hot_hot #hot, (#hot) #hot. hot";
text.replace(#([^\b#,() ]*)/g, '#$1');
output:
#hot#hot hot #hot #hot.hot #hót #hot_hot #hot, (#hot) #hot. hot
the only issue is #hot. should linkify only #hot at the same time #hot.hot is valid

Your regex is fine, but you have to add a word boundary at the end:
#([^\b#,() ]*)\b
^-------- Here
Working demo

Give this regex a try instead:
/#([^\W]+)/g
\w matches only letters, numbers, and underscores. So its opposite, \W, matches everything that's not a letter, number, or underscore. Put that \W in a negated character class ([^\W]) and you get the desired result which can still match the accented characters.

Related

Regex including all special characters except space

I have a regex which checks all the special characters except space but that looks weird and too long.
const specialCharsRegex = new RegExp(/#|#|\$|!|%|&|\^|\*|-|\+|_|=|{|}|\[|\]|\(|\)|~|`|\.|\?|\<|\>|,|\/|:|;|"|'|\\/).
This looks too long and if i use regex (\W) it also includes the space.
Is there is any way i can achieve this?

Well you could use:
[^\w ]
This matches non word characters except for space. You may blacklist anything else you also might want by adding it to the above character class.

To match anything that is not a word character nor a whitespace character (cr, lf, ff, space, tab)
const specialCharsRegex = new RegExp(/[^\w\s]+|_+/, 'g');
See this demo at regex101 or a JS replace demo at tio.run (used g flag for all occurrences)
The underscore belongs to word characters [A-Za-z0-9_] and needs to be matched separately.

Try this using a-A-0-9/a-z/A-Z
Pattern regex = Pattern.compile("[^A-Za-z0-9]");

Ignore newlines in a regex that doesn't care about order

I have a regex here at scriptular.com
/(?=.*net)(?=.*income)(?=.*total)(?=.*depreciation)/i
How do I make the regex successfully match the string?
Without the newline characters in the string, the regex would succeed. I could remove them... but I'd rather not.

1.) The dot matches any character besides newline. It won't skip over newlines if the desired words would match in lines after the first one. In many regex flavors there is the dotall or single line s-flag available for making the dot also match newlines but unfortunately not in JS Regex.
Workarounds are to use a character class that contains any character. Such as [\s\S] any whitespace character \s together with any non whitespace \S or [\w\W] for any word character together with any non word character or even [^] for not nothing instead of the dot.
2.) Anchor the lookaheads to ^ start of string as it's not wanted to repeat the lookaheads at any position in the string. This will drastically improve performance.
3.) Use lazy matching for being satisfied with first match of each word.
/^(?=[\s\S]*?net)(?=[\s\S]*?income)(?=[\s\S]*?total)(?=[\s\S]*?depreciation)/i
See demo at regex101 (dunno why this doesn't work in your demo tool)
Additionally you can use \b word boundaries around the words for making sure such as net won't be matched in brunet, network... so the regex becomes ^(?=[\s\S]*?\bnet\b)...

regex to match alphanumeric and hyphen only, strip everything else in javascript

I want to strip everything except alphanumeric and hyphens.
so far i've got this but its not working:
String = String.replace(/^[a-zA-Z0-9-_]+$/ig,'');
any help appreciated?

If you want to remove everything except alphanum, hypen and underscore, then negate the character class, like this
String = String.replace(/[^a-zA-Z0-9-_]+/ig,'');
Also, ^ and $ anchors should not be there.
Apart from that, you have already covered both uppercase and lowercase characters in the character class itself, so i flag is not needed. So, RegEx becomes
String = String.replace(/[^a-zA-Z0-9-_]+/g,'');
There is a special character class, which matches a-zA-Z0-9_, \w. You can make use of it like this
String = String.replace(/[^\w-]+/g,'');
Since \w doesn't cover -, we included that separately.
Quoting from MDN RegExp documentation,
\w
Matches any alphanumeric character from the basic Latin alphabet, including the underscore. Equivalent to [A-Za-z0-9_].
For example, /\w/ matches 'a' in "apple," '5' in "$5.28," and '3' in "3D."

Accented characters and regular expression

I have this regexp:
(\b)(emozioni|gioia|felicità)(\b)
In a string like the one below:
emozioni emozioniamo felicità felicitàs
it should match the first and the third word. Instead it matches the first and the last. I assume it is because of the accented character. I tried this alternative:
(\b)(emozioni|gioia|felicità\s)(\b)
but it matched "felicità" only if there is an other word after it. So for being specific only if it is in this context:
emozioni emozioniamo felicità felicitàs
and not in this other:
emozioni emozioniamo felicitàs felicità
I've found an article about accented characters in French (so at the beginning of the word) here, i followed the second answer. If anyone knows a better solution it is very welcome.

A word boundary \b works only with characters that are in \w character class, i.e [0-9a-zA-Z_], thus you can't put a \b after an accentued character like à.
You can solve the problem in your case using a lookahead:
felicità(?=\s|$)
or shorter:
felicità(?!\S)
(or \W in place of \s as suggested #Sniffer, but you take the risk to match something like :felicitàà)

Try the following alternative:
\b(emozioni|gioia|felicità)(?=\W|$)
This will match any of your listed words, as long as any of those words is followed by either a non-word character \W or end-of-string $.
Regex101 Demo

Regex to match words unless they have a character before them

I have this regex that detects hashtags. It shouldn't match things with letters before them, so we've got a space character at the beginning of the regex:
/( #[a-zA-Z_]+)/gm
The issue is it no longer matches words at the beginning of sentences. How can I modify this regex so that instead of matching with spaces, it simply DOESN'T match things with letters before them.
Thanks!

Use \b at the start to indicate a word boundary.
\b won't work, since # isn't a word starter.
Just check for the start of the string or a space before: (?:^|\s)(\#[a-zA-Z_]+)
Also, make sure you escape the #, so it doesn't get interpreted as a comment.

Without lookbehind:
pattern = /(?:^|[^a-zA-Z])#[a-zA-Z]+/
With lookbehind (but not allowed in Javascript):
pattern = "(?:^|(?<![a-zA-Z]))#[a-zA-Z]+"

We Keep Coding

JavaScript is the programming language of the Web.

linkify words using regex - javascript

Your regex is fine, but you have to add a word boundary at the end: #([^\b#,() ]*)\b ^-------- Here Working demo

Related

Regex including all special characters except space

Ignore newlines in a regex that doesn't care about order

regex to match alphanumeric and hyphen only, strip everything else in javascript

Accented characters and regular expression

Regex to match words unless they have a character before them

Categories

Resources