Regex including all special characters except space - javascript

I have a regex which checks all the special characters except space but that looks weird and too long.
const specialCharsRegex = new RegExp(/#|#|\$|!|%|&|\^|\*|-|\+|_|=|{|}|\[|\]|\(|\)|~|`|\.|\?|\<|\>|,|\/|:|;|"|'|\\/).
This looks too long and if i use regex (\W) it also includes the space.
Is there is any way i can achieve this?

Well you could use:
[^\w ]
This matches non word characters except for space. You may blacklist anything else you also might want by adding it to the above character class.

To match anything that is not a word character nor a whitespace character (cr, lf, ff, space, tab)
const specialCharsRegex = new RegExp(/[^\w\s]+|_+/, 'g');
See this demo at regex101 or a JS replace demo at tio.run (used g flag for all occurrences)
The underscore belongs to word characters [A-Za-z0-9_] and needs to be matched separately.

Try this using a-A-0-9/a-z/A-Z
Pattern regex = Pattern.compile("[^A-Za-z0-9]");

Related

Regex catch from the hash sign "#" to the next white space

I have a script line this :
#type1 this is the text of the note
I've tried this bu didn't workout for me :
^\#([^\s]+)
I watch to catch type in other words I to get whats between the hash sign "#" and the next white space, excluding the hash "#" sign, and the string that I want to catch is alphanumeric string.
With the regex functionality provided by Javascript:
exec_result = /#(\w*)/.exec('#whatever string comes here');
I believe exec_result[1] should be the string you want.
The return value of exec() method could be found over here:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec
You're really close:
/^\#(\w+)\s/
The \w matches any letters or numbers (and underscores too). And the space should be outside the matching group since I guess you don't want to capture it.
To get an alphanumeric match (which will get you type1), instead of the negated character class [^\s] which matches not a whitespace character, you could use a character class and specify what you want to match like [A-Za-z0-9].
Then use a negative lookahead to assert what is on the right is not a non-whitespace char:
^#([A-Za-z0-9]+)(?!\S)
Regex demo
Your match is in the first capturing group. Note that you don't have to escape the \#
For example using the case insensitive flag /i
const regex = /^#([A-Za-z0-9]+)(?!\S)/i;
const str = `#type1 this is the text of the note`;
console.log(str.match(regex)[1]);
If you only want to match type, you might use:
^#([a-z]+)[a-z0-9]*(?!\S)
Regex demo
const regex = /^#([a-z]+)[a-z0-9]*(?!\S)/i;
const str = `#type1 this is the text of the note`;
console.log(str.match(regex)[1]);
I've figured it out.
/^\#([^\s]+)+(.*)$/

linkify words using regex

I'm trying to linkify hashtags using regex, most of the cases work except when there is a word with dot at the end hot., this should only linkify #hot, but at the same time #hot.hot is valid
Here is my regex code:
var text = "#hot#hot hot #hot #hot.hot #hót #hot_hot #hot, (#hot) #hot. hot";
text.replace(#([^\b#,() ]*)/g, '#$1');
output:
#hot#hot hot #hot #hot.hot #hót #hot_hot #hot, (#hot) #hot. hot
the only issue is #hot. should linkify only #hot at the same time #hot.hot is valid
Your regex is fine, but you have to add a word boundary at the end:
#([^\b#,() ]*)\b
^-------- Here
Working demo
Give this regex a try instead:
/#([^\W]+)/g
\w matches only letters, numbers, and underscores. So its opposite, \W, matches everything that's not a letter, number, or underscore. Put that \W in a negated character class ([^\W]) and you get the desired result which can still match the accented characters.

Regex-Groups in Javascript

I have a problem using a Javascript-Regexp.
This is a very simplified regexp, which demonstrates my Problem:
(?:\s(\+\d\w*))|(\w+)
This regex should only match strings, that doesn't contain forbidden characters (everything that is no word-character).
The only exception is the Symbol +
A match is allowed to start with this symbol, if [0-9] is trailing.
And a + must not appear within words (44+44 is not a valid match, but +4ad is)
In order to allow the + only at the beginning, I said that there must be a whitespace preceding. However, I don't want the whitespace to be part of the match.
I tested my regex with this tool: http://regex101.com/#javascript and the resultig matches look fine.
There are 2 Issues with that regexp:
If I use it in my JS-Code, the space is always part of the match
If +42 appears at the beginning of a line, it won't be matched
My Questions:
How should the regex look like?
Why does this regex add the space to the matches?
Here's my JS-Code:
var input = "+5ad6 +5ad6 sd asd+as +we";
var regexp = /(?:\s(\+\d\w*))|(\w+)/g;
var tokens = input.match(regexp);
console.log(tokens);
How should the regex look like?
You've got multiple choices to reach your goal:
It's fine as you have it. You might allow the string beginning in place of the whitespace as well, though. Just get the capturing groups (tokens[1], tokens[2]) out of it, which will not include the whitespace.
If you didn't use JavaScript, a lookbehind could help. Unfortunately it's not supported.
Require a non-word-boundary before the +, which would make every \w character before the + prevent the match:
/\B\+\d\w+|\w+/
Why does this regex add the space to the matches?
Because the regex does match the whitespace. It does not add the \s(\+\d\w+) to the captured groups, though.

Regex to match words unless they have a character before them

I have this regex that detects hashtags. It shouldn't match things with letters before them, so we've got a space character at the beginning of the regex:
/( #[a-zA-Z_]+)/gm
The issue is it no longer matches words at the beginning of sentences. How can I modify this regex so that instead of matching with spaces, it simply DOESN'T match things with letters before them.
Thanks!
Use \b at the start to indicate a word boundary.
\b won't work, since # isn't a word starter.
Just check for the start of the string or a space before: (?:^|\s)(\#[a-zA-Z_]+)
Also, make sure you escape the #, so it doesn't get interpreted as a comment.
Without lookbehind:
pattern = /(?:^|[^a-zA-Z])#[a-zA-Z]+/​​​​​​​​​​​​​​​​​​
With lookbehind (but not allowed in Javascript):
pattern = "(?:^|(?<![a-zA-Z]))#[a-zA-Z]+"

regex and javascript

using http://www.regular-expressions.info/javascriptexample.html I tested the following regex
^\\{1}([0-9])+
this is designed to match a backslash and then a number.
It works there
If I then try this directly in code
var reg = /^\\{1}([0-9])+/;
reg.exec("/123")
I get no matches!
What am I doing wrong?
Update:
Regarding the update of your question. Then the regex has to be:
var reg = /^\/(\d+)/;
You have to escape the slash inside the regex with \/.
The backslash needs to be escaped in the string too:
reg.exec("\\123")
Otherwise \1 will be treated as special character.
Btw, the regular expression can be simplified:
var reg = /^\\(\d+)/;
Note that I moved the quantifier + inside the capture group, otherwise it will only capture a single digit (namely 3) and not the whole number 123.
You need to escape the backslash in your string:
"\\123"
Also, for various implementation bugs, you may want to set reg.lastIndex = 0;.
In addition, {1} is completely redundant, you can simplify your regex to /^\\(\d)+/.
One last note: (\d)+ will only capture the last digit, you may want (\d+).

Categories