I want a regex matching a specific word that is not surrounded by any alphanumeric character. My thought was to include a negation before and after:
[^a-zA-Z\d]myspecificword[^a-zA-Z\d]
So it would match:
myspecificword
_myspecificword_
-myspecificword
And not match:
notmyspecificword
myspecificword123
But this simple regex won't match the word by itself unless it is preceeded by a whitespace:
myspecificword // no match
myspecificword // match
Using the flags "gmi" and testing with JavaScript. What am I doing wrong? Shouldn't it be as simple as that?
https://regex101.com/r/BCkbVQ/3
Try using:
(?<![^\s_-])myspecificword(?![^\s_-])
This says to match myspecificword when it surrounded, on both sides, by either the start/end of the input, whitespace, underscore, or dash.
Demo
It is not whitespace that is required but any symbol that is matches [^a-zA-Z\d].
You should use: (Demo)
(?:^|[^a-zA-Z\d])myspecificword(?:[^a-zA-Z\d]|$)
The main benefit is support across all Regexp parsers.
If you truly mean "not surrounded by alphanumerics other than _ (and in your attempted regex you seem to be willing to match anything that isn't a letter or digit), then any of the following should be acceptable:
'myspecificword'
'_myspecificword_'
' myspecificword '
'-myspecificword-'
'(myspecificword)'
And the regex should be:
(?<![^_\W])myspecificword(?![^_\W])
let tests = ['myspecificword',
'_myspecificword_',
' myspecificword ',
'-myspecificword-',
'(myspecificword)',
'amyspecificword',
'1myspecificword'
];
let regex = /(?<![^_\W])myspecificword(?![^_\W])/;
for (let test of tests) {
console.log(regex.test(test));
}
The "accepted" answer will not match (myspecificword), for example.
The title of this question is
Regex for word not surrounded by alphanumeric characters
The other answers have all addressed a different question (which may well be the one intended):
Regex for word neither preceded nor followed by alphanumeric characters
I will refer to these statements as #1 and #2 respectively.
If the specified word were 'cat' and the string were '9cat', 'cat' is not surrounded by alphanumeric characters in the string, so there is a match with #1, but not with #2.
For #1, one could use the regex:
/cat(?!\p{Alpha}|(?<!\p{Alnum})cat/
("match 'cat' not followed by a Unicode alphanumeric character or 'cat' not preceded by a Unicode alphanumeric character"), though it's easier to test for the negation:
/(?<=\p{Alpha}cat(?<=\p{Alnum})/
The test passes if the string does not match this regex.
With interpretation #2, the regex is:
/(?<!\p{Alpha}cat(?!\p{Alnum})/
I think this will work:
/[^a-z0-9]?myspesificword[^a-z0-9]?/i
Related
I want to split a string with regex matching spaces, commas, question marks, and exclamation points. But I'd like to include the matched punctuation in the resulting array (Spaces should be discarded.) For example:
Regex irritates me, I can't take it!
the string above should split() to:
["Regex", "irritates", "me", ",", "I", "can't", "take", "it", "!"]
I'm starting easy with just spaces and commas for now; I have the following code:
inputStr.split(/\s|(,)/);
Unfortunately, it gives me undefined items - I'm doing it wrong. I spent a couple hours researching (as usual) and coming up empty. I read about "lookahead" but can't figure it out either. Can any regex gurus give me a hand?
Try using String.prototype.match() with RegExp /(\w+'\w+)|\w+|,|\!/g
(\w+'\w+) Matches \w+'\w+ and remembers the match. These are called capturing groups. \w+'\w+ matches any alphanumeric character from the basic Latin alphabet, including the underscore. Equivalent to [A-Za-z0-9_] , followed by match apostrophe , followed by match alphanumeric character.
+ matches the preceding item \w 1 or more times. Equivalent to {1,}.
\w+ Matches any alphanumeric character from the basic Latin alphabet, including the underscore.
, Matches comma
\! Matches exclamation mark
See RegExp
var str = "Regex irritates me, I can't take it!";
var res = str.match(/(\w+'\w+)|\w+|,|\!/g);
console.log(res)
This should work
String pat="Regex irritates me, I can't take it!"
pat.split("\s");
The regex string is ([\w'!,]*)\S
Explanation:
the ( ) captures groups.
the [ \w'! ]* captures any word character, apostrophe, or exclamation
the \S will not capture the space.
Try it in regexpal.com
I want to strip everything except alphanumeric and hyphens.
so far i've got this but its not working:
String = String.replace(/^[a-zA-Z0-9-_]+$/ig,'');
any help appreciated?
If you want to remove everything except alphanum, hypen and underscore, then negate the character class, like this
String = String.replace(/[^a-zA-Z0-9-_]+/ig,'');
Also, ^ and $ anchors should not be there.
Apart from that, you have already covered both uppercase and lowercase characters in the character class itself, so i flag is not needed. So, RegEx becomes
String = String.replace(/[^a-zA-Z0-9-_]+/g,'');
There is a special character class, which matches a-zA-Z0-9_, \w. You can make use of it like this
String = String.replace(/[^\w-]+/g,'');
Since \w doesn't cover -, we included that separately.
Quoting from MDN RegExp documentation,
\w
Matches any alphanumeric character from the basic Latin alphabet, including the underscore. Equivalent to [A-Za-z0-9_].
For example, /\w/ matches 'a' in "apple," '5' in "$5.28," and '3' in "3D."
Now, I know I can calculate if a string contains particular substring.Using this:
if(str.indexOf("substr") > -1){
}
Having my substring 'GRE' I want to match for my autocomplete list:
GRE:Math
GRE-Math
But I don't want to match:
CONGRES
and I particularly need to match:
NON-WORD-CHARGRENON-WORD-CHAR
and also
GRE
What should be the perfect regex in my case?
Maybe you want to use \b word boundaries:
(\bGRE\b)
Here is the explanation
Demo: http://regex101.com/r/hJ3vL6
MD, if I understood your spec, this simple regex should work for you:
\W?GRE(?!\w)(?:\W\w+)?
But I would prefer something like [:-]?GRE(?!\w)(?:[:-]\w+)? if you are able to specify which non-word characters you are willing to allow (see explanation below).
This will match
GRE
GRE:Math
GRE-Math
but not CONGRES
Ideally though, I would like to replace the \W (non-word character) with a list of allowable characters, for instance [-:] Why? Because \W would match non-word characters you do not want, such as spaces and carriage returns. So what goes in that list is for you to decide.
How does this work?
\W? optionally matches one single non-word character as you specified. Then we match the literal GRE. Then the lookahead (?!\w) asserts that the next character cannot be a word character. Then, optionally, we match a non-word character (as per your spec) followed by any number of word characters.
Depending on where you see this appearing, you could add boundaries.
You can use the regex: /\bGRE\b/g
if(/\bGRE\b/g.test("string")){
// the string has GRE
}
else{
// it doesn't have GRE
}
\b means word boundary
I have this regexp:
(\b)(emozioni|gioia|felicità)(\b)
In a string like the one below:
emozioni emozioniamo felicità felicitàs
it should match the first and the third word. Instead it matches the first and the last. I assume it is because of the accented character. I tried this alternative:
(\b)(emozioni|gioia|felicità\s)(\b)
but it matched "felicità" only if there is an other word after it. So for being specific only if it is in this context:
emozioni emozioniamo felicità felicitàs
and not in this other:
emozioni emozioniamo felicitàs felicità
I've found an article about accented characters in French (so at the beginning of the word) here, i followed the second answer. If anyone knows a better solution it is very welcome.
A word boundary \b works only with characters that are in \w character class, i.e [0-9a-zA-Z_], thus you can't put a \b after an accentued character like à.
You can solve the problem in your case using a lookahead:
felicità(?=\s|$)
or shorter:
felicità(?!\S)
(or \W in place of \s as suggested #Sniffer, but you take the risk to match something like :felicitàà)
Try the following alternative:
\b(emozioni|gioia|felicità)(?=\W|$)
This will match any of your listed words, as long as any of those words is followed by either a non-word character \W or end-of-string $.
Regex101 Demo
Hi I have this regex.
/^[\w]|[åäöæøÅÄÖÆØ]$/
"tå" is ok but "åå" is not. Why is that? How can I make it accept words starting with åäöæøÅÄÖÆØ?
Note that the \w (and \W, \b, and \B) are English-centric. \w just means [A-Za-z0-9_], where A-Z means only the 26 English letters. Other letters are not considered part of a "word" by JavaScript's built-in character classes.
You'll need to build a character class including all of the letters you want to treat as word characters (then use the negated version of that wherever you "non-word character").
But that's not the only problem. Your regular expression says:
Match one English word character at the beginning of the string, or match one of this list of characters at the end of the string.
The | operator is fairly greedy, in this case it treats ^[\w] and [åäöæøÅÄÖÆØ]$ as the alternatives. I don't get the impression that's what you wanted.
"tå" is ok but "åå" is not.
I guess it depends on what you mean by "ok". Both match the expression:
console.log("tå".match(/^[\w]|[åäöæøÅÄÖÆØ]$/)); // ["t", index: 0, input: "tå"]
console.log("åå".match(/^[\w]|[åäöæøÅÄÖÆØ]$/)); // ["å", index: 1, input: "åå"]
"tå" matches because it matches the ^[\w] alternative. "åå" matches because it matches the [åäöæøÅÄÖÆØ]$ alternative.
How can I make it accept words starting with åäöæøÅÄÖÆØ?
If the goal is to accept only strings containing exactly one word, where "word" includes digits and the underscore (since \w does), then:
/^[A-Za-z0-9_åäöæøÅÄÖÆØ]+$/
Why do you think it fails? I would not put the \w in square brackets but various systems seem to allow that and both the following match the text being tested.
Javascript
var test = 'åå';
if (test.match(/^[\w]|[åäöæøÅÄÖÆØ]$/)) { alert("Match"); }
PHP
echo(preg_match("/^[\w]|[åäöæøÅÄÖÆØ]$/","åå")."</br>");
What are you trying to achieve here?