Now, I know I can calculate if a string contains particular substring.Using this:
if(str.indexOf("substr") > -1){
}
Having my substring 'GRE' I want to match for my autocomplete list:
GRE:Math
GRE-Math
But I don't want to match:
CONGRES
and I particularly need to match:
NON-WORD-CHARGRENON-WORD-CHAR
and also
GRE
What should be the perfect regex in my case?
Maybe you want to use \b word boundaries:
(\bGRE\b)
Here is the explanation
Demo: http://regex101.com/r/hJ3vL6
MD, if I understood your spec, this simple regex should work for you:
\W?GRE(?!\w)(?:\W\w+)?
But I would prefer something like [:-]?GRE(?!\w)(?:[:-]\w+)? if you are able to specify which non-word characters you are willing to allow (see explanation below).
This will match
GRE
GRE:Math
GRE-Math
but not CONGRES
Ideally though, I would like to replace the \W (non-word character) with a list of allowable characters, for instance [-:] Why? Because \W would match non-word characters you do not want, such as spaces and carriage returns. So what goes in that list is for you to decide.
How does this work?
\W? optionally matches one single non-word character as you specified. Then we match the literal GRE. Then the lookahead (?!\w) asserts that the next character cannot be a word character. Then, optionally, we match a non-word character (as per your spec) followed by any number of word characters.
Depending on where you see this appearing, you could add boundaries.
You can use the regex: /\bGRE\b/g
if(/\bGRE\b/g.test("string")){
// the string has GRE
}
else{
// it doesn't have GRE
}
\b means word boundary
Related
I'm building on a regular expression I found that works well for my use case. The purpose is to check for what I consider valid hashtags (I know there's a ton of hashtag regex posts on SO but this question is specific).
Here's the regex I'm using
/(^|\B)#(?![0-9_]+\b)([a-zA-Z0-9_]{1,20})(\b|\r)/g
The only problem I'm having is I can't figure out how to check if the second character is a-z (the first character would be the hashtag). I only want the first character after the hashtag to be a-z or A-Z. No numbers or non-alphanumeric.
Any help much appreciated, I'm very novice when it comes to regular expressions.
As I mentioned in the comments, you can replace [a-zA-Z0-9_]{1,20} with [a-zA-Z][a-zA-Z0-9_]{0,19} so that the first character is guaranteed to be a letter and then followed by 0 to 19 word characters (alphanumeric or underscore).
However, there are other unnecessary parts in your pattern. It appears that all you need is something like this:
/(?:^|\B)#[a-zA-Z][a-zA-Z0-9_]{0,19}\b/g
Demo.
Breakdown of (?:^|\B):
(?: # Start of a non-capturing group (don't use a capturing group unless needed).
^ # Beginning of the string/line.
| # Alternation (OR).
\B # The opposite of `\b`. In other words, it makes sure that
# the `#` is not preceded by a word character.
) # End of the non-capturing group.
Note: You may also replace [a-zA-Z0-9_] with \w.
References:
Word Boundaries.
Difference between \b and \B in regex.
The below should work.
(^|\B)#(?![0-9_]+\b)([a-zA-Z][a-zA-Z0-9_]{0,19})(\b|\r)
If you only want to accept two or more letter hashtags then change {0,19} with {1,19}.
You can test it here
In your pattern you use (?![0-9_]+\b) which asserts that what is directly on the right is not a digit or an underscore and can match a lot of other characters as well besides an upper or lower case a-z.
If you want you can use this part [a-zA-Z0-9_]{1,20} but then you have to use a positive lookahead instead (?=[a-zA-Z]) to assert what is directly to the right is an upper or lower case a-z.
(?:^|\B)#(?=[a-zA-Z])[a-zA-Z0-9_]{1,20}\b
Regex demo
I have looked through previous questions and answers, however they do not solve the following:
https://stackoverflow.com/questions/ask#notHashTag
The closest I got to is this: (^#|(?:\s)#)(\w+), which finds the hashtag in half the necessary cases and also includes the leading space in the returned text. Here are all the cases that need to be matched:
#hashtag
a #hashtag
a #hashtag world
cool.#hashtag
##hashtag, but only until the comma and starting at second hash
#hashtag#hashtag two separate matches
And these should be skipped:
https://stackoverflow.com/questions/ask#notHashTag
Word#notHashTag
#ab is too short to be a hashtag, 3 characters minimum
This should work for everything but #hashtag#duplicates, and because JS doesn't support lookbehind, that's probably not possible to match that by itself.
\B#\w{3,}
\B is designed to match only between two word characters or two non-word characters. Since # is a non-word character, this forces the match to be preceded by a space or punctuation, or the beginning of the string.
Try this regex:
(?:^|[\s.])(#+\w{3,})(#+\w{3,})?
Online Demo: http://regex101.com/r/kG1nD5
I have this regexp:
(\b)(emozioni|gioia|felicità)(\b)
In a string like the one below:
emozioni emozioniamo felicità felicitàs
it should match the first and the third word. Instead it matches the first and the last. I assume it is because of the accented character. I tried this alternative:
(\b)(emozioni|gioia|felicità\s)(\b)
but it matched "felicità" only if there is an other word after it. So for being specific only if it is in this context:
emozioni emozioniamo felicità felicitàs
and not in this other:
emozioni emozioniamo felicitàs felicità
I've found an article about accented characters in French (so at the beginning of the word) here, i followed the second answer. If anyone knows a better solution it is very welcome.
A word boundary \b works only with characters that are in \w character class, i.e [0-9a-zA-Z_], thus you can't put a \b after an accentued character like à.
You can solve the problem in your case using a lookahead:
felicità(?=\s|$)
or shorter:
felicità(?!\S)
(or \W in place of \s as suggested #Sniffer, but you take the risk to match something like :felicitàà)
Try the following alternative:
\b(emozioni|gioia|felicità)(?=\W|$)
This will match any of your listed words, as long as any of those words is followed by either a non-word character \W or end-of-string $.
Regex101 Demo
Hi I have this regex.
/^[\w]|[åäöæøÅÄÖÆØ]$/
"tå" is ok but "åå" is not. Why is that? How can I make it accept words starting with åäöæøÅÄÖÆØ?
Note that the \w (and \W, \b, and \B) are English-centric. \w just means [A-Za-z0-9_], where A-Z means only the 26 English letters. Other letters are not considered part of a "word" by JavaScript's built-in character classes.
You'll need to build a character class including all of the letters you want to treat as word characters (then use the negated version of that wherever you "non-word character").
But that's not the only problem. Your regular expression says:
Match one English word character at the beginning of the string, or match one of this list of characters at the end of the string.
The | operator is fairly greedy, in this case it treats ^[\w] and [åäöæøÅÄÖÆØ]$ as the alternatives. I don't get the impression that's what you wanted.
"tå" is ok but "åå" is not.
I guess it depends on what you mean by "ok". Both match the expression:
console.log("tå".match(/^[\w]|[åäöæøÅÄÖÆØ]$/)); // ["t", index: 0, input: "tå"]
console.log("åå".match(/^[\w]|[åäöæøÅÄÖÆØ]$/)); // ["å", index: 1, input: "åå"]
"tå" matches because it matches the ^[\w] alternative. "åå" matches because it matches the [åäöæøÅÄÖÆØ]$ alternative.
How can I make it accept words starting with åäöæøÅÄÖÆØ?
If the goal is to accept only strings containing exactly one word, where "word" includes digits and the underscore (since \w does), then:
/^[A-Za-z0-9_åäöæøÅÄÖÆØ]+$/
Why do you think it fails? I would not put the \w in square brackets but various systems seem to allow that and both the following match the text being tested.
Javascript
var test = 'åå';
if (test.match(/^[\w]|[åäöæøÅÄÖÆØ]$/)) { alert("Match"); }
PHP
echo(preg_match("/^[\w]|[åäöæøÅÄÖÆØ]$/","åå")."</br>");
What are you trying to achieve here?
I have this regex that detects hashtags. It shouldn't match things with letters before them, so we've got a space character at the beginning of the regex:
/( #[a-zA-Z_]+)/gm
The issue is it no longer matches words at the beginning of sentences. How can I modify this regex so that instead of matching with spaces, it simply DOESN'T match things with letters before them.
Thanks!
Use \b at the start to indicate a word boundary.
\b won't work, since # isn't a word starter.
Just check for the start of the string or a space before: (?:^|\s)(\#[a-zA-Z_]+)
Also, make sure you escape the #, so it doesn't get interpreted as a comment.
Without lookbehind:
pattern = /(?:^|[^a-zA-Z])#[a-zA-Z]+/
With lookbehind (but not allowed in Javascript):
pattern = "(?:^|(?<![a-zA-Z]))#[a-zA-Z]+"