Why does the "." get not caught in the regex? - javascript

I want to have a regular epxresion, that allows that checks wether the email adress given is correct. Firstly, it will check if a specific provider is there, in this case (#test.de) - this is not the problem. However the email names that are allowed must consist only of letters or dots. so: .#test.de is valid. However this specific case does not get accepted. My regex looks like the following:
[A-Za-z\.]{1,}\b#test\.de\b
It works fine, for all other cases but if a "." is only in front of the #it does not fit.
Any pointers what I am doing wrong?

The first word boundary \b in your pattern requires that there must be a word char before #. Thus, a dot cannot appear there, the match is failed.
You need to remove the word boundary, use
[A-Za-z.]+#test\.de\b
Note you do not need to escape a dot inside a character class, it already denotes a literal dot.
If you still want to match "whole" words after removing \b, you might use lookbehinds (if the regex engine supports them):
(?<!\w)[A-Za-z.]+#test\.de\b
or to only match after whitespace/start of string:
(?<!\S)[A-Za-z.]+#test\.de\b
Or just use a word boundary if the name starts with a letter, and a non-word boundary if it starts with a dot:
(?:\b[A-Za-z]|\B\.)[A-Za-z.]*#test\.de\b
See this demo

Related

javascript regular expression allow name with one space and special Alphabets

how to write regular expression allow name with one space and special Alphabets?
I tried with this [a-zA-Z]+(?:(?:\. |[' ])[a-zA-Z]+)* but not working for me,
example string Björk Guðmundsdóttir
You may try something along these lines:
^(?!.*[ ].*[ ])[ A-Za-zÀ-ÖØ-öø-ÿ]+$
The first negative lookahead asserts that we do not find two spaces in the name. This implies that at most one space is present (or no spaces at all). Then, we match any number of alphabets, with most accented letters included. Spaces can also be matched, but the lookahead would already ensure that at most one space can be present.
Demo
Use this one:
[a-zA-Z\u00C0-\u00ff]*[ ]{1}[a-zA-Z\u00C0-\u00ff]*
Answer from other question

Regex for property paths

I am trying to match property-syntax with a Javascript regex. Is there a reliable way to do this? I would need to match a string like the following-
someobject.somekey.somechildkey.somegrandchildkey
I don't need the path members, I just need to know if a string contains a path. For example, given a string like this
This is some long string that contains a property.path.syntax, and I need to test it.
Try this:
/\b(?:\S+?\.)+\S+\b/g
Demo
This is bounded by two word boundaries, which should work in most cases (a word character next to a non-word character). Then we lazily repeat 1+ non-whitespace character followed by a . (which needs to be escaped). We use \S for non-whitespace, because like #TJCrowder said, properties can contain many characters. There always has to be another set of non-whitespace characters after the last period.
Working within the limits you've identified in the comments:
/(?:[a-zA-Z_$]+[\w$]*)(?:\.[a-zA-Z_$]+[\w$]*)+/g
Live Copy with details (The g flag if you need to do this repeated.)
That says:
Anything starting with a-z, A-Z, _, or $ (emphasizing again this is an incomplete list)
...followed by any number of those plus digits
Followed by one or more non-capturing groups of the same thing, but starting with a .
Or if you need it not to match one.that and should.not in:
blah one.that.1should.not blah
Then:
/(?:\s|^)((?:[a-zA-Z_$]+[\w$]*)(?:\.[a-zA-Z_$]+[\w$]*)+)(?:\s|$)/g
Live Copy
That says the same thing as the one earlier, but plus:
Tequires whitespace or beginning-of-input to start with ((?:\s|^)) and whitespace-or-end-of-input at the end ((?:\s|$)).
Uses a capture group so you can get just the property path without the optional whitespace on either side of it
Just to recap, the valid list of JavaScript identifier characters is very large, much larger than \w (which is [a-zA-Z0-9_]). It's not like some languages that only allow those characters. All sorts of normal-to-large-numbers-of-people characters are allowed, such as ç, ö, ñ (and arabic, and Japanese, and Chinese, and ...). And there are basically no limits on property names (e.g., if you exprss them as strings), only property name literals. More: http://ecma-international.org/ecma-262/5.1/#sec-7.6
var expr = /[a-zA-Z_]([a-zA-Z0-9_]*\.[a-zA-Z_][a-zA-Z0-9_]*)+/i;
expr.test("your.test.case");
The above regexp:
doesn't match .test
doesn't match test.
doesn't match test
doesn't match 0test, because it cannot be a Javascript property (you cannot start the name of a variable with a number)
EDIT: as suggested by Paulchenkiller, and also considering the i at the end stands by "case insensitive", you can also use the following shorter form:
var expr = /[a-z_](\w*\.[a-z_]\w*)+/i;

Regex-Groups in Javascript

I have a problem using a Javascript-Regexp.
This is a very simplified regexp, which demonstrates my Problem:
(?:\s(\+\d\w*))|(\w+)
This regex should only match strings, that doesn't contain forbidden characters (everything that is no word-character).
The only exception is the Symbol +
A match is allowed to start with this symbol, if [0-9] is trailing.
And a + must not appear within words (44+44 is not a valid match, but +4ad is)
In order to allow the + only at the beginning, I said that there must be a whitespace preceding. However, I don't want the whitespace to be part of the match.
I tested my regex with this tool: http://regex101.com/#javascript and the resultig matches look fine.
There are 2 Issues with that regexp:
If I use it in my JS-Code, the space is always part of the match
If +42 appears at the beginning of a line, it won't be matched
My Questions:
How should the regex look like?
Why does this regex add the space to the matches?
Here's my JS-Code:
var input = "+5ad6 +5ad6 sd asd+as +we";
var regexp = /(?:\s(\+\d\w*))|(\w+)/g;
var tokens = input.match(regexp);
console.log(tokens);
How should the regex look like?
You've got multiple choices to reach your goal:
It's fine as you have it. You might allow the string beginning in place of the whitespace as well, though. Just get the capturing groups (tokens[1], tokens[2]) out of it, which will not include the whitespace.
If you didn't use JavaScript, a lookbehind could help. Unfortunately it's not supported.
Require a non-word-boundary before the +, which would make every \w character before the + prevent the match:
/\B\+\d\w+|\w+/
Why does this regex add the space to the matches?
Because the regex does match the whitespace. It does not add the \s(\+\d\w+) to the captured groups, though.

Email verification regex failing on hyphens

I'm attempting to verify email addresses using this regex: ^.*(?=.{8,})[\w.]+#[\w.]+[.][a-zA-Z0-9]+$
It's accepting emails like a-bc#def.com but rejecting emails like abc#de-f.com (I'm using the tool at http://tools.netshiftmedia.com/regexlibrary/ for testing).
Can anybody explain why?
Here is the explaination:
In your regualr expression, the part matches a-bc#def.com and abc#de-f.com is [\w.]+[.][a-zA-Z0-9]+$
It means:
There should be one or more digits, word characters (letters, digits, and underscores), and whitespace (spaces, tabs, and line breaks) or '.'. See the reference of '\w'
It is followed by a '.',
Then it is followed one or more characters within the collection a-zA-Z0-9.
So the - in de-f.com doesn't matches the first [\w.]+ format in rule 1.
The modified solution
You could adjust this part to [\w.-]+[.][a-zA-Z0-9]+$. to make - validate in the #string.
Because after the # you're looking for letters, numbers, _, or ., then a period, then alphanumeric. You don't allow for a - anywhere after the #.
You'd need to add the - to one of the character classes (except for the single literal period one, which I would have written \.) to allow hyphens.
\w is letters, numbers, and underscores.
A . inside a character class, indicated by [], is just a period, not any character.
In your first expression, you don't limit to \w, you use .*, which is 0+ occurrences of any character (which may not actually be what you want).
Use this Regex:
var email-regex = /^[^#]+#[^#]+\.[^#\.]{2,}$/;
It will accept a-bc#def.com as well as emails like abc#de-f.com.
You may also refer to a similar question on SO:
Why won't this accept email addresses with a hyphen after the #?
Hope this helps.
Instead you can use a regex like this to allow any email address.
^[a-zA-Z][\w\.-]*[a-zA-Z0-9]#[a-zA-Z][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z]$
Following regex works:
([A-Za-z0-9]+[-.-_])*[A-Za-z0-9]+#[-A-Za-z0-9-]+(\.[-A-Z|a-z]{2,})+

Regex expression using word boundary for matching alphanumeric and non alphanumeric characters in javascript

I am trying to highlight a set of keywords using JavaScript and regex, I facing one problem, my keyword may contain literal and special characters as in #text #number etc. I am using word boundary to match and replace the whole word and not a partial word (contained within another word).
var pattern = new regex('\b '( + keyword +')\b',gi);
Here this expression matches the whole keywords and highlights them, however in case if any keyword like "number:" do not get highlighted.
I am aware that \bword\b matches for a word boundary and special characters are non alphanumeric characters hence are not matched by the above expression.
Can you let me know what regex expression I can use to accomplish the above.
==Update==
For the above I tried Tim Pietzcker's suggestion for the below regex,
expr: (?:^|\\b|\\s)(" + keyword + ")(?:$|\\b|\\s)
The above seems to be working for getting me a match for the whole word with alphanumeric and non alphanumeric characters, however whenever a keyword has consecutive html tag before or after the keyword without a space, it does not highlight that keyword (e.g. social security *number:< br >*)
I tried the following regex, but it replaces the html tag preceding the keyword
expr: (?:^|\b|\s|<[^>]+>)number:(?:$|\b|\s|<[^>]+>)
Here for the keyword number: which has < br > (space added intentionally for br tag to avoid browser interpreting the tag) coming next without space in between gets highlighted with the keyword.
Can you suggest an expression which would ignore the consecutive html tag for the whole word with both alphanumeric and non alphanumeric characters.
2021 update: JS now supports lookbehind so this answer is a little outdated.
OK, so you have two problems: JavaScript doesn't support lookbehind, and \b only finds boundaries between alphanumeric and non-alphanumeric characters.
The first question: What exactly does constitute a word boundary for your keywords? My guess is that it must be either a \b boundary or whitespace. If that is the case, you could search for
"(?:^|\\b|\\s)(" + keyword + ")(?:$|\\b|\\s)"
Of course the whitespace characters around keywords like #number# would also become part of the match, but perhaps highlighting those isn't such a problem. In other cases, i. e. if there is an actual word boundary that can match, the spaces won't be part of the match so it should work fine in the majority of cases.
The actual word you're interested in will be in backreference #1, so if you can highlight that separately, even better.
EDIT:
If other characters than space may occur after/before a keyword, then I think the only thing you can do (if you're stuck with JavaScript) is:
Check if your keyword starts with an alnum character.
If so, prepend \b to your regex.
Check if your keyword ends with an alnum character.
If so, append \b to your regex.
So, for keyword, use \bkeyword\b; for number:, use \bnumber:; for #twitter, use #twitter\b.
We need to look for a substring that has a whitespace character on both sides. If JavaScript supported lookbehind, this would look like:
var re = new RegExp('(?<!\\S)' + keyword + '(?!\\S)', 'gi');
That won't work though (but would in Perl and other scripting languages). Instead, we need to include the leading whitespace character (or beginning of string) as the beginning part of the match (and optionally capture what we are really looking for into $1):
var re = new RegExp('(?:^|\\s)(' + keyword + ')(?!\\S)', 'gi');
Just consider that the real place where any match starts will be one character after what is returned by the .index property returned by re.exec(string), and that if you are accessing the matched string, you either need to remove the first character with .slice(1) or simply access what is captured.
maybe what you're trying to do is
'\b\W*(' + keyword + ')\W*\b'
Lookahead and lookbehind are your answer: "(?=<[\s^])" + keyword + "(?=[\s$])". The bits in brackets aren't included in the match, so include whatever characters aren't permitted in the keywords in there.
As Tim correctly points out, \b are tricky things that work differently than the way people often think they work. Read this answer for more details about this matter, and what you can do about it.
In brief, this is a boundary to the left:
(?(?=\w)(?<!\w)|(?<!\W))
and this is a boundary to the right:
(?(?<=\w)(?!\w)|(?!\W))
People always think there are spaces involved, but there aren’t. However, now that you know the real definitions, it’s easy to build that into them. One could swap out \w and \W in echange for \s and \Sin the two patterns above. Or one could add in whitespace awareness to the else blocks.
Try this it should work...
var pattern = new regex(#"\b"+Regex.escape(keyword)+#"\b",gi);

Categories