I'm maintaining some old code when I reached a headscratcher. I am confused by this regex pattern: /^.*$/ (as supplied as an argument in textFieldValidation(this,'true',/^.*$/,'','').
I interpret this regex as:
/^=open pattern
.=match a single character of any value (Except EOL)
*=match 0 or more times
$=match end of the line
/=close pattern
So…I think this pattern matches everything, which means the function does nothing but waste processing cycles. Am I correct?
It matches a single line of text.
It will fail to match a multiline String, because ^ matches the begining of input, and $ matches the end of input. If there are any new line (\n) or caret return (\r) symbols in between - it fails.
For example, 'foo'.match(/^.*$/) returns foo.
But 'foo\nfoo'.match(/^.*$/) returns null.
^ "Starting at the beginning."
. "Matching anything..."
* "0 or more times"
$ "To the end of the line."
Yep, you're right on, that matches empty or something.
And a handy little cheat sheet.
The regexp checks that the string doesn't contain any \n or \r. Dots do not match new-lines.
Examples:
/^.*$/.test(""); // => true
/^.*$/.test("aoeu"); // => true
/^.*$/.test("aoeu\n"); // => false
/^.*$/.test("\n"); // => false
/^.*$/.test("aoeu\nfoo"); // => false
/^.*$/.test("\nfoo"); // => false
Yes, you are quite correct. This regex matches any string that not contains EOL (if dotall=false) or any string (if dotall=true)
Related
I have requirement in javascript where the starting and ending of a word is a Character with in between numbers.
e.g. S652354536667U
I tried with pattern
(/[A-Z]\d+[A-Z]$/).test(S652354536667U) // returns true ok
(/[A-Z]\d+[A-Z]$/).test(S65235Y4536667U) // returns true needed false
but it is allowing characters in between like S65235Y4536667U is being accepted.
Any help is appreciated.
Thanks
You need to put a caret at the start of the regex, to indicate the first letter is at the start of the string, i.e.:
^[A-Z]\d+[A-Z]$
You are missing the ^. It should be:
^[A-Z]\d*[A-Z]$
You can use this site to quickly test your regex: https://regexr.com/
You left caret in the beginning.
["S652354536667U", "S65235Y4536667U"].forEach(item => {
console.log(/^[A-Z]\d+[A-Z]$/.test(item))
})
You need to put ^ at the start and $ at the end in the reg ex. Otherwise if the pattern match anywhere else in the input, it will return true
(/^[A-Z]\d+[A-Z]$/).test(S652354536667U) // returns true
(/^[A-Z]\d+[A-Z]$/).test(S65235Y4536667U) // returns true
Check the below image. It is without specifying the start position. So, from the character 'Y' matching the reg ex
Below is with the starting position
I'm trying to come up with a Regexp that detects whether a string in Javascript is trimmed or not. That means, it doesn't have whitespace at the beginning or end.
So far I have
^[^\s].*[^\s]$
This works if the string is long, but for short strings such as a, it will not work because the pattern wants a non-space, any character, then another non-space. That a is trimmed, but doesn't follow the pattern.
Help me find the correct regex.
Try this to make a second char optional:
^[^\s](.*[^\s])?$
One option is to take your existing regex (^[^\s].*[^\s]$) and add a separate part to it specifically to test for a single non-space character (^[^\s]$), combining the two parts with |:
/(^[^\s]$)|(^[^\s].*[^\s]$)/
But I find sometimes it is simpler to test for the opposite case and then invert the result with !. So in your case, have the regex test for a space at the beginning or end:
!/(^\s)|(\s$)/.test(someString)
Encapsulated in a function:
function isTrimmed(s) {
return !/(^\s)|(\s$)/.test(s)
}
console.log(isTrimmed(" a ")) // false
console.log(isTrimmed("a ")) // false
console.log(isTrimmed(" a")) // false
console.log(isTrimmed("a")) // true
console.log(isTrimmed("abc")) // true
console.log(isTrimmed("a b c")) // true
console.log(isTrimmed("")) // true
You can use the below regex :
^([\s]+[\w]*)?([\w]*[\s]+)?$
Might not be the nicest answer but Negative lookahead should work:
/^(?!(?:.*\s$)|(?:^\s.*)).*$/
(?=^\S)[\W\w]*\S$
(?=^\S) Lookahead for non-space character at the start of the string (matches without consuming the string; will start match from first character again)
[\W\w]* Match any number of characters
\S$ Match non-space character at the end of the string
So for a:
(?=^\S) Matches a at the start of the string (resets current position to the start of the string)
[\W\w]* Matches no characters, as it needs to match the next part
\S$ Matches a at the end of the string
I was going through a piece of code and I hit against this syntax
str.replace(re,function(raw, p1, p2, p3){
if (!/\/\//.test(p1)) { // <---- this one
//some more code
}
});
I understand that the test method matches one string with another, and checks if it is present. But what does this regex /\/\// matching the string to?
I checked the regex, and
\/ matches the character / literally
\/ matches the character / literally
so what does if(!//.test(p1)) doing?
The conditional is true if the string does not contain two consecutive slashes.
If first captured group () p1 contains // return false at if condition by converting true to false using ! operator
\/ matches the character / literally. The above regex will execute if condition if there are no 2 consecutive /.
check out this: here
Regular expression
[A-Za-z_-]+
should match strings that only have upper and lower case letters, underscores, and a dash
but when I run in chrome console
/[A-Za-z_-]+/.test("johmSmith12")
Why it returns true
Because you didn't anchor the expression. You need to add ^ and $, which match beginning and end of string.
For example:
^[A-Za-z_-]+$
Just the [A-Za-z_-]+ will match johnSmith in your example, leaving out the 12 (as David Starkey pointed out).
It is due to your regex looking for any sequence of characters within the test string that matches the regex. In your example, "johnSmith" matches your regex criteria, and so test returns true.
If you instead put ^ (start of string) and $ (end of string) at the ends of your regex, then you would assert that the entire string must match your regex:
/^[A-Za-z_-]+$/.test("johnSmith12");
This will return false.
Here's a fun snippet I ran into today:
/\ba/.test("a") --> true
/\bà/.test("à") --> false
However,
/à/.test("à") --> true
Firstly, wtf?
Secondly, if I want to match an accented character at the start of a word, how can I do that? (I'd really like to avoid using over-the-top selectors like /(?:^|\s|'|\(\) ....)
This worked for me:
/^[a-z\u00E0-\u00FC]+$/i
With help from here
The reason why /\bà/.test("à") doesn't match is because "à" is not a word character. The escape sequence \b matches only between a boundary of word character and a non word character. /\ba/.test("a") matches because "a" is a word character. Because of that, there is a boundary between the beginning of the string (which is not a word character) and the letter "a" which is a word character.
Word characters in JavaScript's regex is defined as [a-zA-Z0-9_].
To match an accented character at the start of a string, just use the ^ character at the beginning of the regex (e.g. /^à/). That character means the beginning of the string (unlike \b which matches at any word boundary within the string). It's most basic and standard regular expression, so it's definitely not over the top.
If you want to match letters, whether or not they're accented, unicode property escapes can be helpful.
/\p{Letter}*/u.test("à"); // true
/\p{Letter}/u.test('œ'); // true
/\p{Letter}/u.test('a'); // true
/\p{Letter}/u.test('3'); // false
/\p{Letter}/u.test('a'); // true
Matching to the start of a word is tricky, but (?<=(?:^|\s)) seems to do the trick. The (?<= ) is a positive lookbehind, ensuring that something exists before the main expression. The (?: ) is a non-capture group, so you don't end up with a reference to this part in whatever match you use later. Then the ^ will match the start of the string if the multiline flag isn't set or the start of the line if the multiline flag is set and the \s will match a whitespace character (space/tab/linebreak).
So using them together, it would look something like:
/(?<=(?:^|\s))\p{Letter}*/u
If you want to only match accented characters to the start of the string, you'd want a negated character set for a-zA-Z.
/(?<=(?:^|\s))[^a-zA-Z]\p{Letter}*/u.match("bœ") // false
/(?<=(?:^|\s))[^a-zA-Z]\p{Letter}*/u.match("œb") // true
// Match characters, accented or not
let regex = /\p{Letter}+$/u;
console.log(regex.test("œb")); // true
console.log(regex.test("bœb")); // true
console.log(regex.test("àbby")); // true
console.log(regex.test("à3")); // false
console.log(regex.test("16 tons")); // true
console.log(regex.test("3 œ")); // true
console.log('-----');
// Match characters to start of line, only match characters
regex = /(?<=(?:^|\s))\p{Letter}+$/u;
console.log(regex.test("œb")); // true
console.log(regex.test("bœb")); // true
console.log(regex.test("àbby")); // true
console.log(regex.test("à3")); // false
console.log('----');
// Match accented character to start of word, only match characters
regex = /(?<=(?:^|\s))[^a-zA-Z]\p{Letter}+$/u;
console.log(regex.test("œb")); // true
console.log(regex.test("bœb")); // false
console.log(regex.test("àbby")); // true
console.log(regex.test("à3")); // false
Stack Overflow had also an issue with non ASCII characters in regex, you can find it here. They are not coping with word boundaries, but maybe gives you anyway useful hints.
There is another page, but he wants to match strings and not words.
I don't know, and did not find now, an anchor for your problem, but when I see what monster regexes in my first link are used, your group, that you want to avoid, is not over the top and to my opinion your solution.
const regex = /^[\-/A-Za-z\u00C0-\u017F ]+$/;
const test1 = regex.test("à");
const test2 = regex.test("Martinez-Cortez");
const test3 = regex.test("Leonardo da vinci");
const test4 = regex.test("ï");
console.log('test1', test1);
console.log('test2', test2);
console.log('test3', test3);
console.log('test4', test4);
Building off of Wak's and Cœur's answer:
/^[\-/A-Za-z\u00C0-\u017F ]+$/
Works for spaces and dashes too.
Example: Leonardo da vinci, Martinez-Cortez
Unicode allows for two alternative but equivalent representations of some accented characters. For example, é has two Unicode representations: '\u0039' and '\u0065\u0301'. The former is called composed form and the latter is called decomposed form. JavaScript allows for conversion between the two:
'é'.normalize('NFD') // decompose: '\u0039' -> '\u0065\u0301'
'é'.normalize('NFC') // compose: '\u0065\u0301' -> '\u0039'
'é'.length // composed form: -> 1
'é'.length // decomposed form: -> 2 (looks identical but has different representation)
'é' == 'é' // -> false (composed and decomposed strings are not equal)
The code point '\u0301' belongs to the Unicode Combining Diacritical Marks code block 0300-036F. So one way to match these accented characters is to compare them in decomposed form:
// matching accented characters
/[a-zA-Z][\u0300-\u036f]+/.test('é'.normalize('NFD')) // -> true
/\bé/.test('é') // -> false
/\bé/.test('é'.normalize('NFD')) // -> true (NOTE: /\bé/ uses the decomposed form)
// matching accented words
/^\w+$/.test('résumé') // -> false
/^(?:[a-zA-Z][\u0300-\u036f]*)+$/.test('résumé'.normalize('NFD')) // -> true