Javascript: Detect if a string is trimmed - javascript

I'm trying to come up with a Regexp that detects whether a string in Javascript is trimmed or not. That means, it doesn't have whitespace at the beginning or end.
So far I have
^[^\s].*[^\s]$
This works if the string is long, but for short strings such as a, it will not work because the pattern wants a non-space, any character, then another non-space. That a is trimmed, but doesn't follow the pattern.
Help me find the correct regex.

Try this to make a second char optional:
^[^\s](.*[^\s])?$

One option is to take your existing regex (^[^\s].*[^\s]$) and add a separate part to it specifically to test for a single non-space character (^[^\s]$), combining the two parts with |:
/(^[^\s]$)|(^[^\s].*[^\s]$)/
But I find sometimes it is simpler to test for the opposite case and then invert the result with !. So in your case, have the regex test for a space at the beginning or end:
!/(^\s)|(\s$)/.test(someString)
Encapsulated in a function:
function isTrimmed(s) {
return !/(^\s)|(\s$)/.test(s)
}
console.log(isTrimmed(" a ")) // false
console.log(isTrimmed("a ")) // false
console.log(isTrimmed(" a")) // false
console.log(isTrimmed("a")) // true
console.log(isTrimmed("abc")) // true
console.log(isTrimmed("a b c")) // true
console.log(isTrimmed("")) // true

You can use the below regex :
^([\s]+[\w]*)?([\w]*[\s]+)?$

Might not be the nicest answer but Negative lookahead should work:
/^(?!(?:.*\s$)|(?:^\s.*)).*$/

(?=^\S)[\W\w]*\S$
(?=^\S) Lookahead for non-space character at the start of the string (matches without consuming the string; will start match from first character again)
[\W\w]* Match any number of characters
\S$ Match non-space character at the end of the string
So for a:
(?=^\S) Matches a at the start of the string (resets current position to the start of the string)
[\W\w]* Matches no characters, as it needs to match the next part
\S$ Matches a at the end of the string

Related

Do not allow '.'(dot) anywhere in a string (regular expression)

I have a regular expression for allowing unicode chars in names(Spanish, Japanese etc), but I don't want to allow '.'(dot) anywhere in the string.
I have tried this regex but it fails when string length is less than 3. I am using xRegExp.
^[^.][\\pL ,.'-‘’][^.]+$
For Example:
NOËL // true
Sanket ketkar // true
.sank // false
san. ket // false
NOËL.some // false
Basically it should return false when name has '.' in it.
Your pattern ^[^.][\\pL ,.'-‘’][^.]+$ matches at least 3 characters because you use 3 characters classes, where the first 2 expect to match at least 1 character and the last one matches 1 or more times.
You could remove the dot from your character class and repeat that character class only to match 1+ times any of the listed to also match when there are less than 3 characters.
^[\p{L} ,'‘’-]+$
Regex demo
Or you could use a negated character class:
^[^.\r\n]+$
^ Start of string
[^.\r\n]+ Negated character class, match any char except a dot or newline
$ End of string
Regex demo
You could try:
^[\p{L},\-\s‘’]+(?!\.)$
As seen here: https://regex101.com/r/ireqbW/5
Explanation -
The first part of the regex [\p{L},\-\s‘’]+ matches any unicode letter, hyphen or space (given by \s)
(?!\.) is a Negative LookAhead in regex, which basically tells the regex that for each match, it should not be followed by a .
^[^.]+$
It will match any non-empty string that does not contain a dot between the start and the end of the string.
If there is a dot somewhere between start to end (i.e. anywhere) it will fail.

regex - don't allow name to finish with hyphen

I'm trying to create a regex using javascript that will allow names like abc-def but will not allow abc-
(hyphen is also the only nonalpha character allowed)
The name has to be a minimum of 2 characters. I started with
^[a-zA-Z-]{2,}$, but it's not good enough so I'm trying something like this
^([A-Za-z]{2,})+(-[A-Za-z]+)*$.
It can have more than one - in a name but it should never start or finish with -.
It's allowing names like xx-x but not names like x-x. I'd like to achieve that x-x is also accepted but not x-.
Thanks!
Option 1
This option matches strings that begin and end with a letter and ensures two - are not consecutive so a string like a--a is invalid. To allow this case, see the Option 2.
^[a-z]+(?:-?[a-z]+)+$
^ Assert position at the start of the line
[a-z]+ Match any lowercase ASCII letter one or more times (with i flag this also matches uppercase variants)
(?:-?[a-z]+)+ Match the following one or more times
-? Optionally match -
[a-z]+ Match any ASCII letter (with i flag)
$ Assert position at the end of the line
var a = [
"aa","a-a","a-a-a","aa-aa-aa","aa-a", // valid
"aa-a-","a","a-","-a","a--a" // invalid
]
var r = /^[a-z]+(?:-?[a-z]+)+$/i
a.forEach(function(s) {
console.log(`${s}: ${r.test(s)}`)
})
Option 2
If you want to match strings like a--a then you can instead use the following regex:
^[a-z]+[a-z-]*[a-z]+$
var a = [
"aa","a-a","a-a-a","aa-aa-aa","aa-a","a--a", // valid
"aa-a-","a","a-","-a" // invalid
]
var r = /^[a-z]+[a-z-]*[a-z]+$/i
a.forEach(function(s) {
console.log(`${s}: ${r.test(s)}`)
})
You can use a negative lookahead:
/(?!.*-$)^[a-z][a-z-]+$/i
Regex101 Example
Breakdown:
// Negative lookahead so that it can't end with a -
(?!.*-$)
// The actual string must begin with a letter a-z
[a-z]
// Any following strings can be a-z or -, there must be at least 1 of these
[a-z-]+
let regex = /(?!.*-$)^[a-z][a-z-]+$/i;
let test = [
'xx-x',
'x-x',
'x-x-x',
'x-',
'x-x-x-',
'-x',
'x'
];
test.forEach(string => {
console.log(string, ':', regex.test(string));
});
The problem is that the first assertion accepts 2 or more [A-Za-z]. You will need to modify it to accept one or more character:
^[A-Za-z]+((-[A-Za-z]{1,})+)?$
Edit: solved some commented issues
/^[A-Za-z]+((-[A-Za-z]{1,})+)?$/.test('xggg-dfe'); // Logs true
/^[A-Za-z]+((-[A-Za-z]{1,})+)?$/.test('x-d'); // Logs true
/^[A-Za-z]+((-[A-Za-z]{1,})+)?$/.test('xggg-'); // Logs false
Edit 2: Edited to accept characters only
/^[A-Za-z]+((-[A-Za-z]{1,})+)?$/.test('abc'); // Logs true
Use this if you want to accept such as A---A as well :
^(?!-|.*-$)[A-Za-z-]{2,}$
https://regex101.com/r/4UYd9l/4/
If you don't want to accept such as A---A do this:
^(?!-|.*[-]{2,}.*|.*-$)[A-Za-z-]{2,}$
https://regex101.com/r/qH4Q0q/4/
So both will accept only word starting from two characters of the pattern [A-Za-z-] and not start or end (?!-|.*-$) (negative lookahead) with - .
Try this /([a-zA-Z]{1,}-[a-zA-Z]{1,})/g
I suggest the following :
^[a-zA-Z][a-zA-Z-]*[a-zA-Z]$
It validates :
that the matched string is at least composed of two characters (the first and last character classes are matched exactly once)
that the first and the last characters aren't dashes (the first and last character classes do not include -)
that the string can contain dashes and be greater than 2 characters (the second character class includes dashes and will consume as much characters as needed, dashes included).
Try it online.
^(?=[A-Za-z](?:-|[A-Za-z]))(?:(?:-|^)[A-Za-z]+)+$
Asserts that
the first character is a-z
the second is a-z or hyphen
If this matches
looks for groups of one or more letters prefixed by a hyphen or start of string, all the way to end of string.
You can also use the I switch to make it case insensitive.

Javascript test method not working ( as expected )

Regular expression
[A-Za-z_-]+
should match strings that only have upper and lower case letters, underscores, and a dash
but when I run in chrome console
/[A-Za-z_-]+/.test("johmSmith12")
Why it returns true
Because you didn't anchor the expression. You need to add ^ and $, which match beginning and end of string.
For example:
^[A-Za-z_-]+$
Just the [A-Za-z_-]+ will match johnSmith in your example, leaving out the 12 (as David Starkey pointed out).
It is due to your regex looking for any sequence of characters within the test string that matches the regex. In your example, "johnSmith" matches your regex criteria, and so test returns true.
If you instead put ^ (start of string) and $ (end of string) at the ends of your regex, then you would assert that the entire string must match your regex:
/^[A-Za-z_-]+$/.test("johnSmith12");
This will return false.

What does this `/^.*$/` regex match?

I'm maintaining some old code when I reached a headscratcher. I am confused by this regex pattern: /^.*$/ (as supplied as an argument in textFieldValidation(this,'true',/^.*$/,'','').
I interpret this regex as:
/^=open pattern
.=match a single character of any value (Except EOL)
*=match 0 or more times
$=match end of the line
/=close pattern
So…I think this pattern matches everything, which means the function does nothing but waste processing cycles. Am I correct?
It matches a single line of text.
It will fail to match a multiline String, because ^ matches the begining of input, and $ matches the end of input. If there are any new line (\n) or caret return (\r) symbols in between - it fails.
For example, 'foo'.match(/^.*$/) returns foo.
But 'foo\nfoo'.match(/^.*$/) returns null.
^ "Starting at the beginning."
. "Matching anything..."
* "0 or more times"
$ "To the end of the line."
Yep, you're right on, that matches empty or something.
And a handy little cheat sheet.
The regexp checks that the string doesn't contain any \n or \r. Dots do not match new-lines.
Examples:
/^.*$/.test(""); // => true
/^.*$/.test("aoeu"); // => true
/^.*$/.test("aoeu\n"); // => false
/^.*$/.test("\n"); // => false
/^.*$/.test("aoeu\nfoo"); // => false
/^.*$/.test("\nfoo"); // => false
Yes, you are quite correct. This regex matches any string that not contains EOL (if dotall=false) or any string (if dotall=true)

Matching accented characters with Javascript regexes

Here's a fun snippet I ran into today:
/\ba/.test("a") --> true
/\bà/.test("à") --> false
However,
/à/.test("à") --> true
Firstly, wtf?
Secondly, if I want to match an accented character at the start of a word, how can I do that? (I'd really like to avoid using over-the-top selectors like /(?:^|\s|'|\(\) ....)
This worked for me:
/^[a-z\u00E0-\u00FC]+$/i
With help from here
The reason why /\bà/.test("à") doesn't match is because "à" is not a word character. The escape sequence \b matches only between a boundary of word character and a non word character. /\ba/.test("a") matches because "a" is a word character. Because of that, there is a boundary between the beginning of the string (which is not a word character) and the letter "a" which is a word character.
Word characters in JavaScript's regex is defined as [a-zA-Z0-9_].
To match an accented character at the start of a string, just use the ^ character at the beginning of the regex (e.g. /^à/). That character means the beginning of the string (unlike \b which matches at any word boundary within the string). It's most basic and standard regular expression, so it's definitely not over the top.
If you want to match letters, whether or not they're accented, unicode property escapes can be helpful.
/\p{Letter}*/u.test("à"); // true
/\p{Letter}/u.test('œ'); // true
/\p{Letter}/u.test('a'); // true
/\p{Letter}/u.test('3'); // false
/\p{Letter}/u.test('a'); // true
Matching to the start of a word is tricky, but (?<=(?:^|\s)) seems to do the trick. The (?<= ) is a positive lookbehind, ensuring that something exists before the main expression. The (?: ) is a non-capture group, so you don't end up with a reference to this part in whatever match you use later. Then the ^ will match the start of the string if the multiline flag isn't set or the start of the line if the multiline flag is set and the \s will match a whitespace character (space/tab/linebreak).
So using them together, it would look something like:
/(?<=(?:^|\s))\p{Letter}*/u
If you want to only match accented characters to the start of the string, you'd want a negated character set for a-zA-Z.
/(?<=(?:^|\s))[^a-zA-Z]\p{Letter}*/u.match("bœ") // false
/(?<=(?:^|\s))[^a-zA-Z]\p{Letter}*/u.match("œb") // true
// Match characters, accented or not
let regex = /\p{Letter}+$/u;
console.log(regex.test("œb")); // true
console.log(regex.test("bœb")); // true
console.log(regex.test("àbby")); // true
console.log(regex.test("à3")); // false
console.log(regex.test("16 tons")); // true
console.log(regex.test("3 œ")); // true
console.log('-----');
// Match characters to start of line, only match characters
regex = /(?<=(?:^|\s))\p{Letter}+$/u;
console.log(regex.test("œb")); // true
console.log(regex.test("bœb")); // true
console.log(regex.test("àbby")); // true
console.log(regex.test("à3")); // false
console.log('----');
// Match accented character to start of word, only match characters
regex = /(?<=(?:^|\s))[^a-zA-Z]\p{Letter}+$/u;
console.log(regex.test("œb")); // true
console.log(regex.test("bœb")); // false
console.log(regex.test("àbby")); // true
console.log(regex.test("à3")); // false
Stack Overflow had also an issue with non ASCII characters in regex, you can find it here. They are not coping with word boundaries, but maybe gives you anyway useful hints.
There is another page, but he wants to match strings and not words.
I don't know, and did not find now, an anchor for your problem, but when I see what monster regexes in my first link are used, your group, that you want to avoid, is not over the top and to my opinion your solution.
const regex = /^[\-/A-Za-z\u00C0-\u017F ]+$/;
const test1 = regex.test("à");
const test2 = regex.test("Martinez-Cortez");
const test3 = regex.test("Leonardo da vinci");
const test4 = regex.test("ï");
console.log('test1', test1);
console.log('test2', test2);
console.log('test3', test3);
console.log('test4', test4);
Building off of Wak's and Cœur's answer:
/^[\-/A-Za-z\u00C0-\u017F ]+$/
Works for spaces and dashes too.
Example: Leonardo da vinci, Martinez-Cortez
Unicode allows for two alternative but equivalent representations of some accented characters. For example, é has two Unicode representations: '\u0039' and '\u0065\u0301'. The former is called composed form and the latter is called decomposed form. JavaScript allows for conversion between the two:
'é'.normalize('NFD') // decompose: '\u0039' -> '\u0065\u0301'
'é'.normalize('NFC') // compose: '\u0065\u0301' -> '\u0039'
'é'.length // composed form: -> 1
'é'.length // decomposed form: -> 2 (looks identical but has different representation)
'é' == 'é' // -> false (composed and decomposed strings are not equal)
The code point '\u0301' belongs to the Unicode Combining Diacritical Marks code block 0300-036F. So one way to match these accented characters is to compare them in decomposed form:
// matching accented characters
/[a-zA-Z][\u0300-\u036f]+/.test('é'.normalize('NFD')) // -> true
/\bé/.test('é') // -> false
/\bé/.test('é'.normalize('NFD')) // -> true (NOTE: /\bé/ uses the decomposed form)
// matching accented words
/^\w+$/.test('résumé') // -> false
/^(?:[a-zA-Z][\u0300-\u036f]*)+$/.test('résumé'.normalize('NFD')) // -> true

Categories