Regex match line with all words starting in uppercase - javascript

I am attempting to create a regex pattern that matches a line where all the words begin with uppercase letters, regardless of length. It must also account for any number of equals signs ('=') being on either side.
For example matches:
==This Would Match==
===I Like My Cats===
====Number Of Equals Signs Does Not Matter===
=====Nor Does Line Length Etc.=====
Non-matches:
==This would not regardless of its length==
===Nor would this match, etc===
How do I write this pattern?

You could match one or more equals signs at either side like =+.
To match words that begin with a capital letter could start with [A-Z] followed by \w one or more times. If you want to match more characters than \w, you could create a character class [\w.] to add matching a dot for example.
This pattern would match between equals sign(s) zero or more times a word that starts with an uppercase character followed by a whitespace, and ends with a word that starts with an uppercase character:
^=+(?:[A-Z]\w* )*(?:[A-Z][\w.]+)=+$
const strings = [
"==This Would Match==",
"===I Like My Cats===",
"====Number Of Equals Signs Does Not Matter===",
"=====Nor Does Line Length Etc.=====",
"==This would not regardless of its length==",
"===Nor would this match, etc===",
"=aaaa="
];
let pattern = /^=+(?:[A-Z]\w* )*(?:[A-Z][\w.]+)=+$/;
strings.forEach((s) => {
console.log(s + " ==> " + pattern.test(s));
});

This matches your desired results:
var test = [
"==This Would Match==",
"===I Like My Cats===",
"====Number Of Equals Signs Does Not Matter===",
"=====Nor Does Line Length Etc.=====",
"==This would not regardless of its length==",
"===Nor would this match, etc==="
]
var reg = /=*([A-Z]\w*\W*)+=*/g;
console.log(test.map(t => t.match(reg) == t));

Try this regex:
^=*[A-Z][^ ]*( [A-Z][^ ]*)*=*$
It allows for any number (including 0) of = signs on either side and requires every word to start with a capital letter.
The * quantifier means 0 or more times.
[^ ] is a negated character class, meaning it matches anything except a space.
You can try it online here.

Related

Write a regex that matches with string separated by dash but should be all upper case or lower case

I am writing a regex that checks for the strings like
ju-NIP-er-us skop-u-LO-rum and ui-LA-ui LO-iu-yu
the set of characters separated by the -
this is what I have got
let str = "ju-NIP-er-us skop-u-LO-rum";
let str2 = "jU-NiP-Er-us skop-u-LO-rum"
console.log(/^\p{L}+(?:[- ']\p{L}+)*$/u.test(str)) // this matches
console.log(/^\p{L}+(?:[- ']\p{L}+)*$/u.test(str2)) // this also matches but this shouldn't match
The problem is set of characters separated by the - should either be all capital or all smaller ,but right now it is matching the mix characters as well ,see the snippet .How to make this regex match only all small or all caps between dashes.
You may use this regex to make sure to match same case substrings between - or space or ' delimiters:
^(?:\p{Lu}+|\p{Ll}+)(?:[- '](?:\p{Lu}+|\p{Ll}+))*$
RegEx Demo
Non-capturing group (?:\p{Lu}+|\p{Ll}+) matches either 1+ of uppercase unicode letters or 1+ of lowercase unicode letters but not a mix of both cases.

How can I port this JavaScript logic as a single-line RegEx to be used in PHP?

I am trying to recreate the following conditions to validate string syntax in a single RegEx to be used with PHP. What would be the fastest regular expression to use, returning TRUE if these conditions are all met?
String must not be empty
Total string length: >= 3 and <= 16 characters
Allowed characters in string: a-z,0-9,-,.
No upper case letters allowed
First character must be a lower case letter: a-z
Last character must be a lower case letter or digit: a-z or 0-9
Cannot have two consecutive hyphens - anywhere in string
Bonus Trickyness: If a dot/period . exists, the string becomes segmented using the . as a delimiter. These "segments" have special rules:
All previous rules also still apply to the string as a whole
Segment must start with a letter: a-z
Segment end with a letter or digit: a-z 0-9
Segment can only have letters, digits, or hyphens: a-z 0-9 -
Segment must be => 3 characters long
This "segment" logic is really throwing me for a loop (no pun intended). I'm not sure how to incorporate everything together.
Here's an example of JavaScript that achieves the goal. I am working in PHP and need a single line RegEx that will validate TRUE if all criteria is met. I don't need the logic to return why it failed (out of scope). I only need a TRUE/FALSE RegEx. Just leaving this snippet in case it is helpful:
export function validateAccountName(value) {
let i, label, len, suffix;
suffix = "Account name should ";
if (!value) {
return suffix + "not be empty.";
}
const length = value.length;
if (length < 3) {
return suffix + "be longer.";
}
if (length > 16) {
return suffix + "be shorter.";
}
if (/\./.test(value)) {
suffix = "Each account segment should ";
}
const ref = value.split(".");
for (i = 0, len = ref.length; i < len; i++) {
label = ref[i];
if (!/^[a-z]/.test(label)) {
return suffix + "start with a letter.";
}
if (!/^[a-z0-9-]*$/.test(label)) {
return suffix + "have only letters, digits, or dashes.";
}
if (/--/.test(label)) {
return suffix + "have only one dash in a row.";
}
if (!/[a-z0-9]$/.test(label)) {
return suffix + "end with a letter or digit.";
}
if (!(label.length >= 3)) {
return suffix + "be longer";
}
}
return null;
}
Thanks!
You might use a positive lookahead to assert the length, including the first and last character, and another negative lookahead to assert not 2 consecutive hyphens.
^(?=[a-z][a-z\d.-]{1,14}[a-z\d]$)(?!.*--)[a-z\d-]+(?:\.[a-z][a-z\d-]+[a-z\d])*$
The pattern matches
^ Start of string
(?=[a-z][a-z\d.-]{1,14}[a-z\d]$) Positive lookahead to assert 3-16 chars, starting with a-z and end with a-z\d
(?!.*--) Negative lookahead to assert not --
[a-z\d-]+ Match 1+ times any of the listed chars (without a dot)
(?: Non capture group
\.[a-z][a-z\d-]+[a-z\d] Match a . and repeat segments of at least 3 chars starting with a-z, then 1+ times any of the listed without a dot and ending on a-z\d without the hyphen
)* Close the non capture group and optionally repeat
$ End of string
Regex demo
If all the parts should be at least 3 chars when there is a dot present, you can repeat matching at least 3 character and shorten the pattern a bit by repeating the first group using (?1) or see the pattern without the group.
^(?=[a-z][a-z\d.-]{1,14}[a-z\d]$)(?!.*--)([a-z][a-z\d-]+[a-z\d]+)(?:\.(?1))*$
Regex demo
You may try this regex as well:
^[a-z](?!.*--)(?!.*\.([^.]{0,2}(\.|$)|[.\d-]))[a-z.\d-]{1,14}[a-z\d]$
RegEx Demo
RegEx Explained:
^: Start
[a-z]: Match a a-z character
(?!.*--): Negative lookahead to fail the match if 2 consecutive hyphen are found anywhere
(?!.*\.([^.]{0,2}(\.|$)|[.\d-])): Negative lookahead to fail the match if there are less than 3 non-dot characters after a dot or a non-alpha character after dot
[a-z.\d-]{1,14}: Match 1 to 14 instances of a a-z or a digit or a dor or a hyphen character
[a-z\d]: Match a a-z or digit
$: End

Regex remove all leading and trailing special characters?

Let's say I have the following string in javascript:
&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&
I want to remove all the leading and trailing special characters (anything which is not alphanumeric or alphabet in another language) from all the words.
So the string should look like
a.b.c a.b.c a.b.c a.b.c a.b&.c a.b.&&dc ê.b..c
Notice how the special characters in between the alphanumeric is left behind. The last ê is also left behind.
This regex should do what you want. It looks for
start of line, or some spaces (^| +) captured in group 1
some number of symbol characters [!-\/:-#\[-``\{-~]*
a minimal number of non-space characters ([^ ]*?) captured in group 2
some number of symbol characters [!-\/:-#\[-``\{-~]*
followed by a space or end-of-line (using a positive lookahead) (?=\s|$)
Matches are replaced with just groups 1 and 2 (the spacing and the characters between the symbols).
let str = '&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&';
str = str.replace(/(^| +)[!-\/:-#\[-`\{-~]*([^ ]*?)[!-\/:-#\[-`\{-~]*(?=\s|$)/gi, '$1$2');
console.log(str);
Note that if you want to preserve a string of punctuation characters on their own (e.g. as in Apple & Sauce), you should change the second capture group to insist on there being one or more non-space characters (([^ ]+?)) instead of none and add a lookahead after the initial match of punctuation characters to assert that the next character is not punctuation:
let str = 'Apple &&& Sauce; -This + !That!';
str = str.replace(/(^| +)[!-\/:-#\[-`\{-~]*(?![!-\/:-#\[-`\{-~])([^ ]+?)[!-\/:-#\[-`\{-~]*(?=\s|$)/gi, '$1$2');
console.log(str);
a-zA-Z\u00C0-\u017F is used to capture all valid characters, including diacritics.
The following is a single regular expression to capture each individual word. The logic is that it will look for the first valid character as the beginning of the capture group, and then the last sequence of invalid characters before a space character or string terminator as the end of the capture group.
const myRegEx = /[^a-zA-Z\u00C0-\u017F]*([a-zA-Z\u00C0-\u017F].*?[a-zA-Z\u00C0-\u017F]*)[^a-zA-Z\u00C0-\u017F]*?(\s|$)/g;
let myString = '&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&'.replace(myRegEx, '$1$2');
console.log(myString);
Something like this might help:
const string = '&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&';
const result = string.split(' ').map(s => /^[^a-zA-Z0-9ê]*([\w\W]*?)[^a-zA-Z0-9ê]*$/g.exec(s)[1]).join(' ');
console.log(result);
Note that this is not one single regex, but uses JS help code.
Rough explanation: We first split the string into an array of strings, divided by spaces. We then transform each of the substrings by stripping
the leading and trailing special characters. We do this by capturing all special characters with [^a-zA-Z0-9ê]*, because of the leading ^ character it matches all characters except those listed, so all special characters. Between these two groups we capture all relevant characters with ([\w\W]*?). \w catches words, \W catches non-words, so \w\W catches all possible characters. By appending the ? after the *, we make the quantifier * lazy, so that the group stops catching as soon as the next group, which catches trailing special characters, catches something. We also start the regex with a ^ symbol and end it with an $ symbol to capture the entire string (they respectively set anchors to the start end the end of the string). With .exec(s)[1] we then execute the regex on the substring and return the first capturing group result in our transform function. Note that this might be null if a substring does not include proper characters. At the end we join the substrings with spaces.

How to split a string based of capital letters?

I have a string I need to split based on capital letters,my code below
let s = 'OzievRQ7O37SB5qG3eLB';
var res = s.split(/(?=[A-Z])/)
console.log(res);
But there is a twist,if the capital letters are contiguous I need the regex to "eat" until this sequence ends.In the example above it returns
..R,Q7,O37,S,B5q,G3e,L,B
And the result should be
RQ7,O37,SB5q,G3e,LB
Thoughts?Thanks.
You need to match these chunks with /[A-Z]+[^A-Z]*|[^A-Z]+/g instead of splitting with a zero-width assertion pattern, because the latter (in your case, it is a positive lookahead only regex) will have to check each position inside the string and it is impossible to tell the regex to skip a position once the lookaround pattern is found.
s = 'and some text hereOzievRQ7O37SB5qG3eLB';
console.log(s.match(/[A-Z]+[^A-Z]*|[^A-Z]+/g));
See the online regex demo at regex101.com.
Details:
[A-Z]+ - one or more uppercase ASCII letters
[^A-Z]* - zero or more (to allow matching uppercase only chunks) chars other than uppercase ASCII letters
| - or
[^A-Z]+ - one or more chars other than uppercase ASCII letters (to allow matching non-uppercase ASCII letters at the start of the string.
The g global modifier will let String#match() return all found non-overlapping matches.

validating variable in javascript

Hi i have a field in php that will be validated in javascript using i.e for emails
var emailRegex = /^[\w-\.]+#([\w-]+\.)+[\w-]{2,4}$/;
What i'm after is a validation check which will look for the
first letter as a capital Q
then the next letters can be numbers only
then followed by a .
then two numbers only
and then an optional letter
i.e Q100.11 or Q100.11a
I must admit i look at the above email validation check and i have no clue how it works but it does ;)
many thanks for any help on this
Steve
The ^ marks the beginning of the string, $ matches the end of the string. In other words, the whole string should exactly match this regular expression.
[\w-\.]+: I think you wanted to match letters, digits, dots and - only. In that case, the - should be escaped (\-): [\w\-\.]+. The plus-sign makes is match one or more times.
#: a literal # match
([\w-]+\.)+ letters, digits and - are allowed one or more times, with a dot after it (between the parentheses). This may occur several times (at least once).
[\w-]{2,4}: this should match the TLD, like com, net or org. Because a TLD can only contain letters, it should be replaced by [a-z]{2,4}. This means: lowercase letters may occur two till four times. Note that the TLD can be longer than 4 characters.
An regular expression which should follow the next rules:
a capital Q (Q)
followed by one or more occurrences of digits (\d+)
a literal dot (.)
two digits (\d{2})
one optional letter ([a-z]?)
Result:
var regex = /Q\d+\.\d{2}[a-z]?/;
If you need to match strings case-insensitive, add the i (case-insensitive) modifier:
var regex = /Q\d+\.\d{2}[a-z]?/i;
Validating a string using a regexp can be done in several ways, one of them:
if (regex.test(str)) {
// success
} else {
// no match
}
var emailRegex = /^Q\d+\.\d{2}[a-zA-Z]?#([\w-]+\.)+[a-zA-Z]+$/;
var str = "Q100.11#test.com";
alert(emailRegex.test(str));
var regex = /^Q[0-9]+\.[0-9]{2}[a-z]?$/;
+ means one or more
the period must be escaped - \.
[0-9]{2} means 2 digits, same as \d{2}
[a-z]? means 0 or 1 letter
You can check your regex at http://regexpal.com/

Categories