How to split a string based of capital letters? - javascript

I have a string I need to split based on capital letters,my code below
let s = 'OzievRQ7O37SB5qG3eLB';
var res = s.split(/(?=[A-Z])/)
console.log(res);
But there is a twist,if the capital letters are contiguous I need the regex to "eat" until this sequence ends.In the example above it returns
..R,Q7,O37,S,B5q,G3e,L,B
And the result should be
RQ7,O37,SB5q,G3e,LB
Thoughts?Thanks.

You need to match these chunks with /[A-Z]+[^A-Z]*|[^A-Z]+/g instead of splitting with a zero-width assertion pattern, because the latter (in your case, it is a positive lookahead only regex) will have to check each position inside the string and it is impossible to tell the regex to skip a position once the lookaround pattern is found.
s = 'and some text hereOzievRQ7O37SB5qG3eLB';
console.log(s.match(/[A-Z]+[^A-Z]*|[^A-Z]+/g));
See the online regex demo at regex101.com.
Details:
[A-Z]+ - one or more uppercase ASCII letters
[^A-Z]* - zero or more (to allow matching uppercase only chunks) chars other than uppercase ASCII letters
| - or
[^A-Z]+ - one or more chars other than uppercase ASCII letters (to allow matching non-uppercase ASCII letters at the start of the string.
The g global modifier will let String#match() return all found non-overlapping matches.

Related

Write a regex that matches with string separated by dash but should be all upper case or lower case

I am writing a regex that checks for the strings like
ju-NIP-er-us skop-u-LO-rum and ui-LA-ui LO-iu-yu
the set of characters separated by the -
this is what I have got
let str = "ju-NIP-er-us skop-u-LO-rum";
let str2 = "jU-NiP-Er-us skop-u-LO-rum"
console.log(/^\p{L}+(?:[- ']\p{L}+)*$/u.test(str)) // this matches
console.log(/^\p{L}+(?:[- ']\p{L}+)*$/u.test(str2)) // this also matches but this shouldn't match
The problem is set of characters separated by the - should either be all capital or all smaller ,but right now it is matching the mix characters as well ,see the snippet .How to make this regex match only all small or all caps between dashes.
You may use this regex to make sure to match same case substrings between - or space or ' delimiters:
^(?:\p{Lu}+|\p{Ll}+)(?:[- '](?:\p{Lu}+|\p{Ll}+))*$
RegEx Demo
Non-capturing group (?:\p{Lu}+|\p{Ll}+) matches either 1+ of uppercase unicode letters or 1+ of lowercase unicode letters but not a mix of both cases.

Matching Arabic and English letters only javascript regex

I'm trying to write a regex that matches Arabic and English letters only (numbers and special characters are not allowed) spaces are allowed.
This regex worked fine but allows numbers in the middle of the string
/[\u0620-\u064A\040a-zA-Z]+$/
for example, it matches (سم111111ر) which suppose not to match.
The question is there a way not to match numbers in the middle of the letters.
Note in JavaScript you will have to use the ECMAScript 2018+ with Unicode category class support:
const texts = ['أسبوع أسبوع','week week','hunāka','سم111111ر'];
const re = /^(?:(?=[\p{Script=Arabic}A-Za-z])\p{L}|\s)+$/u;
for (const text of texts) {
console.log(text, '=>', re.test(text))
}
The ^(?:(?=[\p{Script=Arabic}A-Za-z])\p{L}|\s)+$ means
^ - start of string
(?: - start of a non-capturing group container:
(?=[\p{Script=Arabic}A-Za-z]) - a positive lookahead that requires a char from the Arabic script or an ASCII letter to occur immediately to the right of the current location
\p{L} - any Unicode letter (note \p{Alphabetic} includes a bit more "letter" chars, you may want to try it out)
| - or
\s - whitespace
)+ - repeat one or more times
$ - end of string.

Regex remove all leading and trailing special characters?

Let's say I have the following string in javascript:
&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&
I want to remove all the leading and trailing special characters (anything which is not alphanumeric or alphabet in another language) from all the words.
So the string should look like
a.b.c a.b.c a.b.c a.b.c a.b&.c a.b.&&dc ê.b..c
Notice how the special characters in between the alphanumeric is left behind. The last ê is also left behind.
This regex should do what you want. It looks for
start of line, or some spaces (^| +) captured in group 1
some number of symbol characters [!-\/:-#\[-``\{-~]*
a minimal number of non-space characters ([^ ]*?) captured in group 2
some number of symbol characters [!-\/:-#\[-``\{-~]*
followed by a space or end-of-line (using a positive lookahead) (?=\s|$)
Matches are replaced with just groups 1 and 2 (the spacing and the characters between the symbols).
let str = '&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&';
str = str.replace(/(^| +)[!-\/:-#\[-`\{-~]*([^ ]*?)[!-\/:-#\[-`\{-~]*(?=\s|$)/gi, '$1$2');
console.log(str);
Note that if you want to preserve a string of punctuation characters on their own (e.g. as in Apple & Sauce), you should change the second capture group to insist on there being one or more non-space characters (([^ ]+?)) instead of none and add a lookahead after the initial match of punctuation characters to assert that the next character is not punctuation:
let str = 'Apple &&& Sauce; -This + !That!';
str = str.replace(/(^| +)[!-\/:-#\[-`\{-~]*(?![!-\/:-#\[-`\{-~])([^ ]+?)[!-\/:-#\[-`\{-~]*(?=\s|$)/gi, '$1$2');
console.log(str);
a-zA-Z\u00C0-\u017F is used to capture all valid characters, including diacritics.
The following is a single regular expression to capture each individual word. The logic is that it will look for the first valid character as the beginning of the capture group, and then the last sequence of invalid characters before a space character or string terminator as the end of the capture group.
const myRegEx = /[^a-zA-Z\u00C0-\u017F]*([a-zA-Z\u00C0-\u017F].*?[a-zA-Z\u00C0-\u017F]*)[^a-zA-Z\u00C0-\u017F]*?(\s|$)/g;
let myString = '&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&'.replace(myRegEx, '$1$2');
console.log(myString);
Something like this might help:
const string = '&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&';
const result = string.split(' ').map(s => /^[^a-zA-Z0-9ê]*([\w\W]*?)[^a-zA-Z0-9ê]*$/g.exec(s)[1]).join(' ');
console.log(result);
Note that this is not one single regex, but uses JS help code.
Rough explanation: We first split the string into an array of strings, divided by spaces. We then transform each of the substrings by stripping
the leading and trailing special characters. We do this by capturing all special characters with [^a-zA-Z0-9ê]*, because of the leading ^ character it matches all characters except those listed, so all special characters. Between these two groups we capture all relevant characters with ([\w\W]*?). \w catches words, \W catches non-words, so \w\W catches all possible characters. By appending the ? after the *, we make the quantifier * lazy, so that the group stops catching as soon as the next group, which catches trailing special characters, catches something. We also start the regex with a ^ symbol and end it with an $ symbol to capture the entire string (they respectively set anchors to the start end the end of the string). With .exec(s)[1] we then execute the regex on the substring and return the first capturing group result in our transform function. Note that this might be null if a substring does not include proper characters. At the end we join the substrings with spaces.

Javascript in regexp not matching something

I want to match everything except the one with the string '1AB' in it. How do I do that? When I tried it, it said nothing is matched.
var text = "match1ABmatch match2ABmatch match3ABmatch";
var matches = text.match(/match(?!1AB)match/g);
console.log(matches[0]+"..."+matches[1]);
Lookarounds do not consume the text, i.e. the regex index does not move when their patterns are matched. See Lookarounds Stand their Ground for more details. You still must match the text with a consuming pattern, here, the digits.
Add \w+ word matching pattern after the lookahead. NOTE: You may also use \S+ if there can be any one or more non-whitespace chars. If there can be any chars, use .+ (to match 1 or more chars other than line break chars) or [^]+ (matches even line breaks).
var text = "match100match match200match match300match";
var matches = text.match(/match(?!100(?!\d))\w+match/g);
console.log(matches);
Pattern details
match - a literal substring
(?!100(?!\d)) - a negative lookahead that fails the match if, immediately to the right of the current location, there is 100 substring not followed with a digit (if you want to fail the matches where the number starts with 100, remove the (?!\d) lookahead)
\w+ - 1 or more word chars (letters, digits or _)
match - a literal substring
See the regex demo online.

Regex match line with all words starting in uppercase

I am attempting to create a regex pattern that matches a line where all the words begin with uppercase letters, regardless of length. It must also account for any number of equals signs ('=') being on either side.
For example matches:
==This Would Match==
===I Like My Cats===
====Number Of Equals Signs Does Not Matter===
=====Nor Does Line Length Etc.=====
Non-matches:
==This would not regardless of its length==
===Nor would this match, etc===
How do I write this pattern?
You could match one or more equals signs at either side like =+.
To match words that begin with a capital letter could start with [A-Z] followed by \w one or more times. If you want to match more characters than \w, you could create a character class [\w.] to add matching a dot for example.
This pattern would match between equals sign(s) zero or more times a word that starts with an uppercase character followed by a whitespace, and ends with a word that starts with an uppercase character:
^=+(?:[A-Z]\w* )*(?:[A-Z][\w.]+)=+$
const strings = [
"==This Would Match==",
"===I Like My Cats===",
"====Number Of Equals Signs Does Not Matter===",
"=====Nor Does Line Length Etc.=====",
"==This would not regardless of its length==",
"===Nor would this match, etc===",
"=aaaa="
];
let pattern = /^=+(?:[A-Z]\w* )*(?:[A-Z][\w.]+)=+$/;
strings.forEach((s) => {
console.log(s + " ==> " + pattern.test(s));
});
This matches your desired results:
var test = [
"==This Would Match==",
"===I Like My Cats===",
"====Number Of Equals Signs Does Not Matter===",
"=====Nor Does Line Length Etc.=====",
"==This would not regardless of its length==",
"===Nor would this match, etc==="
]
var reg = /=*([A-Z]\w*\W*)+=*/g;
console.log(test.map(t => t.match(reg) == t));
Try this regex:
^=*[A-Z][^ ]*( [A-Z][^ ]*)*=*$
It allows for any number (including 0) of = signs on either side and requires every word to start with a capital letter.
The * quantifier means 0 or more times.
[^ ] is a negated character class, meaning it matches anything except a space.
You can try it online here.

Categories