I am trying to create a pipe in angular, but I cannot seem to wrap my head around regular expressions.
I am trying to match only lonely or single uppercase letters and split them.
Let us say for instance I have the following:
thisIsAnApple should return [this, Is, An, Apple]
thisIsAString should return [this, Is, AString]
BB88GGFR should return [BB88GGFR]
So the plan is to match a capital letter if it is not accompanied by another capital letter.
This is what I have come up with:
const regExr = new RegExp(/(?=[A-Z])(?![A-Z]{2,})/g);
let split = string.split(regExr);
You may use
string.split(/(?<=[a-z])(?=[A-Z])/)
Or, to support all Unicode letters:
string.split(/(?<=\p{Ll})(?=\p{Lu})/u)
Basically, you match an empty string between a lowercase and an uppercase letter.
JS demo:
const strings = ['thisIsAnApple', 'thisIsAString', 'BB88GGFR'];
const regex = /(?<=[a-z])(?=[A-Z])/;
for (let s of strings) {
console.log(s, '=>', s.split(regex));
}
JS demo #2:
const strings = ['thisIsĄnApple', 'этоСТрокаТакая', 'BB88GGFR'];
const regex = /(?<=\p{Ll})(?=\p{Lu})/u;
for (let s of strings) {
console.log(s, '=>', s.split(regex));
}
Related
I have a CLI where the user can declare an alphabet and pass it to my code. My code generate a string with that alphabet
For example if the user declare these groups of alphabet abc abcAB1234 and ##1$2% I need to generate a string where every single character is at least in one group and the generated string has all the characters defined by the alphabet. No repetition are allowed (case sensitive)
So, if the alphabet is abc abcAB1234 ##1$2% the admitted output can be B#1a or #ca41% but not aA## (same character 'a' repeated) or aBcZ# ('Z' is not part of the alphabet) or aBA43 (some characters of alphabet are not presents)
I tried with this ^(?!.*([abcabcAB1234##1$2%])\1{1})(?!.*([abc])\1{1})(?!.*([abcAB1234])\1{1})(?!.*([##1$2%])\1{1})[abcabcAB1234##1$2%]{8,}$ but, obviously, doesn't work
Can someone please help me to understand where I'm wrong with my regexp?
I don't think this is possible with a RexExp. But it is easy to achieve using a Set.
const alphabet = 'abcBEL'
const wordToMatch = 'BLa'
const wordToMatch2 = 'BLaa'
const wordToMatch3 = 'zBLa'
function checkWord(alphabet, word) {
const set = new Set(alphabet.split(''))
for (const c of word){
if (!set.has(c)) return false
set.delete(c)
}
return true
}
console.log(checkWord(alphabet, wordToMatch))
console.log(checkWord(alphabet, wordToMatch2))
console.log(checkWord(alphabet, wordToMatch3))
Examples of filenames
FDIP_en-gb-nn_Text_v1_YYYYMMDD_SequenceNumber.txt
FDIP_fr-fr-nn_Text_v1_YYYYMMDD_SequenceNumber.txt
FDIP_de-de-nn_Text_v1_YYYYMMDD_SequenceNumber.txt
REGEX is FDIP_([a-z]{2}-[A-Z]{2}-[a-z]{2})_Text_v1_[0-9]{8}_[0-9]{14}.txt
The only part I need is the translation code which is 'en-gb', 'fr-fr' , 'de-de.
How do I extract just that part of the filename?
Modified the regex little bit to match the numbers and text. You can play around here
Explanation
to capture a group you need to wrap the regex into () this will capture as a group.
to do the named capturing you can (?<name_of_group>) and then you can access by name.
Here goes the matching process.
[a-z]{2} match 2 char from a-z
[a-zA-Z0-9] match any char of a-z or A-Z or 0-9
g means global flag i.e. match all.
i means ignore case.
var r = /FDIP_([a-z]{2}-[A-Z]{2})-[a-z]{2}_Text_v1_[0-9A-Z]{8}_[A-Z0-9]{14}.txt/gi;
let t = 'FDIP_en-gb-nn_Text_v1_YYYYMMDD_SequenceNumber.txt';
let dd = r.exec(t);
console.log(dd[1]);
This is example of group capturing
See the name in the regex and the object destructing name is matching.
const { groups: { language } } = /FDIP_(?<language>[a-z]{2}-[A-Z]{2})-[a-z]{2}_Text_v1_[0-9A-Z]{8}_[A-Z0-9]{14}.txt/gi.exec('FDIP_en-gb-nn_Text_v1_YYYYMMDD_SequenceNumber.txt');
console.log(language);
To solve your problem, you should:
Fix your regex:
FDIP_([a-z]{2}-[A-Z]{2}-[a-z]{2})_Text_v1_[0-9]{8}_[0-9]{14}.txt
// to
FDIP_([a-z]{2}-[a-z]{2})-[a-z]{2}_Text_v1_[0-9]{8}_[0-9]{14}.txt
Use get value from first group by using regex.exec function
const fileNames = [
'FDIP_en-gb-nn_Text_v1_20190101_12345678901234.txt',
'FDIP_fr-fr-nn_Text_v1_20200202_12345678901234.txt',
'FDIP_de-de-nn_Text_v1_20180808_12345678901234.txt']
const cultureNames = fileNames.map(name => {
const matched = /FDIP_([a-z]{2}-[a-z]{2})-[a-z]{2}_Text_v1_[0-9]{8}_[0-9]{14}.txt/.exec(name)
return matched && matched[1]
})
console.log(cultureNames)
Change FDIP_([a-z]{2}-[A-Z]{2}-[a-z]{2})_Text_v1_[0-9]{8}_[0-9]{14}.txt
to
let pattern = /FDIP_([a-z]{2}-[a-z]{2})-[a-z]{2}_Text_v1_[\w]{8}_[\w]{14}.txt/;
var str = 'FDIP_en-gb-nn_Text_v1_YYYYMMDD_SequenceNumber.txt';
console.log(str.match(pattern)[1]);
I am trying to retrieve the first alphabetic word of a string, which might include tags as well.
I have tried using split(" ") but it gives me the spaces.
let letter = ' <section class="contact" id="contact">';
let firstWord = letter.split (" ");
It should just show section as the first word. Is there any way I can do. Thank you
Simple regex to match alphabetic (not alphanumeric) words /[a-zA-Z]+/g
let letter = ' <section class="contact" id="contact">';
let words = letter.match (/[a-zA-Z]+/g); // Match all alphabet
let firstWord = words.length > 0 ? words[0] : '';
console.log(firstWord);
You may use several solutions based on what you really need.
For the current scenario, you may match a chunk of 1+ ASCII letters
let letter = ' <section class="contact" id="contact">';
let first_word = (letter.match(/[a-z]+/i) || [""])[0];
console.log(first_word)
You may tell the regex engine to only match it if there are no digits or underscores around it using \b, word boundaries:
/\b[a-z]+\b/i
And in case you want to match any Unicode letter word and target ECMAScript 2018 and newer, you may use
let regex = /\p{Alphabetic}+/u;
console.log("Один,два".match(regex)[0]); // => Один
Or, with Unicode word boundaries,
let regex = /(?<![\p{Alphabetic}\p{N}_])\p{Alphabetic}+(?![\p{Alphabetic}\p{N}_])/u;
// Or,
// let regex = /(?<!\p{L}\p{M}*|[\p{N}_])\p{Alphabetic}+(?![\p{L}\p{N}_])/u
console.log("1Один2,два-три".match(regex)[0]); // => два
That is, to match 1+ alphabetic chars not preceded nor followed with letters or digits.
To get the first word, match a non-letter, then one or more letters inside a capturing group, then another non-letter:
let letter = ' <section class="contact" id="contact">';
let [, firstWord] = letter.match(/[^a-z]([a-z]+)[^a-z]/i);
console.log(firstWord);
How i can select RQR-1BN6Q360090-0001 (without quotes) using Regex in below -
<html><head><title>Object moved</title></head><body>
<h2>Object moved to here.</h2>
</body></html>
I tried this but it does not work
RptNum=([A-Za-z]+)$
You may use
/RptNum=([\w-]+)/
The pattern will match RptNum= and then capture 1 or more occurrences of word chars (letters, digits and _) or hyphens. See the regex demo and the regex graph:
Note that
/RptNum=([A-Z0-9-]+)/
might be a more restrictive pattern that should work, too. It does not match _ and lowercase letters.
In JS, use it with String#match() and grab the second array item upon a match:
var s = 'Object moved to here';
var m = s.match(/RptNum=([\w-]+)/);
if (m) {
console.log(m[1]);
}
Here, we can also use an expression that collects the new lines, such as:
[\s\S]*RptNum=(.+?)"[\s\S]*
[\w\W]*RptNum=(.+?)"[\w\W]*
[\d\D]*RptNum=(.+?)"[\d\D]*
and our desired output is saved in (.+?).
Test
const regex = /[\s\S]*RptNum=(.+?)"[\s\S]*/gm;
const str = `<html><head><title>Object moved</title></head><body>
<h2>Object moved to here.</h2>
</body></html>`;
const subst = `$1`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log(result);
Demo
RegEx
If this expression wasn't desired, it can be modified/changed in regex101.com.
RegEx Circuit
jex.im visualizes regular expressions:
const text = 'RptNum=RQR-1BN6Q360090-0001';
console.log(text.match(/RptNum=.*/).map(m => m.match(/RptNum=.*/)[0])[0].split('RptNum=')[1]);
I suppose that works
Need to extract values from a string using regex(for perf reasons).
Cases might be as follows:
RED,100
RED,"100"
RED,"100,"
RED,"100\"ABC\"200"
The resulting separated [label, value] array should be:
['RED','100']
['RED','100']
['RED','100,']
['RED','100"ABC"200']
I looked into solutions and a popular library even, just splits the entire string to get the values,
e.g. 'RED,100'.split(/,/) might just do the thing.
But I was trying to make a regex with comma, which splits only if that comma is not enclosed within a quotes type value.
This isnt a standard CSV behaviour might be. But its very easy for end-user to enter values.
enter label,value. Do whatever inside value, if thats surrounded by quotes. If you wanna contain quotes, use a backslash.
Any help is appreciated.
You can use this regex that takes care of escaped quotes in string:
/"[^"\\]*(?:\\.[^"\\]*)*"|[^,"]+/g
RegEx Explanation:
": Match a literal opening quote
[^"\\]*: Match 0 or more of any character that is not \ and not a quote
(?:\\.[^"\\]*)*: Followed by escaped character and another non-quote, non-\. Match 0 or more of this combination to get through all escaped characters
": Match closing quote
|: OR (alternation)
[^,"]+: Match 1+ of non-quote, non-comma string
RegEx Demo
const regex = /"[^"\\]*(?:\\.[^"\\]*)*"|[^,"]+/g;
const arr = [`RED,100`, `RED,"100"`, `RED,"100,"`,
`RED,"100\\"ABC\\"200"`];
let m;
for (var i = 0; i < arr.length; i++) {
var str = arr[i];
var result = [];
while ((m = regex.exec(str)) !== null) {
result.push(m[0]);
}
console.log("Input:", str, ":: Result =>", result);
}
You could use String#match and take only the groups.
var array = ['RED,100', 'RED,"100"', 'RED,"100,"', 'RED,"100\"ABC\"200"'];
console.log(array.map(s => s.match(/^([^,]+),(.*)$/).slice(1)))