Regular expression for matching alphabet

Regular expression for matching alphabet - javascript

I have a CLI where the user can declare an alphabet and pass it to my code. My code generate a string with that alphabet
For example if the user declare these groups of alphabet abc abcAB1234 and ##1$2% I need to generate a string where every single character is at least in one group and the generated string has all the characters defined by the alphabet. No repetition are allowed (case sensitive)
So, if the alphabet is abc abcAB1234 ##1$2% the admitted output can be B#1a or #ca41% but not aA## (same character 'a' repeated) or aBcZ# ('Z' is not part of the alphabet) or aBA43 (some characters of alphabet are not presents)
I tried with this ^(?!.*([abcabcAB1234##1$2%])\1{1})(?!.*([abc])\1{1})(?!.*([abcAB1234])\1{1})(?!.*([##1$2%])\1{1})[abcabcAB1234##1$2%]{8,}$ but, obviously, doesn't work
Can someone please help me to understand where I'm wrong with my regexp?

I don't think this is possible with a RexExp. But it is easy to achieve using a Set.
const alphabet = 'abcBEL'
const wordToMatch = 'BLa'
const wordToMatch2 = 'BLaa'
const wordToMatch3 = 'zBLa'
function checkWord(alphabet, word) {
const set = new Set(alphabet.split(''))
for (const c of word){
if (!set.has(c)) return false
set.delete(c)
}
return true
}
console.log(checkWord(alphabet, wordToMatch))
console.log(checkWord(alphabet, wordToMatch2))
console.log(checkWord(alphabet, wordToMatch3))

Related

Javascript: GUID: RegEx: string to GUID

I have a textbox that a user can paste into using Ctrl+V. I would like to restrict the textbox to accept just GUIDs. I tried to write a small function that would format an input string to a GUID based on RegEx, but I can't seem to be able to do it. I tried following the below post:
Javascript string to Guid
function stringToGUID()
{
var strInput = 'b6b954d9cbac4b18b0d5a0f725695f1ca98d64e456f76';
var strOutput = strInput.replace(/([0-f]{8})([0-f]{4})([0-f]{4})([0-f]{4})([0-f]{12})/,"$1-$2-$3-$4-$5");
console.log(strOutput );
//from my understanding, the input string could be any sequence of 0-9 or a-f of any length and a valid giud patterened string would be the result in the above code. This doesn't seem to be the case;
//I would like to extract first 32 characters; how do I do that?
}

I suggest that you remove the dashes, truncate to 32 characters, and then test if the remaining characters are valid before inserting the dashes:
function stringToGUID()
{
var input = 'b6b954d9cbac4b18b0d5a0f725695f1ca98d64e456f76';
let g = input.replace("-", "");
g = g.substring(0, 32);
if (/^[0-9A-F]{32}$/i.test(g)) {
g = g.replace(/(.{8})(.{4})(.{4})(.{4})(.{12})/, "$1-$2-$3-$4-$5");
}
console.log(g);
}
stringToGUID();
(The i at the end of the regex makes it case-insensitive.)

You are already matching 32 characters with the pattern, so there is no need to get a separate operation to get 32 characters to test against.
You can replace all the hyphens with an empty string, and then match the pattern from the start of the string using ^
Then first check if there is a match, and if there is do the replacement with the 5 groups and hyphens in between. If there is not match, return the original string.
The function stringToGUID() by itself does not do anything except log a string that is hardcoded in the function. To extend its functionality, you can pass a parameter.
function stringToGUID(s) {
const regex = /^([0-f]{8})([0-f]{4})([0-f]{4})([0-f]{4})([0-f]{12})/;
const m = s.replace(/-+/g, '').match(regex);
return m ? `${m[1]}-${m[2]}-${m[3]}-${m[4]}-${m[5]}` : s;
}
[
'b6b954d9cbac4b18b0d5a0f725695f1ca98d64e456f76',
'b6b954d9-cbac-4b18-b0d5-a0f725695f1c',
'----54d9cbac4b18b0d5a0f725695f1ca98d64e456f76',
'!##$%'
].forEach(s => {
console.log(stringToGUID(s));
});

is there a way for the content.replace to sort of split them into more words than these?

const filter = ["bad1", "bad2"];
client.on("message", message => {
var content = message.content;
var stringToCheck = content.replace(/\s+/g, '').toLowerCase();
for (var i = 0; i < filter.length; i++) {
if (content.includes(filter[i])){
message.delete();
break
}
}
});
So my code above is a discord bot that deletes the words when someone writes ''bad1'' ''bad2''
(some more filtered bad words that i'm gonna add) and luckily no errors whatsoever.
But right now the bot only deletes these words when written in small letters without spaces in-between or special characters.
I think i have found a solution but i can't seem to put it into my code, i mean i tried different ways but it either deleted lowercase words or didn't react at all and instead i got errors like ''cannot read property of undefined'' etc.
var badWords = [
'bannedWord1',
'bannedWord2',
'bannedWord3',
'bannedWord4'
];
bot.on('message', message => {
var words = message.content.toLowerCase().trim().match(/\w+|\s+|[^\s\w]+/g);
var containsBadWord = words.some(word => {
return badWords.includes(word);
});
This is what i am looking at. the var words line. specifically (/\w+|\s+|[^\s\w]+/g);.
Anyway to implement that into my const filter code (top/above) or a different approach?
Thanks in advance.

Well, I'm not sure what you're trying to do with .match(/\w+|\s+|[^\s\w]+/g). That's some unnecessary regex just to get an array of words and spaces. And it won't even work if someone were to split their bad word into something like "t h i s".
If you want your filter to be case insensitive and account for spaces/special characters, a better solution would probably require more than one regex, and separate checks for the split letters and the normal bad word check. And you need to make sure your split letters check is accurate, otherwise something like "wash it" might be considered a bad word despite the space between the words.
A Solution
So here's a possible solution. Note that it is just a solution, and is far from the only solution. I'm just going to use hard-coded string examples instead of message.content, to allow this to be in a working snippet:
//Our array of bad words
var badWords = [
'bannedWord1',
'bannedWord2',
'bannedWord3',
'bannedWord4'
];
//A function that tests if a given string contains a bad word
function testProfanity(string) {
//Removes all non-letter, non-digit, and non-space chars
var normalString = string.replace(/[^a-zA-Z0-9 ]/g, "");
//Replaces all non-letter, non-digit chars with spaces
var spacerString = string.replace(/[^a-zA-Z0-9]/g, " ");
//Checks if a condition is true for at least one element in badWords
return badWords.some(swear => {
//Removes any non-letter, non-digit chars from the bad word (for normal)
var filtered = swear.replace(/\W/g, "");
//Splits the bad word into a 's p a c e d' word (for spaced)
var spaced = filtered.split("").join(" ");
//Two different regexes for normal and spaced bad word checks
var checks = {
spaced: new RegExp(`\\b${spaced}\\b`, "gi"),
normal: new RegExp(`\\b${filtered}\\b`, "gi")
};
//If the normal or spaced checks are true in the string, return true
//so that '.some()' will return true for satisfying the condition
return spacerString.match(checks.spaced) || normalString.match(checks.normal);
});
}
var result;
//Includes one banned word; expected result: true
var test1 = "I am a bannedWord1";
result = testProfanity(test1);
console.log(result);
//Includes one banned word; expected result: true
var test2 = "I am a b a N_N e d w o r d 2";
result = testProfanity(test2);
console.log(result);
//Includes one banned word; expected result: true
var test3 = "A bann_eD%word4, I am";
result = testProfanity(test3);
console.log(result);
//Includes no banned words; expected result: false
var test4 = "No banned words here";
result = testProfanity(test4);
console.log(result);
//This is a tricky one. 'bannedWord2' is technically present in this string,
//but is 'bannedWord22' really the same? This prevents something like
//"wash it" from being labeled a bad word; expected result: false
var test5 = "Banned word 22 isn't technically on the list of bad words...";
result = testProfanity(test5);
console.log(result);
I've commented each line thoroughly, such that you understand what I am doing in each line. And here it is again, without the comments or testing parts:
var badWords = [
'bannedWord1',
'bannedWord2',
'bannedWord3',
'bannedWord4'
];
function testProfanity(string) {
var normalString = string.replace(/[^a-zA-Z0-9 ]/g, "");
var spacerString = string.replace(/[^a-zA-Z0-9]/g, " ");
return badWords.some(swear => {
var filtered = swear.replace(/\W/g, "");
var spaced = filtered.split("").join(" ");
var checks = {
spaced: new RegExp(`\\b${spaced}\\b`, "gi"),
normal: new RegExp(`\\b${filtered}\\b`, "gi")
};
return spacerString.match(checks.spaced) || normalString.match(checks.normal);
});
}
Explanation
As you can see, this filter is able to deal with all sorts of punctuation, capitalization, and even single spaces/symbols in between the letters of a bad word. However, note that in order to avoid the "wash it" scenario I described (potentially resulting in the unintentional deletion of a clean message), I made it so that something like "bannedWord22" would not be treated the same as "bannedWord2". If you want it to do the opposite (therefore treating "bannedWord22" the same as "bannedWord2"), you must remove both of the \\b phrases in the normal check's regex.
I will also explain the regex, such that you fully understand what is going on here:
[^a-zA-Z0-9 ] means "select any character not in the ranges of a-z, A-Z, 0-9, or space" (meaning all characters not in those specified ranges will be replaced with an empty string, essentially removing them from the string).
\W means "select any character that is not a word character", where "word character" refers to the characters in ranges a-z, A-Z, 0-9, and underscore.
\b means "word boundary", essentially indicating when a word starts or stops. This includes spaces, the beginning of a line, and the end of a line. \b is escaped with an additional \ (to become \\b) in order to prevent javascript from confusing the regex token with strings' escape sequences.
The flags g and i used in both of the regex checks indicate "global" and "case-insensitive", respectively.
Of course, to get this working with your discord bot, all you have to do in your message handler is something like this (and be sure to replace badWords with your filter variable in testProfanity()):
if (testProfanity(message.content)) return message.delete();
If you want to learn more about regex, or if you want to mess around with it and/or test it out, this is a great resource for doing so.

Extract a part of a regex name

Examples of filenames
FDIP_en-gb-nn_Text_v1_YYYYMMDD_SequenceNumber.txt
FDIP_fr-fr-nn_Text_v1_YYYYMMDD_SequenceNumber.txt
FDIP_de-de-nn_Text_v1_YYYYMMDD_SequenceNumber.txt
REGEX is FDIP_([a-z]{2}-[A-Z]{2}-[a-z]{2})_Text_v1_[0-9]{8}_[0-9]{14}.txt
The only part I need is the translation code which is 'en-gb', 'fr-fr' , 'de-de.
How do I extract just that part of the filename?

Modified the regex little bit to match the numbers and text. You can play around here
Explanation
to capture a group you need to wrap the regex into () this will capture as a group.
to do the named capturing you can (?<name_of_group>) and then you can access by name.
Here goes the matching process.
[a-z]{2} match 2 char from a-z
[a-zA-Z0-9] match any char of a-z or A-Z or 0-9
g means global flag i.e. match all.
i means ignore case.
var r = /FDIP_([a-z]{2}-[A-Z]{2})-[a-z]{2}_Text_v1_[0-9A-Z]{8}_[A-Z0-9]{14}.txt/gi;
let t = 'FDIP_en-gb-nn_Text_v1_YYYYMMDD_SequenceNumber.txt';
let dd = r.exec(t);
console.log(dd[1]);
This is example of group capturing
See the name in the regex and the object destructing name is matching.
const { groups: { language } } = /FDIP_(?<language>[a-z]{2}-[A-Z]{2})-[a-z]{2}_Text_v1_[0-9A-Z]{8}_[A-Z0-9]{14}.txt/gi.exec('FDIP_en-gb-nn_Text_v1_YYYYMMDD_SequenceNumber.txt');
console.log(language);

To solve your problem, you should:
Fix your regex:
FDIP_([a-z]{2}-[A-Z]{2}-[a-z]{2})_Text_v1_[0-9]{8}_[0-9]{14}.txt
// to
FDIP_([a-z]{2}-[a-z]{2})-[a-z]{2}_Text_v1_[0-9]{8}_[0-9]{14}.txt
Use get value from first group by using regex.exec function
const fileNames = [
'FDIP_en-gb-nn_Text_v1_20190101_12345678901234.txt',
'FDIP_fr-fr-nn_Text_v1_20200202_12345678901234.txt',
'FDIP_de-de-nn_Text_v1_20180808_12345678901234.txt']
const cultureNames = fileNames.map(name => {
const matched = /FDIP_([a-z]{2}-[a-z]{2})-[a-z]{2}_Text_v1_[0-9]{8}_[0-9]{14}.txt/.exec(name)
return matched && matched[1]
})
console.log(cultureNames)

Change FDIP_([a-z]{2}-[A-Z]{2}-[a-z]{2})_Text_v1_[0-9]{8}_[0-9]{14}.txt
to
let pattern = /FDIP_([a-z]{2}-[a-z]{2})-[a-z]{2}_Text_v1_[\w]{8}_[\w]{14}.txt/;
var str = 'FDIP_en-gb-nn_Text_v1_YYYYMMDD_SequenceNumber.txt';
console.log(str.match(pattern)[1]);

I need help getting the first n characters of a string up to when a number character starts

I'm working with a string where I need to extract the first n characters up to where numbers begin. What would be the best way to do this as sometimes the string starts with a number: 7EUSA8889er898 I would need to extract 7EUSA But other string examples would be SWFX74849948, I would need to extract SWFX from that string.
Not sure how to do this with regex my limited knowledge is blocking me at this point:
^(\w{4}) that just gets me the first four characters but I don't really have a stopping point as sometimes the string could be somelongstring292894830982 which would require me to get somelongstring

Using \w will match a word character which includes characters and digits and an underscore.
You could match an optional digit [0-9]? from the start of the string ^and then match 1+ times A-Za-z
^[0-9]?[A-Za-z]+
Regex demo
const regex = /^[0-9]?[A-Za-z]+/;
[
"7EUSA8889er898",
"somelongstring292894830982",
"SWFX74849948"
].forEach(s => console.log(s.match(regex)[0]));

Can use this regex code:
(^\d+?[a-zA-Z]+)|(^\d+|[a-zA-Z]+)
I try with exmaple and good worked:
1- somelongstring292894830982 -> somelongstring
2- 7sdfsdf5456 -> 7sdfsdf
3- 875werwer54556 -> 875werwer

If you want to create function where the RegExp is parametrized by n parameter, this would be
function getStr(str,n) {
var pattern = "\\d?\\w{0,"+n+"}";
var reg = new RegExp(pattern);
var result = reg.exec(str);
if(result[0]) return result[0].substr(0,n);
}

There are answers to this but here is another way to do it.
var string1 = '7EUSA8889er898';
var string2 = 'SWFX74849948';
var Extract = function (args) {
var C = args.split(''); // Split string in array
var NI = []; // Store indexes of all numbers
// Loop through list -> if char is a number add its index
C.map(function (I) { return /^\d+$/.test(I) === true ? NI.push(C.indexOf(I)) : ''; });
// Get the items between the first and second occurence of a number
return C.slice(NI[0] === 0 ? NI[0] + 1 : 0, NI[1]).join('');
};
console.log(Extract(string1));
console.log(Extract(string2));
Output
EUSA
SWFX7

Since it's hard to tell what you are trying to match, I'd go with a general regex
^\d?\D+(?=\d)

check if for every single char in string

I am trying to make ifcondition for a large number of chars.
I can use
if (str==!||str==#||str==#||str==$||str==^||str==&)
And so on, but this seems very inefficient. I would like to get the condition to work if the char is on of those:
!##%$^&()_-+=\?/.,'][{}<>`~
Is there is any shorter and more efficient way of doing it?
for (var c0 = 1; c0 > fn.length++; c0++) {
var str = fn.charAt(c0--);
if (str ==-"!##%$^&()_-+=\?/.,'][{}<>`~") {
}
}
I want the check to accrue on every single char from the string above.

You can use a regular expression character class to check if your character matches a particular character:
/^[\!##%$\^&\(\)_\-\+=\?\/\.,'\]\[\{\}\<\>`~]$/
Here I have escape special characters so that they get treated like regular characters.
See working example below:
const regex = /^[\!##%$\^&\(\)_\-\+=\?\/\.,'\]\[\{\}\<\>`~]$/,
charA = '#', // appears in char set
charB = 'A'; // doesn't appear in char set
console.log(regex.test(charA)); // true
console.log(regex.test(charB)); // false
Alternatively, if you don't want to use regular expressions you can instead put all your characters into an array and use .includes to check if your character is in your array.
const chars = "!##%$^&()_-+=\?/.,'][{}<>`~",
charArr = [...chars],
charA = '#', // is in char set
charB = 'A'; // isn't in char set
console.log(charArr.includes(charA)); // true
console.log(charArr.includes(charB)); // false

Just use regular expressions rather than manual single character checking.
const pattern = new RegExp("!##%$^&()_-+=\?\/.,'][{}<>`~");
const exists = pattern.test(str);
if (exists) {
// code logic for special character exists in string
}

First you can use split('') to split a string into an array of characters. Next you can use .some to check if a condition is true for at least one element in the array:
"!##%$^&()_-+=\?/.,'][{}<>`~".split('').some(x => x === str)

We Keep Coding

JavaScript is the programming language of the Web.

Regular expression for matching alphabet - javascript

Related

Javascript: GUID: RegEx: string to GUID

is there a way for the content.replace to sort of split them into more words than these?

Extract a part of a regex name

I need help getting the first n characters of a string up to when a number character starts

check if for every single char in string

Categories

Resources