Regex: Matching the first letter(s) of multiple words - javascript

So I am really bad at regex stuff ... I googled for an hour now and the best answer I could find is this one.
But I still can't for the sake of me figure it out ...
What I need is the following:
I have a JS array I need to .filter(). This arrays contains for example:
[ 'Gurken halbiert 2kg', 'Karotten geschnitten 5kg', 'Gurken RW' ]
For example: All of the following inputs should match the first entry ("Gurken halbiert 2kg"):
"Gu ha"
"gur h"
"gu halbi"
The following inputs should not match it:
"ken halbi"
"urk ha"
"gur biert"
Why? because one or more letters at the beginning of any word are missing.
More example inputs and the entries they should match:
input: "Gur" -> matches 1st and 3rd entry
input: "Kar g 5" -> matches 2nd
input: "G r" -> matches 3rd
I really hope someone can help as I am totally lost in this RegEx chaos - I never really understood them or how to use them.

Since the input varies, you would need to dynamically generate the regular expression.
In the function below, you will notice that we are basically building a string and then creating the regular expression using new RegExp(string, 'i').
The expression starts with a caret, and then basically follows the pattern:
^[[nth input char(s)]]\w*\s+[[nth input char(s)]]\w*\s+[[nth input char(s)]]\w*
It's worth pointing out that \w* is added after each input string and \s+ is added if it's not the last input string (i.e., not the end).
function generateRegex (input) {
var string = '^', arr = input.trim().split(' ');
arr.forEach(function (chars, i) {
string += chars + '\\w*' + (arr.length - 1 > i ? '\\s+' : '');
});
return new RegExp(string, 'i');
}
Then you can use the .filter() method on your array and return the elements that match:
var array = ['Gurken halbiert 2kg', 'Karotten geschnitten 5kg', 'Gurken RW'];
var filteredArray = array.filter(function (value) {
return value.match(generateRegex('Gur Ha'));
});
Output:
'Gur Ha' would match: ["Gurken halbiert 2kg"]
'Gur' would match: ["Gurken halbiert 2kg", "Gurken RW"]
'Kar g 5' would match: ["Karotten geschnitten 5kg"]
'G r' would match: ["Gurken RW"]
Example:
function generateRegex (input) {
var string = '^', arr = input.trim().split(' ');
arr.forEach(function (chars, i) {
string += chars + '\\w*' + (arr.length - 1 > i ? '\\s+' : '');
});
return new RegExp(string, 'i');
}
var array = ['Gurken halbiert 2kg', 'Karotten geschnitten 5kg', 'Gurken RW'];
var filteredArray = array.filter(function (value) {
return value.match(generateRegex('Gur'));
});
document.body.textContent = JSON.stringify(filteredArray);

Here is an example of how to filter user input as specified.
Notes
Escaping of regular expression characters from user input pattern (always sanitise user input).
Explicitly not using "\w" in order to support characters like "é".
Supports white space characters other than <space> like <tab> which can be copied and pasted into user input fields causing the user to think it's broken.
function doSubmit() {
// Get the user input search pattern string
var userInput = document.getElementById("myinput").value,
// List of strings to search
testList = [
'Gurken halbiert 2kg',
'Karotten geschnitten 5kg',
'Gurken RW'
],
// Between our "words" we allow zero or more non-space characters "[^\s]*"
// (this eats any extra characters the user might not have specified in their search pattern)
// followed by one or more white-space characters "\s+"
// (eating that space between the "words")
// Note that we are escaping the "\" characters here.
// Note we also don't use "\w" as this doesn't allow for characters like "é".
regexBetween = '[^\\s]*\\s+',
// Match the start of the string "^"
// Optionally allow one or more "words" at the start
// (this eats a "word" followed by a space zero or more times).
// Using an empty string here would allow "o g" to match the 2nd item in our test array.
regexStart = '^(?:' + regexBetween + ')*',
// Clean whitespace at begining and end
regexString = userInput.trim()
// Escape any characters that might break a regular expression
// Taken from: https://developer.mozilla.org/en/docs/Web/JavaScript/Guide/Regular_Expressions
.replace(/[.*+?^${}()|[\]\\]/g, "\\$&")
// Split into array of "words"
.split(/\s+/)
// Combine the "words" building our regular expression string
.join(regexBetween),
// Create the regular expression from the string (non-case-sensitive 'i')
regexObject = new RegExp(regexStart + regexString, 'i'),
// Filter the input array testing for matches against the regular expression.
resultsList = testList.filter(function(item) {
return regexObject.test(item);
});
// Ouput the array into the results text area, one per line.
document.getElementById('output').value = resultsList.join('\n') + '\n===the end===';
}
<form id="myform" onsubmit="doSubmit(); return false;">
<input type="text" id="myinput" value="" />
<input type="submit" name="submit" />
</form>
<textarea id="output" rows="5" cols="30">
</textarea>

Related

Regex expression to get numbers without parentheses ()

I'm trying to create a regex that will select the numbers/numbers with commas(if easier, can trim commas later) that do not have a parentheses after and not the numbers inside the parentheses should not be selected either.
Used with the JavaScript's String.match method
Example strings
9(296,178),5,3(123),10
10,9(296,178),2,5,3(123),3(124,125)
10,7,5(296,293,444,1255),3(218),2,4
What i have so far:
/((^\d+[^\(])|(,\d+,)|(,*\d+$))/gm
I tried this in regex101 and underlined the numbers i would like to match and x on the one that should not.
You could start with a substitution to remove all the unwanted parts:
/\d*\(.*?\),?//gm
Demo
This leaves you with
5,10
10,2,5,
10,7,2,4
which makes the matching pretty straight forward:
/(\d+)/gm
If you want it as a single match expression you could use a negative lookbehind:
/(?<!\([\d,]*)(\d+)(?:,|$)/gm
Demo - and here's the same matching expression as a runnable javascript (skeleton code borrowed from Wiktor's answer):
const text = `9(296,178),5,3(123),10
10,9(296,178),2,5,3(123),3(124,125)
10,7,5(296,293,444,1255),3(218),2,4`;
const matches = Array.from(text.matchAll(/(?<!\([\d,]*)(\d+)(?:,|$)/gm), x=>x[1])
console.log(matches);
Here, I'd recommend the so-called "best regex trick ever": just match what you do not need (negative contexts) and then match and capture what you need, and grab the captured items only.
If you want to match integer numbers that are not matched with \d+\([^()]*\) pattern (a number followed with a parenthetical substring), you can match this pattern or match and capture the \d+, one or more digit matching pattern, and then simply grab Group 1 values from matches:
const text = `9(296,178),5,3(123),10
10,9(296,178),2,5,3(123),3(124,125)
10,7,5(296,293,444,1255),3(218),2,4`;
const matches = Array.from(text.matchAll(/\d+\([^()]*\)|(\d+)/g), x=> x[1] ?? "").filter(Boolean)
console.log(matches);
Details:
text.matchAll(/\d+\([^()]*\)|(\d+)/g) - matches one or more digits (\d+) + ( (with \() + any zero or more chars other than ( and ) (with [^()]*) + \) (see \)), or (|) one or more digits captured into Group 1 ((\d+))
Array.from(..., x=> x[1] ?? "") - gets Group 1 value, or, if not assigned, just adds an empty string
.filter(Boolean) - removes empty strings.
Using several replacement regexes
var textA = `9(296,178),5,3(123),10
10,9(296,178),2,5,3(123),3(124,125)
10,7,5(296,293,444,1255),3(218),2,4
`
console.log('A', textA)
var textB = textA.replace(/\(.*?\),?/g, ';')
console.log('B', textB)
var textC = textB.replace(/^\d+|\d+$|\d*;\d*/gm, '')
console.log('C', textC)
var textD = textC.replace(/,+/g, ' ').trim(',')
console.log('D', textD)
With a loop
Here is a solution which splits the lines on comma and loops over the pieces:
var inside = false;
var result = [];
`9(296,178),5,3(123),10
10,9(296,178),2,5,3(123),3(124,125)
10,7,5(296,293,444,1255),3(218),2,4
`.split("\n").map(line => {
let pieceArray = line.split(",")
pieceArray.forEach((piece, k) => {
if (piece.includes('(')) {
inside = true
} else if (piece.includes(')')) {
inside = false
} else if (!inside && k > 0 && k < pieceArray.length-1 && !pieceArray[k-1].includes(')')) {
result.push(piece)
}
})
})
console.log(result)
It does print the expected result: ["5", "7"]

Javascript: GUID: RegEx: string to GUID

I have a textbox that a user can paste into using Ctrl+V. I would like to restrict the textbox to accept just GUIDs. I tried to write a small function that would format an input string to a GUID based on RegEx, but I can't seem to be able to do it. I tried following the below post:
Javascript string to Guid
function stringToGUID()
{
var strInput = 'b6b954d9cbac4b18b0d5a0f725695f1ca98d64e456f76';
var strOutput = strInput.replace(/([0-f]{8})([0-f]{4})([0-f]{4})([0-f]{4})([0-f]{12})/,"$1-$2-$3-$4-$5");
console.log(strOutput );
//from my understanding, the input string could be any sequence of 0-9 or a-f of any length and a valid giud patterened string would be the result in the above code. This doesn't seem to be the case;
//I would like to extract first 32 characters; how do I do that?
}
I suggest that you remove the dashes, truncate to 32 characters, and then test if the remaining characters are valid before inserting the dashes:
function stringToGUID()
{
var input = 'b6b954d9cbac4b18b0d5a0f725695f1ca98d64e456f76';
let g = input.replace("-", "");
g = g.substring(0, 32);
if (/^[0-9A-F]{32}$/i.test(g)) {
g = g.replace(/(.{8})(.{4})(.{4})(.{4})(.{12})/, "$1-$2-$3-$4-$5");
}
console.log(g);
}
stringToGUID();
(The i at the end of the regex makes it case-insensitive.)
You are already matching 32 characters with the pattern, so there is no need to get a separate operation to get 32 characters to test against.
You can replace all the hyphens with an empty string, and then match the pattern from the start of the string using ^
Then first check if there is a match, and if there is do the replacement with the 5 groups and hyphens in between. If there is not match, return the original string.
The function stringToGUID() by itself does not do anything except log a string that is hardcoded in the function. To extend its functionality, you can pass a parameter.
function stringToGUID(s) {
const regex = /^([0-f]{8})([0-f]{4})([0-f]{4})([0-f]{4})([0-f]{12})/;
const m = s.replace(/-+/g, '').match(regex);
return m ? `${m[1]}-${m[2]}-${m[3]}-${m[4]}-${m[5]}` : s;
}
[
'b6b954d9cbac4b18b0d5a0f725695f1ca98d64e456f76',
'b6b954d9-cbac-4b18-b0d5-a0f725695f1c',
'----54d9cbac4b18b0d5a0f725695f1ca98d64e456f76',
'!##$%'
].forEach(s => {
console.log(stringToGUID(s));
});

Is there a javascript method to recognize a string even if the words are out of order? [duplicate]

I would like to find all the matches of given strings (divided by spaces) in a string.
(The way for example, iTunes search box works).
That, for example, both "ab de" and "de ab" will return true on "abcde" (also "bc e a" or any order should return true)
If I replace the white space with a wild card, "ab*de" would return true on "abcde", but not "de*ab".
[I use * and not Regex syntax just for this explanation]
I could not find any pure Regex solution for that.
The only solution I could think of is spliting the search term and run multiple Regex.
Is it possible to find a pure Regex expression that will cover all these options ?
Returns true when all parts (divided by , or ' ') of a searchString occur in text. Otherwise false is returned.
filter(text, searchString) {
const regexStr = '(?=.*' + searchString.split(/\,|\s/).join(')(?=.*') + ')';
const searchRegEx = new RegExp(regexStr, 'gi');
return text.match(searchRegEx) !== null;
}
I'm pretty sure you could come up with a regex to do what you want, but it may not be the most efficient approach.
For example, the regex pattern (?=.*bc)(?=.*e)(?=.*a) will match any string that contains bc, e, and a.
var isMatch = 'abcde'.match(/(?=.*bc)(?=.*e)(?=.*a)/) != null; // equals true
var isMatch = 'bcde'.match(/(?=.*bc)(?=.*e)(?=.*a)/) != null; // equals false
You could write a function to dynamically create an expression based on your search terms, but whether it's the best way to accomplish what you are doing is another question.
Alternations are order insensitive:
"abcde".match(/(ab|de)/g); // => ['ab', 'de']
"abcde".match(/(de|ab)/g); // => ['ab', 'de']
So if you have a list of words to match you can build a regex with an alternation on the fly like so:
function regexForWordList(words) {
return new RegExp('(' + words.join('|') + ')', 'g');
}
'abcde'.match(['a', 'e']); // => ['a', 'e']
Try this:
var str = "your string";
str = str.split( " " );
for( var i = 0 ; i < str.length ; i++ ){
// your regexp match
}
This is script which I use - it works also with single word searchStrings
var what="test string with search cool word";
var searchString="search word";
var search = new RegExp(searchString, "gi"); // one-word searching
// multiple search words
if(searchString.indexOf(' ') != -1) {
search="";
var words=searchString.split(" ");
for(var i = 0; i < words.length; i++) {
search+="(?=.*" + words[i] + ")";
}
search = new RegExp(search + ".+", "gi");
}
if(search.test(what)) {
// found
} else {
// notfound
}
I assume you are matching words, or parts of words. You want space-separated search terms to limit search results, and it seems you intend to return only those entries which have all the words that the user supplies. And you intend a wildcard character * to stand for 0 or more characters in a matching word.
For example, if the user searches for the words term1 term2, you intend to return only those items which have both words term1 and term2. If the user searches for the word term*, it would match any word beginning with term.
There are suitable regular expressions which are equivalent to this search language and can be generated from it.
A simple example, the word term, can be asserted in regex by converting to \bterm\b. But two or more words which must match in any order require lookahead assertions. Using extended syntax, the equivalent regex is:
(?= .* \b term1 \b )
(?= .* \b term2 \b )
The asterisk wildcard can be asserted in regex with a character class followed by asterisk. The character class identifies which letters you consider to be part of word. For example, you might find that [A-Za-z0-9]* fits the bill.
In short, you might be satisfied if you convert an expression such as:
foo ba* quux
to:
(?= .* \b foo \b )
(?= .* \b ba[A-Za-z0-9]* \b )
(?= .* \b quux \b )
That is a simple matter of search and replace. But do be careful to sanitize the input string to avoid injection attacks by removing punctuation, etc.
I think you may be barking up the wrong tree with RegEx. What you might want to look at is the Levenshtein distance of two input strings.
There's a Javascript implementation here and a usage example here.

Dynamically split string using regex

I am receiving an Australian phone number from the user as a text input. The string will be 10 characters long and begin with 04. I want to split the string as the user is entering it so it turns out like 0411 111 111.
My current solution is value.toString().replace(/^(04\d{2})(\d{3})(\d{3})$/, $1 $2 $3)
This solution splits the string correctly, but only when the user has entered the entire 10 characters. I want it to start splitting after the first 4 characters have been entered ie 0411 1 etc.
Here is a one liner which will work for your exact use case:
var results = "0411111111".split(/(?=\d{6}$)|(?=\d{3}$)/);
console.log(results);
We may split your string on a regex which targets the point after 4 digits and the point after 7 digits.
Consider something like below that checks the length of the currently input mobile number and then applies a different regex depending on the length:
var mobileInput = document.getElementById('mobile');
mobileInput.addEventListener('keyup', foo);
function foo() {
var unformatted = mobileInput.value;
var pattern, replacer;
if (unformatted.length < 5) {
pattern = /(04\d{2})/;
replacer = '$1 ';
} else if (unformatted.length < 9) {
pattern = /(04\d{2})\s{1}(\d{3})/;
replacer = '$1 $2 ';
} else {
pattern = /^(04\d{2})(\d{3})(\d{3})$/;
replacer = '$1 $2 $3';
}
var formatted = unformatted.replace(pattern, replacer);
mobileInput.value = formatted;
}
<input type="text" id="mobile" />
I have managed to come up with a bit of a solution. It is not exactly what I was aiming for, but it does the job.
value.toString()
.replace(/^(04\d{2})(\d{3})(\d{3})$/, $1 $2 $3)
.replace(/[\s-]+$/, "")
This strips out the white space on each keypress (each time the regex is called) and reformats it.
here is my solution:
remove spaces added by before (to recover phone numbers from being splitted)
try to match input using a regex
combine matches and handle some other situations
code:
document.getElementById("phone").addEventListener("input", function() {
var matches = this.value.replace(/ /g, "").match(/^(04\d{2})(\d{3})?(\d{3})?(\d*?)$/);
this.value = matches && matches.length > 2 ?
matches.slice(1, matches.length - 1).join(" ")
+ (matches[matches.length - 1] || "")
: this.value;
});
<input id="phone" maxlength="12">
I'd probably do something like this:
let phone = document.getElementById('phone');
phone.addEventListener('keyup', evt => {
// get value, removing anything that isn't a number
let text = phone.value.replace(/\D/g, '');
// turn it into an array
text = text.split('');
// create a new array containing each group of digits, separated by spaces
let out = [...text.slice(0, 4), ' ', ...text.slice(4, 7), ' ', ...text.slice(7, 10)];
// turn it back into a string, remove any trailing spaces
phone.value = out.join('').trim();
}, false);
<input id="phone">

How to ban words with diacritics using a blacklist array and regex?

I have an input of type text where I return true or false depending on a list of banned words. Everything works fine. My problem is that I don't know how to check against words with diacritics from the array:
var bannedWords = ["bad", "mad", "testing", "băţ"];
var regex = new RegExp('\\b' + bannedWords.join("\\b|\\b") + '\\b', 'i');
$(function () {
$("input").on("change", function () {
var valid = !regex.test(this.value);
alert(valid);
});
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<input type='text' name='word_to_check'>
Now on the word băţ it returns true instead of false for example.
Chiu's comment is right: 'aaáaa'.match(/\b.+?\b/g) yelds quite counter-intuitive [ "aa", "á", "aa" ], because "word character" (\w) in JavaScript regular expressions is just a shorthand for [A-Za-z0-9_] ('case-insensitive-alpha-numeric-and-underscore'), so word boundary (\b) matches any place between chunk of alpha-numerics and any other character. This makes extracting "Unicode words" quite hard.
For non-unicase writing systems it is possible to identify "word character" by its dual nature: ch.toUpperCase() != ch.toLowerCase(), so your altered snippet could look like this:
var bannedWords = ["bad", "mad", "testing", "băţ", "bať"];
var bannedWordsRegex = new RegExp('-' + bannedWords.join("-|-") + '-', 'i');
$(function() {
$("input").on("input", function() {
var invalid = bannedWordsRegex.test(dashPaddedWords(this.value));
$('#log').html(invalid ? 'bad' : 'good');
});
$("input").trigger("input").focus();
function dashPaddedWords(str) {
return '-' + str.replace(/./g, wordCharOrDash) + '-';
};
function wordCharOrDash(ch) {
return isWordChar(ch) ? ch : '-'
};
function isWordChar(ch) {
return ch.toUpperCase() != ch.toLowerCase();
};
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<input type='text' name='word_to_check' value="ba">
<p id="log"></p>
Let's see what's going on:
alert("băţ".match(/\w\b/));
This is [ "b" ] because word boundary \b doesn't recognize word characters beyond ASCII. JavaScript's "word characters" are strictly [0-9A-Z_a-z], so aä, pπ, and zƶ match \w\b\W since they contain a word character, a word boundary, and a non-word character.
I think the best you can do is something like this:
var bound = '[^\\w\u00c0-\u02c1\u037f-\u0587\u1e00-\u1ffe]';
var regex = new RegExp('(?:^|' + bound + ')(?:'
+ bannedWords.join('|')
+ ')(?=' + bound + '|$)', 'i');
where bound is a reversed list of all ASCII word characters plus most Latin-esque letters, used with start/end of line markers to approximate an internationalized \b. (The second of which is a zero-width lookahead that better mimics \b and therefore works well with the g regex flag.)
Given ["bad", "mad", "testing", "băţ"], this becomes:
/(?:^|[^\w\u00c0-\u02c1\u037f-\u0587\u1e00-\u1ffe])(?:bad|mad|testing|băţ)(?=[^\w\u00c0-\u02c1\u037f-\u0587\u1e00-\u1ffe]|$)/i
This doesn't need anything like ….join('\\b|\\b')… because there are parentheses around the list (and that would create things like \b(?:hey\b|\byou)\b, which is akin to \bhey\b\b|\b\byou\b, including the nonsensical \b\b – which JavaScript interprets as merely \b).
You can also use var bound = '[\\s!-/:-#[-`{-~]' for a simpler ASCII-only list of acceptable non-word characters. Be careful about that order! The dashes indicate ranges between characters.
You need a Unicode aware word boundary. The easiest way is to use XRegExp package.
Although its \b is still ASCII based, there is a \p{L} (or a shorter pL version) construct that matches any Unicode letter from the BMP plane. To build a custom word boundary using this contruct is easy:
\b word \b
---------------------------------------
| | |
([^\pL0-9_]|^) word (?=[^\pL0-9_]|$)
The leading word boundary can be represented with a (non)capturing group ([^\pL0-9_]|^) that matches (and consumes) either a character other than a Unicode letter from the BMP plane, a digit and _ or a start of the string before the word.
The trailing word boundary can be represented with a positive lookahead (?=[^\pL0-9_]|$) that requires a character other than a Unicode letter from the BMP plane, a digit and _ or the end of string after the word.
See the snippet below that will detect băţ as a banned word, and băţy as an allowed word.
var bannedWords = ["bad", "mad", "testing", "băţ"];
var regex = new XRegExp('(?:^|[^\\pL0-9_])(?:' + bannedWords.join("|") + ')(?=$|[^\\pL0-9_])', 'i');
$(function () {
$("input").on("change", function () {
var valid = !regex.test(this.value);
//alert(valid);
console.log("The word is", valid ? "allowed" : "banned");
});
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.min.js"></script>
<input type='text' name='word_to_check'>
In stead of using word boundary, you could do it with
(?:[^\w\u0080-\u02af]+|^)
to check for start of word, and
(?=[^\w\u0080-\u02af]|$)
to check for the end of it.
The [^\w\u0080-\u02af] matches any characters not (^) being basic Latin word characters - \w - or the Unicode 1_Supplement, Extended-A, Extended-B and Extensions. This include some punctuation, but would get very long to match just letters. It may also have to be extended if other character sets have to be included. See for example Wikipedia.
Since javascript doesn't support look-behinds, the start-of-word test consumes any before mentioned non-word characters, but I don't think that should be a problem. The important thing is that the end-of-word test doesn't.
Also, putting these test outside a non capturing group that alternates the words, makes it significantly more effective.
var bannedWords = ["bad", "mad", "testing", "băţ", "båt", "süß"],
regex = new RegExp('(?:[^\\w\\u00c0-\\u02af]+|^)(?:' + bannedWords.join("|") + ')(?=[^\\w\\u00c0-\\u02af]|$)', 'i');
function myFunction() {
document.getElementById('result').innerHTML = 'Banned = ' + regex.test(document.getElementById('word_to_check').value);
}
<!DOCTYPE html>
<html>
<body>
Enter word: <input type='text' id='word_to_check'>
<button onclick='myFunction()'>Test</button>
<p id='result'></p>
</body>
</html>
When dealing with characters outside my base set (which can show up at any time), I convert them to an appropriate base equivalent (8bit, 16bit, 32bit). before running any character matching over them.
var bannedWords = ["bad", "mad", "testing", "băţ"];
var bannedWordsBits = {};
bannedWords.forEach(function(word){
bannedWordsBits[word] = "";
for (var i = 0; i < word.length; i++){
bannedWordsBits[word] += word.charCodeAt(i).toString(16) + "-";
}
});
var bannedWordsJoin = []
var keys = Object.keys(bannedWordsBits);
keys.forEach(function(key){
bannedWordsJoin.push(bannedWordsBits[key]);
});
var regex = new RegExp(bannedWordsJoin.join("|"), 'i');
function checkword(word) {
var wordBits = "";
for (var i = 0; i < word.length; i++){
wordBits += word.charCodeAt(i).toString(16) + "-";
}
return !regex.test(wordBits);
};
The separator "-" is there to make sure that unique characters don't bleed together creating undesired matches.
Very useful as it brings all the characters down to a common base that everything can interact with. And this can be re-encoded back to it's original without having to ship it in key/value pair.
For me the best thing about it is that I don't have to know all of the rules for all of the character sets that I might intersect with, because I can pull them all into a common playing field.
As a side note:
To speed things up, rather than passing the large regex statement that you probably have, which takes exponentially longer to pass with the length of the words that you're banning, I would pass each separate word in the sentence through the filter. And break the filter up into length based segments. like;
checkword3Chars();
checkword4Chars();
checkword5chars();
who's functions you can generate systematically and even create on the fly as and when they become required.

Categories