Default behavior after a loop - javascript

I have a bad word that contains illegal letters. I have a list of legal letters (which can be more than one character long). The following nested loop iterates over both the characters in the word and the legal letters, replacing any illegal letter with null.
In order to both retain the legal letters and substitute illegal ones, it's important that all legal letters are looped through. The illegal substitution would happen once that loop finishes.
// acceptable letters:
const legalLetters = ["ND", "CH", "S"]
// bad, evil word containing unacceptable letters:
let word = "SANDWICH"
const filteredLetters = []
while (word.length > 0) {
for (const letter of legalLetters) {
if (word.startsWith(letter)) {
// remove that many letters from the start of the word
word = word.slice(letter.length)
filteredLetters.push(letter)
// break back to the while loop to re-scan the truncated word
break
}
} else {
// this is the part I'm having trouble with
// if the word does not start with an acceptable letter, remove that letter
word = word.slice(1)
filteredLetters.push(null)
}
// some filteredLetter was added and the length of the word has been reduced
// repeat until the word is all gone
}
console.log(filteredLetters) // should be ["S", null, "ND", null, null, "CH"]
In the above example I've used Python's for ... else construct, which executes the code in the else block only if there was no break in the for block. Such a syntax does not exist in Javascript (and therefore the snippet above is nonsense).
How would I go about creating this 'default behaviour' in Javascript?
Lodash and co. answers are okay for my purposes, and this may well be an XY problem, so I welcome any restructuring advice.
Related question: For else loop in Javascript?
The answers to this question recommend either setting a boolean flag or breaking to a label. I'd prefer to avoid these approaches, if possible - the flag approach feels messy, creating an unnecessary variable, and the label approach just doesn't feel quite right.

I hope this can help you.
// acceptable letters:
const legalLetters = ["ND", "CH", "S"]
// bad, evil word containing unacceptable letters:
let word = "SANDWICH"
const filteredLetters = []
while (word.length > 0) {
let pushedItem = null;
for (const letter of legalLetters) {
if (word.startsWith(letter)) {
pushedItem = letter;
break
}
}
word = word.slice(pushedItem ? pushedItem.length : 1)
filteredLetters.push(pushedItem)
}
console.log(filteredLetters)

Related

Shorter more efficient way to tackle this codewars challenge?

I'm fairly new to JavaScript (and development in general). I wanted to try a challenge from Codewars. The challenge was to process a string through a function that would flip any words that were over 5 characters and return the original string with those flipped words. Here's the code I came up with (It did work!).
//this function turns each word into an array that will get flipped.
let wordFlipper = (word) => {
var splitWord = word.split(''); //convert word to array
var reversedWord = splitWord.reverse(); //flips the indexes for the array
var joinReversedWord = reversedWord.join('').toString(); //turns array back to a string.
return joinReversedWord;
}
function spinWords(phrase){
let finalArray = [];
let wordsToArray = phrase.split(' ');
const processFlipWords = wordsToArray.forEach(word => {
if (word.toString().length > 4) {
var flippedWord = wordFlipper(word); //here's where we call the function wordFlipper()
finalArray.push(flippedWord);
}
else {
finalArray.push(word);
}
});
return finalArray.join(' ');
}
How would you experts suggest writing this? I'm sure I'm not being too efficient at writing this code.
Thank you!
Here's what it looks like inside codewars!
I'd use a regular expression to match 5 or more word characters (\w{5,}), and have a replacer function (String.replace()) return the reversed (reverse()) word:
const spinWords = phrase => phrase.replace(
/\w{5,}/g,
word => [...word].reverse().join('')
);
console.log(spinWords('foo barbar more words go here'));
\w matches a word character - something from A to Z, case-insensitive, or a digit, or an underscore.
The brackets indicates the number of times to repeat the previous token. {5,} starts with a 5 (so, "at least 5") and has nothing after the comma ("up to any number").
Then /g, the global flag, matches and replaces every substring that matches this pattern, not just the first.
The callback function runs for every matched substring, where the argument is the matched substring, and what is returned gets replaced at that point in the original string.

is there a way for the content.replace to sort of split them into more words than these?

const filter = ["bad1", "bad2"];
client.on("message", message => {
var content = message.content;
var stringToCheck = content.replace(/\s+/g, '').toLowerCase();
for (var i = 0; i < filter.length; i++) {
if (content.includes(filter[i])){
message.delete();
break
}
}
});
So my code above is a discord bot that deletes the words when someone writes ''bad1'' ''bad2''
(some more filtered bad words that i'm gonna add) and luckily no errors whatsoever.
But right now the bot only deletes these words when written in small letters without spaces in-between or special characters.
I think i have found a solution but i can't seem to put it into my code, i mean i tried different ways but it either deleted lowercase words or didn't react at all and instead i got errors like ''cannot read property of undefined'' etc.
var badWords = [
'bannedWord1',
'bannedWord2',
'bannedWord3',
'bannedWord4'
];
bot.on('message', message => {
var words = message.content.toLowerCase().trim().match(/\w+|\s+|[^\s\w]+/g);
var containsBadWord = words.some(word => {
return badWords.includes(word);
});
This is what i am looking at. the var words line. specifically (/\w+|\s+|[^\s\w]+/g);.
Anyway to implement that into my const filter code (top/above) or a different approach?
Thanks in advance.
Well, I'm not sure what you're trying to do with .match(/\w+|\s+|[^\s\w]+/g). That's some unnecessary regex just to get an array of words and spaces. And it won't even work if someone were to split their bad word into something like "t h i s".
If you want your filter to be case insensitive and account for spaces/special characters, a better solution would probably require more than one regex, and separate checks for the split letters and the normal bad word check. And you need to make sure your split letters check is accurate, otherwise something like "wash it" might be considered a bad word despite the space between the words.
A Solution
So here's a possible solution. Note that it is just a solution, and is far from the only solution. I'm just going to use hard-coded string examples instead of message.content, to allow this to be in a working snippet:
//Our array of bad words
var badWords = [
'bannedWord1',
'bannedWord2',
'bannedWord3',
'bannedWord4'
];
//A function that tests if a given string contains a bad word
function testProfanity(string) {
//Removes all non-letter, non-digit, and non-space chars
var normalString = string.replace(/[^a-zA-Z0-9 ]/g, "");
//Replaces all non-letter, non-digit chars with spaces
var spacerString = string.replace(/[^a-zA-Z0-9]/g, " ");
//Checks if a condition is true for at least one element in badWords
return badWords.some(swear => {
//Removes any non-letter, non-digit chars from the bad word (for normal)
var filtered = swear.replace(/\W/g, "");
//Splits the bad word into a 's p a c e d' word (for spaced)
var spaced = filtered.split("").join(" ");
//Two different regexes for normal and spaced bad word checks
var checks = {
spaced: new RegExp(`\\b${spaced}\\b`, "gi"),
normal: new RegExp(`\\b${filtered}\\b`, "gi")
};
//If the normal or spaced checks are true in the string, return true
//so that '.some()' will return true for satisfying the condition
return spacerString.match(checks.spaced) || normalString.match(checks.normal);
});
}
var result;
//Includes one banned word; expected result: true
var test1 = "I am a bannedWord1";
result = testProfanity(test1);
console.log(result);
//Includes one banned word; expected result: true
var test2 = "I am a b a N_N e d w o r d 2";
result = testProfanity(test2);
console.log(result);
//Includes one banned word; expected result: true
var test3 = "A bann_eD%word4, I am";
result = testProfanity(test3);
console.log(result);
//Includes no banned words; expected result: false
var test4 = "No banned words here";
result = testProfanity(test4);
console.log(result);
//This is a tricky one. 'bannedWord2' is technically present in this string,
//but is 'bannedWord22' really the same? This prevents something like
//"wash it" from being labeled a bad word; expected result: false
var test5 = "Banned word 22 isn't technically on the list of bad words...";
result = testProfanity(test5);
console.log(result);
I've commented each line thoroughly, such that you understand what I am doing in each line. And here it is again, without the comments or testing parts:
var badWords = [
'bannedWord1',
'bannedWord2',
'bannedWord3',
'bannedWord4'
];
function testProfanity(string) {
var normalString = string.replace(/[^a-zA-Z0-9 ]/g, "");
var spacerString = string.replace(/[^a-zA-Z0-9]/g, " ");
return badWords.some(swear => {
var filtered = swear.replace(/\W/g, "");
var spaced = filtered.split("").join(" ");
var checks = {
spaced: new RegExp(`\\b${spaced}\\b`, "gi"),
normal: new RegExp(`\\b${filtered}\\b`, "gi")
};
return spacerString.match(checks.spaced) || normalString.match(checks.normal);
});
}
Explanation
As you can see, this filter is able to deal with all sorts of punctuation, capitalization, and even single spaces/symbols in between the letters of a bad word. However, note that in order to avoid the "wash it" scenario I described (potentially resulting in the unintentional deletion of a clean message), I made it so that something like "bannedWord22" would not be treated the same as "bannedWord2". If you want it to do the opposite (therefore treating "bannedWord22" the same as "bannedWord2"), you must remove both of the \\b phrases in the normal check's regex.
I will also explain the regex, such that you fully understand what is going on here:
[^a-zA-Z0-9 ] means "select any character not in the ranges of a-z, A-Z, 0-9, or space" (meaning all characters not in those specified ranges will be replaced with an empty string, essentially removing them from the string).
\W means "select any character that is not a word character", where "word character" refers to the characters in ranges a-z, A-Z, 0-9, and underscore.
\b means "word boundary", essentially indicating when a word starts or stops. This includes spaces, the beginning of a line, and the end of a line. \b is escaped with an additional \ (to become \\b) in order to prevent javascript from confusing the regex token with strings' escape sequences.
The flags g and i used in both of the regex checks indicate "global" and "case-insensitive", respectively.
Of course, to get this working with your discord bot, all you have to do in your message handler is something like this (and be sure to replace badWords with your filter variable in testProfanity()):
if (testProfanity(message.content)) return message.delete();
If you want to learn more about regex, or if you want to mess around with it and/or test it out, this is a great resource for doing so.

Non-backtracking regex hangs node on strings with newlines

I have no idea why this hangs the javascript engine but it does. Anyone else have a clue?
function isEnglish(text) {
const checker = /^(\p{Emoji}|\p{ASCII})+$/u;
return !!checker.exec(text.replace(/\\n/g, ""));
}
text = `
RT #PROMOSIGROUP: FOLL TWITTER
3K:20rb
5k:30rb
10K:50rb
Foll IG aktif WW
100F:15rb
500F:50rb
1K:100rb
Jual Akun Twitter+IG
081327927525/…`
isEnglish(text);
Ok, figured it out, the "…" character causes the regex engine to spin. Anyone know why this might be?
It looks like you're isEnglish() test is supposed to return true when the source text consists solely of:
US-ASCII characters,
Emoji (not sure why this would count as "english", but whatever), and
Punctuation
and false otherwise.
I might point out that US-ASCII covers U+0000 to U+007F: that includes the C0 Control Characters (U+0000 to U+001F), as well as [DEL] (U+007F), none of which, save whitespace, are actual characters.
But, you're making a mountain out of a molehill: it will be much faster (and clearer) to just search for the first character that's not part of your desired alphabet:
function isEnglish(s) {
return !rxIsNonEnglishAlphabet.test(s);
}
// -------------------------------------------------------
// this regular expression matches characters that are NOT
// * Whitespace
// * US-ASCII (u+0000 through U+007F)
// * Emoji
// * Punctuation
// -------------------------------------------------------
const rxIsNonEnglishAlphabet = /[^\s\p{ASCII}\p{Emoji}\p{Punctuation}]/u;
It turns out that my regex turns into a big backtracking bug even though there isn't anything obvious to me that would cause that. My function now looks much different to get it to work:
function isEnglish(text) {
const ascii = /\p{ASCII}/ug;
const emoji = /\p{Emoji}/ug;
const punct = /\p{Punctuation}/ug;
text = text.replace(ascii, "");
text = text.replace(emoji, "");
text = text.replace(punct, "");
return text.length === 0;
}

Check if each word is existing in database

Issue
I need to check if each word of a string is spelled correctly by searching a mongoDB collection for each word.
Doing a minimum amount of DB query
First word of each sentence must be in upper case, but this word could be upper or lower case in the dictionary. So I need a case sensitive match for each word. Only the first word of each sentence should be case insensitive.
Sample string
This is a simple example. Example. This is another example.
Dictionary structure
Assume there is a dictionary collection like this
{ word: 'this' },
{ word: 'is' },
{ word: 'a' },
{ word: 'example' },
{ word: 'Name' }
In my case, there are 100.000 words in this dictionary. Of course names are stored in upper case, verbs are stored lower case and so on...
Expected result
The words simple and another should be recognized as 'misspelled' word as they are not existing in the DB.
An array with all existing words should be in this case: ['This', 'is', 'a', 'example']. This is upper case as it is the first word of a sentence; in DB it is stored as lower case this.
My attempt so far (Updated)
const sentences = string.replace(/([.?!])\s*(?= [A-Z])/g, '$1|').split('|');
let search = [],
words = [],
existing,
missing;
sentences.forEach(sentence => {
const w = sentence.trim().replace(/[^a-zA-Z0-9äöüÄÖÜß ]/gi, '').split(' ');
w.forEach((word, index) => {
const regex = new RegExp(['^', word, '$'].join(''), index === 0 ? 'i' : '');
search.push(regex);
words.push(word);
});
});
existing = Dictionary.find({
word: { $in: search }
}).map(obj => obj.word);
missing = _.difference(words, existing);
Problem
The insensitive matches don't work properly: /^Example$/i will give me a result. But in existing there will go the original lowercase example, that means Example will go to missing-Array. So the case insensitive search is working as expected, but the result arrays have a missmatch. I don't know how to solve this.
Optimizing the code possible? As I'm using two forEach-loops and a difference...
This is how I would face this issue:
Use regex to get each word after space (including '.') in an array.
var words = para.match(/(.+?)(\b)/g); //this expression is not perfect but will work
Now add all words from your collection in an array by using find(). Lets say name of that array is wordsOfColl.
Now check if words are in the way you want or not
var prevWord= ""; //to check first word of sentence
words.forEach(function(word) {
if(wordsOfColl.toLowerCase().indexOf(word.toLowerCase()) !== -1) {
if(prevWord.replace(/\s/g, '') === '.') {
//this is first word of sentence
if(word[0] !== word[0].toUpperCase()) {
//not capital, so generate error
}
}
prevWord = word;
} else {
//not in collection, generate error
}
});
I haven't tested it so please let me know in comments if there's some issue. Or some requirement of yours I missed.
Update
As author of question suggested that he don't want to load whole collection on client, you can create a method on server which returns an array of words instead of giving access to client of collection.

How to split a string by a difference in character as delimiter?

What I'd like to achieve is splitting a string like this, i.e. the delimiters are the indexes where the character before that index is different from the character after that index:
"AAABBCCCCDEEE" -> ["AAA", "BB", "CCCC", "D", "EEE"]
I've been trying to make up a concise solution, but I ended up with this rather verbose code: http://jsfiddle.net/b39aM/1/.
var arr = [], // output
text = "AAABBCCCCDEEE", // input
current;
for(var i = 0; i < text.length; i++) {
var char = text[i];
if(char !== current) { // new letter
arr.push(char); // create new array element
current = char; // update current
} else { // current letter continued
arr[arr.length - 1] += char; // append letter to last element
}
}
It's naive and I don't like it:
I'm manually iterating over each character, and I'm appending to the array character by character
It's a little too long for the simple thing I want to achieve
I was thinking of using a regexp but I'm not sure what the regexp should be. Is it possible to define a regexp that means "one character and a different character following"?
Or more generally, is there a more elegant solution for achieving this splitting method?
Yes, you can use a regular expression:
"AAABBCCCCDEEE".match(/(.)\1*/g)
Here . will match any character and \1* will match any following characters that are the same as the formerly matched one. And with a global match you’ll get all matching sequences.

Categories