Check if each word is existing in database - javascript

Issue
I need to check if each word of a string is spelled correctly by searching a mongoDB collection for each word.
Doing a minimum amount of DB query
First word of each sentence must be in upper case, but this word could be upper or lower case in the dictionary. So I need a case sensitive match for each word. Only the first word of each sentence should be case insensitive.
Sample string
This is a simple example. Example. This is another example.
Dictionary structure
Assume there is a dictionary collection like this
{ word: 'this' },
{ word: 'is' },
{ word: 'a' },
{ word: 'example' },
{ word: 'Name' }
In my case, there are 100.000 words in this dictionary. Of course names are stored in upper case, verbs are stored lower case and so on...
Expected result
The words simple and another should be recognized as 'misspelled' word as they are not existing in the DB.
An array with all existing words should be in this case: ['This', 'is', 'a', 'example']. This is upper case as it is the first word of a sentence; in DB it is stored as lower case this.
My attempt so far (Updated)
const sentences = string.replace(/([.?!])\s*(?= [A-Z])/g, '$1|').split('|');
let search = [],
words = [],
existing,
missing;
sentences.forEach(sentence => {
const w = sentence.trim().replace(/[^a-zA-Z0-9äöüÄÖÜß ]/gi, '').split(' ');
w.forEach((word, index) => {
const regex = new RegExp(['^', word, '$'].join(''), index === 0 ? 'i' : '');
search.push(regex);
words.push(word);
});
});
existing = Dictionary.find({
word: { $in: search }
}).map(obj => obj.word);
missing = _.difference(words, existing);
Problem
The insensitive matches don't work properly: /^Example$/i will give me a result. But in existing there will go the original lowercase example, that means Example will go to missing-Array. So the case insensitive search is working as expected, but the result arrays have a missmatch. I don't know how to solve this.
Optimizing the code possible? As I'm using two forEach-loops and a difference...

This is how I would face this issue:
Use regex to get each word after space (including '.') in an array.
var words = para.match(/(.+?)(\b)/g); //this expression is not perfect but will work
Now add all words from your collection in an array by using find(). Lets say name of that array is wordsOfColl.
Now check if words are in the way you want or not
var prevWord= ""; //to check first word of sentence
words.forEach(function(word) {
if(wordsOfColl.toLowerCase().indexOf(word.toLowerCase()) !== -1) {
if(prevWord.replace(/\s/g, '') === '.') {
//this is first word of sentence
if(word[0] !== word[0].toUpperCase()) {
//not capital, so generate error
}
}
prevWord = word;
} else {
//not in collection, generate error
}
});
I haven't tested it so please let me know in comments if there's some issue. Or some requirement of yours I missed.
Update
As author of question suggested that he don't want to load whole collection on client, you can create a method on server which returns an array of words instead of giving access to client of collection.

Related

is there a way for the content.replace to sort of split them into more words than these?

const filter = ["bad1", "bad2"];
client.on("message", message => {
var content = message.content;
var stringToCheck = content.replace(/\s+/g, '').toLowerCase();
for (var i = 0; i < filter.length; i++) {
if (content.includes(filter[i])){
message.delete();
break
}
}
});
So my code above is a discord bot that deletes the words when someone writes ''bad1'' ''bad2''
(some more filtered bad words that i'm gonna add) and luckily no errors whatsoever.
But right now the bot only deletes these words when written in small letters without spaces in-between or special characters.
I think i have found a solution but i can't seem to put it into my code, i mean i tried different ways but it either deleted lowercase words or didn't react at all and instead i got errors like ''cannot read property of undefined'' etc.
var badWords = [
'bannedWord1',
'bannedWord2',
'bannedWord3',
'bannedWord4'
];
bot.on('message', message => {
var words = message.content.toLowerCase().trim().match(/\w+|\s+|[^\s\w]+/g);
var containsBadWord = words.some(word => {
return badWords.includes(word);
});
This is what i am looking at. the var words line. specifically (/\w+|\s+|[^\s\w]+/g);.
Anyway to implement that into my const filter code (top/above) or a different approach?
Thanks in advance.
Well, I'm not sure what you're trying to do with .match(/\w+|\s+|[^\s\w]+/g). That's some unnecessary regex just to get an array of words and spaces. And it won't even work if someone were to split their bad word into something like "t h i s".
If you want your filter to be case insensitive and account for spaces/special characters, a better solution would probably require more than one regex, and separate checks for the split letters and the normal bad word check. And you need to make sure your split letters check is accurate, otherwise something like "wash it" might be considered a bad word despite the space between the words.
A Solution
So here's a possible solution. Note that it is just a solution, and is far from the only solution. I'm just going to use hard-coded string examples instead of message.content, to allow this to be in a working snippet:
//Our array of bad words
var badWords = [
'bannedWord1',
'bannedWord2',
'bannedWord3',
'bannedWord4'
];
//A function that tests if a given string contains a bad word
function testProfanity(string) {
//Removes all non-letter, non-digit, and non-space chars
var normalString = string.replace(/[^a-zA-Z0-9 ]/g, "");
//Replaces all non-letter, non-digit chars with spaces
var spacerString = string.replace(/[^a-zA-Z0-9]/g, " ");
//Checks if a condition is true for at least one element in badWords
return badWords.some(swear => {
//Removes any non-letter, non-digit chars from the bad word (for normal)
var filtered = swear.replace(/\W/g, "");
//Splits the bad word into a 's p a c e d' word (for spaced)
var spaced = filtered.split("").join(" ");
//Two different regexes for normal and spaced bad word checks
var checks = {
spaced: new RegExp(`\\b${spaced}\\b`, "gi"),
normal: new RegExp(`\\b${filtered}\\b`, "gi")
};
//If the normal or spaced checks are true in the string, return true
//so that '.some()' will return true for satisfying the condition
return spacerString.match(checks.spaced) || normalString.match(checks.normal);
});
}
var result;
//Includes one banned word; expected result: true
var test1 = "I am a bannedWord1";
result = testProfanity(test1);
console.log(result);
//Includes one banned word; expected result: true
var test2 = "I am a b a N_N e d w o r d 2";
result = testProfanity(test2);
console.log(result);
//Includes one banned word; expected result: true
var test3 = "A bann_eD%word4, I am";
result = testProfanity(test3);
console.log(result);
//Includes no banned words; expected result: false
var test4 = "No banned words here";
result = testProfanity(test4);
console.log(result);
//This is a tricky one. 'bannedWord2' is technically present in this string,
//but is 'bannedWord22' really the same? This prevents something like
//"wash it" from being labeled a bad word; expected result: false
var test5 = "Banned word 22 isn't technically on the list of bad words...";
result = testProfanity(test5);
console.log(result);
I've commented each line thoroughly, such that you understand what I am doing in each line. And here it is again, without the comments or testing parts:
var badWords = [
'bannedWord1',
'bannedWord2',
'bannedWord3',
'bannedWord4'
];
function testProfanity(string) {
var normalString = string.replace(/[^a-zA-Z0-9 ]/g, "");
var spacerString = string.replace(/[^a-zA-Z0-9]/g, " ");
return badWords.some(swear => {
var filtered = swear.replace(/\W/g, "");
var spaced = filtered.split("").join(" ");
var checks = {
spaced: new RegExp(`\\b${spaced}\\b`, "gi"),
normal: new RegExp(`\\b${filtered}\\b`, "gi")
};
return spacerString.match(checks.spaced) || normalString.match(checks.normal);
});
}
Explanation
As you can see, this filter is able to deal with all sorts of punctuation, capitalization, and even single spaces/symbols in between the letters of a bad word. However, note that in order to avoid the "wash it" scenario I described (potentially resulting in the unintentional deletion of a clean message), I made it so that something like "bannedWord22" would not be treated the same as "bannedWord2". If you want it to do the opposite (therefore treating "bannedWord22" the same as "bannedWord2"), you must remove both of the \\b phrases in the normal check's regex.
I will also explain the regex, such that you fully understand what is going on here:
[^a-zA-Z0-9 ] means "select any character not in the ranges of a-z, A-Z, 0-9, or space" (meaning all characters not in those specified ranges will be replaced with an empty string, essentially removing them from the string).
\W means "select any character that is not a word character", where "word character" refers to the characters in ranges a-z, A-Z, 0-9, and underscore.
\b means "word boundary", essentially indicating when a word starts or stops. This includes spaces, the beginning of a line, and the end of a line. \b is escaped with an additional \ (to become \\b) in order to prevent javascript from confusing the regex token with strings' escape sequences.
The flags g and i used in both of the regex checks indicate "global" and "case-insensitive", respectively.
Of course, to get this working with your discord bot, all you have to do in your message handler is something like this (and be sure to replace badWords with your filter variable in testProfanity()):
if (testProfanity(message.content)) return message.delete();
If you want to learn more about regex, or if you want to mess around with it and/or test it out, this is a great resource for doing so.

Default behavior after a loop

I have a bad word that contains illegal letters. I have a list of legal letters (which can be more than one character long). The following nested loop iterates over both the characters in the word and the legal letters, replacing any illegal letter with null.
In order to both retain the legal letters and substitute illegal ones, it's important that all legal letters are looped through. The illegal substitution would happen once that loop finishes.
// acceptable letters:
const legalLetters = ["ND", "CH", "S"]
// bad, evil word containing unacceptable letters:
let word = "SANDWICH"
const filteredLetters = []
while (word.length > 0) {
for (const letter of legalLetters) {
if (word.startsWith(letter)) {
// remove that many letters from the start of the word
word = word.slice(letter.length)
filteredLetters.push(letter)
// break back to the while loop to re-scan the truncated word
break
}
} else {
// this is the part I'm having trouble with
// if the word does not start with an acceptable letter, remove that letter
word = word.slice(1)
filteredLetters.push(null)
}
// some filteredLetter was added and the length of the word has been reduced
// repeat until the word is all gone
}
console.log(filteredLetters) // should be ["S", null, "ND", null, null, "CH"]
In the above example I've used Python's for ... else construct, which executes the code in the else block only if there was no break in the for block. Such a syntax does not exist in Javascript (and therefore the snippet above is nonsense).
How would I go about creating this 'default behaviour' in Javascript?
Lodash and co. answers are okay for my purposes, and this may well be an XY problem, so I welcome any restructuring advice.
Related question: For else loop in Javascript?
The answers to this question recommend either setting a boolean flag or breaking to a label. I'd prefer to avoid these approaches, if possible - the flag approach feels messy, creating an unnecessary variable, and the label approach just doesn't feel quite right.
I hope this can help you.
// acceptable letters:
const legalLetters = ["ND", "CH", "S"]
// bad, evil word containing unacceptable letters:
let word = "SANDWICH"
const filteredLetters = []
while (word.length > 0) {
let pushedItem = null;
for (const letter of legalLetters) {
if (word.startsWith(letter)) {
pushedItem = letter;
break
}
}
word = word.slice(pushedItem ? pushedItem.length : 1)
filteredLetters.push(pushedItem)
}
console.log(filteredLetters)

How to search for letters in words using javascript?

Let's say i have the letters a,b,s,d (say)
I have 100s of words in an array.
I want to use js to search for a word containing all the letters, and only if all letters are met, then return that word.
How do i do it?
OK, so here's an expanded version of the code originally posted by user4703663. I wanted to wait until they had a chance to undelete their answer but they never did.
var words = ['absd', 'dfsd', 'dsfefe', 'dfdddr', 'dfsgbbgah', 'dfggr'];
var str = 'absd';
function find(words, str) {
// split the string into an array
str = str.split('');
// `filter` returns an array of array elements according
// to the specification in the callback when a new word
// is passed to it
return words.filter(function(word) {
// that callback says to take every element
// in the `str` array and see if it appears anywhere
// in the word. If it does, it's a match, and
// `filter` adds that word to the output array
return str.every(function(char) {
return word.includes(char);
});
});
}
const output = find(words, str); // [ "absd", "dfsgbbgah" ]
console.log(output);

How to compare a string with each string in a list in js?

The problem is,
Given a word and a list of possible anagrams, selects the correct sublist.
Given "listen" and a list of candidates like "enlists" "google"
"inlets" "banana"` the program should return a list containing
"inlets".
The test cases given are of type:
var anagram = require('./anagram');
describe('Anagram', function() {
xit("detects simple anagram",function() {
var subject = anagram("ant");
var matches = subject.matches(['tan', 'stand', 'at']);
expect(matches).toEqual(['tan']);
});
});
Here is what I have been thinking,
Take in the given word, split each character and sort it alphabetically
Take in the list, split it to words, take each word and split each character and sort it alphabetically
Compare the result of 1 with 2, if a string matches, return its original form.
But the problem is, I don't know how to begin, please help.
Yeah, your thinking is correct, here's how you can implement it:
var anagrams = function( input, wordlist ){
// sort the input word
var sortedinput = input.split('').sort().join('');
// filter the array by checking if...
return wordlist.filter( function( word ){
// ...the word after sorting matches the sorted input
return word.split('').sort().join('') == sortedinput;
});
}
anagrams( 'listen', ["enlists", "google", "inlets", "banana"] );
// ["inlets"]
http://jsfiddle.net/2kkvw4u5/
After splitting the current word into char array:
iterate over the list of words;
check the length of the array if it's equal with the compared word char array;
sort the compared word char array and the current word char array, and check each elements if are equal for all indexes in that array.

how to get array of sentences and words passing as parameter to a function in javascript?

I have a function say
var init = function(data){
all sentences: " I have to get all the sentences and return an array containing all sentences"
all words : "this method should return an array of words in ‘Data’ when there is no parameter passed in. Optionally, when there is a parameter passed in that is a number, return the words in the sentence indicated by the input parameter"
all reverse sentences: "this method is the same as all Sentences, except it should return the sentences in reverse order"
reverse words :" same as all words but in reverse order"
countWordsBeginningWith:" this method should return the amount of words that begin with an inputted string. The optional second parameter should determine what sentence the words come from when present."
thanks
I think what you need to do is decompose your argument (assuming it's a string) into an array. For instance:
function getAllWords(sentences) {
var result = sentences.split(' ');
return result;
}
var init = function(data){
var result = [];
result['words'] = getAllWords(data.text);
// result['sentences'] = getAllSentences(data.text);
// result['sentencesreversed'] = getReverseSentences(data.text);
// result['sentencewords'] = getReverseWords(data.text);
// result['beginswith'] = getWordsBeginningWith(data.text, data.beginswith);
return result;
}
var getIt = {
'beginswith': 't',
'text': 'This is stuff. I am a sentence. Stuff happens now.'
};
console.log(init(getIt));
This is a very simplistic answer, not taking into account for periods, commas, and other bits. But that's the general answer. Some for loops and/or RegEx's may occur after this point, buyer beware.
In Javascript you can pass any data structure as a function parameter.
First learn how to construct an array of strings to contain you word or sentences. Once you have accomplished this it will be straightforward.

Categories