Count number of words in string using JavaScript - javascript

I am trying to count the number of words in a given string using the following code:
var t = document.getElementById('MSO_ContentTable').textContent;
if (t == undefined) {
var total = document.getElementById('MSO_ContentTable').innerText;
} else {
var total = document.getElementById('MSO_ContentTable').textContent;
}
countTotal = cword(total);
function cword(w) {
var count = 0;
var words = w.split(" ");
for (i = 0; i < words.length; i++) {
// inner loop -- do the count
if (words[i] != "") {
count += 1;
}
}
return (count);
}
In that code I am getting data from a div tag and sending it to the cword() function for counting. Though the return value is different in IE and Firefox. Is there any change required in the regular expression? One thing that I show that both browser send same string there is a problem inside the cword() function.

[edit 2022, based on comment] Nowadays, one would not extend the native prototype this way. A way to extend the native protype without the danger of naming conflicts is to use the es20xx symbol. Here is an example of a wordcounter using that.
Old answer: you can use split and add a wordcounter to the String prototype:
if (!String.prototype.countWords) {
String.prototype.countWords = function() {
return this.length && this.split(/\s+\b/).length || 0;
};
}
console.log(`'this string has five words'.countWords() => ${
'this string has five words'.countWords()}`);
console.log(`'this string has five words ... and counting'.countWords() => ${
'this string has five words ... and counting'.countWords()}`);
console.log(`''.countWords() => ${''.countWords()}`);

I would prefer a RegEx only solution:
var str = "your long string with many words.";
var wordCount = str.match(/(\w+)/g).length;
alert(wordCount); //6
The regex is
\w+ between one and unlimited word characters
/g greedy - don't stop after the first match
The brackets create a group around every match. So the length of all matched groups should match the word count.

This is the best solution I've found:
function wordCount(str) {
var m = str.match(/[^\s]+/g)
return m ? m.length : 0;
}
This inverts whitespace selection, which is better than \w+ because it only matches the latin alphabet and _ (see http://www.ecma-international.org/ecma-262/5.1/#sec-15.10.2.6)
If you're not careful with whitespace matching you'll count empty strings, strings with leading and trailing whitespace, and all whitespace strings as matches while this solution handles strings like ' ', ' a\t\t!\r\n#$%() d ' correctly (if you define 'correct' as 0 and 4).

You can make a clever use of the replace() method although you are not replacing anything.
var str = "the very long text you have...";
var counter = 0;
// lets loop through the string and count the words
str.replace(/(\b+)/g,function (a) {
// for each word found increase the counter value by 1
counter++;
})
alert(counter);
the regex can be improved to exclude html tags for example

//Count words in a string or what appears as words :-)
function countWordsString(string){
var counter = 1;
// Change multiple spaces for one space
string=string.replace(/[\s]+/gim, ' ');
// Lets loop through the string and count the words
string.replace(/(\s+)/g, function (a) {
// For each word found increase the counter value by 1
counter++;
});
return counter;
}
var numberWords = countWordsString(string);

Related

How to get total sum of matches from a loop?

I'm trying to loop through an array to check whether any of the words in the array are in a body of text:
for(var i = 0; i < wordArray.length; i++ ) {
if(textBody.indexOf(wordArray[i]) >= 1) {
console.log("One or two words.");
// do something
}
else if (textBody.indexOf(wordArray[i]) >= 3) {
console.log("Three or more words.");
// do something
}
else {
console.log("No words match.");
// do something
}
}
where >= 1 and >= 3 are supposed to determine the number of matched words (although it might just be determining their index position in the array? As, in its current state it will console.log hundreds of duplicate strings from the if / else statement).
How do I set the if / else statement to do actions based off of the amount of matched words?
Any help would be greatly appreciated!
Try this:
for (var i = 0; i < wordArray.length; i++) {
var regex = new RegExp('\\b' + wordArray[i] + '\\b', 'ig');
var matches = textBody.match(regex);
var numberOfMatches = matches ? matches.length : 0;
console.log(wordArray[i] + ' found ' + numberOfMatches + " times");
}
indefOf will do partial matches. For example "This is a bust".indexOf("bus") would match even though that is probably not what you want. It is better to use a regular expression with the word boundry token \b to eliminate partial word matches. In the Regexp constructor you need to escape the slash so \b becomes \\b. The regex uses the i flag to ignore case and the g flag to find all matches. Replace the console.log line with your if/else logic based on the numberOfMatches variable.
UPDATE: Per your clarification you would change the above to
var numberOfMatches = 0;
for (var i = 0; i < wordArray.length; i++) {
var regex = new RegExp('\\b' + wordArray[i] + '\\b', 'ig');
var matches = textBody.match(regex);
numberOfMatches += matches ? matches.length : 0;
}
console.log(numberOfMatches);
indexOf() provides the index of the first match, not the number of matches. So currently you're testing first if it appears at index one, then at index three - not counting the number of matches.
I can think of a couple different approaches off the top of my head that would work, but I'm not going to write them for you because this sounds like school work. One would be to use match: see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match and Count number of matches of a regex in Javascript
If you're scared of using regex, or can't be assed to spend the time learning how they work, you could get the index of the match, and if it matches make a substring excluding the portion up to that match, and test if it matches again, while incrementing a counter. indexOf() will return -1 if no matches are found.
You can split text to words with regExp and than find all occurrences of your word in this way
var text = "word1, word2, word word word word3"
var allWords = text.split(/\b/);
var getOccurrenceCount = function(word, allWords) {
return allWords.reduce(function(count, nextWord) {
count += word == nextWord ? 1 : 0;
return count;
}, 0);
};
getOccurrenceCount("word", allWords);
This may help you:
You have to use .match instead of .indexOf (get the index of the first occurence inside the string)
var textBody = document.getElementById('inside').innerHTML;
var wordArray = ['check','test'];
for(var i = 0; i < wordArray.length; i++ ) {
var regex = new RegExp( wordArray[i], 'g' );
var wordCount = (textBody.match(regex) || []).length;
console.log(wordCount + " times the word ["+ wordArray[i] +"]");
}
<body>
<p id="inside">
this is your test, check the test, how many test words check
<p>
</body>
I would first put the array into a hashmap, something like
_.each(array, function(a){map[a]=1})
Second split string into array by space and marks.
Loop through the new array to check if the word exist in the first map.
Make sure to compare string/words without cases.
This approach will help you improve the run time efficiency to linear.
Yes .indexOf gives you the first position of the word in the string. Many methods available to count a word in a string, I'm sharing my crazy version :
function matchesCount(word, str) {
return (' ' + str.replace(/[^A-Za-z]+/gi,' ') + ' ')
.split(' '+word+' ').length - 1;
}
console.log(matchesCount('test', 'A test to test how many test in this'));

Checking if combination of any amount of strings exists

I'm solving a puzzle and I have an idea of how to solve this problem, but I would like some guidance and hints.
Suppose I have the following, Given n amount of words to input, and m amount of word combos without spaces, I will have some functionality as the following.
4
this
is
my
dog
5
thisis // outputs 1
thisisacat // 0, since a or cat wasnt in the four words
thisisaduck // 0, no a or cat
thisismy // 1 this,is,my is amoung the four words
thisismydog // 1
My thoughts
First What I was thinking of doing is storing those first words into an array. After that, I check if any of those words is the first word of those 5 words
Example: check if this is in the first word thisis. It is! Great, now remove that this, from thisis to get simply just is, now delete the original string that corresponded to that equality and keep iterating over the left overs (now is,my,dog are available). If we can keep doing this process, until we get an empty string. We return 1, else return 0!
Are my thoughts on the right track? I think this would be a good approach (By the way I would like to implement this in javascript)
Sorting words from long to short may in some cases help to find a solution quicker, but it is not a guarantee. Sentences that contain the longest word might only have a solution if that longest word is not used.
Take for instance this test case:
Words: toolbox, stool, boxer
Sentence: stoolboxer
If "toolbox" is taken as a word in that sentence, then the remaining characters cannot be matched with other valid words. Yet, there is a solution, but only if the word "toolbox" is not used.
Solution with a Regular Expression
When regular expressions are allowed as part of the solution, then it is quite simple. For the above example, the regular expression would be:
^(toolbox|stool|boxer)*$
If a sentence matches that expression, it is a solution. If not, then not. This is quite straightforward, and doesn't really require an algorithm. All is done by the regular expression interpreter. Here is a snippet:
var words = ['this','is','a','string'];
var sentences = ['thisis','thisisastring','thisisaduck','thisisastringg','stringg'];
var regex = new RegExp('^(' + words.join('|') + ')*$');
sentences.forEach(sentence => {
// search returns a position. It should be 0:
console.log(sentence + ': ' + (sentence.search(regex) ? 'No' : 'Yes'));
});
But using regular expressions in an algorithm-challenge feels like cheating: you don't really write the algorithm, but rely on the regular expression implementation to do the job for you.
Without Regular Expressions
You could use this algorithm: first check whether a word matches at the start of the input sentence, and if so, remove that first occurrence from it. Then repeat this for the remaining part of the sentence. If this can be repeated until no characters are left over, you have a solution.
If characters are left over which cannot be matched with any word... well, then you cannot really conclude there is no solution for that sentence. It might be that some earlier made word choice was the wrong one, and there was an alternative. So to cope with that, your algorithm could backtrack and try other words.
This principle can be implemented through recursion. To gain memory-efficiency, you could leave the original sentence in-tact, and work with an index in that sentence instead.
The algorithm is implemented in arrow-function testString:
var words = ['this','is','a','string'];
var sentences = ['thisis','thisisastring','thisisaduck','thisisastringg','stringg'];
var testString = (words, str, i = 0) =>
i >= str.length || words.some( word =>
str.substr(i, word.length) == word && testString(words, str, i + word.length)
);
sentences.forEach(sentence => {
console.log(sentence + ': ' + (testString(words, sentence) ? 'Yes' : 'No'));
});
Or, the same in non-arrow-function syntax:
var words = ['this','is','a','string'];
var sentences = ['thisis','thisisastring','thisisaduck','thisisastringg','stringg'];
var testString = function (words, str, i = 0) {
return i >= str.length || words.some(function (word) {
return str.substr(i, word.length) == word
&& testString(words, str, i + word.length);
});
}
sentences.forEach(function (sentence) {
console.log(sentence + ': ' + (testString(words, sentence) ? 'Yes' : 'No'));
});
... and without some(), forEach() or ternary operator:
var words = ['this','is','a','string'];
var sentences = ['thisis','thisisastring','thisisaduck','thisisastringg','stringg'];
function testString (words, str, i = 0) {
if (i >= str.length) return true;
for (var k = 0; k < words.length; k++) {
var word = words[k];
if (str.substr(i, word.length) == word
&& testString(words, str, i + word.length)) {
return true;
}
}
}
for (var n = 0; n < sentences.length; n++) {
var sentence = sentences[n];
if (testString(words, sentence)) {
console.log(sentence + ': Yes');
} else {
console.log(sentence + ': No');
}
}
Take the 4 words, put them into a regex.
Use that regex to split each string.
Take the length of the resulting array (subtract one for the initial length of one).
var size = 'thisis'.split(/this|is|my|dog/).length - 1
Or if your list of words is an array
var search = new RegExp(words.join('|'))
var size = 'thisis'.split(search).length - 1
Either way you are splitting up the string by the list of words you have defined.
You can sort the words by length to ensure that larger words are matched first by
words.sort(function (a, b) { return b.length - a.length })
Here is the solution for anyone interested
var input = ['this','is','a','string']; // This will work for any input, but this is a test case
var orderedInput = input.sort(function(a,b){
return b.length - a.length;
});
var inputRegex = new RegExp(orderedInput.join('|'));
// our combonation of words can be any size in an array, just doin this since prompt in js is spammy
var testStrings = ['thisis','thisisastring','thisisaduck','thisisastringg','stringg'];
var foundCombos = (regex,str) => !str.split(regex).filter(str => str.length).length;
var finalResult = testStrings.reduce((all,str)=>{
all[str] = foundCombos(inputRegex,str);
if (all[str] === true){
all[str] = 1;
}
else{
all[str] = 0;
}
return all;
},{});
console.log(finalResult);

Javascript all words are 3 characters or longer

Let say I have a string = 'all these words are three characters orr longer'
I want to check it
if (string.someWayToCheckAllWordsAre3CharactersOrLonger) {
alert("it's valid!");
}
How can I do that?
Split the string into an array, then check if each word is longer than 3 characters using every.
var string = 'all these words are three characters orr longer';
// Using regex \s+ to split the string, so only words are get in the array
string.trim().split(/\s+/).every(e => e.length >= 3);
You can use every
Before you can use every the string need to be pre-processed.
trim the string, remove the leading and trailing spaces.
split the string by one or more space characters
Then use every to check if length every element of the array is greater than or equal to three.
Demo
var string = 'all these words are three characters orr longer';
string.trim().split(/\s+/).every(function(e) { return e.length >= 3; });
how about something like this
var string = "all these words are three characters orr longer";
var words = string.split(' ');
var allWordsAreLongerThanThreeChars = true;
for(var i=0;i<words.length;i++){
if(words[i].length < 3){
allWordsAreLongerThanThreeChars = false;
return;
}
}
Two simple steps: split string into an array using .split(), loop through the array and check the length of each word using .length(). Hope this helps.
var string = 'all these words are three characters orr longer';
var stringArray = string.split(" ");
for (var i = 0; i < stringArray.length; i++){
if(stringArray[i].length >= 3) {
alert(stringArray[i]);
}
};

How to match an out of order string with Javascript regular expressions

I've written a live filter in javascript that takes a value from a field and hides the rows in a table that do not match.
The RegEx I use for this is very simple: /inputValue/i
Although this works great it only matches characters that are in order. For example:
inputValue = test
string to match = this is a test sentence
This example would match, but if I tried:
inputValue = this sentence
string to match = this is a test sentence
This won't match because the input value is out of order.
How would I go about writing a RegEx that is in order but can skip words?
Here is the loop I currently use:
for (var i=0; i < liveFilterDataArray.length; i++) {
var comparisonString = liveFilterDataArray[i],
comparisonString = comparisonString.replace(/['";:,.\/?\\-]/g, '');
RE = eval("/" + liveFilterValue + "/i");
if (comparisonString.match(RE)) {
rowsToShow.push(currentRow);
}
if(currentRow < liveFilterGridRows.length - 1) {
currentRow++;
} else {
currentRow = 0;
}
}
Many thanks for your time.
Chris
It is recommended to Use RegExp instead of eval.
DEMO
var words = liveFilterValue.split(" ");
var searchArg = (words.length==1)?words:words.join(".*")+'|'+words.reverse().join(".*")
var RE = new RegExp(searchArg,"i");
It will create this.*sentence|sentence.*this/i
remove +'|'+words.reverse().join(".*") if you only want to find this.....sentence and not sentence....this
You could split the input string on spaces and then run the filter sequentially for each word.

Finding uppercase characters within a string

I am trying to write a function that decryptes an encrypted message that has uppercase letters (showing its a new word) and lower case characters (which is the word itself). The function needs to search through the encrypted message for all the uppercase letters and then returns the uppercase character along with lower case that follows it. I have been given a function to call on within the decrypt function:
function isUpperCase(aCharacter)
{
return (aCharacter >= 'A') && (aCharacter <= 'Z');
}
I was thinking that I would search through the word for all the uppercase characters first and assign that as a new string. I could then do while loop that will pick up each of the letters in the new string and then search for the lower case characters that are next to it in the old string.
However, I am completely stuck at the first part - I cant even work out the structured English.
The code is:
encryptMessage is a string containing uppercase and lowercase characters
indexCharacter is used at a later date for another function
upperAlphabet - alphabet of uppercase characters - used later
lowerAlphabet - alphabet lowercase characters - used later
The function:
function decryptMessage(encryptMessage, indexCharacter, upperAlphabet, lowerAlphabet)
{
var letter
var word = "";
for (var count = 0; count < encryptMessage.length; count = count +1);
{
letter = encryptMessage.charAt(count)
if (isUpperCase(letter));
{
word = word + letter;
}
document.write(word); //this is just to test to see if it returns the uppercase - I would use the return word
}
The above just doesnt seem to work, so I cant even continue with the rest of the code. Can anyone help me identify where i have gone wrong - have I completely gone the wrong direction with this anyway, reading it back I dont think it really makes much sense ?? Its a very basic code, I have only learnt, for, while loops - if and else functions really, i am just soooooo stuck.
thanks in advance for your advice :-)
Issy
I'm not too sure I follow, but you can strip using the replace method and regular expressions
var str = 'MaEfSdsfSsdfsAdfssdGsdfEsdf';
var newmsg = str.replace(/[a-z]/g, '');
var old = str.replace(/[A-Z]/g, '');
In this case, newmsg = 'MESSAGE'.
A simple condition for checking uppercase characters in a string would be...
var str = 'aBcDeFgHiJkLmN';
var sL = str.length;
var i = 0;
for (; i < sL; i++) {
if (str.charAt(i) === str.charAt(i).toUpperCase()) {
console.log('uppercase:',str.charAt(i));
}
}
/*
uppercase: B
uppercase: D
uppercase: F
uppercase: H
uppercase: J
uppercase: L
uppercase: N
*/
EDIT
String input = "ThisIsASecretText";
for(int i = 0; i < input.Length; i++)
{
if(isUpperCase(input.charAt(i))
{
String nextWord = String.Empty;
for(int j = i; j < input.Length && !isUpperCase(input.charAt(j)); j++)
{
nextWord += input.charAt(j);
i++;
}
CallSomeFunctionWithTheNextWord(nextWord);
}
}
The following calls would be made:
CallSomeFunctionWithTheNextWord("This");
CallSomeFunctionWithTheNextWord("Is");
CallSomeFunctionWithTheNextWord("A");
CallSomeFunctionWithTheNextWord("Secret");
CallSomeFunctionWithTheNextWord("Text");
You can do the same thing with much less code using regular expressions, but since you said that you are taking a very basic course on programming, this solution might be more appropriate.
Use Unicode property escapes, in particular the "Lu" General Property Category, which matches uppercase. There are categories for numbers, punctuation, currency, and just about any other category of character you might be interested in.
In the example below, the "u" modifier enables Unicode matching.
"HeLlo WoRld".match(/\p{Lu}/gu) // [ 'H', 'L', 'W', 'R' ]
I would rather use Array.reduce as follows:
say, example sample = 'SampleStringAsFollows';
let capWord = [...sample].reduce((caps,char) => (char.match(/[A-Z]/)) ? caps + char : caps,'');
console.log(capWord); //SSAF
capWord will be a string of CAPITAL CHARACTERS and will also tackle the boundary cases where in the string may contain special characters.
Please Use Below code to get first Capital letter of the sentence :
Demo Code
var str = 'i am a Web developer Student';
var sL = str.length;
var i = 0;
for (; i < sL; i++) {
if (str.charAt(i) != " ") {
if (str.charAt(i) === str.charAt(i).toUpperCase()){
console.log(str.charAt(i));
}
}
}

Categories