Using regular expression to split a string

Using regular expression to split a string - javascript

I have a string which I need to separate correctly:
self.view.frame.size.height = 44
I need to get only view, frame, size, and height. And I need to do it with a regular expression.
So far I've tried a lot of variants, none of them are even close to what I want to get. And my code now looks like this:
var testString = 'self.view.frame.size.height = 44'
var re = new RegExp('\\.(.*)\\.', "g")
var array = re.exec(testString);
console.log('Array length is ' + array.length)
for (var i = 0; i < array.length; i++) {
console.log('<' + array[i] + ">");
}
And it doesn't work at all:
Array length is 2
<.view.frame.size.>
<view.frame.size>
I'm new at Javascript, so maybe I want the impossible, let me know.
Thanks.

In Javascript, executing a regexp with the g modifier doesn't return all the matches at once. You have to execute it repeatedly on the same input string, and each one returns the next match.
You also need to change the regexp so it only returns one word at a time. .* is greedy, so it returns the longest possible match, so it was returning all the words between the first and last .. [^.]* will match a sequence of non-dot characters, so it will just return one word. You can't include the second . in the regexp, because that will interfere with the repetition -- each repetition starts searching after the end of the previous match, and there's no beginning . after the ending . of the word. Also, there's no . after height, so the last word won't match it.
EDIT: I've changed the regexp to use \w* instead of [^.]*, because it was grabbing the whole height = 44 string instead of just height.
var testString = 'self.view.frame.size.height = 44';
var re = /\.(\w*)/g;
var array = [];
var result;
while (result = re.exec(testString)) {
array.push(result[1]);
}
console.log('Array length is ' + array.length)
for (var i = 0; i < array.length; i++) {
console.log('<' + array[i] + ">");
}

If you're sure that your data will be always in the same format you can use this:
function parse (string) {
return string.split(" = ").shift().split(".").splice(1);
}

In your context, split is a MUCH better option:
var str = "self.view.frame.size.height = 44";
var bits1 = str.split(" ")[0];
var bits2 = bits1.split(".");
bits2.shift(); // get rid of the unwanted self
console.log(bits2);

Related

How to get total sum of matches from a loop?

I'm trying to loop through an array to check whether any of the words in the array are in a body of text:
for(var i = 0; i < wordArray.length; i++ ) {
if(textBody.indexOf(wordArray[i]) >= 1) {
console.log("One or two words.");
// do something
}
else if (textBody.indexOf(wordArray[i]) >= 3) {
console.log("Three or more words.");
// do something
}
else {
console.log("No words match.");
// do something
}
}
where >= 1 and >= 3 are supposed to determine the number of matched words (although it might just be determining their index position in the array? As, in its current state it will console.log hundreds of duplicate strings from the if / else statement).
How do I set the if / else statement to do actions based off of the amount of matched words?
Any help would be greatly appreciated!

Try this:
for (var i = 0; i < wordArray.length; i++) {
var regex = new RegExp('\\b' + wordArray[i] + '\\b', 'ig');
var matches = textBody.match(regex);
var numberOfMatches = matches ? matches.length : 0;
console.log(wordArray[i] + ' found ' + numberOfMatches + " times");
}
indefOf will do partial matches. For example "This is a bust".indexOf("bus") would match even though that is probably not what you want. It is better to use a regular expression with the word boundry token \b to eliminate partial word matches. In the Regexp constructor you need to escape the slash so \b becomes \\b. The regex uses the i flag to ignore case and the g flag to find all matches. Replace the console.log line with your if/else logic based on the numberOfMatches variable.
UPDATE: Per your clarification you would change the above to
var numberOfMatches = 0;
for (var i = 0; i < wordArray.length; i++) {
var regex = new RegExp('\\b' + wordArray[i] + '\\b', 'ig');
var matches = textBody.match(regex);
numberOfMatches += matches ? matches.length : 0;
}
console.log(numberOfMatches);

indexOf() provides the index of the first match, not the number of matches. So currently you're testing first if it appears at index one, then at index three - not counting the number of matches.
I can think of a couple different approaches off the top of my head that would work, but I'm not going to write them for you because this sounds like school work. One would be to use match: see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match and Count number of matches of a regex in Javascript
If you're scared of using regex, or can't be assed to spend the time learning how they work, you could get the index of the match, and if it matches make a substring excluding the portion up to that match, and test if it matches again, while incrementing a counter. indexOf() will return -1 if no matches are found.

You can split text to words with regExp and than find all occurrences of your word in this way
var text = "word1, word2, word word word word3"
var allWords = text.split(/\b/);
var getOccurrenceCount = function(word, allWords) {
return allWords.reduce(function(count, nextWord) {
count += word == nextWord ? 1 : 0;
return count;
}, 0);
};
getOccurrenceCount("word", allWords);

This may help you:
You have to use .match instead of .indexOf (get the index of the first occurence inside the string)
var textBody = document.getElementById('inside').innerHTML;
var wordArray = ['check','test'];
for(var i = 0; i < wordArray.length; i++ ) {
var regex = new RegExp( wordArray[i], 'g' );
var wordCount = (textBody.match(regex) || []).length;
console.log(wordCount + " times the word ["+ wordArray[i] +"]");
}
<body>
<p id="inside">
this is your test, check the test, how many test words check
<p>
</body>

I would first put the array into a hashmap, something like
_.each(array, function(a){map[a]=1})
Second split string into array by space and marks.
Loop through the new array to check if the word exist in the first map.
Make sure to compare string/words without cases.
This approach will help you improve the run time efficiency to linear.

Yes .indexOf gives you the first position of the word in the string. Many methods available to count a word in a string, I'm sharing my crazy version :
function matchesCount(word, str) {
return (' ' + str.replace(/[^A-Za-z]+/gi,' ') + ' ')
.split(' '+word+' ').length - 1;
}
console.log(matchesCount('test', 'A test to test how many test in this'));

Replace string between second set of [ and ]

I am learning regex, and I got a doubt. Let's consider
var s = "YYYN[1-20]N[]NYY";
Now, I want to replace/insert the '1-8' between [ and ] at its second occurrence.
Then output should be
YYYN[1-20]N[1-8]NYY
For that I had tried using replace and passing a function through it as shown below:
var nth = 0;
s = s.replace(/\[([^)]+)\]/g, function(match, i, original) {
nth++;
return (nth === 1) ? "1-8" : match;
});
alert(s); // But It wont work
I think that regex is not matchIing the string that I am using.
How can I fix it?

You regex \[([^)]+)\] will not match empty square brackets since + requires at least 1 character other than ). I guess you wanted to write \[[^\]]*\].
Here is a fix for your solution:
var s = "YYYN[1-20]N[]NYY";
var nth = 0;
s = s.replace(/\[[^\]]*\]/g, function (match, i, original) {
nth++;
return (nth !== 1) ? "[1-8]" : match;
});
alert(s);
Here is another way of doing it:
var s = "YYYN[1-20]N[]NYY";
var nth = 0;
s = s.replace(/(.*)\[\]/, "$1[1-8]");
alert(s);
The regex (.*)\[\] matches and captures into Group 1 greedily as much text as possible (thus we get the last set of empty []), and then matches empty square brackets. Then we restore the text before [] with $1 backreference and add out string 1-8.

If it’s only two occurences of square brackets, then this will work:
/(.*\[.*?\].*\[).*?(\].*)/
This RegEx has “YYYN[1-20]N[” as the first capturing group and “]NYY” as the second.

I suggest using simple split and join operations:
var s = "YYYN[1-20]N[]NYY";
var arr = s.split(/\[/)
arr[2] = '1-8' + arr[2]
var r = arr.join('[')
//=> YYYN[1-20]N[1-8]NYY

You can use following regex :
var s = "YYYN[1-20]N[]NYY";
var nth = 0;
s = s.replace(/([^[]+\[(?:[^[]+)\][^[]+)\[[^[]+\](.+)/, "$1[1-8]$2");
alert(s);
The first part ([^[]+\[([^[]+)\][^[]+) will match a string contain first sub-string between []. and \[[^[]+\] would be the second one which you want and the last part (.+?) match the rest of your string.

Javascript split at multiple delimters while keeping delimiters

Is there a better way than what I have (through regex, for instance) to turn
"div#container.blue"
into this
["div", "#container", ".blue"];
Here's what I've have...
var arr = [];
function process(h1, h2) {
var first = h1.split("#");
arr.push(first[0]);
var secondarr = first[1].split(".");
secondarr[0] = "#" + secondarr[0];
arr.push(secondarr[0]);
for (i = 1; i< secondarr.length; i++) {
arr.push(secondarr[i] = "." + secondarr[i]);
}
return arr;
}

Why not something like this?
'div#container.blue'.split(/(?=[#.])/);
Because it's simply looking for a place where the next character is either # or the literal ., this does not capture anything, which makes it a zero length match. Because it's zero-length match, nothing is removed.

As you've probably found, the issue is that split removes the item you're splitting on. You can solve that with regex capturing groups (the parenthesis):
var result = 'div#container.blue'.split(/(#[^#|^.]*)|(\.[^#|^.]*)/);
Now we've got the issue that result contains a lot of falsy values you don't want. A quick filter fixes that:
var result = 'div#container.blue'.split(/(#[^#|^.]*)|(\.[^#|^.]*)/).filter(function(x) {
return !!x;
});
Appendix A: What the heck is that regex
I'm assuming you're only concerned with # and . as characters. That still gives us this monster: /(#[^#|^.]*)|(\.[^#|^.]*)/
This means we'll capture either a # or ., and then all the characters up until the next # or . (remembering that a period is significant in regex, so we need to escape it, unless we're inside the brackets).

I've written an extensions of the Script type for you. It allows you to choose which delimiters to use, passing them in a string:
String.prototype.splitEx = function(delimiters) {
var parts = [];
var current = '';
for (var i = 0; i < this.length; i++) {
if (delimiters.indexOf(this[i]) < 0) current += this[i];
else {
parts.push(current);
current = this[i];
}
}
parts.push(current);
return parts;
};
var text = 'div#container.blue';
console.log(text.splitEx('#.'));

Count number of words in string using JavaScript

I am trying to count the number of words in a given string using the following code:
var t = document.getElementById('MSO_ContentTable').textContent;
if (t == undefined) {
var total = document.getElementById('MSO_ContentTable').innerText;
} else {
var total = document.getElementById('MSO_ContentTable').textContent;
}
countTotal = cword(total);
function cword(w) {
var count = 0;
var words = w.split(" ");
for (i = 0; i < words.length; i++) {
// inner loop -- do the count
if (words[i] != "") {
count += 1;
}
}
return (count);
}
In that code I am getting data from a div tag and sending it to the cword() function for counting. Though the return value is different in IE and Firefox. Is there any change required in the regular expression? One thing that I show that both browser send same string there is a problem inside the cword() function.

[edit 2022, based on comment] Nowadays, one would not extend the native prototype this way. A way to extend the native protype without the danger of naming conflicts is to use the es20xx symbol. Here is an example of a wordcounter using that.
Old answer: you can use split and add a wordcounter to the String prototype:
if (!String.prototype.countWords) {
String.prototype.countWords = function() {
return this.length && this.split(/\s+\b/).length || 0;
};
}
console.log(`'this string has five words'.countWords() => ${
'this string has five words'.countWords()}`);
console.log(`'this string has five words ... and counting'.countWords() => ${
'this string has five words ... and counting'.countWords()}`);
console.log(`''.countWords() => ${''.countWords()}`);

I would prefer a RegEx only solution:
var str = "your long string with many words.";
var wordCount = str.match(/(\w+)/g).length;
alert(wordCount); //6
The regex is
\w+ between one and unlimited word characters
/g greedy - don't stop after the first match
The brackets create a group around every match. So the length of all matched groups should match the word count.

This is the best solution I've found:
function wordCount(str) {
var m = str.match(/[^\s]+/g)
return m ? m.length : 0;
}
This inverts whitespace selection, which is better than \w+ because it only matches the latin alphabet and _ (see http://www.ecma-international.org/ecma-262/5.1/#sec-15.10.2.6)
If you're not careful with whitespace matching you'll count empty strings, strings with leading and trailing whitespace, and all whitespace strings as matches while this solution handles strings like ' ', ' a\t\t!\r\n#$%() d ' correctly (if you define 'correct' as 0 and 4).

You can make a clever use of the replace() method although you are not replacing anything.
var str = "the very long text you have...";
var counter = 0;
// lets loop through the string and count the words
str.replace(/(\b+)/g,function (a) {
// for each word found increase the counter value by 1
counter++;
})
alert(counter);
the regex can be improved to exclude html tags for example

//Count words in a string or what appears as words :-)
function countWordsString(string){
var counter = 1;
// Change multiple spaces for one space
string=string.replace(/[\s]+/gim, ' ');
// Lets loop through the string and count the words
string.replace(/(\s+)/g, function (a) {
// For each word found increase the counter value by 1
counter++;
});
return counter;
}
var numberWords = countWordsString(string);

Javascript regex - split string

Struggling with a regex requirement. I need to split a string into an array wherever it finds a forward slash. But not if the forward slash is preceded by an escape.
Eg, if I have this string:
hello/world
I would like it to be split into an array like so:
arrayName[0] = hello
arrayName[1] = world
And if I have this string:
hello/wo\/rld
I would like it to be split into an array like so:
arrayName[0] = hello
arrayName[1] = wo/rld
Any ideas?

I wouldn't use split() for this job. It's much easier to match the path components themselves, rather than the delimiters. For example:
var subject = 'hello/wo\\/rld';
var regex = /(?:[^\/\\]+|\\.)+/g;
var matched = null;
while (matched = regex.exec(subject)) {
print(matched[0]);
}
output:
hello
wo\/rld
test it at ideone.com

The following is a little long-winded but will work, and avoids the problem with IE's broken split implementation by not using a regular expression.
function splitPath(str) {
var rawParts = str.split("/"), parts = [];
for (var i = 0, len = rawParts.length, part; i < len; ++i) {
part = "";
while (rawParts[i].slice(-1) == "\\") {
part += rawParts[i++].slice(0, -1) + "/";
}
parts.push(part + rawParts[i]);
}
return parts;
}
var str = "hello/world\\/foo/bar";
alert( splitPath(str).join(",") );

Here's a way adapted from the techniques in this blog post:
var str = "Testing/one\\/two\\/three";
var result = str.replace(/(\\)?\//g, function($0, $1){
return $1 ? '/' : '[****]';
}).split('[****]');
Live example
Given:
Testing/one\/two\/three
The result is:
[0]: Testing
[1]: one/two/three
That first uses the simple "fake" lookbehind to replace / with [****] and to replace \/ with /, then splits on the [****] value. (Obviously, replace [****] with anything that won't be in the string.)

/*
If you are getting your string from an ajax response or a data base query,
that is, the string has not been interpreted by javascript,
you can match character sequences that either have no slash or have escaped slashes.
If you are defining the string in a script, escape the escapes and strip them after the match.
*/
var s='hello/wor\\/ld';
s=s.match(/(([^\/]*(\\\/)+)([^\/]*)+|([^\/]+))/g) || [s];
alert(s.join('\n'))
s.join('\n').replace(/\\/g,'')
/* returned value: (String)
hello
wor/ld
*/

Here's an example at rubular.com

For short code, you can use reverse to simulate negative lookbehind
function reverse(s){
return s.split('').reverse().join('');
}
var parts = reverse(myString).split(/[/](?!\\(?:\\\\)*(?:[^\\]|$))/g).reverse();
for (var i = parts.length; --i >= 0;) { parts[i] = reverse(parts[i]); }
but to be efficient, it's probably better to split on /[/]/ and then walk the array and rejoin elements that have an escape at the end.

Something like this may take care of it for you.
var str = "/hello/wo\\/rld/";
var split = str.replace(/^\/|\\?\/|\/$/g, function(match) {
if (match.indexOf('\\') == -1) {
return '\x00';
}
return match;
}).split('\x00');
alert(split);

We Keep Coding

JavaScript is the programming language of the Web.

Using regular expression to split a string - javascript

If you're sure that your data will be always in the same format you can use this: function parse (string) { return string.split(" = ").shift().split(".").splice(1); }

In your context, split is a MUCH better option: var str = "self.view.frame.size.height = 44"; var bits1 = str.split(" ")[0]; var bits2 = bits1.split("."); bits2.shift(); // get rid of the unwanted self console.log(bits2);

Related

How to get total sum of matches from a loop?

Replace string between second set of [ and ]

Javascript split at multiple delimters while keeping delimiters

Count number of words in string using JavaScript

Javascript regex - split string

Categories

Resources