I'm taking some text and want to split it into an array. My goal is to be able to split it into phrases delimited by stopwords (words ignored by search engines, like 'a' 'the' etc), so that I can then search each individual phrase in my API. So for example: 'The cow's hat was really funny' would result in arr[0] = cow's hat and arr[1] = funny. I have an array of stopwords already but I can't really think of how to actually split by each/any of the words in it, without writing a very slow function to loop through each one.
Use split(). It takes a regular expression. The following is a simple example:
search_string.split(/\b(?:a|the|was|\s)+\b/i);
If you already have the array of stop words, you could use join() to build the regular expression. Try the following:
regex = new RegExp("\\b(?:" + stop_words.join('|') + "|\\s)+\\b", "i");
A working example http://jsfiddle.net/NEnR8/. NOTE: it may be best to replace these values than to split on them as there are empty array elements from this result.
This does a case insensitive .split() on your keywords, surrounded by word boundries.
var str = "The cow's hat was really funny";
var arr = str.split(/\ba\b|\bthe\b|\bwas\b/i);
You may end up with some empty items in the Array. To compact it, you could do this:
var len = arr.length;
while( len-- ) {
if( !arr[len] )
arr.splice( len, 1);
}
Quick and dirty way would be to replace the "stop word" strings with some unique characters (e.g. &&&), and then split based on that unique character.
For example.
var the_text = "..............",
stop_words = ['foo', 'bar', 'etc'],
unique_str = '&&&';
for (var i = 0; i < stop_words.length; i += 1) {
the_text.replace(stop_words[i], unique_str);
}
the_text.split(unique_str);
Related
I'm working through a problem on freecodecamp.com, and I want to see whether my code so far is doing what I think it is doing...
function titleCase(str) {
var wordArr = str.split(' '); // now the sentences is an array of words
for (var i = 0; i < wordArr.length; i++) { //looping through the words now...
charArr = wordArr[i].split(''); //charArr is a 2D array of characters within words?
return charArr[1][1];
}
}
titleCase("a little tea pot"); // this should give me 'i', right?
Again, this is just the beginning of the code. My goal is to capitalize the first letter of each word in the parameter of titleCase();. Perhaps I'm not even going about this right at all.
But... is charArr on line 4 a multidimensional array. Did that create [['a'],['l','i','t','t','l','e'],['t','e','a','p','o','t']]?
In addition to ABR answer (I can't comment yet) :
charArr is a one-dimensional array, if you want it to be a 2d array you need to push the result of wordArr[i].split(''); instead of assigning it.
charArr.push(wordArr[i].split(''));
And don't forget to initialize charArr as an empty array
Few issues :
1. Your return statement will stop this after one iteration.
2. If one of the words have fewer then 2 letters (like the first one in your example, which is 'a') - you will get an exception at charArr[1][1].
Other then that, it is mostly ok.
It would probably help you to download a tool like firebug and test your code live...
You can do the following:
function titleCase(str) {
var newString = "";
var wordArr = str.split(' ');
for (var i = 0; i < wordArr.length; i++) { //looping through the words now...
var firstLetter = wordArr[i].slice(0,1); // get the first letter
//capitalize the first letter and attach the rest of the word
newString += firstLetter.toUpperCase() + wordArr[i].substring(1) + " ";
}
return newString;
}
Also you need to remove the return statement in your for loop because the first time the for loop goes over the return statement, it will end the function and you will not be able to loop through all the words
Here you can learn more about string.slice() : http://www.w3schools.com/jsref/jsref_slice_string.asp
I am trying to remove some spaces from a few dynamically generated strings. Which space I remove depends on the length of the string. The strings change all the time so in order to know how many spaces there are, I iterate over the string and increment a variable every time the iteration encounters a space. I can already remove all of a specific type of character with str.replace(' ',''); where 'str' is the name of my string, but I only need to remove a specific occurrence of a space, not all the spaces. So let's say my string is
var str = "Hello, this is a test.";
How can I remove ONLY the space after the word "is"? (Assuming that the next string will be different so I can't just write str.replace('is ','is'); because the word "is" might not be in the next string).
I checked documentation on .replace, but there are no other parameters that it accepts so I can't tell it just to replace the nth instance of a space.
If you want to go by indexes of the spaces:
var str = 'Hello, this is a test.';
function replace(str, indexes){
return str.split(' ').reduce(function(prev, curr, i){
var separator = ~indexes.indexOf(i) ? '' : ' ';
return prev + separator + curr;
});
}
console.log(replace(str, [2,3]));
http://jsfiddle.net/96Lvpcew/1/
As it is easy for you to get the index of the space (as you are iterating over the string) , you can create a new string without the space by doing:
str = str.substr(0, index)+ str.substr(index);
where index is the index of the space you want to remove.
I came up with this for unknown indices
function removeNthSpace(str, n) {
var spacelessArray = str.split(' ');
return spacelessArray
.slice(0, n - 1) // left prefix part may be '', saves spaces
.concat([spacelessArray.slice(n - 1, n + 1).join('')]) // middle part: the one without the space
.concat(spacelessArray.slice(n + 1)).join(' '); // right part, saves spaces
}
Do you know which space you want to remove because of word count or chars count?
If char count, you can Rafaels Cardoso's answer,
If word count you can split them with space and join however you want:
var wordArray = str.split(" ");
var newStr = "";
wordIndex = 3; // or whatever you want
for (i; i<wordArray.length; i++) {
newStr+=wordArray[i];
if (i!=wordIndex) {
newStr+=' ';
}
}
I think your best bet is to split the string into an array based on placement of spaces in the string, splice off the space you don't want, and rejoin the array into a string.
Check this out:
var x = "Hello, this is a test.";
var n = 3; // we want to remove the third space
var arr = x.split(/([ ])/); // copy to an array based on space placement
// arr: ["Hello,"," ","this"," ","is"," ","a"," ","test."]
arr.splice(n*2-1,1); // Remove the third space
x = arr.join("");
alert(x); // "Hello, this isa test."
Further Notes
The first thing to note is that str.replace(' ',''); will actually only replace the first instance of a space character. String.replace() also accepts a regular expression as the first parameter, which you'll want to use for more complex replacements.
To actually replace all spaces in the string, you could do str.replace(/ /g,""); and to replace all whitespace (including spaces, tabs, and newlines), you could do str.replace(/\s/g,"");
To fiddle around with different regular expressions and see what they mean, I recommend using http://www.regexr.com
A lot of the functions on the JavaScript String object that seem to take strings as parameters can also take regular expressions, including .split() and .search().
I have a string with keywords, separated by comma's.
Now I also have a nice RegEx, to filter out all the keywords in that string, that matches a queried-string.
Check out this initial question - RegEx - Extract words that contains a substring, from a comma seperated string
The example below works fine; it has a masterString, and a resultString. That last one only contains the keywords that has at least the word "car" in it.
masterString = "typography,caret,car,align,shopping-cart,adjust,card";
resultString = masterString.match(/[^,]*car[^,]*/g);
console.log(resultString);
Result from the code above;
"caret", "car", "shopping-cart", "card"
But how can I use the RegEx, with a variable matching-word (the word "car" in this example static and not variable).
I think it has to do something with a RegExp - but I can't figure out...
Here's a general solution for use with regexes:
var query = "anything";
// Escape the metacharacters that may be found in the query
// sadly, JS lacks a built-in regex escape function
query = query.replace(/[-\\()\[\]{}^$*+.?|]/g, '\\$&');
var regex = new RegExp("someRegexA" + query + "someRegexB", "g");
As long as someRegexA and someRegexB form a valid regex with a literal in-between, the regex variable will always hold a valid regex.
But, in your particular case, I'd simply do this:
var query = "car";
var items = masterString.split(",");
query = query.toLowerCase();
for (var i = 0; i < items.length; ++i) {
if (items[i].toLowerCase().indexOf(query) >= 0) {
console.log(items[i]);
}
}
How about this one?, you only need to replace \ \ with String , and it works for me. it can find whether your string has "car", not other similar word
var query = 'car';
var string = "car,bike,carrot,plane,card";
var strRegEx = '[^,]*'+query+'[,$]*';
string.match(strRegEx);
Answer provided by OP and removed from inside the question
I figured out this quick-and-maybe-very-dirty solution...
var query = 'car';
var string = "car,bike,carrot,plane,card";
var regex = new RegExp("[^,]*|QUERY|[^,]*".replace('|QUERY|',query),'ig');
string.match(regex);
This code outputs the following, not sure if it is good crafted, 'though..
"car", "carrot", "card"
But ended figuring out another, much simpler solution;
var query = "car";
var string = "car,bike,carrot,plane,card";
string.match(new RegExp("[^,]*"+query+"[^,]*",'ig'));
This code outputs the string below;
["car", "carrot", "card"]
My app-search-engine now works perfect :)
I need some help with a regex conundrum pls. I'm still getting to grips with it all - clearly not an expert!
Eg. Say I have a complex string like so:
{something:here}{examp.le:!/?foo|bar}BLAH|{something/else:here}:{and:here\\}(.)}
First of all I want to split the string into an array by using the pipe, so it is effectively like:
{something:here}{examp.le:!/?foo|bar}BLAH
and
{something/else:here}:{and:here\\}(.)}
But notice that there is a pipe within the curly brackets to ignore... so need to work out the regex expression for this. I was using indexOf originally, but because I now have to take into account pipes being within the curly brackets, it complicates things.
And it isn't over yet! I also then need to split each string into separate parts by what is within the curly brackets and not. So I end up with 2 arrays containing:
Array1
{something:here}
{examp.le:!/?foo|bar}
BLAH
Array2
{something/else:here}
:
{and:here\\}(.)}
I added a double slash before the first closing curly bracket as a way of saying to ignore this one. But cannot figure out the regex to do this.
Can anyone help?
Find all occurrences of "string in braces" or "just string", then iterate through found substrings and split when a pipe is encountered.
str = "{something:here}{examp.le:!/?foo|bar}BLAH|{something/else:here}:{and:here\\}(.)}"
var m = str.match(/{.+?}|[^{}]+/g)
var r = [[]];
var n = 0;
for(var i = 0; i < m.length; i++) {
var s = m[i];
if(s.charAt(0) == "{" || s.indexOf("|") < 0)
r[n].push(s);
else {
s = s.split("|");
if(s[0].length) r[n].push(s[0]);
r[++n] = [];
if(s[1].length) r[n].push(s[1]);
}
}
this expr will be probably better to handle escaped braces
var m = str.match(/{?(\\.|[^{}])+}?/g
i like to split a string depending on "," character using JavaScript
example
var mystring="1=name1,2=name2,3=name3";
need output like this
1=name1
2=name2
3=name3
var list = mystring.split(',');
Now you have an array with ['1=name1', '2=name2', '3=name3']
If you then want to output it all separated by spaces you can do:
var spaces = list.join("\n");
Of course, if that's really the ultimate goal, you could also just replace commas with spaces:
var spaces = mystring.replace(/,/g, "\n");
(Edit: Your original post didn't have your intended output in a code block, so I thought you were after spaces. Fortunately, the same techniques work to get multiple lines.)
Just use string.split() like this:
var mystring="1=name1,2=name2,3=name3";
var arr = mystring.split(','); //array of ["1=name1", "2=name2", "3=name3"]
If you the want string version of result (unclear from your question), call .join() like this:
var newstring = arr.join(' '); //(though replace would do it this example)
Or loop though, etc:
for(var i = 0; i < arr.length; i++) {
alert(arr[i]);
}
You can play with it a bit here