Javascript iterate over string looking for multiple character sets - javascript

Ok, so I know how to do a standard loop to iterate over a string to find a character or a word that matches a single character or word, but in this instance, I have multiple character sets that I am looking for. Some are letters, some have characters (including protected ones). I can't split it into an array of words on space or anything like that because the character sets might not have a space, so wouldn't split. I suspect I'm going to have to do a regex, but I'm not sure how to set it up. This is basically the pseudo code of what I'm trying to do and I'd appreciate any tips on how to move forward. I apologize if this is an easy thing and I'm missing it, I'm still working on my javascript.
Pseudo code:
var string = "This *^! is abdf random&!# text to x*?ysearch for character sets";
var tempSet = [];
// start a typical for loop
for(string.length bla bla...){
// look for one of those four character sets and if it hits one
if(foundSet == "abdf" | "x*?y" | "*^!" | "&!#")
// push that character set to the tempSet array
tempSet.push(foundSet);
// continue searching for the next set until the string is done
console.log(tempSet);
//expected result = ["*^!", "abdf", "&!#", "x*?y"]
and all the sets are in the array in the order in which they appeared in the string
there is obviously more, but that part I can handle. It's this line
if(??? == "abdf" | "x*?y" | "*^!" | "&!#")
that I don't know really how to tackle. I suspect it should be some kind of regex but can you have a regex like that with a | when doing an if statement? I've done them with a | when doing a map/replace but I've never used a regex in a loop. I also don't know how to get it to search multiple characters at a time. Some of the character sets are 3, some are 4 characters long.
I would appreciate any help or if you have a suggestion on how to approach this in an easier way, that would be great.
Thanks!

You can use a regular expression. Just list all your strings as alternatives separated by |. Characters that have special meaning in regular expressions (e.g. *, ?, ^, $) will need to be escaped with \ (you can safely escape any non-alphanumeric characters -- some will be redundant).
var string = "This *^! is abdf random&!# text to x*?ysearch for character sets";
var tempSet = string.match(/abdf|x\*\?y|\*\^!|&!#/g);
console.log(tempSet);
If you need a loop you can call RegExp.prototype.exec() in a loop.
var string = "This *^! is abdf random&!# text to x*?ysearch for character sets";
var regex = /abdf|x\*\?y|\*\^!|&!#/g;
var tempSet = [];
while (match = regex.exec(string)) {
tempSet.push(match[0]);
}
console.log(tempSet);

A bit more of a manual method than Barmar's excellent RegEx, but it was fun to put together and shows the pieces maybe a bit more clearly:
var text = "This *^! is abdf random&!# text to x*?ysearch for character sets",
detect = ["abdf", "x*?y", "*^!", "&!#"],
haystack = '',
found = [];
text.split('').forEach(function(letter){
haystack += letter;
detect.forEach(function(needle){
if (haystack.indexOf(needle) !== -1
&& found.indexOf(needle) === -1) {
found.push(needle);
}
});
});
console.log(found);

I think what you're looking for is the includes() function.
var sample = "This *^! is abdf random&!# text to x*?ysearch for character
sets";
var toSearch = ["*^!", "abdf", "&!#", "x*?y"];
var tempSet = [];
for (var i = 0; i < toSearch.length; i++) {
if (sample.includes(toSearch[i]){
tempSet.push(toSearch[i]);
}
}
console.log(tempSet);
//expected result = ["*^!", "abdf", "&!#", "x*?y"]
This way you can iterate through an entire array of whatever strings you're searching for and push all matching elements to tempSet.
Note: This is case sensitive, so make sure you consider your check accordingly.

I would just add this as a comment to Kevin's answer if I was able to, but if you need IE support you can also check searchString.indexOf(searchToken) !== -1.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/includes
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/indexOf

Related

Find and replace all strings with a certain length JavaScript/Google Script

I am a JavaScript/GoogleScript Rookie, so please bear with me. I am trying to create a Script in Google Docs that will be able to locate all instances of words having exactly 10 characters and append an element to them which would in turn give me a url.
Example : Here is my link pineapples
I would like to find the 10 character string, being pineapple, and add google.com/ in front of each of the strings that have a length of 10.
Giving me "Here is my link google.com/pineapples."
function myFunction() {
var str = document.getElementById(str.length=10);
var res = str.replace("str.length=10", "br"+"str.length=10");
This seems completely wrong, but all I can come up with for now.
You can make it work by using a Regex and then using a backreference to refer to the matching group.
Regex: (\S{10})
it has 3 parts
\S matches anything other than a space, tab or newline.
{10} matches the above character exactly 10 times.
() is the Capturing Group, which is used later in the regex $1.
You can get more information here which explain the above Regex in detail.
You may change it to fit your need.
var stringVal = "Here is my link pineapples";
var stringReplaced = stringVal.replace(/(\S{10})/, "google.com/$1");
console.log(stringReplaced);
Here is a possible solution:
Split your string using space as a separator (this will give you an array)
Test the length of each part in a loop
Prepend google.com/ if a part has 10 characters
Join your array and enjoy your transformed string
var str = "Here is my link pineapples",
arr = str.split(' ');
for (var i = 0; i < arr.length; i++) {
if (arr[i].length === 10) {
arr[i] = 'google.com/' + arr[i];
}
}
console.log(arr.join(' '));
Okay so bear with me, but my idea is as follows:
The text that you want to replace, are they all within elements of the same class? If so, you could do something like this (jQuery hope you don't mind)
function myFunction(){
$('myClass').each(function(){
var innerText = $(this).text();
var substring = innerText.substr(0,9);
$(this).text(substring);
}
}

Regex trying to match characters before and after symbol

I'm trying to match characters before and after a symbol, in a string.
string: budgets-closed
To match the characters before the sign -, I do: ^[a-z]+
And to match the other characters, I try: \-(\w+) but, the problem is that my result is: -closed instead of closed.
Any ideas, how to fix it?
Update
This is the piece of code, where I was trying to apply the regex http://jsfiddle.net/trDFh/1/
I repeat: It's not that I don't want to use split; it's just I was really curious, and wanted to see, how can it be done the regex way. Hacking into things spirit
Update2
Well, using substring is a solution as well: http://jsfiddle.net/trDFh/2/ and is the one I chosed to use, since the if in question, is actually an else if in a more complex if syntax, and the chosen solutions seems to be the most fitted for now.
Use exec():
var result=/([^-]+)-([^-]+)/.exec(string);
result is an array, with result[1] being the first captured string and result[2] being the second captured string.
Live demo: http://jsfiddle.net/Pqntk/
I think you'll have to match that. You can use grouping to get what you need, though.
var str = 'budgets-closed';
var matches = str.match( /([a-z]+)-([a-z]+)/ );
var before = matches[1];
var after = matches[2];
For that specific string, you could also use
var str = 'budgets-closed';
var before = str.match( /^\b[a-z]+/ )[0];
var after = str.match( /\b[a-z]+$/ )[0];
I'm sure there are better ways, but the above methods do work.
If the symbol is specifically -, then this should work:
\b([^-]+)-([^-]+)\b
You match a boundry, any "not -" characters, a - and then more "not -" characters until the next word boundry.
Also, there is no need to escape a hyphen, it only holds special properties when between two other characters inside a character class.
edit: And here is a jsfiddle that demonstrates it does work.

JavaScript: How can I remove any words containing (or directly preceding) capital letters, numbers, or commas, from a string?

I'm trying to write the code so it removes the "bad" words from the string (the text).
The word is "bad" if it has comma or any special sign thereafter. The word is not "bad" if it contains only a to z (small letters).
So, the result I'm trying to achieve is:
<script>
String.prototype.azwords = function() {
return this.replace(/[^a-z]+/g, "0");
}
var res = "good Remove remove1 remove, ### rem0ve? RemoVE gooood remove.".azwords();//should be "good gooood"
//Remove has a capital letter
//remove1 has 1
//remove, has comma
//### has three #
//rem0ve? has 0 and ?
//RemoVE has R and V and E
//remove. has .
alert(res);//should alert "good gooood"
</script>
Try this:
return this.replace(/(^|\s+)[a-z]*[^a-z\s]\S*(?!\S)/g, "");
It tries to match a word (that is surrounded by whitespaces / string ends) and contains any (non-whitespace) character but at least one that is not a-z. However, this is quite complicated and unmaintainable. Maybe you should try a more functional approach:
return this.split(/\s+/).filter(function(word) {
return word && !/[^a-z]/.test(word);
}).join(" ");
okay, first off you probably want to use the word boundary escape \b in your regex. Also, it's a bit tricky if you match the bad words, because a bad word might contain lower case chars, so your current regex will exclude anything which does have lowecase letters.
I'd be tempted to pick out the good words and put them in a new string. It's a much easier regex.
/\b[a-z]+\b/g
NB: I'm not totally sure that it'll work for the first and last words in the string so you might need to account for that as well. http://www.regextester.com/ is exceptionally useful.
EDIT: as you want punctiation after the word to be 'bad', this will actually do what I was suggesting
(^|\s)[a-z]+(\s|$)
Firstly I wouldn't recommend changing the prototype of String (or of any native object) if you can avoid because you leave yourself open to conflicts with other code that might define the same property in different ways. Much better to put custom methods like this on a namespaced object, though I'm sure some will disagree.
Second, is there any need to use RegEx completely? (Genuine question; not trying to be facetious.)
Here is an example of the function with plain old JS using a little bit of RegEx here and there. Easier to comment, debug, and reuse.
Here is the code:
var azwords = function(str) {
var arr = str.split(/\s+/),
len = arr.length,
i = 0,
res = "";
for (i; i < len; i += 1) {
if (!(arr[i].match(/[^a-z]/))) {
res += (!res) ? arr[i] : " " + arr[i];
}
}
return res;
}
var res = "good Remove remove1 remove, ### rem0ve? RemoVE gooood remove."; //should be "good gooood"
//Remove has a capital letter
//remove1 has 1
//remove, has comma
//### has three #
//rem0ve? has 0 and ?
//RemoVE has R and V and E
//remove. has .
alert(azwords(res));//should alert "good gooood";
Try this one:
var res = "good Remove remove1 remove, ### rem0ve? RemoVE gooood remove.";
var new_one = res.replace(/\s*\w*[#A-Z0-9,.?\\xA1-\\xFF]\w*/g,'');
//Output `good gooood`
Description:
\s* # zero-or-more spaces
\w* # zero-or-more alphanumeric characters
[#A-Z0-9,.?\\xA1-\\xFF] # matches any list of characters
\w* # zero-or-more alphanumeric characters
/g - global (run over all string)
This will find all the words you want /^[a-z]+\s|\s[a-z]+$|\s[a-z]+\s/g so you could use match.
this.match(/^[a-z]+\s|\s[a-z]+$|\s[a-z]+\s/g).join(" "); should return the list of valid words.
Note that this took some time as a JSFiddle so it maybe more efficient to split and iterate your list.

Counting items based on multiple separators?

I need to count the number of email addresses that a user inputs. Those addresses could be separated by any of the following:
Comma followed by no space - a#example.com,c#example.com.com
Comma followed by any number of spaces (ie. someone might have a comma follow by 3 spaces or just 1) - a#example.com, c#example.com.com
Only white space - a#example.com c#example.com.com
New line
What's a good way to clean that up and reliably count the addresses?
I assume regular 'ole javascript could handle this, but for what it's worth I am using jQuery.
The simplest way is just replace all commas with whitespaces, then, split your string based on blank spaces. No need for conditions.
Here's a fiddle with an example on that.
var emails = input.split(/[\s,]+/);
FIDDLE
var str="YOUR_STR",
arr = [];
if( str.indexOf(',') >= 0 ) {
// if comma found then replace all extra space and split with comma
arr = str.replace(/\s/g,'').split(',');
} else {
// if comma not found
arr = str.split(' ');
}
var l = "a#example.com,c#example.com.com a#example.com, c#example.com.com a#example.com c#example.com.com";
var r = l.split(/ |, |,/);
Regular expressions make that fairly easy.
If there is change of more than one space, the regex can be changed a bit.
var r = l.split(/ +|, +|,/);

remove all but a specific portion of a string in javascript

I am writing a little app for Sharepoint. I am trying to extract some text from the middle of a field that is returned:
var ows_MetaInfo="1;#Subject:SW|NameOfADocument
vti_parservers:SR|23.0.0.6421
ContentTypeID:SW|0x0101001DB26Cf25E4F31488B7333256A77D2CA
vti_cachedtitle:SR|NameOfADocument
vti_title:SR|ATitleOfADocument
_Author:SW:|TheNameOfOurCompany
_Category:SW|
ContentType:SW|Document
vti_author::SR|mrwienerdog
_Comments:SW|This is very much the string I need extracted
vti_categories:VW|
vtiapprovallevel:SR|
vti_modifiedby:SR|mrwienerdog
vti_assignedto:SR|
Keywords:SW|Project Name
ContentType _Comments"
So......All I want returned is "This is very much the string I need extracted"
Do I need a regex and a string replace? How would you write the regex?
Yes, you can use a regular expression for this (this is the sort of thing they are good for). Assuming you always want the string after the pipe (|) on the line starting with "_Comments:SW|", here's how you can extract it:
var matchresult = ows_MetaInfo.match(/^_Comments:SW\|(.*)$/m);
var comment = (matchresult==null) ? "" : matchresult[1];
Note that the .match() method of the String object returns an array. The first (index 0) element will be the entire match (here, we the entire match is the whole line, as we anchored it with ^ and $; note that adding the "m" after the regex makes this a multiline regex, allowing us to match the start and end of any line within the multi-line input), and the rest of the array are the submatches that we capture using parenthesis. Above we've captured the part of the line that you want, so that will present in the second item in the array (index 1).
If there is no match ("_Comments:SW|" doesnt appear in ows_MetaInfo), then .match() will return null, which is why we test it before pulling out the comment.
If you need to adjust the regex for other scenarios, have a look at the Regex docs on Mozilla Dev Network: https://developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions
You can use this code:
var match = ows_MetaInfo.match(/_Comments:SW\|([^\n]+)/);
if (match)
document.writeln(match[1]);
I'm far from competent with RegEx, so here is my RegEx-less solution. See comments for further detail.
var extractedText = ExtractText(ows_MetaInfo);
function ExtractText(arg) {
// Use the pipe delimiter to turn the string into an array
var aryValues = ows_MetaInfo.split("|");
// Find the portion of the array that contains "vti_categories:VW"
for (var i = 0; i < aryValues.length; i++) {
if (aryValues[i].search("vti_categories:VW") != -1)
return aryValues[i].replace("vti_categories:VW", "");
}
return null;
}​
Here's a working fiddle to demonstrate.

Categories