Parse through a string to create an array of substrings - javascript

I am building a mini search engine on my website that can search for words and has filters.
I need to be able to take a long string, and split it up into an array of smaller substrings. The words (with no filter) should go in one string, and then each filter should go in a separate string. The order of words and filters should not matter.
For example:
If my string is:
"hello before: 01/01/17 after: 01/01/2015"
"before: 01/01/17 hello after: 01/01/2015"
I would expect my function to return (in any order):
["hello", "before: 01/01/2017", "after: 01/01/2015"]

You could use whitespace and a positive lookahead for splitting.
console.log("hello before: 01/01/17 after: 01/01/2015".split(/\s*(?=before|after)/));

Are there any specific limitations for code size? I mean, this isn't code-golf or anything, so why not just do it the straight-forward way?
First, you can tokenize this with a simple regular expression
var search_string = "hello before: 01/01/17 after: 01/01/2015";
var regex = /(?:(before|after)\:\s*)?([^ ]*)/g
var token = null;
while ((token = regex.exec(search_string)) != null) {
Then, you can put the arrange them into any data structure you want. For example, we can put the filters into a separate object, as so:
var filters = {};
var words = [];
if (token[1])
filters[token[1]] = token[2];
After that, you can manipulate these structures any way you want
if (filters['before']) words.push(filters['before']);
if (filters['after']) words.push(filters['after']);
return words;
I'm not sure why you'd want it arranged this way, but this would make things uniform. Alternately, you can use them in a more straightforward way:
var before = Date.parse(filters['before'] || '') || false;
if (before !== false) before = new Date(before);
var after = Date.parse(filters['after'] || '') || false;
if (after !== false) before = new Date(before);
function isDocumentMatchSearch(doc) {
if (before !== false && > before) return false;
if (after !== false && < after) return false;
for (var i = 0; i < words.length; i++) {
if (doc.title.indexOf(words[i]) < 0 &&doc.text.indexOf(words[i]) < 0) return false;
return true;
Since you didn't give a lot of information on what you're searching through, what data types or storage type it's stored in, etc etc, that's the best I can offer.


Check if element contains one of three string in JavaScript or jQuery

When DOM ready, I would like to check if an element on the page contains one of three strings. Currently, I am doing it like this, but this is not working as expected because indexOf is usually used in combination with arrays:
jQuery('li.woocommerce-order-overview__payment-method.method')[0].innerHTML.indexOf('iDEAL' || 'Sofortbanking' || 'Bancontact / MisterCash') > 1
How can I rewrite this in the most effective way to check if the element contains one of the three strings?
You can use an array of needles, and then Array.some to check if the element contains any of them
let needles = ['iDEAL', 'Sofortbanking', 'Bancontact', 'Bancontact / MisterCash'];
let haystack = $('li.woocommerce-order-overview__payment-method.method').html();
let result = needles.some( needle => haystack.includes( needle ) );
Here's my try on it.
var html, salient;
html = $("li.woocommerce-order-overview__payment-method.method").html();
salient = [ "iDEAL", "Sofortbanking", "Bancontact / MisterCash" ];
for (i = 0; i < salient.length; i++)
if (html.indexOf(salient[i]) !== -1)
// it contains string i
Or, if you want to use a RegEx, which is not necessarily better:
var html;
html = $("li.woocommerce-order-overview__payment-method.method").html();
if (html.match(/(?:iDEAL|Sofortbanking|Bancontact.*MisterCash)/))
// it contains one of those strings
If you want to get the Stein which is present, Use match function.
If you just want to check whether it is there or not, then use the test in JavaScript
The regular expression
Will match 1 or more occurances of that words
var html = ...
var match = new Array('iDEAL' , 'Sofortbanking' , 'Bancontact / MisterCash');
var found = false;
for(var i=0;i<3;++i){
found = true;
//use `found`
The || (Or operator) does not work that way,
I would suggest using the || outside of indexOf()
Like so
indexOf('iDEAL') > 1 || indexOf('Sofortbanking') > 1 || indexOf('...')>1
Have a Look at a similar answer here

Filter options by reading character length inside for loop

I have a widget (the widget code in the pen linked below is not the actual code, please just pay attention to the filtering function jQuery.fn.doFilterOptions(){..}).
Use case:
I have a non-native selectbox. I need to extend its functionality to accept an onclick event which allows the user to type data into the selectbox (not targeting a traditional <select>), it should filter the .options available by simply showing or hiding them based on its inner HTML value, if no match is found at any point during the loop through the string being entered by the user, I need the options to continue not being displayed.
Right now it works 95% of the way, the only issue is that if an invalid char is found, the loop keeps checking the rest of the users entries char by char, and if the next char is a match to any of the options in the same index, it re-display's this as a valid .option.
$('.selectbox .selected').on('keyup', function(){
var theseOptions = $(this).parent('.selectbox').find('.option');
var defaultPlaceholder = $(this).data('placeholder');
var filterOptions = (function(curSelectedVal){
if (curSelectedVal === ' ' || curSelectedVal.length === 0 || curSelectedVal === defaultPlaceholder){;
var optionsVal;
var doInputOptionsComparison = (function(){
var invalidOption = false;
for (var letterPos = 0; letterPos < curSelectedVal.length; letterPos++){
var thisOption = $(this);
thisOptionsVal = thisOption.html();
if (curSelectedVal.length > thisOptionsVal.length ){ // If a longer string has been input by the user than exists in the option being iterated over, hide this option
invalidOption = true;
else if ((thisOptionsVal[letterPos].toLowerCase().trim() === curSelectedVal[letterPos].toLowerCase().trim()) && invalidOption === false){ // If the input string matches this option and no invalid options have been found in the letterPos prior to it, show this option;
else { // If the string does not match any option
invalidOptionFound = true;
Here is the demo, try selecting then typing abz you will see the filter working properly.
Now erase that input data, and now type azc. You will see the abc option comes available again because the c matches in that same index (user input[i] = optionsHtml[i] = show();), resulting the the above described undesirable effect.
Would this be easier by using regEx to do the filtering?
I managed to use a dynamic regEx filter function it it cut the code down big time! Wow what a better solution.
$.fn.filterOptionsByUserInput = function(optionSelector){
var curInput = $(this).html().trim().replace(/ /g, '').toLowerCase();
var userInputRegEx = new RegExp('^'+curInput+'.*');
if ($(this).html().toLowerCase().trim().match(userInputRegEx)){
else {

Javascript if value is in array else in next array

I have found a few posts on here with similar questions but not entirely the same as what I am trying. I am currently using a simple if statement that checks the data the user enters then checks to see if it starts with a number of different values. I am doing this with the following:
var value = string;
var value = value.toLowerCase();
country = "NONE";
county = "NONE";
if (value.indexOf('ba1 ') == 0 || value.indexOf('ba2 ') == 0 || value.indexOf('ba3 ') == 0) { //CHECK AVON (MAINLAND UK) UK.AVON
country = "UK";
county = "UK.AVON";
} else if(value.indexOf('lu') == 0){//CHECK BEDFORDSHIRE (MAINLAND UK) UK.BEDS
country = "UK";
county = "UK.BEDS";
I have about 20-30 different if, else statements that are basically checking the post code entered and finding the county associated. However some of these if statements are incredibly long so I would like to store the values inside an array and then in the if statement simply check value.indexOf() for each of the array values.
So in the above example I would have an array as follows for the statement:
var avon = new Array('ba1 ','ba 2','ba3 ');
then inside the indexOf() use each value
Would this be possible with minimal script or am I going to need to make a function for this to work? I am ideally wanting to keep the array inside the if statement instead of querying for each array value.
You can use the some Array method (though you might need to shim it for legacy environments):
var value = string.toLowerCase(),
country = "NONE",
county = "NONE";
if (['ba1 ','ba 2','ba3 '].some(function(str) {
return value.slice(0, str.length) === str;
})) {
country = "UK";
county = "UK.AVON";
(using a more performant How to check if a string "StartsWith" another string? implementation also)
For an even shorter condition, you might also resort to regex (anchor and alternation):
if (/^ba(1 | 2|3 )/i.test(string)) { … }
No, it doesn’t exist, but you can make a function to do just that:
function containsAny(string, substrings) {
for(var i = 0; i < substrings.length; i++) {
if(string.indexOf(substrings[i]) !== -1) {
return true;
return false;
Alternatively, there’s a regular expression:
/ba[123] /.test(value)
My recomendation is to rethink your approach and use regular expressions instead of indexOf.
But if you really need it, you can use the following method:
function checkStart(value, acceptableStarts){
for (var i=0; i<acceptableStarts.length; i++) {
if (value.indexOf(acceptableStarts[i]) == 0) {
return true;
return false;
Your previous usage turns into:
if (checkStart(value, ['ba1', ba2 ', 'ba3'])) {
country = 'UK';
Even better you can generalize stuff, like this:
var countryPrefixes = {
'UK' : ['ba1','ba2 ', 'ba3'],
'FR' : ['fa2','fa2']
for (var key in countryPrefixes) {
if (checkStart(value, countryPrefixes[key]) {
country = key;
I'd forget using hard-coded logic for this, and just use data:
var countyMapping = {
'BA1': 'UK.AVON',
'BA2': 'UK.AVON',
'BA3': 'UK.AVON',
'LU': 'UK.BEDS',
Take successive characters off the right hand side of the postcode and do a trivial lookup in the table until you get a match. Four or so lines of code ought to do it:
function getCounty(str) {
while (str.length) {
var res = countyMapping[str];
if (res !== undefined) return res;
str = str.slice(0, -1);
I'd suggest normalising your strings first to ensure that the space between the two halves of the postcode is present and in the right place.
For extra bonus points, get the table out of a database so you don't have to modify your code when Scotland gets thrown out of leaves the UK ;-)

How to get an Array of all words used on a page

So I'm trying to get an array of all the words used in my web page.
Should be easy, right?
The problem I run into is that $("body").text().split(" ") returns an array where the words at the beginning of one element and end of another are joined as one.
<div id="1">Hello
<div id="2">World</div>
returns ["HelloWorld"] when I want it to return ["Hello", "World"].
I also tried:
wordArr = [];
function getText(target)
var testArr = $(this).text().split(" ");
for(var i =0; i < testArr.length; i++)
but $(node).children() is truthy for any node in the DOM that exists, so that didn't work.
I'm sure I'm missing something obvious, so I'd appreciate an extra set of eyes.
For what it's worth, I don't need unique words, just every word in the body of the document as an element in the array. I'm trying to use it to generate context and lexical co-occurrence with another set of words, so duplicates just up the contextual importance of a given word.
Thanks in advance for any ideas.
See Fiddle
How about something like this?
var res = $('body *').contents().map(function () {
if (this.nodeType == 3 && this.nodeValue.trim() != "")
return this.nodeValue.trim();
}).get().join(" ");
Get the array of words:
var res = $('body *').contents().map(function () {
if (this.nodeType == 3 && this.nodeValue.trim() != "") //check for nodetype text and ignore empty text nodes
return this.nodeValue.trim().split(/\W+/); //split the nodevalue to get words.
}).get(); //get the array of words.
function getText(target) {
var wordArr = [];
$('*',target).add(target).each(function(k,v) {
var words = $('*',v.cloneNode(true)).remove().end().text().split(/(\s+|\n)/);
wordArr = wordArr.concat(words.filter(function(n){return n.trim()}));
return wordArr;
you can do this
function getwords(e){
if ( $(this).children().length > 0 ) {
else if($.trim($(this).text())!=""){
The question assumes that words are not internally separated by elements. If you simply create an array of words separated by white space and elements, you will end up with:
being read as
['Fr', 'e', 'd'];
Another thing to consider is punctuation. How do you deal with: "There were three of them: Mark, Sue and Tom. They were un-remarkable. One—the red head—was in the middle." Do you remove all punctuation? Or replace it with white space before trimming? How do you re-join words that are split by markup or characters that might be inter–word or intra–word punctuation? Note that while it is popular to write a dash between words with a space at either side, "correct" punctuation uses an m dash with no spaces.
Not so simple…
Anyhow, an approach that just splits on spaces and elements using recursion and works in any browser in use without any library support is:
function getWords(element) {
element = element || document.body;
var node, nodes = element.childNodes;
var words = [];
var text, i=0;
while (node = nodes[i++]) {
if (node.nodeType == 1) {
words = words.concat(getWords(node));
} else if (node.nodeType == 3) {
text =^\s+|\s+$/g,'').replace(/\s+/g,' ');
words = !text.length? words : words.concat(text.split(/\s/));
return words;
but it does not deal with the issues above.
To avoid script elements, change:
if (node.nodeType == 1) {
if (node.nodeType == 1 && node.tagName.toLowerCase() != 'script') {
Any element that should be avoided can be added to the condition. If a number of element types should be avoided, you can do:
var elementsToAvoid = {script:'script', button:'button'};
if (node.nodeType == 1 && node.tagName && !(node.tagName.toLowerCase() in elementsToAvoid)) {

Regular expression to get class name with specific substring

I need a regular expression in javascript that will get a string with a specific substring from a list of space delimited strings.
For example, I have;
widget util cookie i18n-username
I want to be able to return only i18n-username.
You could use the following function, using a regex to match for your string surrounded by either a space or the beginning or end of a line. But you'll have to be careful about preparing any regular expression special characters if you plan to use them, since the search argument will be interpreted as a string instead of a RegExp literal:
var hasClass = function(s, klass) {
var r = new RegExp("(?:^| )(" + klass + ")(?: |$)")
, m = (""+s).match(r);
return (m) ? m[1] : null;
hasClass("a b c", "a"); // => "a"
hasClass("a b c", "b"); // => "b"
hasClass("a b c", "x"); // => null
var klasses = "widget util cookie i18n-username";
hasClass(klasses, "username"); // => null
hasClass(klasses, "i18n-username"); // => "i18n-username"
hasClass(klasses, "i18n-\\w+"); // => "i18n-username"
As others have pointed out, you could also simply use a "split" and "indexOf":
var hasClass = function(s, klass) {
return (""+s).split(" ").indexOf(klass) >= 0;
However, note that the "indexOf" function was introduced to JavaScript somewhat recently, so for older browsers you might have to implement it yourself.
var hasClass = function(s, klass) {
var a=(""+s).split(" "), len=a.length, i;
for (i=0; i<len; i++) {
if (a[i] == klass) return true;
return false;
Note that the split/indexOf solution is likely faster for most browsers (though not all). This jsPerf benchmark shows which solution is faster for various browsers - notably, Chrome must have a really good regular expression engine!
function getString(subString, string){
return (string.match(new RegExp("\S*" + subString + "\S*")) || [null])[0];
To Use:
var str = "widget util cookie i18n-username";
getString("user", str); //returns i18n-username
Does this need to be a regex? Would knowing if the string existed be sufficient? Regular expressions are inefficient (slower) and should be avoided if possible:
var settings = 'widget util cookie i18n-username',
// using an array in case searching the string is insufficient
features = settings.split(' ');
if (features.indexOf('i18n-username') !== -1) {
// do something based on having this feature
If whitespace wouldn't cause an issue in searching for a value, you could just search the string directly:
var settings = 'widget util cookie i18n-username';
if (settings.indexOf('i18n-username') !== -1) {
// do something based on having this value
It then becomes easy to make this into a reusable function:
(function() {
var App = {},
features = 'widget util cookie i18n-username';
App.hasFeature = function(feature) {
return features.indexOf(feature) !== -1;
// or if you prefer the array:
return features.split(' ').indexOf(feature) !== -1;
window.App = App;
// Here's how you can use it:
App.hasFeature('i18n-username'); // returns true
You now say you need to return all strings that start with another string, and it is possible to do this with a regular expression as well, although I am unsure about how efficient it is:
(function() {
var App = {},
features = 'widget util cookie i18n-username'.split(' ');
// This contains an array of all features starting with 'i18n'
App.i18nFeatures = {
return value.indexOf('i18n') === 0;
window.App = App;
/i18n-\w+/ ought to work. If your string has any cases like other substrings can start with i18n- or your user names have chars that don't fit the class [a-zA-Z0-9_], you'll need to specify that.
var str = "widget util cookie i18n-username";
If you need to match more than one string, you can add on the global flag (/g) and loop through the matches.
var str = "widget i18n-util cookie i18n-username";
var matches = str.match(/i18n-\w+/g);
if (matches) {
for (var i = 0; i < matches.length; i++)
alert("phooey, no matches");
