How do I split a string with multiple separators in JavaScript?
I'm trying to split on both commas and spaces, but AFAIK JavaScript's split() function only supports one separator.
Pass in a regexp as the parameter:
js> "Hello awesome, world!".split(/[\s,]+/)
Hello,awesome,world!
Edited to add:
You can get the last element by selecting the length of the array minus 1:
>>> bits = "Hello awesome, world!".split(/[\s,]+/)
["Hello", "awesome", "world!"]
>>> bit = bits[bits.length - 1]
"world!"
... and if the pattern doesn't match:
>>> bits = "Hello awesome, world!".split(/foo/)
["Hello awesome, world!"]
>>> bits[bits.length - 1]
"Hello awesome, world!"
You can pass a regex into JavaScript's split() method. For example:
"1,2 3".split(/,| /)
["1", "2", "3"]
Or, if you want to allow multiple separators together to act as one only:
"1, 2, , 3".split(/(?:,| )+/)
["1", "2", "3"]
You have to use the non-capturing (?:) parenthesis, because
otherwise it gets spliced back into the result. Or you can be smart
like Aaron and use a character class.
Examples tested in Safari and Firefox.
Another simple but effective method is to use split + join repeatedly.
"a=b,c:d".split('=').join(',').split(':').join(',').split(',')
Essentially doing a split followed by a join is like a global replace so this replaces each separator with a comma then once all are replaced it does a final split on comma
The result of the above expression is:
['a', 'b', 'c', 'd']
Expanding on this you could also place it in a function:
function splitMulti(str, tokens){
var tempChar = tokens[0]; // We can use the first token as a temporary join character
for(var i = 1; i < tokens.length; i++){
str = str.split(tokens[i]).join(tempChar);
}
str = str.split(tempChar);
return str;
}
Usage:
splitMulti('a=b,c:d', ['=', ',', ':']) // ["a", "b", "c", "d"]
If you use this functionality a lot it might even be worth considering wrapping String.prototype.split for convenience (I think my function is fairly safe - the only consideration is the additional overhead of the conditionals (minor) and the fact that it lacks an implementation of the limit argument if an array is passed).
Be sure to include the splitMulti function if using this approach to the below simply wraps it :). Also worth noting that some people frown on extending built-ins (as many people do it wrong and conflicts can occur) so if in doubt speak to someone more senior before using this or ask on SO :)
var splitOrig = String.prototype.split; // Maintain a reference to inbuilt fn
String.prototype.split = function (){
if(arguments[0].length > 0){
if(Object.prototype.toString.call(arguments[0]) == "[object Array]" ) { // Check if our separator is an array
return splitMulti(this, arguments[0]); // Call splitMulti
}
}
return splitOrig.apply(this, arguments); // Call original split maintaining context
};
Usage:
var a = "a=b,c:d";
a.split(['=', ',', ':']); // ["a", "b", "c", "d"]
// Test to check that the built-in split still works (although our wrapper wouldn't work if it didn't as it depends on it :P)
a.split('='); // ["a", "b,c:d"]
Enjoy!
Lets keep it simple: (add a "[ ]+" to your RegEx means "1 or more")
This means "+" and "{1,}" are the same.
var words = text.split(/[ .:;?!~,`"&|()<>{}\[\]\r\n/\\]+/); // note ' and - are kept
Tricky method:
var s = "dasdnk asd, (naks) :d skldma";
var a = s.replace('(',' ').replace(')',' ').replace(',',' ').split(' ');
console.log(a);//["dasdnk", "asd", "naks", ":d", "skldma"]
I'm suprised no one has suggested it yet, but my hack-ey (and crazy fast) solution was to just append several 'replace' calls before splitting by the same character.
i.e. to remove a, b, c, d, and e:
let str = 'afgbfgcfgdfgefg'
let array = str.replace('a','d').replace('b','d').replace('c','d').replace('e','d').split('d')
this can be conveniently generalized for an array of splitters as follows:
function splitByMany( manyArgs, string ) {
do {
let arg = manyArgs.pop()
string = string.replace(arg, manyArgs[0])
} while (manyArgs.length > 2)
return string.split(manyArgs[0])
}
So, in your case, you could then call
let array = splitByMany([" ", ","], 'My long string containing commas, and spaces, and more commas');
For those of you who want more customization in their splitting function, I wrote a recursive algorithm that splits a given string with a list of characters to split on. I wrote this before I saw the above post. I hope it helps some frustrated programmers.
splitString = function(string, splitters) {
var list = [string];
for(var i=0, len=splitters.length; i<len; i++) {
traverseList(list, splitters[i], 0);
}
return flatten(list);
}
traverseList = function(list, splitter, index) {
if(list[index]) {
if((list.constructor !== String) && (list[index].constructor === String))
(list[index] != list[index].split(splitter)) ? list[index] = list[index].split(splitter) : null;
(list[index].constructor === Array) ? traverseList(list[index], splitter, 0) : null;
(list.constructor === Array) ? traverseList(list, splitter, index+1) : null;
}
}
flatten = function(arr) {
return arr.reduce(function(acc, val) {
return acc.concat(val.constructor === Array ? flatten(val) : val);
},[]);
}
var stringToSplit = "people and_other/things";
var splitList = [" ", "_", "/"];
splitString(stringToSplit, splitList);
Example above returns: ["people", "and", "other", "things"]
Note: flatten function was taken from Rosetta Code
You could just lump all the characters you want to use as separators either singularly or collectively into a regular expression and pass them to the split function. For instance you could write:
console.log( "dasdnk asd, (naks) :d skldma".split(/[ \(,\)]+/) );
And the output will be:
["dasdnk", "asd", "naks", ":d", "skldma"]
Here are some cases that may help by using Regex:
\W to match any character else word character [a-zA-Z0-9_]. Example:
("Hello World,I-am code").split(/\W+/); // would return [ 'Hello', 'World', 'I', 'am', 'code' ]
\s+ to match One or more spaces
\d to match a digit
if you want to split by some characters only let us say , and - you can use str.split(/[,-]+/)...etc
My refactor of #Brian answer
var string = 'and this is some kind of information and another text and simple and some egample or red or text';
var separators = ['and', 'or'];
function splitMulti(str, separators){
var tempChar = 't3mp'; //prevent short text separator in split down
//split by regex e.g. \b(or|and)\b
var re = new RegExp('\\b(' + separators.join('|') + ')\\b' , "g");
str = str.replace(re, tempChar).split(tempChar);
// trim & remove empty
return str.map(el => el.trim()).filter(el => el.length > 0);
}
console.log(splitMulti(string, separators))
Here is a new way to achieving same in ES6:
function SplitByString(source, splitBy) {
var splitter = splitBy.split('');
splitter.push([source]); //Push initial value
return splitter.reduceRight(function(accumulator, curValue) {
var k = [];
accumulator.forEach(v => k = [...k, ...v.split(curValue)]);
return k;
});
}
var source = "abc,def#hijk*lmn,opq#rst*uvw,xyz";
var splitBy = ",*#";
console.log(SplitByString(source, splitBy));
Please note in this function:
No Regex involved
Returns splitted value in same order as it appears in source
Result of above code would be:
Hi for example if you have split and replace in String 07:05:45PM
var hour = time.replace("PM", "").split(":");
Result
[ '07', '05', '45' ]
I will provide a classic implementation for a such function. The code works in almost all versions of JavaScript and is somehow optimum.
It doesn't uses regex, which is hard to maintain
It doesn't uses new features of JavaScript
It doesn't uses multiple .split() .join() invocation which require more computer memory
Just pure code:
var text = "Create a function, that will return an array (of string), with the words inside the text";
println(getWords(text));
function getWords(text)
{
let startWord = -1;
let ar = [];
for(let i = 0; i <= text.length; i++)
{
let c = i < text.length ? text[i] : " ";
if (!isSeparator(c) && startWord < 0)
{
startWord = i;
}
if (isSeparator(c) && startWord >= 0)
{
let word = text.substring(startWord, i);
ar.push(word);
startWord = -1;
}
}
return ar;
}
function isSeparator(c)
{
var separators = [" ", "\t", "\n", "\r", ",", ";", ".", "!", "?", "(", ")"];
return separators.includes(c);
}
You can see the code running in playground:
https://codeguppy.com/code.html?IJI0E4OGnkyTZnoszAzf
Splitting URL by .com/ or .net/
url.split(/\.com\/|\.net\//)
a = "a=b,c:d"
array = ['=',',',':'];
for(i=0; i< array.length; i++){ a= a.split(array[i]).join(); }
this will return the string without a special charecter.
I ran into this question wile looking for a replacement for the C# string.Split() function which splits a string using the characters in its argument.
In JavaScript you can do the same using map an reduce to iterate over the splitting characters and the intermediate values:
let splitters = [",", ":", ";"]; // or ",:;".split("");
let start= "a,b;c:d";
let values = splitters.reduce((old, c) => old.map(v => v.split(c)).flat(), [start]);
// values is ["a", "b", "c", "d"]
flat() is used to flatten the intermediate results so each iteration works on a list of strings without nested arrays. Each iteration applies split to all of the values in old and then returns the list of intermediate results to be split by the next value in splitters. reduce() is initialized with an array containing the initial string value.
I find that one of the main reasons I need this is to split file paths on both / and \. It's a bit of a tricky regex so I'll post it here for reference:
var splitFilePath = filePath.split(/[\/\\]/);
I think it's easier if you specify what you wanna leave, instead of what you wanna remove.
As if you wanna have only English words, you can use something like this:
text.match(/[a-z'\-]+/gi);
Examples (run snippet):
var R=[/[a-z'\-]+/gi,/[a-z'\-\s]+/gi];
var s=document.getElementById('s');
for(var i=0;i<R.length;i++)
{
var o=document.createElement('option');
o.innerText=R[i]+'';
o.value=i;
s.appendChild(o);
}
var t=document.getElementById('t');
var r=document.getElementById('r');
s.onchange=function()
{
r.innerHTML='';
var x=s.value;
if((x>=0)&&(x<R.length))
x=t.value.match(R[x]);
for(i=0;i<x.length;i++)
{
var li=document.createElement('li');
li.innerText=x[i];
r.appendChild(li);
}
}
<textarea id="t" style="width:70%;height:12em">even, test; spider-man
But saying o'er what I have said before:
My child is yet a stranger in the world;
She hath not seen the change of fourteen years,
Let two more summers wither in their pride,
Ere we may think her ripe to be a bride.
—Shakespeare, William. The Tragedy of Romeo and Juliet</textarea>
<p><select id="s">
<option selected>Select a regular expression</option>
<!-- option value="1">/[a-z'\-]+/gi</option>
<option value="2">/[a-z'\-\s]+/gi</option -->
</select></p>
<ol id="r" style="display:block;width:auto;border:1px inner;overflow:scroll;height:8em;max-height:10em;"></ol>
</div>
I don't know the performance of RegEx, but here is another alternative for RegEx leverages native HashSet and works in O( max(str.length, delimeter.length) ) complexity instead:
var multiSplit = function(str,delimiter){
if (!(delimiter instanceof Array))
return str.split(delimiter);
if (!delimiter || delimiter.length == 0)
return [str];
var hashSet = new Set(delimiter);
if (hashSet.has(""))
return str.split("");
var lastIndex = 0;
var result = [];
for(var i = 0;i<str.length;i++){
if (hashSet.has(str[i])){
result.push(str.substring(lastIndex,i));
lastIndex = i+1;
}
}
result.push(str.substring(lastIndex));
return result;
}
multiSplit('1,2,3.4.5.6 7 8 9',[',','.',' ']);
// Output: ["1", "2", "3", "4", "5", "6", "7", "8", "9"]
multiSplit('1,2,3.4.5.6 7 8 9',' ');
// Output: ["1,2,3.4.5.6", "7", "8", "9"]
I solved this with reduce and filter. It might not be the most readable solution, or the fastest, and in real life I would probably use Aarons answere here, but it was fun to write.
[' ','_','-','.',',',':','#'].reduce(
(segs, sep) => segs.reduce(
(out, seg) => out.concat(seg.split(sep)), []),
['E-mail Address: user#domain.com, Phone Number: +1-800-555-0011']
).filter(x => x)
Or as a function:
function msplit(str, seps) {
return seps.reduce((segs, sep) => segs.reduce(
(out, seg) => out.concat(seg.split(sep)), []
), [str]).filter(x => x);
}
This will output:
['E','mail','Address','user','domain','com','0','Phone','Number','+1','800','555','0011']
Without the filter at the end you would get empty strings in the array where two different separators are next to each other.
Not the best way but works to Split with Multiple and Different seperators/delimiters
html
<button onclick="myFunction()">Split with Multiple and Different seperators/delimiters</button>
<p id="demo"></p>
javascript
<script>
function myFunction() {
var str = "How : are | you doing : today?";
var res = str.split(' | ');
var str2 = '';
var i;
for (i = 0; i < res.length; i++) {
str2 += res[i];
if (i != res.length-1) {
str2 += ",";
}
}
var res2 = str2.split(' : ');
//you can add countless options (with or without space)
document.getElementById("demo").innerHTML = res2;
}
</script>
Starting from #stephen-sweriduk solution (that was the more interesting to me!), I have slightly modified it to make more generic and reusable:
/**
* Adapted from: http://stackoverflow.com/questions/650022/how-do-i-split-a-string-with-multiple-separators-in-javascript
*/
var StringUtils = {
/**
* Flatten a list of strings
* http://rosettacode.org/wiki/Flatten_a_list
*/
flatten : function(arr) {
var self=this;
return arr.reduce(function(acc, val) {
return acc.concat(val.constructor === Array ? self.flatten(val) : val);
},[]);
},
/**
* Recursively Traverse a list and apply a function to each item
* #param list array
* #param expression Expression to use in func
* #param func function of (item,expression) to apply expression to item
*
*/
traverseListFunc : function(list, expression, index, func) {
var self=this;
if(list[index]) {
if((list.constructor !== String) && (list[index].constructor === String))
(list[index] != func(list[index], expression)) ? list[index] = func(list[index], expression) : null;
(list[index].constructor === Array) ? self.traverseListFunc(list[index], expression, 0, func) : null;
(list.constructor === Array) ? self.traverseListFunc(list, expression, index+1, func) : null;
}
},
/**
* Recursively map function to string
* #param string
* #param expression Expression to apply to func
* #param function of (item, expressions[i])
*/
mapFuncToString : function(string, expressions, func) {
var self=this;
var list = [string];
for(var i=0, len=expressions.length; i<len; i++) {
self.traverseListFunc(list, expressions[i], 0, func);
}
return self.flatten(list);
},
/**
* Split a string
* #param splitters Array of characters to apply the split
*/
splitString : function(string, splitters) {
return this.mapFuncToString(string, splitters, function(item, expression) {
return item.split(expression);
})
},
}
and then
var stringToSplit = "people and_other/things";
var splitList = [" ", "_", "/"];
var splittedString=StringUtils.splitString(stringToSplit, splitList);
console.log(splitList, stringToSplit, splittedString);
that gives back as the original:
[ ' ', '_', '/' ] 'people and_other/things' [ 'people', 'and', 'other', 'things' ]
An easy way to do this is to process each character of the string with each delimiter and build an array of the splits:
splix = function ()
{
u = [].slice.call(arguments); v = u.slice(1); u = u[0]; w = [u]; x = 0;
for (i = 0; i < u.length; ++i)
{
for (j = 0; j < v.length; ++j)
{
if (u.slice(i, i + v[j].length) == v[j])
{
y = w[x].split(v[j]); w[x] = y[0]; w[++x] = y[1];
};
};
};
return w;
};
console.logg = function ()
{
document.body.innerHTML += "<br>" + [].slice.call(arguments).join();
}
splix = function() {
u = [].slice.call(arguments);
v = u.slice(1);
u = u[0];
w = [u];
x = 0;
console.logg("Processing: <code>" + JSON.stringify(w) + "</code>");
for (i = 0; i < u.length; ++i) {
for (j = 0; j < v.length; ++j) {
console.logg("Processing: <code>[\x22" + u.slice(i, i + v[j].length) + "\x22, \x22" + v[j] + "\x22]</code>");
if (u.slice(i, i + v[j].length) == v[j]) {
y = w[x].split(v[j]);
w[x] = y[0];
w[++x] = y[1];
console.logg("Currently processed: " + JSON.stringify(w) + "\n");
};
};
};
console.logg("Return: <code>" + JSON.stringify(w) + "</code>");
};
setTimeout(function() {
console.clear();
splix("1.23--4", ".", "--");
}, 250);
#import url("http://fonts.googleapis.com/css?family=Roboto");
body {font: 20px Roboto;}
Usage: splix(string, delimiters...)
Example: splix("1.23--4", ".", "--")
Returns: ["1", "23", "4"]
Check out my simple library on Github
If you really do not want to visit or interact with the repo, here is the working code:
/**
*
* #param {type} input The string input to be split
* #param {type} includeTokensInOutput If true, the tokens are retained in the splitted output.
* #param {type} tokens The tokens to be employed in splitting the original string.
* #returns {Scanner}
*/
function Scanner(input, includeTokensInOutput, tokens) {
this.input = input;
this.includeTokensInOutput = includeTokensInOutput;
this.tokens = tokens;
}
Scanner.prototype.scan = function () {
var inp = this.input;
var parse = [];
this.tokens.sort(function (a, b) {
return b.length - a.length; //ASC, For Descending order use: b - a
});
for (var i = 0; i < inp.length; i++) {
for (var j = 0; j < this.tokens.length; j++) {
var token = this.tokens[j];
var len = token.length;
if (len > 0 && i + len <= inp.length) {
var portion = inp.substring(i, i + len);
if (portion === token) {
if (i !== 0) {//avoid empty spaces
parse[parse.length] = inp.substring(0, i);
}
if (this.includeTokensInOutput) {
parse[parse.length] = token;
}
inp = inp.substring(i + len);
i = -1;
break;
}
}
}
}
if (inp.length > 0) {
parse[parse.length] = inp;
}
return parse;
};
The usage is very straightforward:
var tokens = new Scanner("ABC+DE-GHIJK+LMNOP", false , new Array('+','-')).scan();
console.log(tokens);
Gives:
['ABC', 'DE', 'GHIJK', 'LMNOP']
And if you wish to include the splitting tokens (+ and -) in the output, set the false to true and voila! it still works.
The usage would now be:
var tokens = new Scanner("ABC+DE-GHIJK+LMNOP", true , new Array('+','-')).scan();
and
console.log(tokens);
would give:
['ABC', '+', 'DE', '-', 'GHIJK', '+', 'LMNOP']
ENJOY!
I use regexp:
str = 'Write a program that extracts from a given text all palindromes, e.g. "ABBA", "lamal", "exe".';
var strNew = str.match(/\w+/g);
// Output: ["Write", "a", "program", "that", "extracts", "from", "a", "given", "text", "all", "palindromes", "e", "g", "ABBA", "lamal", "exe"]
Good evening, I proceed to explain my situation. I started to get interested in javascript which started to dabble
in this language, I have been doing some online courses which I have encountered the following task, basically I am trying through the condition "for" tell me what is the first repeated letter of a string also adding the funsion ".UpperCase () "which at the beginning worked best, until I entered more characters to the string in this case" x "throwing me as output result" undefined "instead of" the most repeated word is: X "reach the case that the string should Consider all the letters regardless of whether they are lowercase or capital letters, for which I ask for help to understand if ¿there is another way? for this task and thus move forward (Sorry for my bad english)
Well i making this task in JavasScript with Atom Editor
var word = "SQSQSQSSaaaassssxxxY";
var contendor = [];
var calc = [];
var mycalc = 0;
function repeat() {
for (var i = 0; i < word.length; i++) {
if (contendor.includes(word[i])) {} else {
contendor.push(word[i])
calc.push(0)
}
}
for (var p = 0; p < word.length; p++) {
for (var l = 0; l < contendor.length; l++) {
if (word[p].toUpperCase() == word[l]) {
calc[l] = calc[l] + 1
}
}
}
for (var f = 0; f < calc.length; f++) {
if (calc[f] > mycalc) {
mycalc = calc[f]
}
}
}
repeat()
console.log("The first letter repeated its: " + contendor[mycalc])
I expected the output of the String to be: "X"
but the actual output is: "Undefined"
The first error in your script is that you store the wrong value in mycalc:
mycalc = calc[f]
Since you want mycalc to be an index, the above should have been
mycalc = f
Now, you will get a result, but your code is actually going through a lot of effort to find the uppercase character that is repeated most often, not first.
Your comparison should have used toUpperCase on both sides of the comparison, otherwise lower case letters will never match.
To get the character that was repeated most often, you could use a Map (to keep track of the counts like you did in calc):
function mostRepeated(str) {
const map = new Map;
let result;
let maxCount = 0;
for (let ch of str) {
ch = ch.toUpperCase();
let count = (map.get(ch) || 0) + 1;
map.set(ch, count);
if (count > maxCount) {
maxCount = count;
result = ch;
}
}
return result;
}
var word = "MBXAYMZAXmZYxxxxxxxxxxmBxAYMZaXmZY";
console.log(mostRepeated(word));
Note that you should better use function parameters and local variables. Declaring your variables as global is not considered best practice.
You could find the letter that occurs the most number of times in a string by:
first creating a map that relates each unique letter, to the number of times it occurs in the string
converting that map to an array of "key/value" entries, and then sorting those entries by the "count value"
returning the "letter key" that has the largest count
One way to express this in JavaScript would be via the following:
function findMaxLetter(word) {
/* Create a map that relates letters to the number of times that letter occours */
const letterCounts = Array.from(word).reduce((map, letter) => {
return { ...map, [letter] : (map[letter] === undefined ? 0 : map[letter] + 1) }
}, {})
/* Sort letters by the number of times they occour, as determined in letterCounts map */
const letters = Object.entries(letterCounts).sort(([letter0, count0], [letter1, count1]) => {
return count1 - count0
})
.map(([letter]) => letter)
/* Return letter that occoured the most number of times */
return letters[0]
}
console.log("The first letter repeated its: " + findMaxLetter("MBXAYMZAXmZYxxxxxxxxxxmBxAYMZaXmZY"))
I this is solution is most detailed for you
function func( word ){
word = word.toLowerCase();
var i, charCountCache = {};
//store all char counts into an object
for( i = 0; i < word.length; i++){
if( charCountCache[ word[ i ] ] )
charCountCache[ word[ i ] ] = charCountCache[ word[ i ] ] + 1;
else
charCountCache[ word[ i ] ] = 1;
}
//find the max value of char count in cached object
var fieldNames = Object.keys( charCountCache )
, fieldValues = Object.values( charCountCache )
, mostReapeatChar = '', mostReapeatCharCount = 0;
for( i = 0; i < fieldNames.length; i++ ){
if( mostReapeatCharCount < fieldValues[i] ){
mostReapeatCharCount = fieldValues[i];
mostReapeatChar = fieldNames[i];
}
}
console.log('most repeating char: ', mostReapeatChar, ' no of times: ', mostReapeatCharCount )
}
console.log("The first letter repeated its: " + contendor[mycalc])
You tried to print the 14th index of contendor which has only 9 values, that is why your log result was undefined.
You probably wanted to print word[mycalc].
Also if you intended to count x as X, you should have added toUpperCase() to every letter you process/go-through.
This is only a note to the issues in your code, there are better/faster/cleaner solutions to reach the result which i am sure other answers will provide.
my advice would be to create a hashmap such as
letter => [indexLetter1, indexLetter2].
From that hashmap, you could easily find your first repeated letters.
For that string MBXAYMZAXmZYxxxxxxxxxxmBxAYMZaXmZY, hashmap will look like
[
M => [0,5,..],
B => [1, ..],
X => [2, ..],
...
]
now you can find every letter with multiple values in its array, then in those arrays take the one with the lowest value.
If you want to get the index of most repeated letter, you can use Array.from to convert the word into an array. Add a map function to make all letters uppercase.
Get the count of each letter by using reduce and Object.entries
Use indexOf to the get the index of the lettet in the array. Please note that indexOf count the letters from 0.
var word = "MBXAYMZAXmZYxxxxxxxxxxmBxAYMZaXmZY";
var letters = Array.from(word, o => o.toUpperCase());
var [highestLetter, highestCount]= Object.entries(letters.reduce((c, v) => (c[v] = (c[v] || 0) + 1, c), {})).reduce((c, v) => c[1] > v[1] ? c : v);
var index = letters.indexOf(highestLetter);
console.log("Most repeated letter:", highestLetter);
console.log("Count:", highestCount);
console.log("First Index:", index);