Related
How do I split a string with multiple separators in JavaScript?
I'm trying to split on both commas and spaces, but AFAIK JavaScript's split() function only supports one separator.
Pass in a regexp as the parameter:
js> "Hello awesome, world!".split(/[\s,]+/)
Hello,awesome,world!
Edited to add:
You can get the last element by selecting the length of the array minus 1:
>>> bits = "Hello awesome, world!".split(/[\s,]+/)
["Hello", "awesome", "world!"]
>>> bit = bits[bits.length - 1]
"world!"
... and if the pattern doesn't match:
>>> bits = "Hello awesome, world!".split(/foo/)
["Hello awesome, world!"]
>>> bits[bits.length - 1]
"Hello awesome, world!"
You can pass a regex into JavaScript's split() method. For example:
"1,2 3".split(/,| /)
["1", "2", "3"]
Or, if you want to allow multiple separators together to act as one only:
"1, 2, , 3".split(/(?:,| )+/)
["1", "2", "3"]
You have to use the non-capturing (?:) parenthesis, because
otherwise it gets spliced back into the result. Or you can be smart
like Aaron and use a character class.
Examples tested in Safari and Firefox.
Another simple but effective method is to use split + join repeatedly.
"a=b,c:d".split('=').join(',').split(':').join(',').split(',')
Essentially doing a split followed by a join is like a global replace so this replaces each separator with a comma then once all are replaced it does a final split on comma
The result of the above expression is:
['a', 'b', 'c', 'd']
Expanding on this you could also place it in a function:
function splitMulti(str, tokens){
var tempChar = tokens[0]; // We can use the first token as a temporary join character
for(var i = 1; i < tokens.length; i++){
str = str.split(tokens[i]).join(tempChar);
}
str = str.split(tempChar);
return str;
}
Usage:
splitMulti('a=b,c:d', ['=', ',', ':']) // ["a", "b", "c", "d"]
If you use this functionality a lot it might even be worth considering wrapping String.prototype.split for convenience (I think my function is fairly safe - the only consideration is the additional overhead of the conditionals (minor) and the fact that it lacks an implementation of the limit argument if an array is passed).
Be sure to include the splitMulti function if using this approach to the below simply wraps it :). Also worth noting that some people frown on extending built-ins (as many people do it wrong and conflicts can occur) so if in doubt speak to someone more senior before using this or ask on SO :)
var splitOrig = String.prototype.split; // Maintain a reference to inbuilt fn
String.prototype.split = function (){
if(arguments[0].length > 0){
if(Object.prototype.toString.call(arguments[0]) == "[object Array]" ) { // Check if our separator is an array
return splitMulti(this, arguments[0]); // Call splitMulti
}
}
return splitOrig.apply(this, arguments); // Call original split maintaining context
};
Usage:
var a = "a=b,c:d";
a.split(['=', ',', ':']); // ["a", "b", "c", "d"]
// Test to check that the built-in split still works (although our wrapper wouldn't work if it didn't as it depends on it :P)
a.split('='); // ["a", "b,c:d"]
Enjoy!
Lets keep it simple: (add a "[ ]+" to your RegEx means "1 or more")
This means "+" and "{1,}" are the same.
var words = text.split(/[ .:;?!~,`"&|()<>{}\[\]\r\n/\\]+/); // note ' and - are kept
Tricky method:
var s = "dasdnk asd, (naks) :d skldma";
var a = s.replace('(',' ').replace(')',' ').replace(',',' ').split(' ');
console.log(a);//["dasdnk", "asd", "naks", ":d", "skldma"]
I'm suprised no one has suggested it yet, but my hack-ey (and crazy fast) solution was to just append several 'replace' calls before splitting by the same character.
i.e. to remove a, b, c, d, and e:
let str = 'afgbfgcfgdfgefg'
let array = str.replace('a','d').replace('b','d').replace('c','d').replace('e','d').split('d')
this can be conveniently generalized for an array of splitters as follows:
function splitByMany( manyArgs, string ) {
do {
let arg = manyArgs.pop()
string = string.replace(arg, manyArgs[0])
} while (manyArgs.length > 2)
return string.split(manyArgs[0])
}
So, in your case, you could then call
let array = splitByMany([" ", ","], 'My long string containing commas, and spaces, and more commas');
For those of you who want more customization in their splitting function, I wrote a recursive algorithm that splits a given string with a list of characters to split on. I wrote this before I saw the above post. I hope it helps some frustrated programmers.
splitString = function(string, splitters) {
var list = [string];
for(var i=0, len=splitters.length; i<len; i++) {
traverseList(list, splitters[i], 0);
}
return flatten(list);
}
traverseList = function(list, splitter, index) {
if(list[index]) {
if((list.constructor !== String) && (list[index].constructor === String))
(list[index] != list[index].split(splitter)) ? list[index] = list[index].split(splitter) : null;
(list[index].constructor === Array) ? traverseList(list[index], splitter, 0) : null;
(list.constructor === Array) ? traverseList(list, splitter, index+1) : null;
}
}
flatten = function(arr) {
return arr.reduce(function(acc, val) {
return acc.concat(val.constructor === Array ? flatten(val) : val);
},[]);
}
var stringToSplit = "people and_other/things";
var splitList = [" ", "_", "/"];
splitString(stringToSplit, splitList);
Example above returns: ["people", "and", "other", "things"]
Note: flatten function was taken from Rosetta Code
You could just lump all the characters you want to use as separators either singularly or collectively into a regular expression and pass them to the split function. For instance you could write:
console.log( "dasdnk asd, (naks) :d skldma".split(/[ \(,\)]+/) );
And the output will be:
["dasdnk", "asd", "naks", ":d", "skldma"]
Here are some cases that may help by using Regex:
\W to match any character else word character [a-zA-Z0-9_]. Example:
("Hello World,I-am code").split(/\W+/); // would return [ 'Hello', 'World', 'I', 'am', 'code' ]
\s+ to match One or more spaces
\d to match a digit
if you want to split by some characters only let us say , and - you can use str.split(/[,-]+/)...etc
My refactor of #Brian answer
var string = 'and this is some kind of information and another text and simple and some egample or red or text';
var separators = ['and', 'or'];
function splitMulti(str, separators){
var tempChar = 't3mp'; //prevent short text separator in split down
//split by regex e.g. \b(or|and)\b
var re = new RegExp('\\b(' + separators.join('|') + ')\\b' , "g");
str = str.replace(re, tempChar).split(tempChar);
// trim & remove empty
return str.map(el => el.trim()).filter(el => el.length > 0);
}
console.log(splitMulti(string, separators))
Here is a new way to achieving same in ES6:
function SplitByString(source, splitBy) {
var splitter = splitBy.split('');
splitter.push([source]); //Push initial value
return splitter.reduceRight(function(accumulator, curValue) {
var k = [];
accumulator.forEach(v => k = [...k, ...v.split(curValue)]);
return k;
});
}
var source = "abc,def#hijk*lmn,opq#rst*uvw,xyz";
var splitBy = ",*#";
console.log(SplitByString(source, splitBy));
Please note in this function:
No Regex involved
Returns splitted value in same order as it appears in source
Result of above code would be:
Hi for example if you have split and replace in String 07:05:45PM
var hour = time.replace("PM", "").split(":");
Result
[ '07', '05', '45' ]
I will provide a classic implementation for a such function. The code works in almost all versions of JavaScript and is somehow optimum.
It doesn't uses regex, which is hard to maintain
It doesn't uses new features of JavaScript
It doesn't uses multiple .split() .join() invocation which require more computer memory
Just pure code:
var text = "Create a function, that will return an array (of string), with the words inside the text";
println(getWords(text));
function getWords(text)
{
let startWord = -1;
let ar = [];
for(let i = 0; i <= text.length; i++)
{
let c = i < text.length ? text[i] : " ";
if (!isSeparator(c) && startWord < 0)
{
startWord = i;
}
if (isSeparator(c) && startWord >= 0)
{
let word = text.substring(startWord, i);
ar.push(word);
startWord = -1;
}
}
return ar;
}
function isSeparator(c)
{
var separators = [" ", "\t", "\n", "\r", ",", ";", ".", "!", "?", "(", ")"];
return separators.includes(c);
}
You can see the code running in playground:
https://codeguppy.com/code.html?IJI0E4OGnkyTZnoszAzf
Splitting URL by .com/ or .net/
url.split(/\.com\/|\.net\//)
a = "a=b,c:d"
array = ['=',',',':'];
for(i=0; i< array.length; i++){ a= a.split(array[i]).join(); }
this will return the string without a special charecter.
I ran into this question wile looking for a replacement for the C# string.Split() function which splits a string using the characters in its argument.
In JavaScript you can do the same using map an reduce to iterate over the splitting characters and the intermediate values:
let splitters = [",", ":", ";"]; // or ",:;".split("");
let start= "a,b;c:d";
let values = splitters.reduce((old, c) => old.map(v => v.split(c)).flat(), [start]);
// values is ["a", "b", "c", "d"]
flat() is used to flatten the intermediate results so each iteration works on a list of strings without nested arrays. Each iteration applies split to all of the values in old and then returns the list of intermediate results to be split by the next value in splitters. reduce() is initialized with an array containing the initial string value.
I find that one of the main reasons I need this is to split file paths on both / and \. It's a bit of a tricky regex so I'll post it here for reference:
var splitFilePath = filePath.split(/[\/\\]/);
I think it's easier if you specify what you wanna leave, instead of what you wanna remove.
As if you wanna have only English words, you can use something like this:
text.match(/[a-z'\-]+/gi);
Examples (run snippet):
var R=[/[a-z'\-]+/gi,/[a-z'\-\s]+/gi];
var s=document.getElementById('s');
for(var i=0;i<R.length;i++)
{
var o=document.createElement('option');
o.innerText=R[i]+'';
o.value=i;
s.appendChild(o);
}
var t=document.getElementById('t');
var r=document.getElementById('r');
s.onchange=function()
{
r.innerHTML='';
var x=s.value;
if((x>=0)&&(x<R.length))
x=t.value.match(R[x]);
for(i=0;i<x.length;i++)
{
var li=document.createElement('li');
li.innerText=x[i];
r.appendChild(li);
}
}
<textarea id="t" style="width:70%;height:12em">even, test; spider-man
But saying o'er what I have said before:
My child is yet a stranger in the world;
She hath not seen the change of fourteen years,
Let two more summers wither in their pride,
Ere we may think her ripe to be a bride.
—Shakespeare, William. The Tragedy of Romeo and Juliet</textarea>
<p><select id="s">
<option selected>Select a regular expression</option>
<!-- option value="1">/[a-z'\-]+/gi</option>
<option value="2">/[a-z'\-\s]+/gi</option -->
</select></p>
<ol id="r" style="display:block;width:auto;border:1px inner;overflow:scroll;height:8em;max-height:10em;"></ol>
</div>
I don't know the performance of RegEx, but here is another alternative for RegEx leverages native HashSet and works in O( max(str.length, delimeter.length) ) complexity instead:
var multiSplit = function(str,delimiter){
if (!(delimiter instanceof Array))
return str.split(delimiter);
if (!delimiter || delimiter.length == 0)
return [str];
var hashSet = new Set(delimiter);
if (hashSet.has(""))
return str.split("");
var lastIndex = 0;
var result = [];
for(var i = 0;i<str.length;i++){
if (hashSet.has(str[i])){
result.push(str.substring(lastIndex,i));
lastIndex = i+1;
}
}
result.push(str.substring(lastIndex));
return result;
}
multiSplit('1,2,3.4.5.6 7 8 9',[',','.',' ']);
// Output: ["1", "2", "3", "4", "5", "6", "7", "8", "9"]
multiSplit('1,2,3.4.5.6 7 8 9',' ');
// Output: ["1,2,3.4.5.6", "7", "8", "9"]
I solved this with reduce and filter. It might not be the most readable solution, or the fastest, and in real life I would probably use Aarons answere here, but it was fun to write.
[' ','_','-','.',',',':','#'].reduce(
(segs, sep) => segs.reduce(
(out, seg) => out.concat(seg.split(sep)), []),
['E-mail Address: user#domain.com, Phone Number: +1-800-555-0011']
).filter(x => x)
Or as a function:
function msplit(str, seps) {
return seps.reduce((segs, sep) => segs.reduce(
(out, seg) => out.concat(seg.split(sep)), []
), [str]).filter(x => x);
}
This will output:
['E','mail','Address','user','domain','com','0','Phone','Number','+1','800','555','0011']
Without the filter at the end you would get empty strings in the array where two different separators are next to each other.
Not the best way but works to Split with Multiple and Different seperators/delimiters
html
<button onclick="myFunction()">Split with Multiple and Different seperators/delimiters</button>
<p id="demo"></p>
javascript
<script>
function myFunction() {
var str = "How : are | you doing : today?";
var res = str.split(' | ');
var str2 = '';
var i;
for (i = 0; i < res.length; i++) {
str2 += res[i];
if (i != res.length-1) {
str2 += ",";
}
}
var res2 = str2.split(' : ');
//you can add countless options (with or without space)
document.getElementById("demo").innerHTML = res2;
}
</script>
Starting from #stephen-sweriduk solution (that was the more interesting to me!), I have slightly modified it to make more generic and reusable:
/**
* Adapted from: http://stackoverflow.com/questions/650022/how-do-i-split-a-string-with-multiple-separators-in-javascript
*/
var StringUtils = {
/**
* Flatten a list of strings
* http://rosettacode.org/wiki/Flatten_a_list
*/
flatten : function(arr) {
var self=this;
return arr.reduce(function(acc, val) {
return acc.concat(val.constructor === Array ? self.flatten(val) : val);
},[]);
},
/**
* Recursively Traverse a list and apply a function to each item
* #param list array
* #param expression Expression to use in func
* #param func function of (item,expression) to apply expression to item
*
*/
traverseListFunc : function(list, expression, index, func) {
var self=this;
if(list[index]) {
if((list.constructor !== String) && (list[index].constructor === String))
(list[index] != func(list[index], expression)) ? list[index] = func(list[index], expression) : null;
(list[index].constructor === Array) ? self.traverseListFunc(list[index], expression, 0, func) : null;
(list.constructor === Array) ? self.traverseListFunc(list, expression, index+1, func) : null;
}
},
/**
* Recursively map function to string
* #param string
* #param expression Expression to apply to func
* #param function of (item, expressions[i])
*/
mapFuncToString : function(string, expressions, func) {
var self=this;
var list = [string];
for(var i=0, len=expressions.length; i<len; i++) {
self.traverseListFunc(list, expressions[i], 0, func);
}
return self.flatten(list);
},
/**
* Split a string
* #param splitters Array of characters to apply the split
*/
splitString : function(string, splitters) {
return this.mapFuncToString(string, splitters, function(item, expression) {
return item.split(expression);
})
},
}
and then
var stringToSplit = "people and_other/things";
var splitList = [" ", "_", "/"];
var splittedString=StringUtils.splitString(stringToSplit, splitList);
console.log(splitList, stringToSplit, splittedString);
that gives back as the original:
[ ' ', '_', '/' ] 'people and_other/things' [ 'people', 'and', 'other', 'things' ]
An easy way to do this is to process each character of the string with each delimiter and build an array of the splits:
splix = function ()
{
u = [].slice.call(arguments); v = u.slice(1); u = u[0]; w = [u]; x = 0;
for (i = 0; i < u.length; ++i)
{
for (j = 0; j < v.length; ++j)
{
if (u.slice(i, i + v[j].length) == v[j])
{
y = w[x].split(v[j]); w[x] = y[0]; w[++x] = y[1];
};
};
};
return w;
};
console.logg = function ()
{
document.body.innerHTML += "<br>" + [].slice.call(arguments).join();
}
splix = function() {
u = [].slice.call(arguments);
v = u.slice(1);
u = u[0];
w = [u];
x = 0;
console.logg("Processing: <code>" + JSON.stringify(w) + "</code>");
for (i = 0; i < u.length; ++i) {
for (j = 0; j < v.length; ++j) {
console.logg("Processing: <code>[\x22" + u.slice(i, i + v[j].length) + "\x22, \x22" + v[j] + "\x22]</code>");
if (u.slice(i, i + v[j].length) == v[j]) {
y = w[x].split(v[j]);
w[x] = y[0];
w[++x] = y[1];
console.logg("Currently processed: " + JSON.stringify(w) + "\n");
};
};
};
console.logg("Return: <code>" + JSON.stringify(w) + "</code>");
};
setTimeout(function() {
console.clear();
splix("1.23--4", ".", "--");
}, 250);
#import url("http://fonts.googleapis.com/css?family=Roboto");
body {font: 20px Roboto;}
Usage: splix(string, delimiters...)
Example: splix("1.23--4", ".", "--")
Returns: ["1", "23", "4"]
Check out my simple library on Github
If you really do not want to visit or interact with the repo, here is the working code:
/**
*
* #param {type} input The string input to be split
* #param {type} includeTokensInOutput If true, the tokens are retained in the splitted output.
* #param {type} tokens The tokens to be employed in splitting the original string.
* #returns {Scanner}
*/
function Scanner(input, includeTokensInOutput, tokens) {
this.input = input;
this.includeTokensInOutput = includeTokensInOutput;
this.tokens = tokens;
}
Scanner.prototype.scan = function () {
var inp = this.input;
var parse = [];
this.tokens.sort(function (a, b) {
return b.length - a.length; //ASC, For Descending order use: b - a
});
for (var i = 0; i < inp.length; i++) {
for (var j = 0; j < this.tokens.length; j++) {
var token = this.tokens[j];
var len = token.length;
if (len > 0 && i + len <= inp.length) {
var portion = inp.substring(i, i + len);
if (portion === token) {
if (i !== 0) {//avoid empty spaces
parse[parse.length] = inp.substring(0, i);
}
if (this.includeTokensInOutput) {
parse[parse.length] = token;
}
inp = inp.substring(i + len);
i = -1;
break;
}
}
}
}
if (inp.length > 0) {
parse[parse.length] = inp;
}
return parse;
};
The usage is very straightforward:
var tokens = new Scanner("ABC+DE-GHIJK+LMNOP", false , new Array('+','-')).scan();
console.log(tokens);
Gives:
['ABC', 'DE', 'GHIJK', 'LMNOP']
And if you wish to include the splitting tokens (+ and -) in the output, set the false to true and voila! it still works.
The usage would now be:
var tokens = new Scanner("ABC+DE-GHIJK+LMNOP", true , new Array('+','-')).scan();
and
console.log(tokens);
would give:
['ABC', '+', 'DE', '-', 'GHIJK', '+', 'LMNOP']
ENJOY!
I use regexp:
str = 'Write a program that extracts from a given text all palindromes, e.g. "ABBA", "lamal", "exe".';
var strNew = str.match(/\w+/g);
// Output: ["Write", "a", "program", "that", "extracts", "from", "a", "given", "text", "all", "palindromes", "e", "g", "ABBA", "lamal", "exe"]
How do I split a string with multiple separators in JavaScript?
I'm trying to split on both commas and spaces, but AFAIK JavaScript's split() function only supports one separator.
Pass in a regexp as the parameter:
js> "Hello awesome, world!".split(/[\s,]+/)
Hello,awesome,world!
Edited to add:
You can get the last element by selecting the length of the array minus 1:
>>> bits = "Hello awesome, world!".split(/[\s,]+/)
["Hello", "awesome", "world!"]
>>> bit = bits[bits.length - 1]
"world!"
... and if the pattern doesn't match:
>>> bits = "Hello awesome, world!".split(/foo/)
["Hello awesome, world!"]
>>> bits[bits.length - 1]
"Hello awesome, world!"
You can pass a regex into JavaScript's split() method. For example:
"1,2 3".split(/,| /)
["1", "2", "3"]
Or, if you want to allow multiple separators together to act as one only:
"1, 2, , 3".split(/(?:,| )+/)
["1", "2", "3"]
You have to use the non-capturing (?:) parenthesis, because
otherwise it gets spliced back into the result. Or you can be smart
like Aaron and use a character class.
Examples tested in Safari and Firefox.
Another simple but effective method is to use split + join repeatedly.
"a=b,c:d".split('=').join(',').split(':').join(',').split(',')
Essentially doing a split followed by a join is like a global replace so this replaces each separator with a comma then once all are replaced it does a final split on comma
The result of the above expression is:
['a', 'b', 'c', 'd']
Expanding on this you could also place it in a function:
function splitMulti(str, tokens){
var tempChar = tokens[0]; // We can use the first token as a temporary join character
for(var i = 1; i < tokens.length; i++){
str = str.split(tokens[i]).join(tempChar);
}
str = str.split(tempChar);
return str;
}
Usage:
splitMulti('a=b,c:d', ['=', ',', ':']) // ["a", "b", "c", "d"]
If you use this functionality a lot it might even be worth considering wrapping String.prototype.split for convenience (I think my function is fairly safe - the only consideration is the additional overhead of the conditionals (minor) and the fact that it lacks an implementation of the limit argument if an array is passed).
Be sure to include the splitMulti function if using this approach to the below simply wraps it :). Also worth noting that some people frown on extending built-ins (as many people do it wrong and conflicts can occur) so if in doubt speak to someone more senior before using this or ask on SO :)
var splitOrig = String.prototype.split; // Maintain a reference to inbuilt fn
String.prototype.split = function (){
if(arguments[0].length > 0){
if(Object.prototype.toString.call(arguments[0]) == "[object Array]" ) { // Check if our separator is an array
return splitMulti(this, arguments[0]); // Call splitMulti
}
}
return splitOrig.apply(this, arguments); // Call original split maintaining context
};
Usage:
var a = "a=b,c:d";
a.split(['=', ',', ':']); // ["a", "b", "c", "d"]
// Test to check that the built-in split still works (although our wrapper wouldn't work if it didn't as it depends on it :P)
a.split('='); // ["a", "b,c:d"]
Enjoy!
Lets keep it simple: (add a "[ ]+" to your RegEx means "1 or more")
This means "+" and "{1,}" are the same.
var words = text.split(/[ .:;?!~,`"&|()<>{}\[\]\r\n/\\]+/); // note ' and - are kept
Tricky method:
var s = "dasdnk asd, (naks) :d skldma";
var a = s.replace('(',' ').replace(')',' ').replace(',',' ').split(' ');
console.log(a);//["dasdnk", "asd", "naks", ":d", "skldma"]
I'm suprised no one has suggested it yet, but my hack-ey (and crazy fast) solution was to just append several 'replace' calls before splitting by the same character.
i.e. to remove a, b, c, d, and e:
let str = 'afgbfgcfgdfgefg'
let array = str.replace('a','d').replace('b','d').replace('c','d').replace('e','d').split('d')
this can be conveniently generalized for an array of splitters as follows:
function splitByMany( manyArgs, string ) {
do {
let arg = manyArgs.pop()
string = string.replace(arg, manyArgs[0])
} while (manyArgs.length > 2)
return string.split(manyArgs[0])
}
So, in your case, you could then call
let array = splitByMany([" ", ","], 'My long string containing commas, and spaces, and more commas');
For those of you who want more customization in their splitting function, I wrote a recursive algorithm that splits a given string with a list of characters to split on. I wrote this before I saw the above post. I hope it helps some frustrated programmers.
splitString = function(string, splitters) {
var list = [string];
for(var i=0, len=splitters.length; i<len; i++) {
traverseList(list, splitters[i], 0);
}
return flatten(list);
}
traverseList = function(list, splitter, index) {
if(list[index]) {
if((list.constructor !== String) && (list[index].constructor === String))
(list[index] != list[index].split(splitter)) ? list[index] = list[index].split(splitter) : null;
(list[index].constructor === Array) ? traverseList(list[index], splitter, 0) : null;
(list.constructor === Array) ? traverseList(list, splitter, index+1) : null;
}
}
flatten = function(arr) {
return arr.reduce(function(acc, val) {
return acc.concat(val.constructor === Array ? flatten(val) : val);
},[]);
}
var stringToSplit = "people and_other/things";
var splitList = [" ", "_", "/"];
splitString(stringToSplit, splitList);
Example above returns: ["people", "and", "other", "things"]
Note: flatten function was taken from Rosetta Code
You could just lump all the characters you want to use as separators either singularly or collectively into a regular expression and pass them to the split function. For instance you could write:
console.log( "dasdnk asd, (naks) :d skldma".split(/[ \(,\)]+/) );
And the output will be:
["dasdnk", "asd", "naks", ":d", "skldma"]
Here are some cases that may help by using Regex:
\W to match any character else word character [a-zA-Z0-9_]. Example:
("Hello World,I-am code").split(/\W+/); // would return [ 'Hello', 'World', 'I', 'am', 'code' ]
\s+ to match One or more spaces
\d to match a digit
if you want to split by some characters only let us say , and - you can use str.split(/[,-]+/)...etc
My refactor of #Brian answer
var string = 'and this is some kind of information and another text and simple and some egample or red or text';
var separators = ['and', 'or'];
function splitMulti(str, separators){
var tempChar = 't3mp'; //prevent short text separator in split down
//split by regex e.g. \b(or|and)\b
var re = new RegExp('\\b(' + separators.join('|') + ')\\b' , "g");
str = str.replace(re, tempChar).split(tempChar);
// trim & remove empty
return str.map(el => el.trim()).filter(el => el.length > 0);
}
console.log(splitMulti(string, separators))
Here is a new way to achieving same in ES6:
function SplitByString(source, splitBy) {
var splitter = splitBy.split('');
splitter.push([source]); //Push initial value
return splitter.reduceRight(function(accumulator, curValue) {
var k = [];
accumulator.forEach(v => k = [...k, ...v.split(curValue)]);
return k;
});
}
var source = "abc,def#hijk*lmn,opq#rst*uvw,xyz";
var splitBy = ",*#";
console.log(SplitByString(source, splitBy));
Please note in this function:
No Regex involved
Returns splitted value in same order as it appears in source
Result of above code would be:
Hi for example if you have split and replace in String 07:05:45PM
var hour = time.replace("PM", "").split(":");
Result
[ '07', '05', '45' ]
I will provide a classic implementation for a such function. The code works in almost all versions of JavaScript and is somehow optimum.
It doesn't uses regex, which is hard to maintain
It doesn't uses new features of JavaScript
It doesn't uses multiple .split() .join() invocation which require more computer memory
Just pure code:
var text = "Create a function, that will return an array (of string), with the words inside the text";
println(getWords(text));
function getWords(text)
{
let startWord = -1;
let ar = [];
for(let i = 0; i <= text.length; i++)
{
let c = i < text.length ? text[i] : " ";
if (!isSeparator(c) && startWord < 0)
{
startWord = i;
}
if (isSeparator(c) && startWord >= 0)
{
let word = text.substring(startWord, i);
ar.push(word);
startWord = -1;
}
}
return ar;
}
function isSeparator(c)
{
var separators = [" ", "\t", "\n", "\r", ",", ";", ".", "!", "?", "(", ")"];
return separators.includes(c);
}
You can see the code running in playground:
https://codeguppy.com/code.html?IJI0E4OGnkyTZnoszAzf
Splitting URL by .com/ or .net/
url.split(/\.com\/|\.net\//)
a = "a=b,c:d"
array = ['=',',',':'];
for(i=0; i< array.length; i++){ a= a.split(array[i]).join(); }
this will return the string without a special charecter.
I ran into this question wile looking for a replacement for the C# string.Split() function which splits a string using the characters in its argument.
In JavaScript you can do the same using map an reduce to iterate over the splitting characters and the intermediate values:
let splitters = [",", ":", ";"]; // or ",:;".split("");
let start= "a,b;c:d";
let values = splitters.reduce((old, c) => old.map(v => v.split(c)).flat(), [start]);
// values is ["a", "b", "c", "d"]
flat() is used to flatten the intermediate results so each iteration works on a list of strings without nested arrays. Each iteration applies split to all of the values in old and then returns the list of intermediate results to be split by the next value in splitters. reduce() is initialized with an array containing the initial string value.
I find that one of the main reasons I need this is to split file paths on both / and \. It's a bit of a tricky regex so I'll post it here for reference:
var splitFilePath = filePath.split(/[\/\\]/);
I think it's easier if you specify what you wanna leave, instead of what you wanna remove.
As if you wanna have only English words, you can use something like this:
text.match(/[a-z'\-]+/gi);
Examples (run snippet):
var R=[/[a-z'\-]+/gi,/[a-z'\-\s]+/gi];
var s=document.getElementById('s');
for(var i=0;i<R.length;i++)
{
var o=document.createElement('option');
o.innerText=R[i]+'';
o.value=i;
s.appendChild(o);
}
var t=document.getElementById('t');
var r=document.getElementById('r');
s.onchange=function()
{
r.innerHTML='';
var x=s.value;
if((x>=0)&&(x<R.length))
x=t.value.match(R[x]);
for(i=0;i<x.length;i++)
{
var li=document.createElement('li');
li.innerText=x[i];
r.appendChild(li);
}
}
<textarea id="t" style="width:70%;height:12em">even, test; spider-man
But saying o'er what I have said before:
My child is yet a stranger in the world;
She hath not seen the change of fourteen years,
Let two more summers wither in their pride,
Ere we may think her ripe to be a bride.
—Shakespeare, William. The Tragedy of Romeo and Juliet</textarea>
<p><select id="s">
<option selected>Select a regular expression</option>
<!-- option value="1">/[a-z'\-]+/gi</option>
<option value="2">/[a-z'\-\s]+/gi</option -->
</select></p>
<ol id="r" style="display:block;width:auto;border:1px inner;overflow:scroll;height:8em;max-height:10em;"></ol>
</div>
I don't know the performance of RegEx, but here is another alternative for RegEx leverages native HashSet and works in O( max(str.length, delimeter.length) ) complexity instead:
var multiSplit = function(str,delimiter){
if (!(delimiter instanceof Array))
return str.split(delimiter);
if (!delimiter || delimiter.length == 0)
return [str];
var hashSet = new Set(delimiter);
if (hashSet.has(""))
return str.split("");
var lastIndex = 0;
var result = [];
for(var i = 0;i<str.length;i++){
if (hashSet.has(str[i])){
result.push(str.substring(lastIndex,i));
lastIndex = i+1;
}
}
result.push(str.substring(lastIndex));
return result;
}
multiSplit('1,2,3.4.5.6 7 8 9',[',','.',' ']);
// Output: ["1", "2", "3", "4", "5", "6", "7", "8", "9"]
multiSplit('1,2,3.4.5.6 7 8 9',' ');
// Output: ["1,2,3.4.5.6", "7", "8", "9"]
I solved this with reduce and filter. It might not be the most readable solution, or the fastest, and in real life I would probably use Aarons answere here, but it was fun to write.
[' ','_','-','.',',',':','#'].reduce(
(segs, sep) => segs.reduce(
(out, seg) => out.concat(seg.split(sep)), []),
['E-mail Address: user#domain.com, Phone Number: +1-800-555-0011']
).filter(x => x)
Or as a function:
function msplit(str, seps) {
return seps.reduce((segs, sep) => segs.reduce(
(out, seg) => out.concat(seg.split(sep)), []
), [str]).filter(x => x);
}
This will output:
['E','mail','Address','user','domain','com','0','Phone','Number','+1','800','555','0011']
Without the filter at the end you would get empty strings in the array where two different separators are next to each other.
Not the best way but works to Split with Multiple and Different seperators/delimiters
html
<button onclick="myFunction()">Split with Multiple and Different seperators/delimiters</button>
<p id="demo"></p>
javascript
<script>
function myFunction() {
var str = "How : are | you doing : today?";
var res = str.split(' | ');
var str2 = '';
var i;
for (i = 0; i < res.length; i++) {
str2 += res[i];
if (i != res.length-1) {
str2 += ",";
}
}
var res2 = str2.split(' : ');
//you can add countless options (with or without space)
document.getElementById("demo").innerHTML = res2;
}
</script>
Starting from #stephen-sweriduk solution (that was the more interesting to me!), I have slightly modified it to make more generic and reusable:
/**
* Adapted from: http://stackoverflow.com/questions/650022/how-do-i-split-a-string-with-multiple-separators-in-javascript
*/
var StringUtils = {
/**
* Flatten a list of strings
* http://rosettacode.org/wiki/Flatten_a_list
*/
flatten : function(arr) {
var self=this;
return arr.reduce(function(acc, val) {
return acc.concat(val.constructor === Array ? self.flatten(val) : val);
},[]);
},
/**
* Recursively Traverse a list and apply a function to each item
* #param list array
* #param expression Expression to use in func
* #param func function of (item,expression) to apply expression to item
*
*/
traverseListFunc : function(list, expression, index, func) {
var self=this;
if(list[index]) {
if((list.constructor !== String) && (list[index].constructor === String))
(list[index] != func(list[index], expression)) ? list[index] = func(list[index], expression) : null;
(list[index].constructor === Array) ? self.traverseListFunc(list[index], expression, 0, func) : null;
(list.constructor === Array) ? self.traverseListFunc(list, expression, index+1, func) : null;
}
},
/**
* Recursively map function to string
* #param string
* #param expression Expression to apply to func
* #param function of (item, expressions[i])
*/
mapFuncToString : function(string, expressions, func) {
var self=this;
var list = [string];
for(var i=0, len=expressions.length; i<len; i++) {
self.traverseListFunc(list, expressions[i], 0, func);
}
return self.flatten(list);
},
/**
* Split a string
* #param splitters Array of characters to apply the split
*/
splitString : function(string, splitters) {
return this.mapFuncToString(string, splitters, function(item, expression) {
return item.split(expression);
})
},
}
and then
var stringToSplit = "people and_other/things";
var splitList = [" ", "_", "/"];
var splittedString=StringUtils.splitString(stringToSplit, splitList);
console.log(splitList, stringToSplit, splittedString);
that gives back as the original:
[ ' ', '_', '/' ] 'people and_other/things' [ 'people', 'and', 'other', 'things' ]
An easy way to do this is to process each character of the string with each delimiter and build an array of the splits:
splix = function ()
{
u = [].slice.call(arguments); v = u.slice(1); u = u[0]; w = [u]; x = 0;
for (i = 0; i < u.length; ++i)
{
for (j = 0; j < v.length; ++j)
{
if (u.slice(i, i + v[j].length) == v[j])
{
y = w[x].split(v[j]); w[x] = y[0]; w[++x] = y[1];
};
};
};
return w;
};
console.logg = function ()
{
document.body.innerHTML += "<br>" + [].slice.call(arguments).join();
}
splix = function() {
u = [].slice.call(arguments);
v = u.slice(1);
u = u[0];
w = [u];
x = 0;
console.logg("Processing: <code>" + JSON.stringify(w) + "</code>");
for (i = 0; i < u.length; ++i) {
for (j = 0; j < v.length; ++j) {
console.logg("Processing: <code>[\x22" + u.slice(i, i + v[j].length) + "\x22, \x22" + v[j] + "\x22]</code>");
if (u.slice(i, i + v[j].length) == v[j]) {
y = w[x].split(v[j]);
w[x] = y[0];
w[++x] = y[1];
console.logg("Currently processed: " + JSON.stringify(w) + "\n");
};
};
};
console.logg("Return: <code>" + JSON.stringify(w) + "</code>");
};
setTimeout(function() {
console.clear();
splix("1.23--4", ".", "--");
}, 250);
#import url("http://fonts.googleapis.com/css?family=Roboto");
body {font: 20px Roboto;}
Usage: splix(string, delimiters...)
Example: splix("1.23--4", ".", "--")
Returns: ["1", "23", "4"]
Check out my simple library on Github
If you really do not want to visit or interact with the repo, here is the working code:
/**
*
* #param {type} input The string input to be split
* #param {type} includeTokensInOutput If true, the tokens are retained in the splitted output.
* #param {type} tokens The tokens to be employed in splitting the original string.
* #returns {Scanner}
*/
function Scanner(input, includeTokensInOutput, tokens) {
this.input = input;
this.includeTokensInOutput = includeTokensInOutput;
this.tokens = tokens;
}
Scanner.prototype.scan = function () {
var inp = this.input;
var parse = [];
this.tokens.sort(function (a, b) {
return b.length - a.length; //ASC, For Descending order use: b - a
});
for (var i = 0; i < inp.length; i++) {
for (var j = 0; j < this.tokens.length; j++) {
var token = this.tokens[j];
var len = token.length;
if (len > 0 && i + len <= inp.length) {
var portion = inp.substring(i, i + len);
if (portion === token) {
if (i !== 0) {//avoid empty spaces
parse[parse.length] = inp.substring(0, i);
}
if (this.includeTokensInOutput) {
parse[parse.length] = token;
}
inp = inp.substring(i + len);
i = -1;
break;
}
}
}
}
if (inp.length > 0) {
parse[parse.length] = inp;
}
return parse;
};
The usage is very straightforward:
var tokens = new Scanner("ABC+DE-GHIJK+LMNOP", false , new Array('+','-')).scan();
console.log(tokens);
Gives:
['ABC', 'DE', 'GHIJK', 'LMNOP']
And if you wish to include the splitting tokens (+ and -) in the output, set the false to true and voila! it still works.
The usage would now be:
var tokens = new Scanner("ABC+DE-GHIJK+LMNOP", true , new Array('+','-')).scan();
and
console.log(tokens);
would give:
['ABC', '+', 'DE', '-', 'GHIJK', '+', 'LMNOP']
ENJOY!
I use regexp:
str = 'Write a program that extracts from a given text all palindromes, e.g. "ABBA", "lamal", "exe".';
var strNew = str.match(/\w+/g);
// Output: ["Write", "a", "program", "that", "extracts", "from", "a", "given", "text", "all", "palindromes", "e", "g", "ABBA", "lamal", "exe"]
I'm new in StackOverflow and JavaScript, I'm trying to get the first letter that repeats from a string considering both uppercase and lowercase letters and counting and obtaining results using the for statement. The problem is that the form I used is too long Analyzing the situation reaches such a point that maybe you can only use a "For" statement for this exercise, which I get to iterate, but not with a cleaner and reduced code has me completely blocked, this is the reason why I request help to understand and continue with the understanding and use of this sentence. In this case, the result was tested in a JavaScript script inside a function and 3 "For" sentences obtaining quite positive results, but I can not create it in 1 only For (Sorry for my bad english google translate)
I making in HTML with JavasScript
var letter = "SYAHSVCXCyXSssssssyBxAVMZsXhZV";
var contendor = [];
var calc = [];
var mycalc = 0;
letter = letter.toUpperCase()
console.log(letter)
function repeats(){
for (var i = 0; i < letter.length; i++) {
if (contendor.includes(letter[i])) {
}else{
contendor.push(letter[i])
calc.push(0)
}
}
for (var p = 0; p < letter.length; p++) {
for (var l = 0; l < contendor.length; l++) {
if (letter[p] == contendor[l]) {
calc [l]= calc [l]+1
}
}
}
for (var f = 0; f < calc.length; f++) {
if ( calc[f] > calc[mycalc]) {
mycalc = f
}
}
}
repeats()
console.log("The most repeated letter its: " + contendor[mycalc]);
I Expected: A result with concise code
It would probably be a lot more concise to use a regular expression: match a character, then lookahead for more characters until you can match that first character again:
var letter = "SYAHSVCXCyXSssssssyBxAVMZsXhZV";
const firstRepeatedRegex = /(.)(?=.*\1)/;
console.log(letter.match(firstRepeatedRegex)[1]);
Of course, if you aren't sure whether a given string contains a repeated character, check that the match isn't null before trying to extract the character:
const input = 'abcde';
const firstRepeatedRegex = /(.)(?=.*\1)/;
const match = input.match(firstRepeatedRegex);
if (match) {
console.log(match[0]);
} else {
console.log('No repeated characters');
}
You could also turn the input into an array and use .find to find the first character whose lastIndexOf is not the same as the index of the character being iterated over:
const getFirstRepeatedCharacter = (str) => {
const chars = [...str];
const char = chars.find((char, i) => chars.lastIndexOf(char) !== i);
return char || 'No repeated characters';
};
console.log(getFirstRepeatedCharacter('abcde'));
console.log(getFirstRepeatedCharacter('SYAHSVCXCyXSssssssyBxAVMZsXhZV'));
If what you're actually looking for is the character that occurs most often, case-insensitive, use reduce to transform the string into an object indexed by character, whose values are the number of occurrences of that character, then identify the largest value:
const getMostRepeatedCharacter = (str) => {
const charsByCount = [...str.toUpperCase()].reduce((a, char) => {
a[char] = (a[char] || 0) + 1;
return a;
}, {});
const mostRepeatedEntry = Object.entries(charsByCount).reduce((a, b) => a[1] >= b[1] ? a : b);
return mostRepeatedEntry[0];
};
console.log(getMostRepeatedCharacter('abcde'));
console.log(getMostRepeatedCharacter('SYAHSVCXCyXSssssssyBxAVMZsXhZV'));
If the first repeated character is what you want, you can push it into an array and check if the character already exists
function getFirstRepeating( str ){
chars = []
for ( var i = 0; i < str.length; i++){
var char = str.charAt(i);
if ( chars.includes( char ) ){
return char;
} else {
chars.push( char );
}
}
return -1;
}
This will return the first repeating character if it exists, or will return -1.
Working
function getFirstRepeating( str ){
chars = []
for ( var i = 0; i < str.length; i++){
var char = str.charAt(i);
if ( chars.includes( char ) ){
return char;
} else {
chars.push( char );
}
}
return -1;
}
console.log(getFirstRepeating("SYAHSVCXCyXSssssssyBxAVMZsXhZV"))
Have you worked with JavaScript objects yet?
You should look into it.
When you loop through your string
let characters = "hemdhdksksbbd";
let charCount = {};
let max = { count: 0, ch: ""}; // will contain max
// rep letter
//Turn string into an array of letters and for
// each letter create a key in the charcount
// object , set it to 1 (meaning that's the first of
// that letter you've found) and any other time
// you see the letter, increment by 1.
characters.split("").forEach(function(character)
{
if(!charCount[character])
charCount[character] = 1;
else
charCount[character]++;
}
//charCount should now contain letters and
// their counts.
//Get the letters from charCount and find the
// max count
Object.keys(charCount). forEach (function(ch){
if(max.count < charCount[ch])
max = { count: charCount[ch], ch: ch};
}
console.log("most reps is: " , max.ch);
This is a pretty terrible solution. It takes 2 loops (reduce) and doesn't handle ties, but it's short and complicated.
Basically keep turning the results into arrays and use array methods split and reduce to find the answer. The first reduce is wrapped in Object.entries() to turn the object back into an array.
let letter = Object.entries(
"SYAHSVCXCyXSssssssyBxAVMZsXhZV".
toUpperCase().
split('').
reduce((p, c) => {
p[c] = isNaN(++p[c]) ? 1 : p[c];
return p;
}, {})
).
reduce((p, c) => p = c[1] > p[1] ? c : p);
console.log(`The most repeated letter is ${letter[0]}, ${letter[1]} times.`);
Say you have the following string:
FJKAUNOJDCUTCRHBYDLXKEODVBWTYPTSHASQQFCPRMLDXIJMYPVOHBDUGSMBLMVUMMZYHULSUIZIMZTICQORLNTOVKVAMQTKHVRIFMNTSLYGHEHFAHWWATLYAPEXTHEPKJUGDVWUDDPRQLUZMSZOJPSIKAIHLTONYXAULECXXKWFQOIKELWOHRVRUCXIAASKHMWTMAJEWGEESLWRTQKVHRRCDYXNT
LDSUPXMQTQDFAQAPYBGXPOLOCLFQNGNKPKOBHZWHRXAWAWJKMTJSLDLNHMUGVVOPSAMRUJEYUOBPFNEHPZZCLPNZKWMTCXERPZRFKSXVEZTYCXFRHRGEITWHRRYPWSVAYBUHCERJXDCYAVICPTNBGIODLYLMEYLISEYNXNMCDPJJRCTLYNFMJZQNCLAGHUDVLYIGASGXSZYPZKLAWQUDVNTWGFFY
FFSMQWUNUPZRJMTHACFELGHDZEJWFDWVPYOZEVEJKQWHQAHOCIYWGVLPSHFESCGEUCJGYLGDWPIWIDWZZXRUFXERABQJOXZALQOCSAYBRHXQQGUDADYSORTYZQPWGMBLNAQOFODSNXSZFURUNPMZGHTAJUJROIGMRKIZHSFUSKIZJJTLGOEEPBMIXISDHOAIFNFEKKSLEXSJLSGLCYYFEQBKIZZTQQ
XBQZAPXAAIFQEIXELQEZGFEPCKFPGXULLAHXTSRXDEMKFKABUTAABSLNQBNMXNEPODPGAORYJXCHCGKECLJVRBPRLHORREEIZOBSHDSCETTTNFTSMQPQIJBLKNZDMXOTRBNMTKHHCZQQMSLOAXJQKRHDGZVGITHYGVDXRTVBJEAHYBYRYKJAVXPOKHFFMEPHAGFOOPFNKQAUGYLVPWUJUPCUGGIXGR
AMELUTEPYILBIUOCKKUUBJROQFTXMZRLXBAMHSDTEKRRIKZUFNLGTQAEUINMBPYTWXULQNIIRXHHGQDPENXAJNWXULFBNKBRINUMTRBFWBYVNKNKDFR
I'm trying to find the smallest substring containing the letters ABCDA.
I tried a regex approach.
console.log(str.match(/[A].*?[B].*?[C].*?[D].*?[A]/gm).sort((a, b) => a.length - b.length)[0]);
This works, but it only find strings where ABCDA appear (in that order). Meaning it won't find substring where the letters appear in a order like this: BCDAA
I'm trying to change my regex to account for this. How would I do that without using | and type out all the different cases?
You can't.
Let's consider a special case: Assume the letters you are looking for are A, A, and B. At some point in your regexp there will certainly be a B. However, the parts to the left and to the right of the B are independent of each other, so you cannot refer from one to the other. How many As are matched in the subexpression to the right of the B depends on the number of As being already matched in the left part. This is not possible with regular expressions, so you will have to unfold all the different orders, which can be many!
Another popular example that illustrates the problem is to match opening brackets with closing brackets. It's not possible to write a regular expression asserting that in a given string a sequence of opening brackets is followed by a sequence of closing brackets of the same length. The reason for this is that to count the brackets you would need a stack machine in contrast to a finite state machine but regular expressions are limited to patterns that can be matched using FSMs.
This algorithm doesn't use a regex, but found both solutions as well.
var haystack = 'FJKAUNOJDCUTCRHBYDLXKEODVBWTYPTSHASQQFCPRMLDXIJMYPVOHBDUGSMBLMVUMMZYHULSUIZIMZTICQORLNTOVKVAMQTKHVRIFMNTSLYGHEHFAHWWATLYAPEXTHEPKJUGDVWUDDPRQLUZMSZOJPSIKAIHLTONYXAULECXXKWFQOIKELWOHRVRUCXIAASKHMWTMAJEWGEESLWRTQKVHRRCDYXNTLDSUPXMQTQDFAQAPYBGXPOLOCLFQNGNKPKOBHZWHRXAWAWJKMTJSLDLNHMUGVVOPSAMRUJEYUOBPFNEHPZZCLPNZKWMTCXERPZRFKSXVEZTYCXFRHRGEITWHRRYPWSVAYBUHCERJXDCYAVICPTNBGIODLYLMEYLISEYNXNMCDPJJRCTLYNFMJZQNCLAGHUDVLYIGASGXSZYPZKLAWQUDVNTWGFFYFFSMQWUNUPZRJMTHACFELGHDZEJWFDWVPYOZEVEJKQWHQAHOCIYWGVLPSHFESCGEUCJGYLGDWPIWIDWZZXRUFXERABQJOXZALQOCSAYBRHXQQGUDADYSORTYZQPWGMBLNAQOFODSNXSZFURUNPMZGHTAJUJROIGMRKIZHSFUSKIZJJTLGOEEPBMIXISDHOAIFNFEKKSLEXSJLSGLCYYFEQBKIZZTQQXBQZAPXAAIFQEIXELQEZGFEPCKFPGXULLAHXTSRXDEMKFKABUTAABSLNQBNMXNEPODPGAORYJXCHCGKECLJVRBPRLHORREEIZOBSHDSCETTTNFTSMQPQIJBLKNZDMXOTRBNMTKHHCZQQMSLOAXJQKRHDGZVGITHYGVDXRTVBJEAHYBYRYKJAVXPOKHFFMEPHAGFOOPFNKQAUGYLVPWUJUPCUGGIXGRAMELUTEPYILBIUOCKKUUBJROQFTXMZRLXBAMHSDTEKRRIKZUFNLGTQAEUINMBPYTWXULQNIIRXHHGQDPENXAJNWXULFBNKBRINUMTRBFWBYVNKNKDFR';
var needle = 'ABCDA'; // the order of letters doesn't matter
var letters = {};
needle.split('').forEach(function(ch) {
letters[ch] = letters[ch] || 0;
letters[ch]++;
});
var shortestSubstringLength = haystack.length;
var shortestSubstrings = []; // storage for found substrings
var startingPos = 0;
var length;
var currentPos;
var notFound;
var letterKeys = Object.keys(letters); // unique leters
do {
lettersLeft = JSON.parse(JSON.stringify(letters)); // copy letters count object
notFound = false;
posStart = haystack.length;
posEnd = 0;
letterKeys.forEach(function(ch) {
currentPos = startingPos;
while (!notFound && lettersLeft[ch] > 0) {
currentPos = haystack.indexOf(ch, currentPos);
if (currentPos >= 0) {
lettersLeft[ch]--;
posStart = Math.min(currentPos, posStart);
posEnd = Math.max(currentPos, posEnd);
currentPos++;
} else {
notFound = true;
}
}
});
if (!notFound) {
length = posEnd - posStart + 1;
startingPos = posStart + 1; // starting position for next iteration
}
if (!notFound && length === shortestSubstringLength) {
shortestSubstrings.push(haystack.substr(posStart, length));
}
if (!notFound && length < shortestSubstringLength) {
shortestSubstrings = [haystack.substr(posStart, length)];
shortestSubstringLength = length;
}
} while (!notFound);
console.log(shortestSubstrings);
Maybe not as clear as using regex could be (well, for me regex are never really clear :D ) you can use brute force (not so brute)
Create an index of "valid" points of your string (those with the letters you want) and iterate with a double loop over it getting substrings containing at least 5 of those points, checking that they are valid solutions. Maybe not the most efficient way, but easy to implement, to understand, and probably to optimize.
var haystack="UGDVWUDDPRQLUZMSZOJPSIKAIHLTONYXAULECXXKWFQOIKELWOHRVRUCXIAASKHMWTMAJEWGEESLWRTQKVHRRCDYXNTLDSUPXMQTQDFAQAPYBGXPOLOCLFQNGNKPKOBHZWHRXAWAWJKMTJSLDLNHMUGVVOPSAMRUJEYUOBPFNEHPZZCLPNZKWMTCXERPZRFKSXVEZTYCXFRHRGEITWHRRYPWSVAYBUHCERJXDCYAVICPTNBGIODLYLMEYLISEYNXNMCDPJJRCTLYNFMJZQNCLAGHUDVLYIGASGXSZYPZKLAWQUDVNTWGFFYFFSMQWUNUPZRJMTHACFELGHDZEJWFDWVPYOZEVEJKQWHQAHOCIYWGVLPSHFESCGEUCJGYLGDWPIWIDWZZXRUFXERABQJOXZALQOCSAYBRHXQQGUDADYSORTYZQPWGMBLNAQOFODSNXSZFURUNPMZGHTAJUJROIGMRKIZHSFUSKIZJJTLGOEEPBMIXISDHOAIFNFEKKSLEXSJLSGLCYYFEQBKIZZTQQXBQZAPXAAIFQEIXELQEZGFEPCKFPGXULLAHXTSRXDEMKFKABUTAABSLNQBNMXNEPODPGAORYJXCHCGKECLJVRBPRLHORREEIZOBSHDSCETTTNFTSMQPQIJBLKNZDMXOTRBNMTKHHCZQQMSLOAXJQKRHDGZVGITHYGVDXRTVBJEAHYBYRYKJAVXPOKHFFMEPHAGFOOPFNKQAUGYLVPWUJUPCUGGIXGR";
var needle="ABCD";
var size=haystack.length;
var candidate_substring="";
var minimal_length=size;
var solutions=new Array();
var points=Array();
for(var i=0;i<size;i++){
if(needle.indexOf(haystack[i])>-1) points.push(i);
}
var limit_i= points.length-4;
var limit_k= points.length;
for (var i=0;i<limit_i;i++){
for(var k=i;k<limit_k;k++){
if(points[k]-points[i]+1<=minimal_length){
candidate_substring=haystack.substr(points[i],points[k]-points[i]+1);
if(is_valid(candidate_substring)){
solutions.push(candidate_substring);
if(candidate_substring.length < minimal_length) minimal_length=candidate_substring.length;
}
}
}
}
document.write('<p>Solution length:'+minimal_length+'<p>');
for(var i=0;i<solutions.length;i++){
if(solutions[i].length<=minimal_length) document.write('<p>Solution:'+solutions[i]+'<p>');
}
function is_valid(candidate_substring){
//verify we've got all characters
for(var j=0;j<candidate_substring.length;j++){
if(candidate_substring.indexOf(needle.charAt(j))<0) return false;
}
//...and verify we have two "A"
if(candidate_substring.indexOf("A")==candidate_substring.lastIndexOf("A")) return false;
return true;
}
Just had this problem in an interview as a coding assignment and came up with another solution, (it's not as optimal as the one above but maybe it's easier to understand).
function MinWindowSubstring(strArr) {
const N = strArr[0];
const K = strArr[1];
const letters = {};
K.split('').forEach( (character) => {
letters[character] = letters[character] ? letters[character] + 1 : 1;
});
let possibleSequencesList = [];
const letterKeys = Object.keys(letters);
for(let i=0; i< N.length; i++) {
const char = N[i];
if (new String(letterKeys).indexOf(char) !== -1) {
// found a character in the string
// update all previus sequences
possibleSequencesList.forEach((seq) => {
if(!seq.sequenceComplete) {
seq[char] = seq[char]-1;
seq.lastIndex = i;
// check if sequence is complete
var sequenceComplete = true;
letterKeys.forEach( (letter) => {
if(seq[letter] > 0) {
sequenceComplete = false;
}
});
seq.sequenceComplete = sequenceComplete
}
})
// create a new sequence starting from it
const newSeq = {
startPoint: i,
lastIndex: i,
sequenceComplete: false,
...letters
}
newSeq[char] = newSeq[char]-1;
possibleSequencesList.push(newSeq);
}
}
// cleanup sequences
let sequencesList = possibleSequencesList.filter(sequence => sequence.sequenceComplete);
let output = [];
let minLength = N.length;
// find the smalles one
sequencesList.forEach( seq => {
if( (seq.lastIndex - seq.startPoint) < minLength) {
minLength = seq.lastIndex - seq.startPoint;
output = N.substring(seq.startPoint, seq.lastIndex + 1);
}
})
return output;
}
Given string in the form:
'"abc",ab(),c(d(),e()),f(g(),zyx),h(123)'
How can I split it to get the below array format:
abc
ab()
c(d(),e())
f(g(),zyx)
h(123)
I have tried normal javascript split, however it doesn't work as desired. Trying Regular Expression but not yet successful.
You can keep track of the parentheses, and add those expressions when the left and right parens equalize.
For example-
function splitNoParen(s){
var left= 0, right= 0, A= [],
M= s.match(/([^()]+)|([()])/g), L= M.length, next, str= '';
for(var i= 0; i<L; i++){
next= M[i];
if(next=== '(')++left;
else if(next=== ')')++right;
if(left!== 0){
str+= next;
if(left=== right){
A[A.length-1]+=str;
left= right= 0;
str= '';
}
}
else A=A.concat(next.match(/([^,]+)/g));
}
return A;
}
var s1= '"abc",ab(),c(d(),e()),f(g(),zyx),h(123)';
splitNoParen(s1).join('\n');
/* returned value: (String)
"abc"
ab()
c(d(),e())
f(g(),zyx)
h(123)
*/
This might be not the best or more refined solution, and also maybe won't fit every single possibility, but based on your example it works:
var data = '"abc",ab(),c(d(),e()),f(g(),zyx),h(123)';
// Create a preResult splitting the commas.
var preResult = data.replace(/"/g, '').split(',');
// Create an empty result.
var result = [];
for (var i = 0; i < preResult.length; i++) {
// Check on every preResult if the number of parentheses match.
// Opening ones...
var opening = preResult[i].match(/\(/g) || 0;
// Closing ones...
var closing = preResult[i].match(/\)/g) || 0;
if (opening != 0 &&
closing != 0 &&
opening.length != closing.length) {
// If the current item contains a different number of opening
// and closing parentheses, merge it with the next adding a
// comma in between.
result.push(preResult[i] + ',' + preResult[i + 1]);
i++;
} else {
// Leave it as it is.
result.push(preResult[i]);
}
}
Demo
For future reference, here's another approach to top-level splitting, using string.replace as a control flow operator:
function psplit(s) {
var depth = 0, seg = 0, rv = [];
s.replace(/[^(),]*([)]*)([(]*)(,)?/g,
function (m, cls, opn, com, off, s) {
depth += opn.length - cls.length;
var newseg = off + m.length;
if (!depth && com) {
rv.push(s.substring(seg, newseg - 1));
seg = newseg;
}
return m;
});
rv.push(s.substring(seg));
return rv;
}
console.log(psplit('abc,ab(),c(d(),e()),f(g(),zyx),h(123)'))
["abc", "ab()", "c(d(),e())", "f(g(),zyx)", "h(123)"]
Getting it to handle quotes as well would not be too complicated, but at some point you need to decide to use a real parser such as jison, and I suspect that would be the point. In any event, there's not enough detail in the question to know what the desired handling of double quotes is.
You can't use .split for this, but instead you'll have to write a small parser like this:
function splitNoParen(s){
let results = [];
let next;
let str = '';
let left = 0, right = 0;
function keepResult() {
results.push(str);
str = '';
}
for(var i = 0; i<s.length; i++) {
switch(s[i]) {
case ',':
if((left === right)) {
keepResult();
left = right = 0;
} else {
str += s[i];
}
break;
case '(':
left++;
str += s[i];
break;
case ')':
right++;
str += s[i];
break;
default:
str += s[i];
}
}
keepResult();
return results;
}
var s1= '"abc",ab(),c(d(),e()),f(g(),zyx),h(123)';
console.log(splitNoParen(s1).join('\n'));
var s2='cats,(my-foo)-bar,baz';
console.log(splitNoParen(s2).join('\n'));
Had a similar issue and existing solutions were hard to generalize. So here's another parser that's a bit more readable and easier to extend to your personal needs. It'll also work with curly braces, brackets, normal braces, and strings of any type. License is MIT.
/**
* This function takes an input string and splits it by the given token, but only if the token is not inside
* braces of any kind, or a string.
* #param {string} input The string to split.
* #param {string} split_by Must be a single character.
* #returns {string[]} An array of split parts without the split_by character.
*/
export function parse_split(input:string, split_by:string = ",") : string[]
{
// Javascript has 3 types of strings
const STRING_TYPES = ["'","`","\""] as const;
// Some symbols can be nested, like braces, and must be counted
const state = {"{":0,"[":0,"(":0};
// Some cannot be nested, like a string, and just flip a flag.
// Additionally, once the string flag has been flipped, it can only be unflipped
// by the same token.
let string_state : (typeof STRING_TYPES)[number] | undefined = undefined
// Nestable symbols come in sets, usually in pairs.
// These sets increase or decrease the state, depending on the symbol.
const pairs : Record<string,[keyof typeof state,number]> = {
"{":["{",1],
"}":["{",-1],
"[":["[",1],
"]":["[",-1],
"(":["(",1],
")":["(",-1]
}
let start = 0;
let results = [];
let length = input.length;
for(let i = 0; i < length; ++i)
{
let char = input[i];
// Backslash escapes the next character. We directly skip 2 characters by incrementing i one extra time.
if(char === "\\")
{
i++;
continue;
}
// If the symbol exists in the single/not nested state object, flip the corresponding state flag.
if(char == string_state)
{
string_state = undefined;
console.log("Closed string ", string_state);
}
// if it's not in a string, but it's a string opener, remember the string type in string_state.
else if(string_state === undefined && STRING_TYPES.includes(char as typeof STRING_TYPES[number]))
{
string_state = char as typeof STRING_TYPES[number];
}
// If it's not in a string, and if it's a paired symbol, increase or decrease the state based on our "pairs" constant.
else if(string_state === undefined && (char in pairs) )
{
let [key,value] = pairs[char];
state[key] += value;
}
// If it's our split symbol...
else if(char === split_by)
{
// ... check whether any flags are active ...
if(Object.entries(state).every(([k,v])=>v == 0) && (string_state === undefined))
{
// ... if not, then this is a valid split.
results.push(input.substring(start,i))
start = i+1;
}
}
}
// Add the last segment if the string didn't end in the split_by symbol, otherwise add an empty string
if(start < input.length)
{
results.push(input.substring(start,input.length))
}
else
results.push("");
return results;
}
With this regex, it makes the job:
const regex = /,(?![^(]*\))/g;
const str = '"abc",ab(),c(d(),e()),f(g(),zyx),h(123)';
const result = str.split(regex);
console.log(result);
Javascript
var str='"abc",ab(),c(d(),e()),f(g(),zyx),h(123)'
str.split('"').toString().split(',').filter(Boolean);
this should work