Splitting an array at only certain places but not others - javascript

I understand the .split() function quite well. But what I can seem to figure out is how to split in certain places but not in others. Sounds confusing? Well I mean for example, lets say I use .split(",") on the following string:
div:(li,div),div
Is it possible to split it so that only the commas ouside of the parentheses get split.
So the string above with the split method should return:
['div:(li,div)', 'div']
Of course at the moment it is also splitting the first comma inside of the parentheses, returning:
['div:(li', 'div)', 'div']
Is there some way to make this work like I desire?

If your expected strings are not going to become more complicated than this, you don't have to worry about writing code to parse them. Regex will work just fine.
http://jsfiddle.net/dC5HN/1/
var str = "div:(li,div),div:(li,div),div";
var parts = str.split(/,(?=(?:[^\)]|\([^\)]*\))*$)/g);
console.log(parts);
outputs:
["div:(li,div)", "div:(li,div)", "div"]

REGEX is not built for this sort of thing, which is essentially parsing.
When faced with this sort of situation previously I've first temporarily replaced the parenthesised parts with a placeholder, then split, then replaced the placeholders with the original parenthised parts.
A bit hacky, but it works:
var str = 'div:(li,div),div',
repls = [];
//first strip out parenthesised parts and store in array
str = str.replace(/\([^\)]*\)/g, function($0) {
repls.push($0);
return '*repl'+(repls.length - 1)+'*';
});
//with the parenthisised parts removed, split the string then iteratively
//reinstate the removed parenthisised parts
var pieces = str.split(',').map(function(val, index) {
return val.replace(/\*repl(\d+)\*/, function($0, $1) {
return repls[$1];
});
});
//test
console.log(pieces); //["div:(li,div)","div"]

This function will split whatever you specify in splitChar, but ignore that value if inside parenthesis:
function customSplit(stringToSplit, splitChar){
var arr = new Array();
var isParenOpen = 0
var curChar;
var curString = "";
for (var i = 0; i < stringToSplit.length; i++) {
curChar = stringToSplit.substr(i, 1);
switch(curChar) {
case "(":
isParenOpen++;
break;
case ")":
if(isParenOpen > 0) isParenOpen--;
break;
case splitChar:
if (isParenOpen < 1) {
arr.push(curString);
curString = "";
continue;
}
}
curString += curChar;
}
if (curString.length > 0) {
arr.push(curString);
}
return arr;
}

Related

Using regular expression to split a string

I have a string which I need to separate correctly:
self.view.frame.size.height = 44
I need to get only view, frame, size, and height. And I need to do it with a regular expression.
So far I've tried a lot of variants, none of them are even close to what I want to get. And my code now looks like this:
var testString = 'self.view.frame.size.height = 44'
var re = new RegExp('\\.(.*)\\.', "g")
var array = re.exec(testString);
console.log('Array length is ' + array.length)
for (var i = 0; i < array.length; i++) {
console.log('<' + array[i] + ">");
}
And it doesn't work at all:
Array length is 2
<.view.frame.size.>
<view.frame.size>
I'm new at Javascript, so maybe I want the impossible, let me know.
Thanks.
In Javascript, executing a regexp with the g modifier doesn't return all the matches at once. You have to execute it repeatedly on the same input string, and each one returns the next match.
You also need to change the regexp so it only returns one word at a time. .* is greedy, so it returns the longest possible match, so it was returning all the words between the first and last .. [^.]* will match a sequence of non-dot characters, so it will just return one word. You can't include the second . in the regexp, because that will interfere with the repetition -- each repetition starts searching after the end of the previous match, and there's no beginning . after the ending . of the word. Also, there's no . after height, so the last word won't match it.
EDIT: I've changed the regexp to use \w* instead of [^.]*, because it was grabbing the whole height = 44 string instead of just height.
var testString = 'self.view.frame.size.height = 44';
var re = /\.(\w*)/g;
var array = [];
var result;
while (result = re.exec(testString)) {
array.push(result[1]);
}
console.log('Array length is ' + array.length)
for (var i = 0; i < array.length; i++) {
console.log('<' + array[i] + ">");
}
If you're sure that your data will be always in the same format you can use this:
function parse (string) {
return string.split(" = ").shift().split(".").splice(1);
}
In your context, split is a MUCH better option:
var str = "self.view.frame.size.height = 44";
var bits1 = str.split(" ")[0];
var bits2 = bits1.split(".");
bits2.shift(); // get rid of the unwanted self
console.log(bits2);

Javascript split at multiple delimters while keeping delimiters

Is there a better way than what I have (through regex, for instance) to turn
"div#container.blue"
into this
["div", "#container", ".blue"];
Here's what I've have...
var arr = [];
function process(h1, h2) {
var first = h1.split("#");
arr.push(first[0]);
var secondarr = first[1].split(".");
secondarr[0] = "#" + secondarr[0];
arr.push(secondarr[0]);
for (i = 1; i< secondarr.length; i++) {
arr.push(secondarr[i] = "." + secondarr[i]);
}
return arr;
}
Why not something like this?
'div#container.blue'.split(/(?=[#.])/);
Because it's simply looking for a place where the next character is either # or the literal ., this does not capture anything, which makes it a zero length match. Because it's zero-length match, nothing is removed.
As you've probably found, the issue is that split removes the item you're splitting on. You can solve that with regex capturing groups (the parenthesis):
var result = 'div#container.blue'.split(/(#[^#|^.]*)|(\.[^#|^.]*)/);
Now we've got the issue that result contains a lot of falsy values you don't want. A quick filter fixes that:
var result = 'div#container.blue'.split(/(#[^#|^.]*)|(\.[^#|^.]*)/).filter(function(x) {
return !!x;
});
Appendix A: What the heck is that regex
I'm assuming you're only concerned with # and . as characters. That still gives us this monster: /(#[^#|^.]*)|(\.[^#|^.]*)/
This means we'll capture either a # or ., and then all the characters up until the next # or . (remembering that a period is significant in regex, so we need to escape it, unless we're inside the brackets).
I've written an extensions of the Script type for you. It allows you to choose which delimiters to use, passing them in a string:
String.prototype.splitEx = function(delimiters) {
var parts = [];
var current = '';
for (var i = 0; i < this.length; i++) {
if (delimiters.indexOf(this[i]) < 0) current += this[i];
else {
parts.push(current);
current = this[i];
}
}
parts.push(current);
return parts;
};
var text = 'div#container.blue';
console.log(text.splitEx('#.'));

Replace all without a regex where can I use the G

So I have the following:
var token = '[token]';
var tokenValue = 'elephant';
var string = 'i have a beautiful [token] and i sold my [token]';
string = string.replace(token, tokenValue);
The above will only replace the first [token] and leave the second on alone.
If I were to use regex I could use it like
string = string.replace(/[token]/g, tokenValue);
And this would replace all my [tokens]
However I don't know how to do this without the use of //
I have found split/join satisfactory enough for most of my cases.
A real-life example:
myText.split("\n").join('<br>');
Why not replace the token every time it appears with a do while loop?
var index = 0;
do {
string = string.replace(token, tokenValue);
} while((index = string.indexOf(token, index + 1)) > -1);
string = string.replace(new RegExp("\\[token\\]","g"), tokenValue);
Caution with the accepted answer, the replaceWith string can contain the inToReplace string, in which case there will be an infinite loop...
Here a better version:
function replaceSubstring(inSource, inToReplace, inReplaceWith)
{
var outString = [];
var repLen = inToReplace.length;
while (true)
{
var idx = inSource.indexOf(inToReplace);
if (idx == -1)
{
outString.push(inSource);
break;
}
outString.push(inSource.substring(0, idx))
outString.push(inReplaceWith);
inSource = inSource.substring(idx + repLen);
}
return outString.join("");
}
"[.token.*] nonsense and [.token.*] more nonsense".replace("[.token.*]", "some", "g");
Will produce:
"some nonsense and some more nonsense"
I realized that the answer from #TheBestGuest won't work for the following example as you will end up in an endless loop:
var stringSample= 'CIC';
var index = 0;
do { stringSample = stringSample.replace('C', 'CC'); }
while((index = stringSample.indexOf('C', index + 1)) > -1);
So here is my proposition for replaceAll method written in TypeScript:
let matchString = 'CIC';
let searchValueString= 'C';
let replacementString ='CC';
matchString = matchString.split(searchValueString).join(replacementString);
console.log(matchString);
Unfortunately since Javascript's string replace() function doesn't let you start from a particular index, and there is no way to do in-place modifications to strings it is really hard to do this as efficiently as you could in saner languages.
.split().join() isn't a good solution because it involves the creation of a load of strings (although I suspect V8 does some dark magic to optimise this).
Calling replace() in a loop is a terrible solution because replace starts its search from the beginning of the string every time. This is going to lead to O(N^2) behaviour! It also has issues with infinite loops as noted in the answers here.
A regex is probably the best solution if your replacement string is a compile time constant, but if it isn't then you can't really use it. You should absolutely not try and convert an arbitrary string into a regex by escaping things.
One reasonable approach is to build up a new string with the appropriate replacements:
function replaceAll(input: string, from: string, to: string): string {
const fromLen = from.length;
let output = "";
let pos = 0;
for (;;) {
let matchPos = input.indexOf(from, pos);
if (matchPos === -1) {
output += input.slice(pos);
break;
}
output += input.slice(pos, matchPos);
output += to;
pos = matchPos + fromLen;
}
return output;
}
I benchmarked this against all the other solutions (except calling replace() in a loop which is going to be terrible) and it came out slightly faster than a regex, and about twice as fast as split/join.
Edit: This is almost the same method as Stefan Steiger's answer which I totally missed for some reason. However his answer still uses .join() for some reason which makes it 4 times slower than mine.

Cutting a string at nth occurrence of a character

What I want to do is take a string such as "this.those.that" and get a substring to or from the nth occurrence of a character. So, from the start of the string to the 2nd occurrence of . would return "this.those". Likewise, from the 2nd occurrence of . to the end of the string would return "that". Sorry if my question is foggy, it's not that easy to explain. Also, please do not suggest making extra variables, and the result will be in a string and not an array.
You could do it without arrays, but it would take more code and be less readable.
Generally, you only want to use as much code to get the job done, and this also increases readability. If you find this task is becoming a performance issue (benchmark it), then you can decide to start refactoring for performance.
var str = 'this.those.that',
delimiter = '.',
start = 1,
tokens = str.split(delimiter).slice(start),
result = tokens.join(delimiter); // those.that
console.log(result)
// To get the substring BEFORE the nth occurence
var tokens2 = str.split(delimiter).slice(0, start),
result2 = tokens2.join(delimiter); // this
console.log(result2)
jsFiddle.
Try this :
"qwe.fs.xczv.xcv.xcv.x".replace(/([^\.]*\.){3}/, '');
"xcv.xcv.x"
"qwe.fs.xczv.xcv.xcv.x".replace(/([^\.]*\.){**nth**}/, ''); - where is nth is the amount of occurrence to remove.
I'm perplexed as to why you want to do things purely with string functions, but I guess you could do something like the following:
//str - the string
//c - the character or string to search for
//n - which occurrence
//fromStart - if true, go from beginning to the occurrence; else go from the occurrence to the end of the string
var cut = function (str, c, n, fromStart) {
var strCopy = str.slice(); //make a copy of the string
var index;
while (n > 1) {
index = strCopy.indexOf(c)
strCopy = strCopy.substring(0, index)
n--;
}
if (fromStart) {
return str.substring(0, index);
} else {
return str.substring(index+1, str.length);
}
}
However, I'd strongly advocate for something like alex's much simpler code.
Just in case somebody needs both "this" and "those.that" in a way as alex described in his comment, here is a modified code:
var str = 'this.those.that',
delimiter = '.',
start = 1,
tokens = str.split(delimiter),
result = [tokens.slice(0, start), tokens.slice(start)].map(function(item) {
return item.join(delimiter);
}); // [ 'this', 'those.that' ]
document.body.innerHTML = result;
If you really want to stick to string methods, then:
// Return a substring of s upto but not including
// the nth occurence of c
function getNth(s, c, n) {
var idx;
var i = 0;
var newS = '';
do {
idx = s.indexOf(c);
newS += s.substring(0, idx);
s = s.substring(idx+1);
} while (++i < n && (newS += c))
return newS;
}

Javascript regex - split string

Struggling with a regex requirement. I need to split a string into an array wherever it finds a forward slash. But not if the forward slash is preceded by an escape.
Eg, if I have this string:
hello/world
I would like it to be split into an array like so:
arrayName[0] = hello
arrayName[1] = world
And if I have this string:
hello/wo\/rld
I would like it to be split into an array like so:
arrayName[0] = hello
arrayName[1] = wo/rld
Any ideas?
I wouldn't use split() for this job. It's much easier to match the path components themselves, rather than the delimiters. For example:
var subject = 'hello/wo\\/rld';
var regex = /(?:[^\/\\]+|\\.)+/g;
var matched = null;
while (matched = regex.exec(subject)) {
print(matched[0]);
}
output:
hello
wo\/rld
test it at ideone.com
The following is a little long-winded but will work, and avoids the problem with IE's broken split implementation by not using a regular expression.
function splitPath(str) {
var rawParts = str.split("/"), parts = [];
for (var i = 0, len = rawParts.length, part; i < len; ++i) {
part = "";
while (rawParts[i].slice(-1) == "\\") {
part += rawParts[i++].slice(0, -1) + "/";
}
parts.push(part + rawParts[i]);
}
return parts;
}
var str = "hello/world\\/foo/bar";
alert( splitPath(str).join(",") );
Here's a way adapted from the techniques in this blog post:
var str = "Testing/one\\/two\\/three";
var result = str.replace(/(\\)?\//g, function($0, $1){
return $1 ? '/' : '[****]';
}).split('[****]');
Live example
Given:
Testing/one\/two\/three
The result is:
[0]: Testing
[1]: one/two/three
That first uses the simple "fake" lookbehind to replace / with [****] and to replace \/ with /, then splits on the [****] value. (Obviously, replace [****] with anything that won't be in the string.)
/*
If you are getting your string from an ajax response or a data base query,
that is, the string has not been interpreted by javascript,
you can match character sequences that either have no slash or have escaped slashes.
If you are defining the string in a script, escape the escapes and strip them after the match.
*/
var s='hello/wor\\/ld';
s=s.match(/(([^\/]*(\\\/)+)([^\/]*)+|([^\/]+))/g) || [s];
alert(s.join('\n'))
s.join('\n').replace(/\\/g,'')
/* returned value: (String)
hello
wor/ld
*/
Here's an example at rubular.com
For short code, you can use reverse to simulate negative lookbehind
function reverse(s){
return s.split('').reverse().join('');
}
var parts = reverse(myString).split(/[/](?!\\(?:\\\\)*(?:[^\\]|$))/g).reverse();
for (var i = parts.length; --i >= 0;) { parts[i] = reverse(parts[i]); }
but to be efficient, it's probably better to split on /[/]/ and then walk the array and rejoin elements that have an escape at the end.
Something like this may take care of it for you.
var str = "/hello/wo\\/rld/";
var split = str.replace(/^\/|\\?\/|\/$/g, function(match) {
if (match.indexOf('\\') == -1) {
return '\x00';
}
return match;
}).split('\x00');
alert(split);

Categories