Javascript split at multiple delimters while keeping delimiters - javascript

Is there a better way than what I have (through regex, for instance) to turn
"div#container.blue"
into this
["div", "#container", ".blue"];
Here's what I've have...
var arr = [];
function process(h1, h2) {
var first = h1.split("#");
arr.push(first[0]);
var secondarr = first[1].split(".");
secondarr[0] = "#" + secondarr[0];
arr.push(secondarr[0]);
for (i = 1; i< secondarr.length; i++) {
arr.push(secondarr[i] = "." + secondarr[i]);
}
return arr;
}

Why not something like this?
'div#container.blue'.split(/(?=[#.])/);
Because it's simply looking for a place where the next character is either # or the literal ., this does not capture anything, which makes it a zero length match. Because it's zero-length match, nothing is removed.

As you've probably found, the issue is that split removes the item you're splitting on. You can solve that with regex capturing groups (the parenthesis):
var result = 'div#container.blue'.split(/(#[^#|^.]*)|(\.[^#|^.]*)/);
Now we've got the issue that result contains a lot of falsy values you don't want. A quick filter fixes that:
var result = 'div#container.blue'.split(/(#[^#|^.]*)|(\.[^#|^.]*)/).filter(function(x) {
return !!x;
});
Appendix A: What the heck is that regex
I'm assuming you're only concerned with # and . as characters. That still gives us this monster: /(#[^#|^.]*)|(\.[^#|^.]*)/
This means we'll capture either a # or ., and then all the characters up until the next # or . (remembering that a period is significant in regex, so we need to escape it, unless we're inside the brackets).

I've written an extensions of the Script type for you. It allows you to choose which delimiters to use, passing them in a string:
String.prototype.splitEx = function(delimiters) {
var parts = [];
var current = '';
for (var i = 0; i < this.length; i++) {
if (delimiters.indexOf(this[i]) < 0) current += this[i];
else {
parts.push(current);
current = this[i];
}
}
parts.push(current);
return parts;
};
var text = 'div#container.blue';
console.log(text.splitEx('#.'));

Related

Remove part of string that contains a tag

I have a variable in JavaScript that holds the below value:
<label>AAA</label>
I need just the AAA. I try to replace the characters but it is failing. Would someone please suggest the best approach?
var company="<label>AAA</label>";// I am getting this value from element
var rx = new RegExp("((\\$|)(([1-9]\\d{0,2}(\\,\\d{3})*|([1-9]\\d*))(\\.\\d{2})))|(\\<)*(\\>)");
var arr = rx.exec(company);
var arr1 = company.match(rx);
if (arr[1] != null) {
var co = arr[1].replace(",", "");
}
}
As you say you need only AAA, consider the below code.
I have taken a substring between the first '>' character in the string company, added 1 to that and the last < character. However, if the company var contains more of such < or >, you could go for a regex approach.
var company="<label>AAA</label>";
alert(company.substring(company.indexOf('>')+1, company.lastIndexOf('<')));

Using regular expression to split a string

I have a string which I need to separate correctly:
self.view.frame.size.height = 44
I need to get only view, frame, size, and height. And I need to do it with a regular expression.
So far I've tried a lot of variants, none of them are even close to what I want to get. And my code now looks like this:
var testString = 'self.view.frame.size.height = 44'
var re = new RegExp('\\.(.*)\\.', "g")
var array = re.exec(testString);
console.log('Array length is ' + array.length)
for (var i = 0; i < array.length; i++) {
console.log('<' + array[i] + ">");
}
And it doesn't work at all:
Array length is 2
<.view.frame.size.>
<view.frame.size>
I'm new at Javascript, so maybe I want the impossible, let me know.
Thanks.
In Javascript, executing a regexp with the g modifier doesn't return all the matches at once. You have to execute it repeatedly on the same input string, and each one returns the next match.
You also need to change the regexp so it only returns one word at a time. .* is greedy, so it returns the longest possible match, so it was returning all the words between the first and last .. [^.]* will match a sequence of non-dot characters, so it will just return one word. You can't include the second . in the regexp, because that will interfere with the repetition -- each repetition starts searching after the end of the previous match, and there's no beginning . after the ending . of the word. Also, there's no . after height, so the last word won't match it.
EDIT: I've changed the regexp to use \w* instead of [^.]*, because it was grabbing the whole height = 44 string instead of just height.
var testString = 'self.view.frame.size.height = 44';
var re = /\.(\w*)/g;
var array = [];
var result;
while (result = re.exec(testString)) {
array.push(result[1]);
}
console.log('Array length is ' + array.length)
for (var i = 0; i < array.length; i++) {
console.log('<' + array[i] + ">");
}
If you're sure that your data will be always in the same format you can use this:
function parse (string) {
return string.split(" = ").shift().split(".").splice(1);
}
In your context, split is a MUCH better option:
var str = "self.view.frame.size.height = 44";
var bits1 = str.split(" ")[0];
var bits2 = bits1.split(".");
bits2.shift(); // get rid of the unwanted self
console.log(bits2);

Splitting an array at only certain places but not others

I understand the .split() function quite well. But what I can seem to figure out is how to split in certain places but not in others. Sounds confusing? Well I mean for example, lets say I use .split(",") on the following string:
div:(li,div),div
Is it possible to split it so that only the commas ouside of the parentheses get split.
So the string above with the split method should return:
['div:(li,div)', 'div']
Of course at the moment it is also splitting the first comma inside of the parentheses, returning:
['div:(li', 'div)', 'div']
Is there some way to make this work like I desire?
If your expected strings are not going to become more complicated than this, you don't have to worry about writing code to parse them. Regex will work just fine.
http://jsfiddle.net/dC5HN/1/
var str = "div:(li,div),div:(li,div),div";
var parts = str.split(/,(?=(?:[^\)]|\([^\)]*\))*$)/g);
console.log(parts);
outputs:
["div:(li,div)", "div:(li,div)", "div"]
REGEX is not built for this sort of thing, which is essentially parsing.
When faced with this sort of situation previously I've first temporarily replaced the parenthesised parts with a placeholder, then split, then replaced the placeholders with the original parenthised parts.
A bit hacky, but it works:
var str = 'div:(li,div),div',
repls = [];
//first strip out parenthesised parts and store in array
str = str.replace(/\([^\)]*\)/g, function($0) {
repls.push($0);
return '*repl'+(repls.length - 1)+'*';
});
//with the parenthisised parts removed, split the string then iteratively
//reinstate the removed parenthisised parts
var pieces = str.split(',').map(function(val, index) {
return val.replace(/\*repl(\d+)\*/, function($0, $1) {
return repls[$1];
});
});
//test
console.log(pieces); //["div:(li,div)","div"]
This function will split whatever you specify in splitChar, but ignore that value if inside parenthesis:
function customSplit(stringToSplit, splitChar){
var arr = new Array();
var isParenOpen = 0
var curChar;
var curString = "";
for (var i = 0; i < stringToSplit.length; i++) {
curChar = stringToSplit.substr(i, 1);
switch(curChar) {
case "(":
isParenOpen++;
break;
case ")":
if(isParenOpen > 0) isParenOpen--;
break;
case splitChar:
if (isParenOpen < 1) {
arr.push(curString);
curString = "";
continue;
}
}
curString += curChar;
}
if (curString.length > 0) {
arr.push(curString);
}
return arr;
}

need a regular expression to search a matching last name

I have a javascript array which holds strings of last names.
I need to loop this array and separate out the last names which match a given string.
var names = ['woods','smith','smike'];
var test = 'smi';
var c = 0;
var result = new Array();
for(var i = 0; i < names.length; i++)
{
if(names[i].match(test))// need regular expression for this
result[c++] = names[i];
}
return result;
name should match the test string even if the test lies within the name. so... mik should match 'Mike' and 'Smike' also.
Any help is really appreciated!
You can create a regex from a string:
var nameRe = new RegExp("mik", "i");
if(names[i].match(nameRe))
{
result.push(names[i]);
}
Make sure to escape regex meta-characters though - if your string may contain them. For example ^, $ may result in a miss-match, and *, ? ) and more may result in an invalid regex.
More info: regular-expressions.info/javascript
You can do this without regex:
if (names[i].toLowerCase().indexOf(test.toLowerCase()) >= 0)
// ...
Javascript string .search is what you're looking for.. You don't even need regex although search supports that too.
var names = ['woods','smith','smike'];
var test = 'smi';
var c = 0;
var result = new Array();
for(var i = 0; i < names.length; i++)
{
if(names[i].toLowerCase().search(test))// need regular expression for this
result.push(names[i]);
}
return result;
You can do this with one regex.
var r = new RegExp(names.join('|'), "igm");
'woods smith'.match(r);
You don't need regex for this, so I'd recommend using string manipulation instead. It's almost (almost!) always better to use string functions instead of regex when you can: They're usually faster, and it's harder to make a mistake.
for(var i = 0; i < names.length; i++)
{
if(names[i].indexOf(test) > -1)
//match, do something with names[i]...
}

Javascript regex - split string

Struggling with a regex requirement. I need to split a string into an array wherever it finds a forward slash. But not if the forward slash is preceded by an escape.
Eg, if I have this string:
hello/world
I would like it to be split into an array like so:
arrayName[0] = hello
arrayName[1] = world
And if I have this string:
hello/wo\/rld
I would like it to be split into an array like so:
arrayName[0] = hello
arrayName[1] = wo/rld
Any ideas?
I wouldn't use split() for this job. It's much easier to match the path components themselves, rather than the delimiters. For example:
var subject = 'hello/wo\\/rld';
var regex = /(?:[^\/\\]+|\\.)+/g;
var matched = null;
while (matched = regex.exec(subject)) {
print(matched[0]);
}
output:
hello
wo\/rld
test it at ideone.com
The following is a little long-winded but will work, and avoids the problem with IE's broken split implementation by not using a regular expression.
function splitPath(str) {
var rawParts = str.split("/"), parts = [];
for (var i = 0, len = rawParts.length, part; i < len; ++i) {
part = "";
while (rawParts[i].slice(-1) == "\\") {
part += rawParts[i++].slice(0, -1) + "/";
}
parts.push(part + rawParts[i]);
}
return parts;
}
var str = "hello/world\\/foo/bar";
alert( splitPath(str).join(",") );
Here's a way adapted from the techniques in this blog post:
var str = "Testing/one\\/two\\/three";
var result = str.replace(/(\\)?\//g, function($0, $1){
return $1 ? '/' : '[****]';
}).split('[****]');
Live example
Given:
Testing/one\/two\/three
The result is:
[0]: Testing
[1]: one/two/three
That first uses the simple "fake" lookbehind to replace / with [****] and to replace \/ with /, then splits on the [****] value. (Obviously, replace [****] with anything that won't be in the string.)
/*
If you are getting your string from an ajax response or a data base query,
that is, the string has not been interpreted by javascript,
you can match character sequences that either have no slash or have escaped slashes.
If you are defining the string in a script, escape the escapes and strip them after the match.
*/
var s='hello/wor\\/ld';
s=s.match(/(([^\/]*(\\\/)+)([^\/]*)+|([^\/]+))/g) || [s];
alert(s.join('\n'))
s.join('\n').replace(/\\/g,'')
/* returned value: (String)
hello
wor/ld
*/
Here's an example at rubular.com
For short code, you can use reverse to simulate negative lookbehind
function reverse(s){
return s.split('').reverse().join('');
}
var parts = reverse(myString).split(/[/](?!\\(?:\\\\)*(?:[^\\]|$))/g).reverse();
for (var i = parts.length; --i >= 0;) { parts[i] = reverse(parts[i]); }
but to be efficient, it's probably better to split on /[/]/ and then walk the array and rejoin elements that have an escape at the end.
Something like this may take care of it for you.
var str = "/hello/wo\\/rld/";
var split = str.replace(/^\/|\\?\/|\/$/g, function(match) {
if (match.indexOf('\\') == -1) {
return '\x00';
}
return match;
}).split('\x00');
alert(split);

Categories