Complex regex to split up a string - javascript

I need some help with a regex conundrum pls. I'm still getting to grips with it all - clearly not an expert!
Eg. Say I have a complex string like so:
{something:here}{examp.le:!/?foo|bar}BLAH|{something/else:here}:{and:here\\}(.)}
First of all I want to split the string into an array by using the pipe, so it is effectively like:
{something:here}{examp.le:!/?foo|bar}BLAH
and
{something/else:here}:{and:here\\}(.)}
But notice that there is a pipe within the curly brackets to ignore... so need to work out the regex expression for this. I was using indexOf originally, but because I now have to take into account pipes being within the curly brackets, it complicates things.
And it isn't over yet! I also then need to split each string into separate parts by what is within the curly brackets and not. So I end up with 2 arrays containing:
Array1
{something:here}
{examp.le:!/?foo|bar}
BLAH
Array2
{something/else:here}
:
{and:here\\}(.)}
I added a double slash before the first closing curly bracket as a way of saying to ignore this one. But cannot figure out the regex to do this.
Can anyone help?

Find all occurrences of "string in braces" or "just string", then iterate through found substrings and split when a pipe is encountered.
str = "{something:here}{examp.le:!/?foo|bar}BLAH|{something/else:here}:{and:here\\}(.)}"
var m = str.match(/{.+?}|[^{}]+/g)
var r = [[]];
var n = 0;
for(var i = 0; i < m.length; i++) {
var s = m[i];
if(s.charAt(0) == "{" || s.indexOf("|") < 0)
r[n].push(s);
else {
s = s.split("|");
if(s[0].length) r[n].push(s[0]);
r[++n] = [];
if(s[1].length) r[n].push(s[1]);
}
}
this expr will be probably better to handle escaped braces
var m = str.match(/{?(\\.|[^{}])+}?/g

Related

Retrieving several capturing groups recursively with RegExp

I have a string with this format:
#someID#tn#company#somethingNew#classing#somethingElse#With
There might be unlimited #-separated words, but definitely the whole string begins with #
I have written the following regexp, though it matches it, but I cannot get each #-separated word, and what I get is the last recursion and the first (as well as the whole string). How can I get an array of every word in an element separately?
(?:^\#\w*)(?:(\#\w*)+) //I know I have ruled out second capturing group with ?: , though doesn't make much difference.
And here is my Javascript code:
var reg = /(?:^\#\w*)(?:(\#\w*)+)/g;
var x = null;
while(x = reg.exec("#someID#tn#company#somethingNew#classing#somethingElse#With"))
{
console.log(x);
}
And here is the result (Firebug, console):
["#someID#tn#company#somet...sing#somethingElse#With", "#With"]
0
"#someID#tn#company#somet...sing#somethingElse#With"
1
"#With"
index
0
input
"#someID#tn#company#somet...sing#somethingElse#With"
EDIT :
I want an output like this with regular expression if possible:
["#someID", "#tn", #company", "#somethingNew", "#classing", "#somethingElse", "#With"]
NOTE that I want a RegExp solution. I know about String.split() and String operations.
You can use:
var s = '#someID#tn#company#somethingNew#classing#somethingElse#With'
if (s.substr(0, 1) == "#")
tok = s.substr(1).split('#');
//=> ["someID", "tn", "company", "somethingNew", "classing", "somethingElse", "With"]
You could try this regex also,
((?:#|#)\w+)
DEMO
Explanation:
() Capturing groups. Anything inside this capturing group would be captured.
(?:) It just matches the strings but won't capture anything.
#|# Literal # or # symbol.
\w+ Followed by one or more word characters.
OR
> "#someID#tn#company#somethingNew#classing#somethingElse#With".split(/\b(?=#|#)/g);
[ '#someID',
'#tn',
'#company',
'#somethingNew',
'#classing',
'#somethingElse',
'#With' ]
It will be easier without regExp:
var str = "#someID#tn#company#somethingNew#classing#somethingElse#With";
var strSplit = str.split("#");
for(var i = 1; i < strSplit.length; i++) {
strSplit[i] = "#" + strSplit[i];
}
console.log(strSplit);
// ["#someID", "#tn", "#company", "#somethingNew", "#classing", "#somethingElse", "#With"]

Split a string ( {a},{b} ) with RegExp in Javascript

I have a string object that is returned by an API. It looks like this:
{Apple},{"A tree"},{Three2},{123},{A bracket {},{Two brackets {}},{}
I only need to split at commas that have } and { on both sides, which I want to keep them as part of the returned result. Doing split("},{") results in first and last entries having leading and trailing brackets, and when there is only one element returned, I have to make additional checks to ensure I don't add any extra brackets to first and last (which is same as first) elements.
I hope there is an elegant RegExp to split at ,, surrounded by }{.
You need to use a positive lookahead to match only a comma which is followed by curly braces. I've tested this and it works:
var apiResponse = "{Apple},{\"A tree\"},{Three2},{123},{A bracket {},{Two brackets {}},{}";
var split = apiResponse.split(/,(?={)/);
console.log("Split length is "+split.length);
for(i = 0; i < split.length; ++i) {
console.log("split["+i+"] is: "+split[i]);
}
The (?=\{) means "must be immediately followed by an opening curly brace".
To read about lookaheads, see this regex tutorial.
var _data = '{Apple},{"A tree"},{Three2},{123},{A bracket {},{Two brackets {}},{}';
var _items = [];
var re = /(^|,){(.*?)}(?=,{|$)/g;
var m;
while ((m = re.exec(_data)) !== null){
_items.push(m[2]);
}
You can test it out using jsFiddle http://jsfiddle.net/wao20/SgFx7/24/
Regex breakdown:
(^|,) Start of the string or by a comma
{ A literal bracket "{"
(.*?) Non-greedy match between two brackets (for more
info http://javascript.info/tutorial/greedy-and-lazy)
} A literal bracket "}"
(?=,{|$) Look ahead and non-comsuming (match a comma ",{" or end of
string) without the look ahead it will eat up the comma and you end up with only every other items.
Update: Changed regex to address Robin's comments.
/(^|,)\{(.*?)\}(?=,|$)/g to /(^|,){(.*?)}(?=,{|$)/g
This should work for the string as provided - it doesn't account for whitespace between braces and commas, nor does it retain the brace-comma-brace pattern within quotes.
var str = '{Apple},{"A tree"},{Three2},{123},{A bracket {},{Two brackets {}},{}';
var parts = [];
var nextIndex = function(str) {
return (str.search(/},{/) > -1) ? str.search(/},{/) + 1 : null;
};
while (nextIndex(str)) {
parts.push(str.slice(0, nextIndex(str)));
str = str.slice(nextIndex(str) + 1);
}
parts.push(str); // Final piece
console.log(parts);

How to use string.split( ) for the following string in javascript

I have the following string: (17.591257793993833, 78.88544082641602) in Javascript
How do I use split() for the above string so that I can get the numbers separately.
This is what I have tried (I know its wrong)
var location= "(17.591257793993833, 78.88544082641602)";
var sep= location.split("("" "," "")");
document.getElementById("TextBox1").value= sep[1];
document.getElementById("Textbox2").value=sep[2];
Suggestions please
Use regular expression, something as simple as following would work:
// returns and array with two elements: [17.591257793993833, 78.88544082641602]
"(17.591257793993833, 78.88544082641602)".match(/(\d+\.\d+)/g)
You could user Regular Expression. That would help you a lot. Together with the match function.
A possible Regexp for you might be:
/\d+.\d+/g
For more information you can start with wiki: http://en.wikipedia.org/wiki/Regular_expression
Use the regex [0-9]+\.[0-9]+. You can try the regex here.
In javascript you could do
var str = "(17.591257793993833, 78.88544082641602)";
str.match(/(\d+\.\d+)/g);
Check it.
If you want the values as numbers, i.e. typeof x == "number", you would have to use a regular expression to get the numbers out and then convert those Strings into Numbers, i.e.
var numsStrings = location.match(/(\d+.\d+)/g),
numbers = [],
i, len = numsStrings.length;
for (i = 0; i < len; i++) {
numbers.push(+numsStrings[i]);
}

Javascript split function not correct worked with specific regex

I have a problem. I have a string - "\,str\,i,ing" and i need to split by comma before which not have slash. For my string - ["\,str\,i", "ing"]. I'm use next regex
myString.split("[^\],", 2)
but it's doesn't worked.
Well, this is ridiculous to avoid the lack of lookbehind but seems to get the correct result.
"\\,str\\,i,ing".split('').reverse().join('').split(/,(?=[^\\])/).map(function(a){
return a.split('').reverse().join('');
}).reverse();
//=> ["\,str\,i", "ing"]
Not sure about your expected output but you are specifying string not a regex, use:
var arr = "\,str\,i,ing".split(/[^\\],/, 2);
console.log(arr);
To split using regex, wrap your regex in /..../
This is not easily possible with js, because it does not support lookbehind. Even if you'd use a real regex, it would eat the last character:
> "xyz\\,xyz,xyz".split(/[^\\],/, 2)
["xyz\\,xy", "xyz"]
If you don't want the z to be eaten, I'd suggest:
var str = "....";
return str.split(",").reduce(function(res, part) {
var l = res.length;
if (l && res[l-1].substr(-1) == "\\" || l<2)
// ^ ^^ ^
// not the first was escaped limit
res[l-1] += ","+part;
else
res.push(part);
return;
}, []);
Reading between the lines, it looks like you want to split a string by , characters that are not preceded by \ characters.
It would be really great if JavaScript had a regular expression lookbehind (and negative lookbehind) pattern, but unfortunately it does not. What it does have is a lookahead ((?=) )and negative lookahead ((?!)) pattern. Make sure to review the documentation.
You can use these as a lookbehind if you reverse the string:
var str,
reverseStr,
arr,
reverseArr;
//don't forget to escape your backslashes
str = '\\,str\\,i,ing';
//reverse your string
reverseStr = str.split('').reverse().join('');
//split the array on `,`s that aren't followed by `\`
reverseArr = reverseStr.split(/,(?!\\)/);
//reverse the reversed array, and reverse each string in the array
arr = reverseArr.reverse().map(function (val) {
return val.split('').reverse().join('');
});
You picked a tough character to match- a forward slash preceding a comma is apt to disappear while you pass it around in a string, since '\,'==','...
var s= 'My dog, the one with two \\, blue \\,eyes, is asleep.';
var a= [], M, rx=/(\\?),/g;
while((M= rx.exec(s))!= null){
if(M[1]) continue;
a.push(s.substring(0, rx.lastIndex-1));
s= s.substring(rx.lastIndex);
rx.lastIndex= 0;
};
a.push(s);
/* returned value: (Array)
My dog
the one with two \, blue \,eyes
is asleep.
*/
Find something which will not be present in your original string, say "###". Replace "\\," with it. Split the resulting string by ",". Replace "###" back with "\\,".
Something like this:
<script type="text/javascript">
var s1 = "\\,str\\,i,ing";
var s2 = s1.replace(/\\,/g,"###");
console.log(s2);
var s3 = s2.split(",");
for (var i=0;i<s3.length;i++)
{
s3[i] = s3[i].replace(/###/g,"\\,");
}
console.log(s3);
</script>
See JSFiddle

Splitting a String by an Array of Words in Javascript

I'm taking some text and want to split it into an array. My goal is to be able to split it into phrases delimited by stopwords (words ignored by search engines, like 'a' 'the' etc), so that I can then search each individual phrase in my API. So for example: 'The cow's hat was really funny' would result in arr[0] = cow's hat and arr[1] = funny. I have an array of stopwords already but I can't really think of how to actually split by each/any of the words in it, without writing a very slow function to loop through each one.
Use split(). It takes a regular expression. The following is a simple example:
search_string.split(/\b(?:a|the|was|\s)+\b/i);
If you already have the array of stop words, you could use join() to build the regular expression. Try the following:
regex = new RegExp("\\b(?:" + stop_words.join('|') + "|\\s)+\\b", "i");
A working example http://jsfiddle.net/NEnR8/. NOTE: it may be best to replace these values than to split on them as there are empty array elements from this result.
This does a case insensitive .split() on your keywords, surrounded by word boundries.
var str = "The cow's hat was really funny";
var arr = str.split(/\ba\b|\bthe\b|\bwas\b/i);
You may end up with some empty items in the Array. To compact it, you could do this:
var len = arr.length;
while( len-- ) {
if( !arr[len] )
arr.splice( len, 1);
}
Quick and dirty way would be to replace the "stop word" strings with some unique characters (e.g. &&&), and then split based on that unique character.
For example.
var the_text = "..............",
stop_words = ['foo', 'bar', 'etc'],
unique_str = '&&&';
for (var i = 0; i < stop_words.length; i += 1) {
the_text.replace(stop_words[i], unique_str);
}
the_text.split(unique_str);

Categories