Javascript skip double pipes in a string - javascript

I have the following string:
var test = "test|2014-07-22 12:13:47||ASD|\|nameOfSomething123\||anothersmt";
var s = test.split('|');
console.log(s);
//outputs
[ 'test',
'2014-07-22 12:13:47',
'',
'ASD',
'',
'nameOfSomething123',
'',
'anothersmt' ]
Because the |nameOfSomething123| also has pipes, the split('|'), the result is not good, I need to get rid of the 5 and 6th position. No good.
I would like to split it, but skipping \|nameOfSomething123\|
Does anyone know how to solve it ?
Thank you.

First, I'm going to assume that your test string actually contains \| sequences. If you were to write the string literal as you've shown, \| would be interpreted as an escape sequence for |. For this script to work as you've shown, you'd need to write test like this:
var test = "test|2014-07-22 12:13:47||ASD|\\|nameOfSomething123\\||anothersmt";
You can accomplish this pretty easily using match instead of split:
test.match(/(\\\||[^|])+/g);
// outputs
[ "test",
"2014-07-22 12:13:47",
"ASD",
"\|nameOfSomething123\|",
"anothersmt" ]
This pattern matches one or more sequences of either \| or any character other than |. Note that the the \ and the | need to be escaped to refer to literal \ and | characters. Given your sample input, this should accomplish the goal. (Of course if the \ can be escaped, too, that's complicates it a bit)
If you need to capture empty strings between two pipes like ||, then you can use split around the matched values and filter out the separators. For example:
test.split(/((?:\\\||[^|])*)/g).filter(function(x, i) { return i % 2 });
// outputs
[ "test",
"2014-07-22 12:13:47",
"",
"ASD",
"\|nameOfSomething123\|",
"anothersmt" ]
This works because split will return any captured substrings as a separate entry in the result array. Then filter just picks every other element from the result. Note that filter requires ECMAScript 5.1 or later, so it may not work in older browsers. If this is a problem, see the polyfill option described in the linked documentation.

I don't see why this is a hard problem. If your separator is always |, then the only case when you get an empty string from .split is going to be when you have a double | (or triple or quadruple). As long as the double pipes have no semantic purpose for you, all you need to do is get rid of the empty strings:
function check_for_empty_string(element){
if (element.length != 0) return element;
}
s = s.filter(check_for_empty_string);
Now s should only contain non-empty strings and you're done. Array.filter is a javascript built-in that takes a callback that checks an element. Whatever you return from the callback passes through the filter and into the new array. Here I've used the old array as the target, for brevity, but .filter returns a new array so you can keep the old one if you want.

Related

Using a split with a regex not replacing correctly?

I'm trying to parse rows of text in order to retrive the 4 version numbers:
v.1.7.600.0 - latest | 9.2.6200.0 to 9.2.9999
I'm looking to be able to parse a line like this into:
['v.1.7.600.0', 'latest', '9.2.6200.0', '9.2.9999']
At the moment, I have something like this:
var line = "v.1.7.600.0 - latest | 9.2.6200.0 to 9.2.9999"
var result = line.split(/ (\||-|to) /g)
console.log(result)
I'm not that great at regex but it matches so i'm not sure why it includes them in the result.
You are almost there, just use a non-capturing group:
var line = "v.1.7.600.0 - latest | 9.2.6200.0 to 9.2.9999";
var result = line.split(/\s+(?:\||-|to)\s+/);
console.log(result);
You need a non-capturing group because split() will extract captured values into the resulting array.
Also, it might be more convenient to match one or more whitespaces with \s+ rather than using a literal space.
Besides, the /g modifier is redundant with split(), it the default behavior.
You also may define a character class for single char delimiters, and write a bit more compact /\s+(?:[|-]|to)\s+/ regex.

How to write regex for this javascript string

How to write this string below
"(22.0796251, 82.13914120000004),36", "(22.744108, 77.73696700000005),48",...and so on
Like this:
(22.0796251, 82.13914120000004) 36
(22.744108, 77.73696700000005) 48
...and so on.................. ..
How to do this using regex in javscript ?
My try is this:
substring = test.split(',');
where test contains the data to be formatted. But its wrong
You should use the ability of split to split on regular expressions and then keep them in the results. To do this, simply put a capturing group in the regexp. In your case, you will "split" on things in double quote marks:
pieces = test.split(/(".*?")/)
^^^^^^^ CAPTURE GROUP
// ["", ""(22.0796251, 82.13914120000004),36"", ", ", ""(22.744108, 77.73696700000005),48"", ""]
The question mark is to make sure it doesn't eat up all the characters up through the last quote in the input. It makes the * quantifier "non-greedy".
Now get rid of the junk (empty strings and ", "):
pieces = pieces . filter (function(seg) { return !/^[, ]*$/.test(seg); })
// ["(22.0796251, 82.13914120000004),36", "(22.744108, 77.73696700000005),48"]
Next you can break down each piece with another regexp, as in
arrays = pieces . map(function(piece) { return piece.match(/(.*), (.*)/).slice(1); });
// [["(22.0796251, 82.13914120000004)", "36"], ["(22.744108, 87.73696700000005)", "48"]]
The slice is to get rid of the first element of the array returned by match, which is the entire match and we don't need that.
Now print out arrays, split its elements further, or do whatever else you want with it.

Javascript regex, determining what group was matched on

I have the following regex in javascript for matching similar to book[n], book[1,2,3,4,5,...,n], book[author="Kristian"] and book[id=n] (n is an arbitrary number):
var opRegex = /\[[0-9]+\]|\[[0-9]+,.*\]|\[[a-zA-Z]+="*.+"*\]/gi;
I can use this in the following way:
// If there is no match in any of the groups hasOp will be null
hasOp = opRegex.exec('books[0]');
/*
Result: ["[0]", index: 5, input: "books[0]"]
*/
As shown above I not only get the value but also the [ and ]. I can avoid this by using groups. So I changed the regex to:
var opRegex = /\[([0-9]+)\]|\[([0-9]+,.*)\]|\[([a-zA-Z]+=".+")\]/gi;
Running the same as above the results will instead be:
["[0]", "0", undefined, undefined, index: 5, input: "books[0]"]
Above I get the groups as index 1, 2 and 3 in the array. For this example the match is in the first but if the match is in the second regex group the match will be in index 2 or the array.
Can I change my first regex to get the value without the brackets or do I go with the grouped approach and a while loop to get the first defined value?
Anything else I'm missing? Is it greedy?
Let me know if you need more information and I'll be happy to provide it.
I have a few suggestions. First, especially since you are looking for literal brackets, avoid the regex brackets when you can (replace [0-9] with \d, for example). Also, you were allowing multiple quotes with the *, so I changed it to "?. But most importantly, I moved the match for the brackets outside the alternation, since they should be in every alternate match. That way, you have the same group no matter which part matches.
/\[(\d+(,\d+)*|[a-zA-Z]+="?[^\]]+"?)\]/gi

Javascript - regular expression to split string on unescaped character, e.g. | but ignore \|

I read a string from file that I split on | character. For example the string is
1|test pattern|prefix|url|postfix
So split must always give me 5 substrings, which in the above case are
["1", "test pattern", "prefix", "url", "postfix"]
The problem comes in when any of these five substrings contains | character. I would store it as escaped \|
1|test pattern|prefix|url \| title |postfix
Now, you can see that string.split('|') won't give me the desired result. The desired result is
["1", "test pattern", "prefix", "url \| title ", "postfix"]
I have tried some regular expressions but none of these gives desired result.
string.split(/[^\\]\|/) //["", "", "prefi", "$url \| $titl", " postfix"]
It looks like this is only possible with negative lookbacks but I could not get one to work
Another solution:
"1|test pattern|prefix|url \\| title |postfix"
.replace(/([^\\])\|/g, "$1$1|")
.split(/[^\\]\|/);
That said, you'll need to escape your backslash in the initial string with another backslash to make it work:
"1|test pattern|prefix|url \\| title |postfix"
^
Working demo available here.
Unfortunately Javascript does not support lookbehinds. I see no easy solution but the following might be suitable as workaround:
// use two backslashes in your string!
var string = '1|test pattern|prefix|url \\| title |postfix';
// create an arbitrary unique substitute character
var sub = "-";
string.replace(/\\\|/g,sub).split(/\|/);
/* replace the substituted character again in your array of strings */
Alternatively you could use something like this:
string.split(//\|\b//)
However this might fail in some circumstances when there are whitespaces involved.
Instead of using split() you could match all occurences that you're interested in:
var rx = /([^\\\|]|\\\|?)+/gi, item, items = [];
while (item = rx.exec(str)) {
items.push(item[0]);
}
See it in action in the Fiddle
'foo|bar\\|baz'.match(/(\\\||[^|])+/g)
This finds all sequences of characters that comprise the escaped splitting character or any character that isn't the splitting character.

Split string by HTML entities?

My string contain a lot of HTML entities, like this
"Hello <everybody> there"
And I want to split it by HTML entities into this :
Hello
everybody
there
Can anybody suggest me a way to do this please? May be using Regex?
It looks like you can just split on &[^;]*; regex. That is, the delimiter are strings that starts with &, ends with ;, and in between there can be anything but ;.
If you can have multiple delimiters in a row, and you don't want the empty strings between them, just use (&[^;]*;)+ (or in general (delim)+ pattern).
If you can have delimiters in the beginning or front of the string, and you don't want them the empty strings caused by them, then just trim them away before you split.
Example
Here's a snippet to demonstrate the above ideas (see also on ideone.com):
var s = ""Hello <everybody> there""
print (s.split(/&[^;]*;/));
// ,Hello,,everybody,,there,
print (s.split(/(?:&[^;]*;)+/));
// ,Hello,everybody,there,
print (
s.replace(/^(?:&[^;]*;)+/, "")
.replace(/(?:&[^;]*;)+$/, "")
.split(/(?:&[^;]*;)+/)
);
// Hello,everybody,there
var a = str.split(/\&[#a-z0-9]+\;/); should do it, although you'll end up with empty slots in the array when you have two entities next to each other.
split(/&.*?;(?=[^&]|$)/)
and cut the last and first result:
["", "Hello", "everybody", "there", ""]
>> ""Hello <everybody> there"".split(/(?:&[^;]+;)+/)
['', 'Hello', 'everybody', 'there', '']
The regex is: /(?:&[^;]+;)+/
Matches entities as & followed by 1+ non-; characters, followed by a ;. Then matches at least one of those (or more) as the split delimiter. The (?:expression) non-capturing syntax is used so that the delimiters captured don't get put into the result array (split() puts capture groups into the result array if they appear in the pattern).

Categories