Using a split with a regex not replacing correctly?

Using a split with a regex not replacing correctly? - javascript

I'm trying to parse rows of text in order to retrive the 4 version numbers:
v.1.7.600.0 - latest | 9.2.6200.0 to 9.2.9999
I'm looking to be able to parse a line like this into:
['v.1.7.600.0', 'latest', '9.2.6200.0', '9.2.9999']
At the moment, I have something like this:
var line = "v.1.7.600.0 - latest | 9.2.6200.0 to 9.2.9999"
var result = line.split(/ (\||-|to) /g)
console.log(result)
I'm not that great at regex but it matches so i'm not sure why it includes them in the result.

You are almost there, just use a non-capturing group:
var line = "v.1.7.600.0 - latest | 9.2.6200.0 to 9.2.9999";
var result = line.split(/\s+(?:\||-|to)\s+/);
console.log(result);
You need a non-capturing group because split() will extract captured values into the resulting array.
Also, it might be more convenient to match one or more whitespaces with \s+ rather than using a literal space.
Besides, the /g modifier is redundant with split(), it the default behavior.
You also may define a character class for single char delimiters, and write a bit more compact /\s+(?:[|-]|to)\s+/ regex.

Related

How can I include the delimiter with regex String.split()?

I need to parse the tokens from a GS1 UDI format string:
"(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"
I would like to split that string with a regex on the "(nnn)" and have the delimiter included with the split values, like this:
[ "(20)987111", "(240)A", "(10)ABC123", "(17)2022-04-01", "(21)888888888888888" ]
Below is a JSFiddle with examples, but in case you want to see it right here:
// This includes the delimiter match in the results, but I want the delimiter included WITH the value
// after it, e.g.: ["(20)987111", ...]
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\))/).filter(Boolean))
// Result: ["(20)", "987111", "(240)", "A", "(10)", "ABC123", "(17)", "2022-04-01", "(21)", "888888888888888"]
// If I include a pattern that should (I think) match the content following the delimiter I will
// only get a single result that is the full string:
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\)\W+)/).filter(Boolean))
// Result: ["(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"]
// I think this is because I'm effectively mathching the entire string, hence a single result.
// So now I'll try to match only up to the start of the next "(":
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\)(^\())/).filter(Boolean))
// Result: ["(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"]
I've found and read this question, however the examples there are matching literals and I'm using character classes and getting different results.
I'm failing to create a regex pattern that will provide what I'm after. Here's a JSFiddle of some of the things I've tried: https://jsfiddle.net/6bogpqLy/
I can't guarantee the order of the "application identifiers" in the input string and as such, match with named captures isn't an attractive option.

You can split on positions where parenthesised element follows, by using a zero-length lookahead assertion:
const text = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"
const parts = text.split(/(?=\(\d+\))/)
console.log(parts)

Instead of split use match to create the array. Then find 1) digits in parenthesis, followed by a group that might contain a digit, a letter, or a hyphen, and then 2) group that whole query.
(PS. I often find a site like Regex101 really helps when it comes to testing out expressions outside of a development environment.)
const re = /(\(\d+\)[\d\-A-Z]+)/g;
const str = '(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888';
console.log(str.match(re));

Regexp group not excluding dots

Let's say I have the following string: div.classOneA.classOneB#idOne
Trying to write a regexp which extracts the classes (classOneA, classOneB) from it. I was able to do this but with Lookbehind assertion only.
It looks like this:
'div.classOneA.classOneB#idOne'.match(/(?<=\.)([^.#]+)/g)
> (2) ["classOneA", "classOneB"]
Now I would like to archive this without the lookbehind approach and do not really understand why my solution's not working.
'div.classOneA.classOneB#idOne'.match(/\.([^.#]+)/g)
> (2) [".classOneA", ".classOneB"]
Thought that the grouping will solve my problem but all matching item contains the dot as well.

There isn't a good way in Javascript to both match multiple times (/g option) and pick up capture groups (in the parens). Try this:
var input = "div.classOneA.classOneB#idOne";
var regex = /\.([^.#]+)/g;
var matches, output = [];
while (matches = regex.exec(input)) {
output.push(matches[1]);
}

This is because with g modifier you get all matching substrings but not its matching groups (that is as if (...) pairs worked just like (?:...) ones.
You see. Whithout g modifier:
> 'div.classOneA.classOneB#idOne'.match(/\.([^.#]+)/)
[ '.classOneA',
'classOneA',
index: 3,
input: 'div.classOneA.classOneB#idOne',
groups: undefined ]
With g modifier:
> 'div.classOneA.classOneB#idOne'.match(/\.([^.#]+)/g)
[ '.classOneA', '.classOneB' ]
In other words: you obtain all matches but only the whole match (0 item) per each.
There are many solutions:
Use LookBehind assertions as you pointed out yourself.
Fix each result later adding .map(x=>x.replace(/^\./, ""))
Or, if your input structure won't be much more complicated than the example you provide, simply use a cheaper approach:
> 'div.classOneA.classOneB#idOne'.replace(/#.*/, "").split(".").slice(1)
[ 'classOneA', 'classOneB' ]
Use .replace() + callback instead of .match() in order to be able to access capture groups of every match:
const str = 'div.classOneA.classOneB#idOne';
const matches = [];
str.replace(/\.([^.#]+)/g, (...args)=>matches.push(args[1]))
console.log(matches); // [ 'classOneA', 'classOneB' ]
I would recommend the third one (if there aren't other possible inputs that could eventually break it) because it is much more efficient (actual regular expressions are used only once to trim the '#idOne' part).

If you want to expand you regex. you can simply map on results and replace . with empty string
let op = 'div.classOneA.classOneB#idOne'.match(/\.([^.#]+)/g)
.map(e=> e.replace(/\./g,''))
console.log(op)

If you know you are searching for a text containing class, then you can use something like
'div.classOneA.classOneB#idOne'.match(/class[^.#]+/g)
If the only thing you know is that the text is preceded by a dot, then you must use lookbehind.

This regex will work without lookbehind assertion:
'div.classOneA.classOneB#idOne'.match(/\.[^\.#]+/g).map(item => item.substring(1));
Lookbehind assertion is not available in JavaScript recently.

I'm not an expert on using regex - particularly in Javascript - but after some research on MDN I've figured out why your attempt wasn't working, and how to fix.
The problem is that using .match with a regexp with the /g flag will ignore capturing groups. So instead you have to use the .exec method on the regexp object, using a loop to execute it multiple times to get all the results.
So the following code is what works, and can be adapted for similar cases. (Note the grp[1] - this is because the first element of the array returned by .exec is the entire match, the groups are the subsequent elements.)
var regExp = /\.([^.#]+)/g
var result = [];
var grp;
while ((grp = regExp.exec('div.classOneA.classOneB#idOne')) !== null) {
result.push(grp[1]);
}
console.log(result)

javascript regular expression ignore condition between the text

I have a text for example as below:
"head1>data1,data2,data3|head2>data1,data2,data3|head3>data3,data4,data5**
now I want to replace ">data1..|" with "|"
I am using this: ".replace(/>\S+\||>\S+$/g,"|");"
But this is not helping as it gives me data as below:
"head1|head3|" instead of "head1|head2|head3|"
I am unable to find the right method.

You can use
>\S+?(?:\||$)
See the regex demo
The point is to make \S+ lazy, and to shorten the pattern we can use place the >\S+? before the alternation group.
Pattern details:
>\S+? - a literal > followed with 1+ non-whitespace symbols but as few as possible up to
(?:\||$) - a literal | or the end of string.

A simple approach :), was trying like this
var str = "head1>data1,data2,data3|head2>data1,data2,data3|head3>data3,data4,data5";
console.log(str.replace(/>[a-z1-9,]+/g,"|").replace(/\|+/g, "|"));
>[a-z1-9,]+ will select >data1,data2,data3
and then replaced multiple | with single |
:)

You can use:
>[a-z0-9,]+\|
and then replace this with single | every time.

Javascript skip double pipes in a string

I have the following string:
var test = "test|2014-07-22 12:13:47||ASD|\|nameOfSomething123\||anothersmt";
var s = test.split('|');
console.log(s);
//outputs
[ 'test',
'2014-07-22 12:13:47',
'',
'ASD',
'',
'nameOfSomething123',
'',
'anothersmt' ]
Because the |nameOfSomething123| also has pipes, the split('|'), the result is not good, I need to get rid of the 5 and 6th position. No good.
I would like to split it, but skipping \|nameOfSomething123\|
Does anyone know how to solve it ?
Thank you.

First, I'm going to assume that your test string actually contains \| sequences. If you were to write the string literal as you've shown, \| would be interpreted as an escape sequence for |. For this script to work as you've shown, you'd need to write test like this:
var test = "test|2014-07-22 12:13:47||ASD|\\|nameOfSomething123\\||anothersmt";
You can accomplish this pretty easily using match instead of split:
test.match(/(\\\||[^|])+/g);
// outputs
[ "test",
"2014-07-22 12:13:47",
"ASD",
"\|nameOfSomething123\|",
"anothersmt" ]
This pattern matches one or more sequences of either \| or any character other than |. Note that the the \ and the | need to be escaped to refer to literal \ and | characters. Given your sample input, this should accomplish the goal. (Of course if the \ can be escaped, too, that's complicates it a bit)
If you need to capture empty strings between two pipes like ||, then you can use split around the matched values and filter out the separators. For example:
test.split(/((?:\\\||[^|])*)/g).filter(function(x, i) { return i % 2 });
// outputs
[ "test",
"2014-07-22 12:13:47",
"",
"ASD",
"\|nameOfSomething123\|",
"anothersmt" ]
This works because split will return any captured substrings as a separate entry in the result array. Then filter just picks every other element from the result. Note that filter requires ECMAScript 5.1 or later, so it may not work in older browsers. If this is a problem, see the polyfill option described in the linked documentation.

I don't see why this is a hard problem. If your separator is always |, then the only case when you get an empty string from .split is going to be when you have a double | (or triple or quadruple). As long as the double pipes have no semantic purpose for you, all you need to do is get rid of the empty strings:
function check_for_empty_string(element){
if (element.length != 0) return element;
}
s = s.filter(check_for_empty_string);
Now s should only contain non-empty strings and you're done. Array.filter is a javascript built-in that takes a callback that checks an element. Whatever you return from the callback passes through the filter and into the new array. Here I've used the old array as the target, for brevity, but .filter returns a new array so you can keep the old one if you want.

RegEx - Get All Characters After Last Slash in URL

I'm working with a Google API that returns IDs in the below format, which I've saved as a string. How can I write a Regular Expression in javascript to trim the string to only the characters after the last slash in the URL.
var id = 'http://www.google.com/m8/feeds/contacts/myemail%40gmail.com/base/nabb80191e23b7d9'

Don't write a regex! This is trivial to do with string functions instead:
var final = id.substr(id.lastIndexOf('/') + 1);
It's even easier if you know that the final part will always be 16 characters:
var final = id.substr(-16);

A slightly different regex approach:
var afterSlashChars = id.match(/\/([^\/]+)\/?$/)[1];
Breaking down this regex:
\/ match a slash
( start of a captured group within the match
[^\/] match a non-slash character
+ match one of more of the non-slash characters
) end of the captured group
\/? allow one optional / at the end of the string
$ match to the end of the string
The [1] then retrieves the first captured group within the match
Working snippet:
var id = 'http://www.google.com/m8/feeds/contacts/myemail%40gmail.com/base/nabb80191e23b7d9';
var afterSlashChars = id.match(/\/([^\/]+)\/?$/)[1];
// display result
document.write(afterSlashChars);

Just in case someone else comes across this thread and is looking for a simple JS solution:
id.split('/').pop(-1)

this is easy to understand (?!.*/).+
let me explain:
first, lets match everything that has a slash at the end, ok?
that's the part we don't want
.*/ matches everything until the last slash
then, we make a "Negative lookahead" (?!) to say "I don't want this, discard it"
(?!.*) this is "Negative lookahead"
Now we can happily take whatever is next to what we don't want with this
.+
YOU MAY NEED TO ESCAPE THE / SO IT BECOMES:
(?!.*\/).+

this regexp: [^\/]+$ - works like a champ:
var id = ".../base/nabb80191e23b7d9"
result = id.match(/[^\/]+$/)[0];
// results -> "nabb80191e23b7d9"

This should work:
last = id.match(/\/([^/]*)$/)[1];
//=> nabb80191e23b7d9

Don't know JS, using others examples (and a guess) -
id = id.match(/[^\/]*$/); // [0] optional ?

Why not use replace?
"http://google.com/aaa".replace(/(.*\/)*/,"")
yields "aaa"

We Keep Coding

JavaScript is the programming language of the Web.

Using a split with a regex not replacing correctly? - javascript

Related

How can I include the delimiter with regex String.split()?

Regexp group not excluding dots

javascript regular expression ignore condition between the text

Javascript skip double pipes in a string

RegEx - Get All Characters After Last Slash in URL

Categories

Resources