Matching a line protocol string with regex in JavaScript - javascript

I'm trying to match a string like this: test0,id=28084 type=high,18765003 138456387
And I'm using this regex:
const str = `test0,id=28084 type=high,18765003 138456387`;
console.log(str.match(/\s*([a-zA-Z0-9\-_.]+)\s*,\s*([a-zA-Z0-9\-_.]+\s*=\s*[a-zA-Z0-9\-_.]+\s*)*\s*,\s*([0-9.]+)\s+([0-9]+)\s*/))
But I am not getting the id part. Just type=high.
Any help is appreciated.
Edit: I see that I will only get the last capture group. But not stated in the question, I need there to be a dynamic number of fields at that point in the string. I'm wondering if there's some other way to accomplish this.

You can capture in the same group all the key-value pairs and then split them:
const str = `test0,id=28084 type=high,18765003 138456387`;
matches = str.match(/\s*([a-zA-Z0-9\-_.]+)\s*,\s*((?:[a-zA-Z0-9\-_.]+\s*=\s*[a-zA-Z0-9\-_.]+\s*)*)\s*,\s*([0-9.]+)\s*([0-9]+)\s*/)
matches.splice.apply(matches, [2, 1].concat(matches[2].split(/\s+/)));
console.log(matches)
Notice that I changed in your regex the second group from (...)* to ((?:...)*).

Related

How can I include the delimiter with regex String.split()?

I need to parse the tokens from a GS1 UDI format string:
"(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"
I would like to split that string with a regex on the "(nnn)" and have the delimiter included with the split values, like this:
[ "(20)987111", "(240)A", "(10)ABC123", "(17)2022-04-01", "(21)888888888888888" ]
Below is a JSFiddle with examples, but in case you want to see it right here:
// This includes the delimiter match in the results, but I want the delimiter included WITH the value
// after it, e.g.: ["(20)987111", ...]
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\))/).filter(Boolean))
// Result: ["(20)", "987111", "(240)", "A", "(10)", "ABC123", "(17)", "2022-04-01", "(21)", "888888888888888"]
// If I include a pattern that should (I think) match the content following the delimiter I will
// only get a single result that is the full string:
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\)\W+)/).filter(Boolean))
// Result: ["(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"]
// I think this is because I'm effectively mathching the entire string, hence a single result.
// So now I'll try to match only up to the start of the next "(":
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\)(^\())/).filter(Boolean))
// Result: ["(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"]
I've found and read this question, however the examples there are matching literals and I'm using character classes and getting different results.
I'm failing to create a regex pattern that will provide what I'm after. Here's a JSFiddle of some of the things I've tried: https://jsfiddle.net/6bogpqLy/
I can't guarantee the order of the "application identifiers" in the input string and as such, match with named captures isn't an attractive option.
You can split on positions where parenthesised element follows, by using a zero-length lookahead assertion:
const text = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"
const parts = text.split(/(?=\(\d+\))/)
console.log(parts)
Instead of split use match to create the array. Then find 1) digits in parenthesis, followed by a group that might contain a digit, a letter, or a hyphen, and then 2) group that whole query.
(PS. I often find a site like Regex101 really helps when it comes to testing out expressions outside of a development environment.)
const re = /(\(\d+\)[\d\-A-Z]+)/g;
const str = '(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888';
console.log(str.match(re));

Javascript regex to find a string and extract it from whole string

I have a Javascript array of string that contains urls like:
http://www.example.com.tr/?first=DSPN47ZTE1BGMR&second=NECEFT8RYD
http://www.example.com.tr/?first=RTR22414242144&second=YUUSADASFF
http://www.example.com.tr/?first=KOSDFASEWQESAS&second=VERERQWWFA
http://www.example.com.tr/?first=POLUJYUSD41234&second=13F241DASD
http://www.example.com.tr/?first=54SADFD14242RD&second=TYY42412DD
I want to extract "first" query parameter values from these url.
I mean i need values DSPN47ZTE1BGMR, RTR22414242144, KOSDFASEWQESAS, POLUJYUSD41234, 54SADFD14242RD
Because i am not good using regex, i couldnt find a way to extract these values from the array. Any help will be appreciated
Instead of using regex, why not just create a URL object out of the string and extract the parameters natively?
let url = new URL("http://www.example.com.tr/?first=54SADFD14242RD&second=TYY42412DD");
console.log(url.searchParams.get("first")); // -> "54SADFD14242RD"
If you don't know the name of the first parameter, you can still manually search the query string using the URL constructor.
let url = new URL("http://www.example.com.tr/?first=54SADFD14242RD&second=TYY42412DD");
console.log(url.search.match(/\?([^&$]+)/)[1]); // -> "54SADFD14242RD"
The index of the search represents the parameter's position (with index zero being the whole matched string). Note that .match returns null for no matches, so the code above would throw an error if there's no parameters in the URL.
Does it have to use regex? Would something like the following work:
var x = 'http://www.example.com.tr/?first=DSPN47ZTE1BGMR&second=NECEFT8RYD';
x.split('?first=')[1].split('&second')[0];
Try this regex:
first=([^&]*)
Capture the contents of Group 1
Click for Demo
Code
Explanation:
first= - matches first=
([^&]*) - matches 0+ occurences of any character that is not a & and stores it in Group 1
You can use
(?<=\?first=)[^&]+?
(?<=\?first=) - positive look behind to match ?first=
[^&]+? - Matches any character up to & (lazy mode)
Demo
Without positive look behind you do like this
let str = `http://www.example.com.tr/?first=DSPN47ZTE1BGMR&second=NECEFT8RYD
http://www.example.com.tr/?first=RTR22414242144&second=YUUSADASFF
http://www.example.com.tr/?first=KOSDFASEWQESAS&second=VERERQWWFA
http://www.example.com.tr/?first=POLUJYUSD41234&second=13F241DASD
http://www.example.com.tr/?first=54SADFD14242RD&second=TYY42412DD`
let op = str.match(/\?first=([^&]+)/g).map(e=> e.split('=')[1])
console.log(op)

How do you repeat a pattern and extract the contents in Javascript regex

I am trying to do something fairly simple however I have not worked extensively with Regex before.
I am trying to extract some strings out of another string.
I have the string 'value/:id/:foo/:bar'
I would like to extract each string after the colon and before slash eg:
let s = 'value/:id/:foo/:bar';
let r = new RegExp(/MAGIC HERE/);
// result r.exec(s)
I have been trying for an hour or so on this website: https://regex101.com/ but can only get as close as this:
:([a-z]+)
I also tried playing with these examples but couldn't get anywhere:
Regex match everything after question mark?
How do you access the matched groups in a JavaScript regular expression?
I want to be able to extract these parameters infinitely if possible.
My intended result is to get an array of each of the parameters.
group 1 - id
group 2 - foo
group 3 - bar
Please consider explaining the regex that can help with this I want to understand how groups are formed in the regex.
'value/:id/:foo/:bar'.match(/:[a-z]+/g)
Returns
[":id", ":foo", ":bar"]
try this:
let reg=/:(.*?)\/|:(.*?)$/g;
let reg2=/:(.*?)\/|:(.*?)$/;
let str='value/:id/:foo/:bar';
let result=str.match(reg).map(v=>v.match(reg2)[1]?v.match(reg2)[1]:v.match(reg2)[2]);
console.log(result);
"/:"
This regex , don't try to match the text between separators, but the separator '/:'.
I hope this help...
let s = 'value/:id/:foo/:bar';
s = s.split("\/\:").splice(1).map((current, index) => `group ${index+1}: - ${current}`);
console.log(s);

How to extract a particular text from url in JavaScript

I have a url like http://www.somedotcom.com/all/~childrens-day/pr?sid=all.
I want to extract childrens-day. How to get that? Right now I am doing it like this
url = "http://www.somedotcom.com/all/~childrens-day/pr?sid=all"
url.match('~.+\/');
But what I am getting is ["~childrens-day/"].
Is there a (definitely there would be) short and sweet way to get the above text without ["~ and /"] i.e just childrens-day.
Thanks
You could use a negated character class and a capture group ( ) and refer to capture group #1. The caret (^) inside of a character class [ ] is considered the negation operator.
var url = "http://www.somedotcom.com/all/~childrens-day/pr?sid=all";
var result = url.match(/~([^~]+)\//);
console.log(result[1]); // "childrens-day"
See Working demo
Note: If you have many url's inside of a string you may want to add the ? quantifier for a non greedy match.
var result = url.match(/~([^~]+?)\//);
Like so:
var url = "http://www.somedotcom.com/all/~childrens-day/pr?sid=all"
var matches = url.match(/~(.+?)\//);
console.log(matches[1]);
Working example: http://regex101.com/r/xU4nZ6
Note that your regular expression wasn't actually properly delimited either, not sure how you got the result you did.
Use non-capturing groups with a captured group then access the [1] element of the matches array:
(?:~)(.+)(?:/)
Keep in mind that you will need to escape your / if using it also as your RegEx delimiter.
Yes, it is.
url = "http://www.somedotcom.com/all/~childrens-day/pr?sid=all";
url.match('~(.+)\/')[1];
Just wrap what you need into parenteses group. No more modifications into your code is needed.
References: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp
You could just do a string replace.
url.replace('~', '');
url.replace('/', '');
http://www.w3schools.com/jsref/jsref_replace.asp

Extract text from HTML with Javascript regex

I am trying to parse a webpage and to get the number reference after <li>YM#. For example I need to get 1234-234234 in a variable from the HTML that contains
<li>YM# 1234-234234 </li>
Many thanks for your help someone!
Rich
currently, your regex only matches if there is a single number before the dash and a single number after it. This will let you get one or more numbers in each place instead:
/YM#[0-9]+-[0-9]+/g
Then, you also need to capture it, so we use a cgroup to captue it:
/YM#([0-9]+-[0-9]+)/g
Then we need to refer to the capture group again, so we use the following code instead of the String.match
var regex = /YM#([0-9]+-[0-9]+)/g;
var match = regex.exec(text);
var id = match[1];
// 0: match of entire regex
// after that, each of the groups gets a number
(?!<li>YM#\s)([\d-]+)
http://regexr.com?30ng5
This will match the numbers.
Try this:
(<li>[^#<>]*?# *)([\d\-]+)\b
and get the result in $2.

Categories