How can I include the delimiter with regex String.split()?

How can I include the delimiter with regex String.split()? - javascript

I need to parse the tokens from a GS1 UDI format string:
"(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"
I would like to split that string with a regex on the "(nnn)" and have the delimiter included with the split values, like this:
[ "(20)987111", "(240)A", "(10)ABC123", "(17)2022-04-01", "(21)888888888888888" ]
Below is a JSFiddle with examples, but in case you want to see it right here:
// This includes the delimiter match in the results, but I want the delimiter included WITH the value
// after it, e.g.: ["(20)987111", ...]
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\))/).filter(Boolean))
// Result: ["(20)", "987111", "(240)", "A", "(10)", "ABC123", "(17)", "2022-04-01", "(21)", "888888888888888"]
// If I include a pattern that should (I think) match the content following the delimiter I will
// only get a single result that is the full string:
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\)\W+)/).filter(Boolean))
// Result: ["(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"]
// I think this is because I'm effectively mathching the entire string, hence a single result.
// So now I'll try to match only up to the start of the next "(":
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\)(^\())/).filter(Boolean))
// Result: ["(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"]
I've found and read this question, however the examples there are matching literals and I'm using character classes and getting different results.
I'm failing to create a regex pattern that will provide what I'm after. Here's a JSFiddle of some of the things I've tried: https://jsfiddle.net/6bogpqLy/
I can't guarantee the order of the "application identifiers" in the input string and as such, match with named captures isn't an attractive option.

You can split on positions where parenthesised element follows, by using a zero-length lookahead assertion:
const text = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"
const parts = text.split(/(?=\(\d+\))/)
console.log(parts)

Instead of split use match to create the array. Then find 1) digits in parenthesis, followed by a group that might contain a digit, a letter, or a hyphen, and then 2) group that whole query.
(PS. I often find a site like Regex101 really helps when it comes to testing out expressions outside of a development environment.)
const re = /(\(\d+\)[\d\-A-Z]+)/g;
const str = '(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888';
console.log(str.match(re));

Related

Regexp to explode url

I have a string url like "home/products/product_name_1/details/some_options"
And i want to parse it into array with Regexp to ["home", "products","product","details","some"]
So the rule is "split by words if backslash, but if the word have underscores - take only that part that comes before first underscore"
JavaScript equivalent for this regex is
str.split("/").map(item => item.indexOf("_") > -1 ? item.split("_")[0] : item)
Please help!

you can use this pattern
(?<!\w)[^/_]+
results
['home', 'products', 'product', 'details', 'some']
python code
import re
str="home/products/product_name_1/details/some_options"
re.findall('(?<!\w)[^/_]+',str)
['home', 'products', 'product', 'details', 'some']

Try this:
input = ["home/products/product_name_1/details/some_options",
"company/products/cars_all/details/black_color",
"public/places/1_cities/disctricts/1234_something"]
let pattern = /([a-zA-Z\d]*)(?:\/|_.*?(?:\/|$))/gmi
input.forEach(el => {
let matches = el.matchAll(pattern)
for (const match of matches) {
console.log(match[1]);
}
})
Remove \d from the regex pattern if you dont want digits in the url.
I have used matchAll here, matchAll returns a iterator, use that to get each match object, inside which the first element is the full match, and the second elemnt(index: 1) is the required group.
/([a-zA-Z\d]*)(?:\/|_.*?(?:\/|$))/gmi
/
([a-zA-Z\d]*) capture group to match letters and digits
(?:\/|_.*?(?:\/|$)) non capture group to match '/' or '_' and everything till another '/' or end of the line is found
/gmi
You can test this regex here: https://regex101.com/r/B5Bo74/1

You can use:
\b[^\W_]+
\b A word boundary to prevent a partial match
[^\W_]+ Match 1+ word characters except for _
See a regex demo.
const s = "home/products/product_name_1/details/some_options";
const regex = /\b[^\W_]+/g;
console.log(s.match(regex));
If there has to be a leading / or the start of the string before the match, you can use an alternation (?:^|\/) and use a capture group for the values that you want to keep:
const s = "home/products/product_name_1/details/some_options";
const regex = /(?:^|\/)([^\W_]+)/g;
console.log(Array.from(s.matchAll(regex), m => m[1]));

Given input:
string "home/products/product_name_1/details/some_options"
Expected output:
array ["home", "products", "product", "details", "some"]
Note: ignore/exclude name, 1, options (because word occurs after 1st underscore).
Task:
split URI by slash into a set of path-segments (words)
(if the path-segment or word contains underscores) remove the part after first underscore
Regex to match
With a regex \/|_\w+ you could match the URL-path separator (slash) and excluded word-part (every word after an underscore).
Then use this regex
either as separator to split the string into its parts（excluding the regex matches): e.g. in JS split(/\/|_\w+/)
or as search-pattern in replace to prepare a string that can be easily split: e.g. in JS replaceAll(/\/|_\w+/g, ',') to obtain a CSV row which can be easily split by comma `split(',')
Beware: The regular-expression itself (flavor) and functions to apply it depend on your environment/regex-engine and script-/programming-language.
Regex applied in Javascript
split by regex
For example in Javascript use url.split(/\/|_\w*/) where:
/pattern/: everything inside the slashes is the regex-pattern
\/: a c slash (URL-path-separator)
|: the alternate junction, interpreted as boolean OR
_\w*: zero or more (*) word-characters (w, i.e. letter from alphabet, numeric digit or underscore) following an underscore
See also:
Use of capture groups in String.split()
However, this returns also empty strings (as empty split-off second parts inside underscore-containing path-segments). We can remove the empty strings with a filter where predicate s => s returns true if the string is non-empty.
Demo to solve your task:
const url = "home/products/product_name_1/details/some_options";
let firstWordsInSegments = url.split(/\/|_\w*/).filter(s => s);
console.log(firstWordsInSegments);
const urlDuplicate = "home/products/product_name_1/details/some_options/_/home";
console.log(urlDuplicate.split(/\/|_\w*/).filter(s => s)); // contains duplicates in output array
replace into CSV, then split and exclude (map,replace,filter)
The CSV containing path-segments can be split by comma and resulting parts (path-segments) can be filtered or replaced to exclude unwanted sub-parts.
using:
replaceAll to transform to CSV or remove empty strings. Note: global flag required when calling replaceAll with regex
map to remove unwanted parts after underscore
filter(s => s) to filter out empty strings
const url = "home/products/product_name_1/details/some_options";
// step by step
let pathSegments = url.split('/');
console.log('pathSegments:', pathSegments);
let firstWordsInSegments = pathSegments.map(s => s.replaceAll(/_\w*/g,''));
console.log(firstWordsInSegments);
// replace to obtain CSV and then split
let csv = "home/products/product_name_1/details/some_options/_/home".replaceAll(/\/|_\w+/g, ',');
console.log('csv:', csv);
let parts = csv.split(',');
console.log('parts:', parts); // contains empty parts
let nonEmptyParts = parts.filter(s => s);
console.log('nonEmptyParts:', nonEmptyParts); // filtered out empty parts
Bonus Tip
Try your regex online (e.g. regex101 or regexplanet). See the demo on regex101.

You could split the url with this regex
(_\w*)+|(\/)
This matches the /, _name_1 and _options.
BUT depending what you are trying to to, or which language do you use, there are way better options to do this.

You can try a pattern like \/([^\/_]+){1,} (assuming that the path starts with '/' and the components are separated by '/'); depending on language you might get an array or iterator that will give the components.

Try ^[[:alpha:]]+|(?<=\/)[[:alpha:]]+ or ^[a-zA-Z]+|(?<=\/)[a-zA-Z]+ if [[:alpha:]] is not supported , it matches one or more characters on the beginning or after slash until first non char.

How to match bold markdown if it isn't preceded with a backslash?

I'm looking to match bolded markdown. Here are some examples:
qwer *asdf* zxcv matches *asdf*
qwer*asdf*zxcv matches *asdf*
qwer \*asdf* zxcv does not match
*qwer* asdf zxcv matches *qwer*
A negative look behind like this (?<!\\)\*(.*)\* works.
Except there is no browser support in Firefox, so I cannot use it.
Similarly, I can get very close with (^|[^\\])\*(.*)\*
The issue is that there are two capture groups, and I need the index of the second capture group, and Javascript only returns the index of the first capture group. I can bandaid it in this case by just adding 1, but in other cases this hack will not work.
My reasoning for doing this is that I'm trying to replace a small subset of Markdown with React components. As an example, I'm trying to convert this string:
qwer *asdf* zxcv *123*
Into this array:
[ "qwer ", <strong>asdf</strong>, " zxcv ", <strong>123</strong> ]
Where the second and fourth elements are created via JSX and included as array elements.

You will also need to take into account that when a backslash occurs before an asterisk, it may be one that is itself escaped by a backslash, and in that case the asterisk should be considered the start of bold markup. Except if that one is also preceded by a backslash,...etc.
So I would suggest this regular expression:
((?:^|[^\\])(?:\\.)*)\*((\\.|[^*])*)\*
If the purpose is to replace these with tags, like <strong> ... </strong>, then just use JavaScript's replace as follows:
let s = String.raw`now *this is bold*, and \\*this too\\*, but \\\*this\* not`;
console.log(s);
let regex = /((?:^|[^\\])(?:\\.)*)\*((\\.|[^*])*)\*/g;
let res = s.replace(regex, "$1<strong>$2</strong>");
console.log(res);
If the bolded words should be converted to a React component and stored in an array with the other pieces of plain text, then you could use split and map:
let s = String.raw`now *this is bold*, and \\*this too\\*, but \\\*this\* not`;
console.log(s);
let regex = /((?:^|[^\\])(?:\\.)*)\*((?:\\.|[^*])*)\*/g;
let res = s.split(regex).map((s, i) =>
i%3 === 2 ? React.createComponent("strong", {}, s) : s
);
Since there are two capture groups in the "delimiter" for the split call, one having the preceding character(s) and the second the word itself, every third item in the split result is a word to be bolded, hence the i%3 expression.

This should do the trick:
/(?:^|[^\\])(\*[^*]+[^\\]\*)/
The only capturing group there is the string surrounded by *'s.

How to write regexp for finding :smile: in javascript?

I want to write a regular expression, in JavaScript, for finding the string starting and ending with :.
For example "hello :smile: :sleeping:" from this string I need to find the strings which are starting and ending with the : characters. I tried the expression below, but it didn't work:
^:.*\:$

My guess is that you not only want to find the string, but also replace it. For that you should look at using a capture in the regexp combined with a replacement function.
const emojiPattern = /:(\w+):/g
function replaceEmojiTags(text) {
return text.replace(emojiPattern, function (tag, emotion) {
// The emotion will be the captured word between your tags,
// so either "sleep" or "sleeping" in your example
//
// In this function you would take that emotion and return
// whatever you want based on the input parameter and the
// whole tag would be replaced
//
// As an example, let's say you had a bunch of GIF images
// for the different emotions:
return '<img src="/img/emoji/' + emotion + '.gif" />';
});
}
With that code you could then run your function on any input string and replace the tags to get the HTML for the actual images in them. As in your example:
replaceEmojiTags('hello :smile: :sleeping:')
// 'hello <img src="/img/emoji/smile.gif" /> <img src="/img/emoji/sleeping.gif" />'
EDIT: To support hyphens within the emotion, as in "big-smile", the pattern needs to be changed since it is only looking for word characters. For this there is probably also a restriction such that the hyphen must join two words so that it shouldn't accept "-big-smile" or "big-smile-". For that you need to change the pattern to:
const emojiPattern = /:(\w+(-\w+)*):/g
That pattern is looking for any word that is then followed by zero or more instances of a hyphen followed by a word. It would match any of the following: "smile", "big-smile", "big-smile-bigger".

The ^ and $ are anchors (start and end respectively). These cause your regex to explicitly match an entire string which starts with : has anything between it and ends with :.
If you want to match characters within a string you can remove the anchors.
Your * indicates zero or more so you'll be matching :: as well. It'll be better to change this to + which means one or more. In fact if you're just looking for text you may want to use a range [a-z0-9] with a case insensitive modifier.
If we put it all together we'll have regex like this /:([a-z0-9]+):/gmi
match a string beginning with : with any alphanumeric character one or more times ending in : with the modifiers g globally, m multi-line and i case insensitive for things like :FacePalm:.
Using it in JavaScript we can end up with:
var mytext = 'Hello :smile: and jolly :wave:';
var matches = mytext.match(/:([a-z0-9]+):/gmi);
// matches = [':smile:', ':wave:'];
You'll have an array with each match found.

REGEX - after bracket get data until end bracket

I have a string like the following:
SOME TEXT (BI1) SOME MORE TEXT (BI17) SOME FINAL TEXT (BI1234)
Question
I am trying to make a regex to get just the information between the curly brackets, for example the end string would look like:
BI1 BI17 BI1234
I have found this example on stackoverflow which will get the first value BI1, but will ignore the rest after.
Get text between two rounded brackets
this is the REGEX I created from the above link: /\(([^)]+)\)/g but it includes the brackets, I want to remove these.
I am using this website to attempt to solve this query which has a testing window to see if the regex entered works:
http://www.regexr.com
Additional Information
there can be any amount of numbers also, which is why I have given 3 different examples.
this is a continous string, not on seperate lines
thanks for any help on this matter.

While this isn't possible using just regexes, you can do it with string#split and the following regex:
\).*?\(|^.*?\(|\).*?$
Yielding code that looks a bit like this:
function getBracketed(str) {
return str.split(/\).*?\(|^.*?\(|\).*?$/).filter(Boolean);
}
(You need to filter out the empty strings that'll appear at the beginning and end if you do it this way - hence the extra operation).
Regex demo on Regex101
Code demo on Repl.it

If you need to keep all inside parentheses and remove everything else, you might use
var str = "SOME TEXT (BI1) SOME MORE TEXT (BI17) SOME FINAL TEXT (BI1234)";
var result = str.replace(/.*?\(([^()]*)\)/g, " $1").trim();
console.log(result);
If you need to get only the BI+digits pattern inside parentheses, use
/.*?\((BI\d+)\)/g
Details:
.*? - match any 0+ chars other than linebreak symbols
\( - match a (
(BI\d+) - Group 1 capturing BI + 1 or more digits (\d+) (or [^()]* - zero or more chars other than ( and ))
\) - a closing ).
To get all the values as array (say, for later joining), use
var str = "SOME TEXT (BI1) SOME MORE TEXT (BI17) SOME FINAL TEXT (BI1234)";
var re = /\((BI\d+)\)/g;
var res =str.match(re).map(function(s) {return s.substring(1, s.length-1);})
console.log(res);
console.log(res.join(" "));

RegEx - Get All Characters After Last Slash in URL

I'm working with a Google API that returns IDs in the below format, which I've saved as a string. How can I write a Regular Expression in javascript to trim the string to only the characters after the last slash in the URL.
var id = 'http://www.google.com/m8/feeds/contacts/myemail%40gmail.com/base/nabb80191e23b7d9'

Don't write a regex! This is trivial to do with string functions instead:
var final = id.substr(id.lastIndexOf('/') + 1);
It's even easier if you know that the final part will always be 16 characters:
var final = id.substr(-16);

A slightly different regex approach:
var afterSlashChars = id.match(/\/([^\/]+)\/?$/)[1];
Breaking down this regex:
\/ match a slash
( start of a captured group within the match
[^\/] match a non-slash character
+ match one of more of the non-slash characters
) end of the captured group
\/? allow one optional / at the end of the string
$ match to the end of the string
The [1] then retrieves the first captured group within the match
Working snippet:
var id = 'http://www.google.com/m8/feeds/contacts/myemail%40gmail.com/base/nabb80191e23b7d9';
var afterSlashChars = id.match(/\/([^\/]+)\/?$/)[1];
// display result
document.write(afterSlashChars);

Just in case someone else comes across this thread and is looking for a simple JS solution:
id.split('/').pop(-1)

this is easy to understand (?!.*/).+
let me explain:
first, lets match everything that has a slash at the end, ok?
that's the part we don't want
.*/ matches everything until the last slash
then, we make a "Negative lookahead" (?!) to say "I don't want this, discard it"
(?!.*) this is "Negative lookahead"
Now we can happily take whatever is next to what we don't want with this
.+
YOU MAY NEED TO ESCAPE THE / SO IT BECOMES:
(?!.*\/).+

this regexp: [^\/]+$ - works like a champ:
var id = ".../base/nabb80191e23b7d9"
result = id.match(/[^\/]+$/)[0];
// results -> "nabb80191e23b7d9"

This should work:
last = id.match(/\/([^/]*)$/)[1];
//=> nabb80191e23b7d9

Don't know JS, using others examples (and a guess) -
id = id.match(/[^\/]*$/); // [0] optional ?

Why not use replace?
"http://google.com/aaa".replace(/(.*\/)*/,"")
yields "aaa"

We Keep Coding

JavaScript is the programming language of the Web.

How can I include the delimiter with regex String.split()? - javascript

You can split on positions where parenthesised element follows, by using a zero-length lookahead assertion: const text = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888" const parts = text.split(/(?=\(\d+\))/) console.log(parts)

Related

Regexp to explode url

How to match bold markdown if it isn't preceded with a backslash?

How to write regexp for finding :smile: in javascript?

REGEX - after bracket get data until end bracket

RegEx - Get All Characters After Last Slash in URL

Categories

Resources