How can I get a specific part of a URL using RegEx?

How can I get a specific part of a URL using RegEx? - javascript

I am trying to get a part of a file download using RegEx (or other methods). I have pasted below the link that I am trying to parse and put the part I am trying to select in bold.
https://minecraft.azureedge.net/bin-linux/bedrock-server-1.7.0.13.zip
I have looked around and thought about trying Named Capture Groups, however I couldn't figure it out. I would like to be able to do this in JavaScript/Node.js, even if it requires a module 👻.

You can use node.js default modules to ease the match
URL and path to identify filename, and an easy regexp finally.
const { URL } = require('url')
const path = require('path')
const test = new URL(
'https://minecraft.azureedge.net/bin-linux/bedrock-server-1.7.0.13.zip'
)
/*
test.pathname = '/bin-linux/bedrock-server-1.7.0.13.zip'
path.parse(test.pathname) = { root: '/',
dir: '/bin-linux',
base: 'bedrock-server-1.7.0.13.zip',
ext: '.zip',
name: 'bedrock-server-1.7.0.13' }
match = [ '1.7.0.13', index: 15, input: 'bedrock-server-1.7.0.13' ]
*/
const match = path.parse(test.pathname)
.name
.match(/[0-9.]*$/)

You could use the below regex:
[\d.]+(?=\.\w+$)
This matches dots and digits that are following a file extension. You could also make it more accurate:
\d+(?:\.\d+)*(?=\.\w+$)

I'd stick with this:
-(\d+(?:\.\d+)*)(?:\.\w+)$
It matches a dash before any numbers
The parenthesis will make a capture group
Then, \d+ will match from one to any number of digits
?: will make a group but not capture it
Inside this group, \.\d+ will match a dot followed by any number of digits
The last expression will repeat from zero to any times thanks to *
After that, (?:\.\w+)$ will make a group that matches the extension toward the end of the string but not capture it
So, basically, this format would allow you to capture all the numbers that are after the dash and before the extension, be it 1, 1.7, 1.7.0, 1.7.0.13, 1.7.0.13.5 etc. On the match array, at index [0] you will have the entire regex match, and on [1] you will have your captured group, the number you're looking for.

Perhaps a regular expression like this is what you need?
var url = 'https://minecraft.azureedge.net/bin-linux9.9.9/bedrock-server-1.7.0.13.zip'
var match = url.match(/(\d+[.\d+]*)(?=\.\w+$)/gi)
console.log( match )
The way this pattern /\d+[.\d+]*\d+/gi works is to basically say that we want a sub string match that:
first contains one or more digit characters, ie \d+
immediately following this, there can be optional groupings of digits and decimal characters, ie [.\d+]
and finally, (?=\.\w+$) requires a file extension like .zip to follow immediately after our matched string
For more information on special characters like + and *, see this documentation. Hope that helps!

Related

Regex specific number inside quote

I am new to regex and have this cdn url that returns text and I want to use javascript to match and extract the version number. I can match the latestVersion but I am not sure how to get the value inside of it.
ex on text:
...oldVersion:"1.2.0",stagingVersion:"1.2.1",latestVersion:"1.3.0",authVersion:"2.2.2"...
I tried doing this line to display latestVersion:"1.3.0 but not successful
const regex = /\blatestVersion:"*"\b/
stringIneed = text.match(regex)
And I only need 1.3.0 not including the string latestVersion:

There are many ways of doing it. This is one:
const text='...oldVersion:"1.2.0",stagingVersion:"1.2.1",latestVersion:"1.3.0",authVersion:"2.2.2"...';
console.log(text.match(/latestVersion:"(.*?)"/)?.[1])
The .*? is a "non-greedy" wildcard that will match as few as possible characters in order to make the whole regexp match. For this reason it will stop matching before the ".

Try adding a capture group () to match certain strings in the Regex.
/\blatestVersion:"([0-9.]+)"/

You could use a lookbehind, or a capturing group like this:
const str = '...oldVersion:"1.2.0",stagingVersion:"1.2.1",latestVersion:"1.3.0",authVersion:"2.2.2"...'
console.log(
str.match(/(?<=latestVersion:")[^"]+/)?.[0]
)
console.log(
str.match(/latestVersion:"([^"]+)"/)?.[1]
)

How can I include the delimiter with regex String.split()?

I need to parse the tokens from a GS1 UDI format string:
"(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"
I would like to split that string with a regex on the "(nnn)" and have the delimiter included with the split values, like this:
[ "(20)987111", "(240)A", "(10)ABC123", "(17)2022-04-01", "(21)888888888888888" ]
Below is a JSFiddle with examples, but in case you want to see it right here:
// This includes the delimiter match in the results, but I want the delimiter included WITH the value
// after it, e.g.: ["(20)987111", ...]
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\))/).filter(Boolean))
// Result: ["(20)", "987111", "(240)", "A", "(10)", "ABC123", "(17)", "2022-04-01", "(21)", "888888888888888"]
// If I include a pattern that should (I think) match the content following the delimiter I will
// only get a single result that is the full string:
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\)\W+)/).filter(Boolean))
// Result: ["(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"]
// I think this is because I'm effectively mathching the entire string, hence a single result.
// So now I'll try to match only up to the start of the next "(":
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\)(^\())/).filter(Boolean))
// Result: ["(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"]
I've found and read this question, however the examples there are matching literals and I'm using character classes and getting different results.
I'm failing to create a regex pattern that will provide what I'm after. Here's a JSFiddle of some of the things I've tried: https://jsfiddle.net/6bogpqLy/
I can't guarantee the order of the "application identifiers" in the input string and as such, match with named captures isn't an attractive option.

You can split on positions where parenthesised element follows, by using a zero-length lookahead assertion:
const text = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"
const parts = text.split(/(?=\(\d+\))/)
console.log(parts)

Instead of split use match to create the array. Then find 1) digits in parenthesis, followed by a group that might contain a digit, a letter, or a hyphen, and then 2) group that whole query.
(PS. I often find a site like Regex101 really helps when it comes to testing out expressions outside of a development environment.)
const re = /(\(\d+\)[\d\-A-Z]+)/g;
const str = '(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888';
console.log(str.match(re));

How to search a string for 1st occurrence of ":/" and then search all other occurences of the found substring inclusive ":/"?

A little explanation:
I have a string like (from a commandline programm execution kpsewhich -all etex.src):
c:/texlive/2019/texmf-dist/tex/luatex/hyph-utf8/etex.srcc:/texlive/2019/texmf-dist/tex/plain/etex/etex.src
This string consists of 2 or more concatenated file paths, which are to be separated again.
Dynamic search pattern: c:/
The files are always on the same volume, here c, but the volume name has to be determined.
Is it possible to do something like this with an RegExp?
I could split the string according to the actual filename etex.src, but is the other approach possible?
Update:
The RegExp as follows
(.+?:[\/\\]+)(?:(?!\1).)*
meets my requirements even better. How to disassemble a string with CaptureGroup?

I'm guessing that maybe this expression would be somewhat close to what you might want to design:
c:\/.*?(?=c:\/|$)
DEMO

I'm not entirely sure what you want this RegExp to retrieve but if you want to get the array of file paths then you can do it with /(?<!^)[^:]+/g regex:
// in node.js
const str = 'c:/texlive/2019/texmf-dist/tex/luatex/hyph-utf8/etex.srcc:/texlive/2019/texmf-dist/tex/plain/etex/etex.src'
const paths = str.match(/(?<!^)[^:]+/g)
// [
// "/texlive/2019/texmf-dist/tex/luatex/hyph-utf8/etex.srcc",
// "/texlive/2019/texmf-dist/tex/plain/etex/etex.src"
// ]
This RegExp searches for a sequence of symbols which don't include : and which don't start at the beginning of the string (this excludes c volume or any other volume name)

Regex - conditional match for hyphened appendices

I'm dealing with 8 character jobnames that must follow convention, but I want to allow additional characters if appended with a hyphen.
I have come up with this:
\w{2}YYY\w{3}(?(-).*|\b)
Which matches correctly:
XXYYY001 >> match
XXYYY001-TEST >> match
XXYYY001123 >> no match
This seems cumbersome however, so I just wanna know the most efficient expression.
EDIT: Thanks Wiktor, your answer worked.
And to take it one step further: If I wanted to use a variable for YYY?

Like this.
explanation:
^ matches beginning of string
\w{2}YYY\w{3} is the part you wrote. Matches main pattern
(\-.*) matches a dash, followed by anything (including nothing. see test #4)
? Means the previous match can occur zero or one times
const pattern = /^\w{2}YYY\w{3}(\-.*)?$/;
const strings = [
'XXYYY001',
'XXYYY001XXXTEST',
'XXYYY001-TEST',
'XXYYY003-',
'FARFXXYYY003',
'FARFXXYYY003-TEST'
];
strings.forEach(string => {
let conforms = pattern.test(string);
console.log(string,conforms);
});

Regex to match filename at end of URL

Having this text:
http://img.oo.com.au/prod/CRWWBGFWG/1t44.jpg
And other texts like this where the last 1 can be any other number and the last 44 can be any other number as well, I need a regex that will match /1t44.jpg.
Everything I've tried so far (/.+?\.([^\.]+)$) matches from the first slash (//img.oo.com.au/prod/CRWWBGFWG/1t44.jpg).
I'm using JavaScript, so whatever works on RegexPal should do.

Here's a simple Regex that will match everything after the last /:
/[^/]*$

If you want to match a filename with a very specific file extenstion, you can use something like this:
/\/\dt\d\d\.jpg$/
This matches:
a slash
followed by a digit
followed by the letter 't'
followed by two digits
followed by '.jpg' at the end of the string
Or, if you really just want the filename (whatever is after the last slash with any file extension), then you can use this:
/\/[^\/]+$/
This matches:
a slash
followed by one or more non-slash characters
at the end of the string
In your sample string of http://img.oo.com.au/prod/CRWWBGFWG/1t44.jpg, both of these will match /1t44.jpg. The first is obviously much more restrictive since it requires a specific format of the filename. The second matches any filename.
Other choices. In node.js development, you can use the path module and use path.parse() to break a path up into all of its various components.
And, there are various libraries written for the browser that will break up a path into its components too.

As Johnsyweb says, a regular express isn't really needed here. AFAIK the fastest way to do this is with lastIndexOf and substr.
str.substr(str.lastIndexOf('/'));

Of course you don't have to use a regular expression to split a string and pop the last part:
var str="http://img.oo.com.au/prod/CRWWBGFWG/1t44.jpg";
var parts = str.split("/");
document.write(parts.pop() + "<br />");

Based on answer of Scott, try this: (JavaScript)
var url = "http://img.oo.com.au/prod/CRWWBGFWG/1t44.jpg";
var path = url.replace(/(.*)([\\\/][^\\\/]*$)/, "$1" );
var lastElement = url.replace(/(.*)([\\\/][^\\\/]*$)/, "$2" );
This can be also matched for Windows/Nix file path, to extract file name or file path :
c:\Program Files\test.js => c:\Program Files
c:\Program Files\test.js => \test.js

This is for Java on a Linux machine. It grabs the last part of a file path, so that it can be used for making a file lock.
// String to be scanned to find the pattern.
String pattern = ".*?([^/.]+)*$";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher match = r.matcher("/usr/local/java/bin/keystore");
/**
* Now you have two matches
* #0 /usr/local/java/bin/keystore
* #1 keystore
*/
String fileLock = "";
if (match.find()) {
fileLock = match.group(1) + ".lock";
}
A little different than the original question, I know. But I hope this helps others who were stuck with the same problem I had.

We Keep Coding

JavaScript is the programming language of the Web.

How can I get a specific part of a URL using RegEx? - javascript

You could use the below regex: [\d.]+(?=\.\w+$) This matches dots and digits that are following a file extension. You could also make it more accurate: \d+(?:\.\d+)*(?=\.\w+$)

Related

Regex specific number inside quote

How can I include the delimiter with regex String.split()?

How to search a string for 1st occurrence of ":/" and then search all other occurences of the found substring inclusive ":/"?

Regex - conditional match for hyphened appendices

Regex to match filename at end of URL

Categories

Resources