Need to extract part of a url using Regex [duplicate]

Need to extract part of a url using Regex [duplicate] - javascript

This question already has answers here:
Template literal inside of the RegEx
(2 answers)
Closed 2 years ago.
I would like to extract a substring from an s3 URL using Regex rather than with string manipulation functions.
My requirement is to retrieve dynamodbtablename/05abd315-2e0b-4717-919d-1cc6576ebe19 out of a URL s3://s3bucket/dynamodbtablename/05abd315-2e0b-4717-919d-1cc6576ebe19
However, I have not been able to arrange the regex expression to give me what I want.
I would like the regex to parse in this form but I know that I am missing something in the regex line.
const url = 's3://s3bucket/dynamodbtablename/05abd315-2e0b-4717-919d-1cc6576ebe19';
const patternMatches = url.match(new RegExp(s3://${s3bucket}/${dynamodbtablename}/([a-f\d-]+)));
const migrationDataFileS3Key = patternMatches[indexOfResultingArrayWithDesiredSubstring]
I was able to come up with the expression below to retrieve the UUID/GUID and have had to concatenate it with ${s3bucket} to form the S3 bucket key. However, I am not happy with this solution. I require the above.
const url = 's3://s3bucket/dynamodbtablename/05abd315-2e0b-4717-919d-1cc6576ebe19';
const patternMatches = url.match(/([a-f\d-]+)/g);
const migrationDataFileS3Key = massiveTableItem + '/' + patternMatches[patternMatches.length - 1];
Thank you very much for your help.

You may not need a regular expression: split the URL on / and take the element you need from that. Like:
{
console.log(`s3://s3bucket/dynamodbtablename/05abd315-2e0b-4717-919d-1cc6576ebe19`
.split(`/`) // split on forward slash
.slice(-2) // take the last 2 elements from the resulting array
.join(`/`) // extract it
);
// alternatively
console.log(`s3://s3bucket/dynamodbtablename/05abd315-2e0b-4717-919d-1cc6576ebe19`
.match(/([\w\-])+/g)
.slice(-2)
.join(`/`)
);
// or (use capture groups)
const {groups: {root, hashpath}} =
/(?<root>s3:\/\/s3bucket\/)(?<hashpath>[\w\-\/]+)/
.exec(`s3://s3bucket/dynamodbtablename/05abd315-2e0b-4717-919d-1cc6576ebe19`);
console.log(hashpath);
// or (just substring from known index)
const url = `s3://s3bucket/dynamodbtablename/05abd315-2e0b-4717-919d-1cc6576ebe19`;
console.log(url.substr(url.indexOf(`/`, 5) + 1))
}

you can use capture groups, like
var str = "s3://s3bucket/dynamodbtablename/05abd315-2e0b-4717-919d-1cc6576ebe19";
var myRegexp = /s3:\/\/s3bucket\/(.*)/;
var match = myRegexp.exec(str);
console.log(match[1]);
// returns 'dynamodbtablename/05abd315-2e0b-4717-919d-1cc6576ebe19'

I was able to eventually arrive at a solution that was closest to the format that I wanted as required in my question. I was able to do it by combining the solution of #sudhir-bastakoti and #wiktor-stribiżew as each individual answer did not address my question completely.
I am grateful to everyone that answered my question including #kooiinc. I checked out his last answer options and it worked. However, I wanted the answer in a certain format.
const s3bucket = 's3bucket';
const url = 's3://s3bucket/dynamodbtablename/05abd315-2e0b-4717-919d-1cc6576ebe19';
const migrationDataFileS3Key = url.match(new RegExp(String.raw`s3://${s3bucket}/(.*)`))[1];

Related

How can I add two hyphens in an RegEx expression?

I have a value that I will want to add two hyphens.
For example, if I receive:
FN322KN
I want to transform it to:
FN-322-KN
I am trying to use this solution (Mask javascript variable value) and Im stuck here:
CODE:
var value = 'FN322KN';
var formatted = value.replace(/^(.{2})(.{5}).*/, '$1-$2');
RESULT KO:
'FN-322KN'
Can someone please tell me how I can add the second "-" ?
UPDATE!!
Both Mark Baijens and Buttered_Toast answers are correct. I have one more question though. What if the value comes like FN-322KN or F-N322-KN ? Like, out of format? Because if thats the case, then it adds one hifen where one already exists, making it "--".
Thanks!

Assuming you always want the hyphens after the first 2 characters and after the first 5 characters you can change the regex easily to 3 groups.
var value = 'FN322KN';
var formatted = value.replace(/^(.{2})(.{3})(.{2}).*/, '$1-$2-$3');
console.log(formatted);

Going by the provided content you have, you could try this
(?=(.{5}|.{2})$)
https://regex101.com/r/JZVivU/1
const regex = /(?=(.{5}|.{2})$)/gm;
// Alternative syntax using RegExp constructor
// const regex = new RegExp('(?=(.{5}|.{2})$)', 'gm')
const str = `FN322KN`;
const subst = `-`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);

Regex or substring operation to strip out a URL from a keyword onwards [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I'm struggling to figure out the best way to strip out all the content in a URL from a specific keyword onwards (including the keyword), using either regex or a substring operation. So if I have an example dynamic URL http://example.com/category/subcat/filter/size/1/ - I would like to strip out the /filter/size/1 element of the URL and leave me with the remaining URL as a separate string. Grateful for any pointers. I should clarify that the number of arguments after the filter keyword isn't fixed and could be more than in my example and the number of category arguments prior to the filter keyword isn't fixed either

To be a little safer you could use the URL object to handle most of the parsing and then
just sanitize the pathname.
const filteredUrl = 'http://example.com/category/subcat/filter/test?param1&param2=test';
console.log(unfilterUrl(filteredUrl));
function unfilterUrl(urlString) {
const url = new URL(urlString);
url.pathname = url.pathname.replace(/(?<=\/)filter(\/|$).*/i, '');
return url.toString();
}

You can tweak this a little based on your need. Like it might be the case where filter is not present in the URL. but lets assume it is present then consider the following regex expression.
/(.*)\/filter\/(.*)/g
the first captured group ( can be obtained by $1 ) is the portion of the string behind the filter keyword and the second captured group ( obtained by $2 ) will contain all your filters present after the filter keyword
have a look at example i tried on regextester.com

Use the split() function.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/split
url='http://example.com/category/subcat/filter/size/1/';
console.log(url.split('/filter')[0]);

Split
The simplest solution that occurs to me is the following:
const url = 'http://example.com/category/subcat/filter/size/1/';
const [base, filter] = url.split('/filter/');
// where:
// base == 'http://example.com/category/subcat'
// filter == 'size/1/'
If you expect more than one occurrence of '/filter/', use the limit parameter of String.split(): url.split('/filter/', 2);
RegExp
The assumption of the above is that after the filter parameter, everything is part of the filter. If you need more granularity, you can use a regex that terminates at the '?', for example. This will remove everything from 'filter/anything/that/follows' that immediately follows a / and until the first query string separator ?, not including.
const filterRegex = /(?<=\/)filter(\/|$)[^?]*/i;
function parseURL(url) {
const match = url.match(filterRegex);
if (!match) { return [url, null, null]; } // expect anything
const stripped = url.replace(filterRegex, '');
return [url, stripped, match[0]];
}
const [full, stripped, filter] = parseURL('http://example.com/category/subcat/filter/size/1/?query=string');
// where:
// stripped == 'http://example.com/category/subcat/?query=string'
// filter == 'filter/size/1/'

I'm sadly not able to post the full answer here, as i'ts telling me 'it looks like spam'. I created a gist with the original answer. In it i talk about the details of String.prototype.match and of JS/ES regex in general including named capture groups and pitfalls. And incude a link to a great regex tool: regex101. I'm not posting the link here in fear of triggering the filter again. But back to the topic:
In short, a simple regext can be used to split and format it (using filter as the keyword):
/^(.*)(\/filter\/.*)$/
or with named groups:
/^(?<main>.*)(?<stripped>\/filter\/.*)$/
(note that the forward slashes need to be escaped in a regex literal)
Using String.prototype.match with that regex will return an array of the matches: index 1 will be the first capture group (so everything before the keyword), index 2 will be everything after that (including the keyword).
Again, all the details can be found in the gist

Replace characters of a string matched by regex

I am in a situation to find the domain name of all valid URLs among a HTML page, replace these domain names with another domain name, but within the domain name, I need to do a 2nd replacement. For example, say the url https://www.example.com/path/to/somewhere is among the HTML page, I need to eventually transfer it into something like www-example-com.another.domain/path/to/somewhere.
I can do the first match and replace with the following code:
const regex = new RegExp('(https?:\/\/([^:\/\n\"\'?]+))', 'g');
txt = txt.replace(regex, "$1.another.domain");
but I have no idea how to do the second match and replace to replace the . into -. I wonder if there is any efficient way to finish this task. I tried to do something like the following but it does not work:
const regex = new RegExp('(https?:\/\/([^:\/\n\"\'?]+))', 'g');
txt = txt.replace(regex, "$1".replace(/'.'/g, '-') + ".another.domain");

Ok - I think I know what you're looking for. I'll explain what it's doing.
You 2 capture groups: the one before and the one after the first /.
You're taking the first capture group, and converting the . to -
You're adding via string .another.domain and then you're appending the 2nd capture group on it afterward
const address1 = 'https://www.example.com/path/to/somewhere';
const newDomain = "another.domain"
const pattern = /(https?:\/\/[^:\/\n\"\'?]+)(\/.*)/;
const matches = pattern.exec(address1);
const converted = matches[1].replace(/\./g, "-") + `.${newDomain}${matches[2]}`;
console.log(converted);

You can use the function version of String.prototype.replace() to have some more control over the specific replacements.
For example...
const txt = 'URL is https://www.example.com/path/to/somewhere'
const newTxt = txt.replace(/(https?:\/\/)([\w.]+)/g, (_, scheme, domain) =>
`${scheme}${domain.replace(/\./g, '-')}.another.domain`)
console.log(newTxt)
Here, scheme is the first capture group (https?:\/\/) and domain is the second ([\w.]+).
If you need a fancier domain matcher (as per your question), just substitute that part of the regex.

URL last part regular expression

I need to get the last part of my URL with html on the end. So if I have this url http://step/build/index.html I need to get only index.html. I need to do this with javascript
let address = http://step/build/index.html;
let result = address.match(/html/i);
I tried this, but it doesn't work for me, maybe I make some mistakes.
How do I get the last segment of URL using regular expressions
Could someone help me and explain it in details?
Thank you.

You can extract the .html filename part using this /[^/]+\.html/i RegEx.
See the code below.
const regex = /[^/]+\.html/i;
let address = "http://step/build/index.html";
let result = address.match(regex);
console.log(result);
The same RegEx will also match the filename incase the URL has additional parameters.
const regex = /[^/]+\.html/i;
let address = "http://step/build/index.html?name=value";
let result = address.match(regex);
console.log(result);

You could split it on slashes and then fetch the last item:
let address = "http://step/build/index.html";
let result = address.split("/").pop();
console.log(result)

Here's a non-regex approach. Should be more reliable/appropriate at the job, depending on whether you'll need other URL-specific parts:
// Note the ?foo=bar part, that URL.pathname will ignore below
let address = 'http://step/build/index.html?foo=bar';
let url = new URL(address);
// Last part of the path
console.log(url.pathname.split('/').pop());
// Query string
console.log(url.search);
// Whole data
console.log(url);

You could use split which returns an array to split on a forward slash and then use pop which removes the last element from the array and returns that:
let address = "http://step/build/index.html".split('/').pop();
console.log(address);
If you have querystring parameters which could for example start with ? or #, you might use split again and get the first item from the array:
let address2 = "\"http://step/build/index.html?id=1&cat=2"
.split('/')
.pop()
.split(/[?#]/)[0];
console.log(address2);

Regex - Match a string between second occurance of characters

I have a string of text that looks something like this:
?q=search&something=that&this=example/
In that example, I need to grab that . I'm using the following regex below:
var re = new RegExp("\&(.*?)\&");
Which going re[1] is giving me:
something=that - but it needs to be only that
I tried:
var re = new RegExp("\=(.*?)\&");
But that gives me everything from the first equals sign, so:
search&something=that
Is the output when it just needs to be:
that
I need to somehow target the second occurrences of 2 characters and grab whats in between them. How best do I go about this?

You can use
/something=([^&]+)/
and take the first group, see the JavaScript example:
let url = '?q=search&something=that&this=example/';
let regex = /something=([^&]+)/
let match = regex.exec(url);
console.log(match[1]);

split seems more suited to your case:
"?q=search&something=that&this=example/".split("&")[1].split("=")[1]
Then you could also implement a simple method to extract any wanted value :
function getValue(query, index) {
const obj = query.split("&")[index];
if(obj) obj.split("=")[1]
}
getValue("?q=search&something=that&this=example/", 1);

We Keep Coding

JavaScript is the programming language of the Web.

Need to extract part of a url using Regex [duplicate] - javascript

you can use capture groups, like var str = "s3://s3bucket/dynamodbtablename/05abd315-2e0b-4717-919d-1cc6576ebe19"; var myRegexp = /s3:\/\/s3bucket\/(.*)/; var match = myRegexp.exec(str); console.log(match[1]); // returns 'dynamodbtablename/05abd315-2e0b-4717-919d-1cc6576ebe19'

Related

How can I add two hyphens in an RegEx expression?

Regex or substring operation to strip out a URL from a keyword onwards [duplicate]

Replace characters of a string matched by regex

URL last part regular expression

Regex - Match a string between second occurance of characters

Categories

Resources