Replace characters of a string matched by regex

Replace characters of a string matched by regex - javascript

I am in a situation to find the domain name of all valid URLs among a HTML page, replace these domain names with another domain name, but within the domain name, I need to do a 2nd replacement. For example, say the url https://www.example.com/path/to/somewhere is among the HTML page, I need to eventually transfer it into something like www-example-com.another.domain/path/to/somewhere.
I can do the first match and replace with the following code:
const regex = new RegExp('(https?:\/\/([^:\/\n\"\'?]+))', 'g');
txt = txt.replace(regex, "$1.another.domain");
but I have no idea how to do the second match and replace to replace the . into -. I wonder if there is any efficient way to finish this task. I tried to do something like the following but it does not work:
const regex = new RegExp('(https?:\/\/([^:\/\n\"\'?]+))', 'g');
txt = txt.replace(regex, "$1".replace(/'.'/g, '-') + ".another.domain");

Ok - I think I know what you're looking for. I'll explain what it's doing.
You 2 capture groups: the one before and the one after the first /.
You're taking the first capture group, and converting the . to -
You're adding via string .another.domain and then you're appending the 2nd capture group on it afterward
const address1 = 'https://www.example.com/path/to/somewhere';
const newDomain = "another.domain"
const pattern = /(https?:\/\/[^:\/\n\"\'?]+)(\/.*)/;
const matches = pattern.exec(address1);
const converted = matches[1].replace(/\./g, "-") + `.${newDomain}${matches[2]}`;
console.log(converted);

You can use the function version of String.prototype.replace() to have some more control over the specific replacements.
For example...
const txt = 'URL is https://www.example.com/path/to/somewhere'
const newTxt = txt.replace(/(https?:\/\/)([\w.]+)/g, (_, scheme, domain) =>
`${scheme}${domain.replace(/\./g, '-')}.another.domain`)
console.log(newTxt)
Here, scheme is the first capture group (https?:\/\/) and domain is the second ([\w.]+).
If you need a fancier domain matcher (as per your question), just substitute that part of the regex.

Related

How can I make a regex to filter a URL pattern so that it terminates after 8 char wildcard and then directory slash?

I need to write a regex for a Google Tag Manager trigger. I want to be able to trigger the tag on a particular URL pattern: pre-store/cvsfds/
and not pick up another URL which is reserved for another event: pre-store/cvsfds/ekyc_2/ekyc_2/
I am writing the following expression pre-store\/(.+?\/) but it still matches the other URL pattern with the two more levels of subdirectory. I try to use the $ symbol to terminate at the second slash (after the 8 digit code) but it doesn't appear to be working for me https://regex101.com/r/U3WBB1/1
How can I make the regex just target pre-store/cvsfds/ only?

For regex, you may use /[\/]*[^\/]+[\/]([^\/]+)/ or you may use split('/') as well.
//regex
let str = 'pre-store/cvsfds/ekyc_2/ekyc_2/';
let matches = str.match(/[\/]*[^\/]+[\/]([^\/]+)/);
document.write(`regex method:<br> ${matches[0]}/`);
//split method
let splitStr = 'pre-store/cvsfds/ekyc_2/ekyc_2/';
let splitMatches = splitStr.split('/');
let splitLink = `${splitMatches[0]}/${splitMatches[1]}/`;
document.write(`<br><br>split method:<br> ${splitLink}`);

How to avoid capturing groups if the captured match is empty?

I would like to prepend the word "custom" to a list of host-names whose subdomains can be separated by some separator.
Examples:
news.google.com -> custom.news.google.com
news/google/com -> custom.news.google.com
dev.maps.yahoo.fr -> custom.dev.maps.yahoo.fr
dev/maps/yahoo/fr -> custom/dev/maps/yahoo/fr
These strings appear inside a document with more content, so I am trying to solve this problem using regular expressions and JavaScript's string replace function.
The list of hostnames and separators is predefined and known in advance. For the sake of this example, I only showed 2 hostnames (news.google.com and dev.maps.yahoo.com) and 2 separators (. and /), but there are more.
The separator within a single string will always be the same, i.e. there won't be cases like dev/maps.yahoo/fr.
I want to be consistent and use the correct separator when prepending "custom".
I built this long regular expression:
const myRegex = /news\.google\.com|news\/google\/com|dev\.maps\.yahoo\.fr|dev\/maps\/yahoo\/fr/
(For readability purposes, this is the expression:
/news\.google\.com/
OR
/news\/google\/com/
OR
/dev\.maps\.yahoo\.fr/
OR
/dev\/maps\/yahoo\/fr/
)
(Note: It is important to emphasize that the list of hostnames is predefined and well known in advance, that's why I am 'hardcoding' the hostnames and not using tokens such as \w+ or \S+. For example, I might want to replace news.google.com, but leave news2.google.com intact).
However, I am not sure how to capture the separator (whether ., /, or any other separator). I tried using capture groups like this:
const myRegex = /news(\.)google\.com|news(\/)google\/com|dev(\.)maps\.yahoo\.fr|dev(\/)maps\/yahoo\/fr/
However, by doing this, I am creating 4 capture groups, and there's only one separator (and this is just a simple example). 3 of the capture groups will be empty, and one of them will contain the separator. How can I know which capture group is it?
Ideally, I would like something like this:
const myString = 'I navigated to news.google.com'; // For example
const myCustomString = myString.replace(
myRegex,
(match, <SEPARATOR_WRONG>) => `custom${SEPARATOR_WRONG}${match}`,
);
console.log(myCustomString);
// will log 'I navigated to custom.news.google.com'
Is there a way to skip captured groups if they are empty?

Use \1 to refer to the separator captured in the first (\.|\/) group so we don't have to keep writing it over and over.
const text = `I navigated to news.google.com
I navigated to news/google/com
I navigated to dev.maps.yahoo.fr
I navigated to dev/maps/yahoo/fr`;
const re = /\w+(\.|\/)(\w+\1)?(google|yahoo)\1\w+/g;
console.log(text.replace(re, (url, separator) => `custom${separator}${url}`));
Here's an alternate solution given the new requirement described in the comments:
const text = `I navigated to news.google.com
I navigated to news/google/com
I navigated to dev.maps.yahoo.fr
I navigated to dev/maps/yahoo/fr`;
const re = /(news|dev)(\.|\/)(google|maps)\2(com|yahoo)(\2fr)?/g;
console.log(text.replace(re, (url, prefix, separator) => `custom${separator}${url}`));
Yet another alternate solution:
const text = `I navigated to news.google.com
I navigated to news/google/com
I navigated to dev.maps.yahoo.fr
I navigated to dev/maps/yahoo/fr`;
const re = /news(\.)google\.com|news(\/)google\/com|dev(\.)maps\.yahoo\.fr|dev(\/)maps\/yahoo\/fr/g;
console.log(text.replace(re, url => 'custom' + url.match(/\.|\//)[0] + url));

solution that I believe is acceptable to you is to add separator finding logic among the capture groups in the callback
const myCustomString = myString.replace(
myRegex,
(match, ...rest) => {
const sep = rest.slice(0, -2) // last two args are index and full match
.find(sep => !!sep) // first truthy capture group contains a separator
return `custom${sep}${match}`},
);

RegEx for matching YouTube embed ID

I'm in non-modern JavaScript and I have a string defined as follows:
"//www.youtube.com/embed/DmYK479EpQc?vq=hd720&rel=0"
I want to pull out just the DmYK479EpQc but I don't know the length. I do know that I want what is after the / and before the ?
Is there some simple lines of JavaScript that would solve this?

Use the URL object?
console.log(
(new URL("//www.youtube.com/embed/DmYK479EpQc?vq=hd720&rel=0", location.href)).pathname
.split('/')
.pop());
Why? Because I can likely make up a URL that defeats the regex (though for youtube it's probably unlikely)

This expression might help you to do so, and it might be faster:
(d\/)([A-z0-9]+)(\?)
Graph
This graph shows how the expression would work and you can visualize other expressions in this link:
const regex = /(.*)(d\/)([A-z0-9]+)(\?)(.*)/gm;
const str = `//www.youtube.com/embed/DmYK479EpQc?vq=hd720&rel=0`;
const subst = `$3`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
Performance Test
This JavaScript snippet shows the performance of that expression using a simple 1-million times for loop.
const repeat = 1000000;
const start = Date.now();
for (var i = repeat; i >= 0; i--) {
const string = '//www.youtube.com/embed/DmYK479EpQc?vq=hd720&rel=0';
const regex = /(.*)(d\/)([A-z0-9]+)(\?)(.*)/gm;
var match = string.replace(regex, "$3");
}
const end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match 💚💚💚 ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test. 😳 ");

How about non-regex way
console.log("//www.youtube.com/embed/DmYK479EpQc?vq=hd720&rel=0".split('/').pop().split('?')[0]);

I'm not going to give a piece of code because this is a relatively simple algorithm, and easy to implement.
Please note that those links has this format (correct me if I'm wrong):
https:// or http://
www.youtube.com/
embed/
Video ID (DmYK479EpQc in this case)
?parameters (note that they start ALWAYS with the character ?)
You want the ID of the video, so you can split the string into those sections and if you store those sections in one array you can be sure that the ID is at the 3rd position.
One example of how that array would look like would be:
['https://', 'www.youtube.com', 'embed', 'DmYK479EpQc', '?vq=hd720&rel=0']

One option uses a regex replacement:
var url = "//www.youtube.com/embed/DmYK479EpQc?vq=hd720&rel=0";
var path = url.replace(/.*\/([^?]+).*/, "$1");
console.log(path);
The above regex pattern says to:
.* match and consume everything up to and
/ including the last path separator
([^?]+) then match and capture any number of non ? characters
.* then consume the rest of the input
Then, we just replace with the first capture group, which corresponds to the text after the final path separator, but before the start of the query string, should the URL have one.

You can use this regex
.* match and consume everything up to
[A-z0-9]+ then match and capture any number and character between A-z
.* then consume the rest of the input
const ytUrl = '//www.youtube.com/embed/DmYK479EpQc?vq=hd720&rel=0';
const regex = /(.*)(d\/)([A-z0-9]+)(\?)(.*)/gm;
const position = '$3';
let result = ytUrl.replace(regex, position);
console.log('YouTube ID: ', result);
This regex just split the string into different sections and the YouTube id is at the 3rd position.
Another, solution is using split. This method splits a string into an array of substrings.
const ytUrl = '//www.youtube.com/embed/DmYK479EpQc?vq=hd720&rel=0';
let result = ytUrl.split('/').pop().split('?').shift()
console.log('YouTube ID: ', result);
In this sample, we split the URL using / as separator. Then we took the last element of the array with the pop method. and finally we split again using ? as separator and we take the first element of the array with the shift method.

Regex - Match a string between second occurance of characters

I have a string of text that looks something like this:
?q=search&something=that&this=example/
In that example, I need to grab that . I'm using the following regex below:
var re = new RegExp("\&(.*?)\&");
Which going re[1] is giving me:
something=that - but it needs to be only that
I tried:
var re = new RegExp("\=(.*?)\&");
But that gives me everything from the first equals sign, so:
search&something=that
Is the output when it just needs to be:
that
I need to somehow target the second occurrences of 2 characters and grab whats in between them. How best do I go about this?

You can use
/something=([^&]+)/
and take the first group, see the JavaScript example:
let url = '?q=search&something=that&this=example/';
let regex = /something=([^&]+)/
let match = regex.exec(url);
console.log(match[1]);

split seems more suited to your case:
"?q=search&something=that&this=example/".split("&")[1].split("=")[1]
Then you could also implement a simple method to extract any wanted value :
function getValue(query, index) {
const obj = query.split("&")[index];
if(obj) obj.split("=")[1]
}
getValue("?q=search&something=that&this=example/", 1);

Removing last part of URL based on

I need to remove any occurence of a product number that may occur in URLs, using javascript/jquery.
URL looks like this:
http://www.mysite.com/section1/section2/section3/section4/01-012-15_1571884
The final part of the url is always formatted with 2 digits followed by -, so I was thinking a regex might do the job? I need everything removing after the last /.
It must also work when the product occurs higher or lower in the hierarchy, i.e.: http://www.mysite.com/section1/section2/01-012-15_1571884
So far I have tried different solutions with location.pathname and splits, but I am stuck on how to handle differences in product hierarchy and handling the arrays.

DEMO
var x = "http://www.mysite.com/section1/section2/section3/section4/01-012-15_1571884";
console.log(x.substr(0,x.lastIndexOf('/')));

Use lastIndexOf to find the last occurence of "/" and then remove the rest of the path using substring.

var url = 'http://www.mysite.com/section1/section2/section3/section4/01-012-15_1571884';
parts = url.split('/');
parts.pop();
url = parts.join('/');
http://jsfiddle.net/YXe6L/

var a = 'http://www.mysite.com/section1/section2/01-012-15_1571884',
result = a.replace(a.match(/(\d{1,2}-\d{1,3}-\d{1,2}_\d+)[^\d]*/g), '');
JSFiddle: http://jsfiddle.net/2TVBk/2/
This is a very nice online regex tester to test your regexes with: http://regexpal.com/

Here is an approach that will properly handle a situation where there is no product ID as you requested. http://jsfiddle.net/84GVe/
var url1 = "http://www.mysite.com/section1/section2/section3/section4/01-012-15_1571884";
var url2 = "http://www.mysite.com/section1/section2/section3/section4";
function removeID(url) {
//look for a / followed by _, - or 0-9 characters,
//and use $ to ensure it is the end of the string
var reg = /\/[-\d_]+$/;
if(reg.test(url))
{
url = url.substr(0,url.lastIndexOf('/'));
}
return url;
}
console.log( removeID(url1) );
console.log( removeID(url2) );

We Keep Coding

JavaScript is the programming language of the Web.

Replace characters of a string matched by regex - javascript

Related

How can I make a regex to filter a URL pattern so that it terminates after 8 char wildcard and then directory slash?

How to avoid capturing groups if the captured match is empty?

RegEx for matching YouTube embed ID

Regex - Match a string between second occurance of characters

Removing last part of URL based on

Categories

Resources