RegEx for matching YouTube embed ID - javascript

I'm in non-modern JavaScript and I have a string defined as follows:
"//www.youtube.com/embed/DmYK479EpQc?vq=hd720&rel=0"
I want to pull out just the DmYK479EpQc but I don't know the length. I do know that I want what is after the / and before the ?
Is there some simple lines of JavaScript that would solve this?

Use the URL object?
console.log(
(new URL("//www.youtube.com/embed/DmYK479EpQc?vq=hd720&rel=0", location.href)).pathname
.split('/')
.pop());
Why? Because I can likely make up a URL that defeats the regex (though for youtube it's probably unlikely)

This expression might help you to do so, and it might be faster:
(d\/)([A-z0-9]+)(\?)
Graph
This graph shows how the expression would work and you can visualize other expressions in this link:
const regex = /(.*)(d\/)([A-z0-9]+)(\?)(.*)/gm;
const str = `//www.youtube.com/embed/DmYK479EpQc?vq=hd720&rel=0`;
const subst = `$3`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
Performance Test
This JavaScript snippet shows the performance of that expression using a simple 1-million times for loop.
const repeat = 1000000;
const start = Date.now();
for (var i = repeat; i >= 0; i--) {
const string = '//www.youtube.com/embed/DmYK479EpQc?vq=hd720&rel=0';
const regex = /(.*)(d\/)([A-z0-9]+)(\?)(.*)/gm;
var match = string.replace(regex, "$3");
}
const end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match 💚💚💚 ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test. 😳 ");

How about non-regex way
console.log("//www.youtube.com/embed/DmYK479EpQc?vq=hd720&rel=0".split('/').pop().split('?')[0]);

I'm not going to give a piece of code because this is a relatively simple algorithm, and easy to implement.
Please note that those links has this format (correct me if I'm wrong):
https:// or http://
www.youtube.com/
embed/
Video ID (DmYK479EpQc in this case)
?parameters (note that they start ALWAYS with the character ?)
You want the ID of the video, so you can split the string into those sections and if you store those sections in one array you can be sure that the ID is at the 3rd position.
One example of how that array would look like would be:
['https://', 'www.youtube.com', 'embed', 'DmYK479EpQc', '?vq=hd720&rel=0']

One option uses a regex replacement:
var url = "//www.youtube.com/embed/DmYK479EpQc?vq=hd720&rel=0";
var path = url.replace(/.*\/([^?]+).*/, "$1");
console.log(path);
The above regex pattern says to:
.* match and consume everything up to and
/ including the last path separator
([^?]+) then match and capture any number of non ? characters
.* then consume the rest of the input
Then, we just replace with the first capture group, which corresponds to the text after the final path separator, but before the start of the query string, should the URL have one.

You can use this regex
.* match and consume everything up to
[A-z0-9]+ then match and capture any number and character between A-z
.* then consume the rest of the input
const ytUrl = '//www.youtube.com/embed/DmYK479EpQc?vq=hd720&rel=0';
const regex = /(.*)(d\/)([A-z0-9]+)(\?)(.*)/gm;
const position = '$3';
let result = ytUrl.replace(regex, position);
console.log('YouTube ID: ', result);
This regex just split the string into different sections and the YouTube id is at the 3rd position.
Another, solution is using split. This method splits a string into an array of substrings.
const ytUrl = '//www.youtube.com/embed/DmYK479EpQc?vq=hd720&rel=0';
let result = ytUrl.split('/').pop().split('?').shift()
console.log('YouTube ID: ', result);
In this sample, we split the URL using / as separator. Then we took the last element of the array with the pop method. and finally we split again using ? as separator and we take the first element of the array with the shift method.

Related

Replace characters of a string matched by regex

I am in a situation to find the domain name of all valid URLs among a HTML page, replace these domain names with another domain name, but within the domain name, I need to do a 2nd replacement. For example, say the url https://www.example.com/path/to/somewhere is among the HTML page, I need to eventually transfer it into something like www-example-com.another.domain/path/to/somewhere.
I can do the first match and replace with the following code:
const regex = new RegExp('(https?:\/\/([^:\/\n\"\'?]+))', 'g');
txt = txt.replace(regex, "$1.another.domain");
but I have no idea how to do the second match and replace to replace the . into -. I wonder if there is any efficient way to finish this task. I tried to do something like the following but it does not work:
const regex = new RegExp('(https?:\/\/([^:\/\n\"\'?]+))', 'g');
txt = txt.replace(regex, "$1".replace(/'.'/g, '-') + ".another.domain");
Ok - I think I know what you're looking for. I'll explain what it's doing.
You 2 capture groups: the one before and the one after the first /.
You're taking the first capture group, and converting the . to -
You're adding via string .another.domain and then you're appending the 2nd capture group on it afterward
const address1 = 'https://www.example.com/path/to/somewhere';
const newDomain = "another.domain"
const pattern = /(https?:\/\/[^:\/\n\"\'?]+)(\/.*)/;
const matches = pattern.exec(address1);
const converted = matches[1].replace(/\./g, "-") + `.${newDomain}${matches[2]}`;
console.log(converted);
You can use the function version of String.prototype.replace() to have some more control over the specific replacements.
For example...
const txt = 'URL is https://www.example.com/path/to/somewhere'
const newTxt = txt.replace(/(https?:\/\/)([\w.]+)/g, (_, scheme, domain) =>
`${scheme}${domain.replace(/\./g, '-')}.another.domain`)
console.log(newTxt)
Here, scheme is the first capture group (https?:\/\/) and domain is the second ([\w.]+).
If you need a fancier domain matcher (as per your question), just substitute that part of the regex.

JS: Remove all text to the left of certain last character with Regex

I'm trying to remove all the text which falls before the last character in a Regex pattern.
Example:
rom.com/run/login.php
Becomes:
login.php
How would I go about doing this in JavaScript? I'm new to regular expressions.
To get everything after last slash, use [^\/]+$
const str = "rom.com/run/login.php";
console.log(str.match(/[^/]+$/)[0]);
You can get the result you need by searching for a literal string (just one character in fact) so there's no need to employ regular expressions which will cost you performance.
You can split the input into chunks separated by / and get the last chunk:
var input = 'rom.com/run/login.php';
var result = input.split('/').pop();
Or find the position of the last occurrence of / in the input, and get the remainder of the string that follows that position:
var input = 'rom.com/run/login.php';
var result = input.substring(input.lastIndexOf('/') + 1);
One approach is a regex replacement:
var path = "rom.com/run/login.php";
var output = path.replace(/^.*\//, "");
console.log(output);
The regex pattern ^.*/ is greedy, and will consume everything up to (and including) the last path separator. Then, we replace this match with empty string, to effectively remove it.
You could do it with Regex like this:
var url = 'rom.com/run/login.php'
var page = url.match('^.*/(.*)')[1]
console.log(page)
Or you could do it without Regex like this:
var url = 'rom.com/run/login.php'
var split = url.split('/')
var page = split[split.length-1]
console.log(page)

Split image path in javascript from first slash

I have the following path :
/data/2/444/test.text
or (without a slash at the start of the path)
data/2/444/test.text
I would like to return in JS the following result :
"/2/444/test.text"
I tried with the following: but I managed only to get the base name
new String(str).substring(str.lastIndexOf('/') + 1);
You can use a simple regex to remove the first directory in the path.
str.replace(/^\/?[^/]+\//, '/')
^\/? Optional slash at the beginning of string.
[^/]+\// match any non slash character until it encounter a slash
const input = ['/data/2/444/test.text', 'data/2/444/test.text', 'file.txt'];
const output = input.map(str => str.replace(/^\/?[^/]+\//, '/'))
console.log(output);
If you only want to replace /data from the beginning you can use:
^\/?data\/
lastIndexOf finds the last occurrence of a string within a string. When you use substring(x) on a string y, it will return the characters of y starting at x. So using lastIndexOf in this use case isn't what you want. You can achieve what you want by using indexOf (finding the first occurrence of a string within a string).
To account for the different formats of your input string (i.e. /data and data), you can just test for that:
function getPathWithoutData(str) {
var strWithoutSlash = str[0] === '/' ? str.substring(1) : str;
return strWithoutSlash.substring(strWithoutSlash.indexOf('/'));
}
You can easily do it without regexes and using slice and indexOf:
const getPath=path=>path.slice(path.indexOf('/',path[0]==="/"?1:0)-path.length);
console.log(getPath('/data/2/444/test.text'));
console.log(getPath('data/2/444/test.text'));
This checks if the first char is a / or not, and adjusts the indexOf accordingly to match either the second or first /. Also note how the subtraction gives a negative value, which gets the intended characters from the end, up to the /.
Of course you can do still do it with substring, as you were attempting, but with indexOf instead of lastIndexOf, because you want the 1st or 2nd / not the last one:
const getPath=path=>path.substring(path.indexOf('/',path[0]==="/"?1:0),path.length);
console.log(getPath('/data/2/444/test.text'));
console.log(getPath('data/2/444/test.text'));
It's worth mentioning that these may not be as robust as a regex, but are simple enough, and may fit your needs, depending on how the data can vary.
You could use String.prototype.split() and pass it a regex.
const paths = [
"/data/path/one",
"data/path/two"
];
const modifyPath = p => {
const [fallback, newPath] = p.split(/\/?data/);
return newPath || fallback;
}
console.log(paths.map(modifyPath));
Or you could use String.prototype.replace()
const paths = [
"/data/path/one",
"data/path/two",
];
const modifyPath = p => {
return p.replace(/\/?data(.*)/, '$1');
}
console.log(paths.map(modifyPath));

Removing last part of URL based on

I need to remove any occurence of a product number that may occur in URLs, using javascript/jquery.
URL looks like this:
http://www.mysite.com/section1/section2/section3/section4/01-012-15_1571884
The final part of the url is always formatted with 2 digits followed by -, so I was thinking a regex might do the job? I need everything removing after the last /.
It must also work when the product occurs higher or lower in the hierarchy, i.e.: http://www.mysite.com/section1/section2/01-012-15_1571884
So far I have tried different solutions with location.pathname and splits, but I am stuck on how to handle differences in product hierarchy and handling the arrays.
DEMO
var x = "http://www.mysite.com/section1/section2/section3/section4/01-012-15_1571884";
console.log(x.substr(0,x.lastIndexOf('/')));
Use lastIndexOf to find the last occurence of "/" and then remove the rest of the path using substring.
var url = 'http://www.mysite.com/section1/section2/section3/section4/01-012-15_1571884';
parts = url.split('/');
parts.pop();
url = parts.join('/');
http://jsfiddle.net/YXe6L/
var a = 'http://www.mysite.com/section1/section2/01-012-15_1571884',
result = a.replace(a.match(/(\d{1,2}-\d{1,3}-\d{1,2}_\d+)[^\d]*/g), '');
JSFiddle: http://jsfiddle.net/2TVBk/2/
This is a very nice online regex tester to test your regexes with: http://regexpal.com/
Here is an approach that will properly handle a situation where there is no product ID as you requested. http://jsfiddle.net/84GVe/
var url1 = "http://www.mysite.com/section1/section2/section3/section4/01-012-15_1571884";
var url2 = "http://www.mysite.com/section1/section2/section3/section4";
function removeID(url) {
//look for a / followed by _, - or 0-9 characters,
//and use $ to ensure it is the end of the string
var reg = /\/[-\d_]+$/;
if(reg.test(url))
{
url = url.substr(0,url.lastIndexOf('/'));
}
return url;
}
console.log( removeID(url1) );
console.log( removeID(url2) );

Finding image url via using Regex

Any working Regex to find image url ?
Example :
var reg = /^url\(|url\(".*"\)|\)$/;
var string = 'url("http://domain.com/randompath/random4509324041123213.jpg")';
var string2 = 'url(http://domain.com/randompath/random4509324041123213.jpg)';
console.log(string.match(reg));
console.log(string2.match(reg));
I tied but fail with this reg
pattern will look like this, I just want image url between url(" ") or url( )
I just want to get output like http://domain.com/randompath/random4509324041123213.jpg
http://jsbin.com/ahewaq/1/edit
I'd simply use this expression:
/url.*\("?([^")]+)/
This returns an array, where the first index (0) contains the entire match, the second will be the url itself, like so:
'url("http://domain.com/randompath/random4509324041123213.jpg")'.match(/url.*\("?([^")]+)/)[1];
//returns "http://domain.com/randompath/random4509324041123213.jpg"
//or without the quotes, same return, same expression
'url(http://domain.com/randompath/random4509324041123213.jpg)'.match(/url.*\("?([^")]+)/)[1];
If there is a change that single and double quotes are used, you can simply replace all " by either '" or ['"], in this case:
/url.*\(["']?([^"')]+)/
Try this regexp:
var regex = /\burl\(\"?(.*?)\"?\)/;
var match = regex.exec(string);
console.log(match[1]);
The URL is captured in the first subgroup.
If the string will always be consistent, one option would be simply to remove the first 4 characters url(" and the last two "):
var string = 'url("http://domain.com/randompath/random4509324041123213.jpg")';
// Remove last two characters
string = string.substr(0, string.length - 2);
// Remove first five characters
string = string.substr(5, string.length);
Here's a working fiddle.
Benefit of this approach: You can edit it yourself, without asking StackOverflow to do it for you. RegEx is great, but if you don't know it, peppering your code with it makes for a frustrating refactor.

Categories