Trying to use .split with two different page URL structures

Trying to use .split with two different page URL structures - javascript

My intention
pull out language code from my two type of URL strings
My question
How do I make a split between two different URL structures? I have two URL strucutres, listed as examples below under the code.
My problem
I can't figure out how I should split the two different variables separately or together in one line with cc =... using custom javascript with Google Tag Manager
Code
function() {
cc = {{Page Path}}.split("/")[1].toLowerCase();
cc = {{virtualURL}}.split("/#/")[1].toLowerCase();
if(cc.length == 2) {
cc = cc;
} else {
cc = 'other';
}
return cc;
}
Example of {{Page Path}} - https://www.example.com/en/.....
Example of {{virtualURL}} - https://www.booking.example.com/#/en/........
Note
In both examples I want to be able to pull out en successfully.

Any solution here is likely to be fragile, you could have https://example.com/xy/ where xy isn't meant to be a language code.
But allowing for that, and allowing only two-character language codes:
var rexGetLang = /\/([a-z]{2})\//;
function getLang(url) {
var match = rexGetLang.exec(url);
return match ? match[1] : "other";
}
console.log(getLang("https://www.example.com/en/....."));
console.log(getLang("https://www.booking.example.com/#/en/........"));
Or if you want to allow for en-GB and such:
var rexGetLang = /\/([a-z]{2}(?:-[A-Z]{2})?)\//;
function getLang(url) {
var match = rexGetLang.exec(url);
return match ? match[1] : "other";
}
console.log(getLang("https://www.example.com/en/....."));
console.log(getLang("https://www.booking.example.com/#/en/........"));
console.log(getLang("https://www.booking.example.com/........"));
console.log(getLang("https://www.example.com/en-GB/....."));
console.log(getLang("https://www.booking.example.com/#/en-US/........"));

We can take out the language code simply by splitting the URL by /. Let's see what we get when we split the two URL's given as the example:
https://www.example.com/en/ - ["https:", "", "www.example.com", "en", ""]
https://www.booking.example.com/#/en/ - ["https:", "", "www.booking.example.com", "#", "en", ""]
In the above examples we can see that language code is either coming at 3rd index (1st example) or at the 4th index (2nd example) which can be taken care by an if condition. Let's see how:
let url = 'https://www.booking.example.com/#/en/';
let urlTokens = url.split('/');
let languageCode = urlTokens[3] === '#' ? urlTokens[4] : urlTokens[3];
console.log(languageCode);

// Web API for handling URL https://developer.mozilla.org/en-US/docs/Web/API/URL
const url = new URL('https://www.example.com/en/website');
url.hostname; // 'example.com'
url.port; // ''
url.search; // ''
url.pathname; // '/en/website'
url.protocol; // 'https:'
// RegEx to see if /en/ exists https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp
new RegExp(/\/en\//).test(url.pathname) // true

function getLanguage(url) {
var rgx = /^https:\/\/[^\/]+\/(?:#\/)?([a-z]+)/;
var language = url.match(rgx)[1];
return language;
}
var url = 'https://www.booking.example.com/#/en/';
var language = getLanguage(url);

Related

Removing subdomain in a string in TypeScript

I have a string in TypeScript which is subdomain.domain.com I want to create a new string that is just the domain on its own, so for example subdomain.domain.com would become domain.com
Note: The 'subdomain' part of the URL could be different sizes so it could be 'subdomain.domain.com' or it might be 'sub.domain.com' so I can't do this on character size. The domain might also be different so it could be 'subdomain.domain.com' or it could be 'subdomain.new-domain.com'.
So basically I need to just remove up to and including the first '.' - hope that all makes sense.

var domain = 'mail.testing.praveent.com';
var domainCharacters = domain.split('').reverse();
var domainReversed = '', dotCount = 0;
do {
if (domainCharacters[0] === '.') {
dotCount++;
if (dotCount == 2) break;
}
domainReversed += domainCharacters[0];
domainCharacters.splice(0, 1);
} while (dotCount < 2 && domainCharacters.length > 0);
var domainWithoutSubdomain = domainReversed.split('').reverse().join('');
This will strip off the subdomains in a domain and give the root (#) domain name alone.

You can split it by . and get only the last 2 elements and turn it back into a string again.
function strip(url: string) {
const fragments = url.split('.');
const last = fragments.pop();
try {
// if its a valid url with a protocol (http/https)
const instance = new URL(url);
return `${instance.protocol}//${fragments.pop()}.${last}`;
} catch (_) {
return `${fragments.pop()}.${last}`;
}
}
strip('https://subdomain.example.com') // https://example.com
strip('subdomain.example.com') // example.com
strip('https://subdomain.another-subdomain.example.com') // https://example.com

Javascript: Extracting local part and domain of an email

I have an email in the following format:
joe+12312313#aDomain.com
First, I need to make sure the email's domain equals to aDomain.com
Next, I need to extract everything before the + sign
It would be nice if I can get a the following object:
var objParts = {
localPart: null,
domain: null,
};
console.log(objParts.localPart) // joe
console.log(objParts.domain) // aDomain.com
I know we have to use Regex. I am new to JS.

var email = "joe+12312313#aDomain.com";
var objParts = CreateEmailParts(email);
console.log(objParts.localPart);
console.log(objParts.domain);
function CreateEmailParts(email)
{
if(email)
{
var objParts = {
domain: email.split('#')[1], // caution: hoping to have the domain follow # always
localPart: email.split('+')[0], // caution: hoping to have the ema follow # always
};
return objParts;
}
}
https://jsfiddle.net/pdkvx82d/

It doesn't seem like you have to validate the generic email format here.
You can just split on the main points, # and +, and extract the data:
const email = 'joe+12312313#aDomain.com'
const domain = email.split('#').pop() // split on '#' and get the last item
const local = email.split('+').shift() // split on '+' and get the first item
console.log(domain);
console.log(local);

Use simple split:
var str = "joe+12312313#aDomain.com";
var parts = str.split("#");
var objParts = {
localPart: parts[0].split('+')[0],
domain: parts[1],
};
console.log(objParts);

RegEx to extract parameters from url hash in JavaScript

My urls will look like:
http://example.com/whatever#page?x=1&locale=hu&y=2
http://example.com/whatever#page?x=1&locale=hu
http://example.com/whatever#page?locale=hu
http://example.com/whatever#page?locale=
http://example.com/whatever#page?x=1
http://example.com/whatever#page
http://example.com/whatever
I'd like to get the locale parameter or empty string if it's not set.
I'm trying something like:
locale = location.hash.replace(/.*(?:[?&]locale=([^&]*))?.*/, "$2");
But my problem is that I couldn't find the right RegExp that works for all cases (both when there's locale= in the hash and when there isn't)

Here's a piece of code that will extract it from the hash and avoid it anywhere else in the URL:
function getLocaleFromHash(url) {
var match = url.match(/#.*[?&]locale=([^&]+)(&|$)/);
return(match ? match[1] : "");
}
And, you can see it work on all your test cases here: http://jsfiddle.net/jfriend00/p37Mx/
If you want to be able to look for any parm in the hash, you would use this:
function getParmFromHash(url, parm) {
var re = new RegExp("#.*[?&]" + parm + "=([^&]+)(&|$)");
var match = url.match(re);
return(match ? match[1] : "");
}
See it work here: http://jsfiddle.net/jfriend00/6kgUk/
A more generic function that will fetch all parameters in the URL would look like this. For normal URLs where the hash is after the query and the parameters are in the query string, it would look like this. This is a bit more code because it does more. It fetches all the parameters into an object where you can look up any parameter by it's key and it URL decodes them all too:
function getParmsFromURL(url) {
var parms = {}, pieces, parts, i;
var hash = url.lastIndexOf("#");
if (hash !== -1) {
// remove hash value
url = url.slice(0, hash);
}
var question = url.lastIndexOf("?");
if (question !== -1) {
url = url.slice(question + 1);
pieces = url.split("&");
for (i = 0; i < pieces.length; i++) {
parts = pieces[i].split("=");
if (parts.length < 2) {
parts.push("");
}
parms[decodeURIComponent(parts[0])] = decodeURIComponent(parts[1]);
}
}
return parms;
}
For a special version that handles parameters in a hash value and after a ? in the hash value like in the OP's question (which isn't the typical case), one could use this:
function getParmsFromURLHash(url) {
var parms = {}, pieces, parts, i;
var hash = url.lastIndexOf("#");
if (hash !== -1) {
// isolate just the hash value
url = url.slice(hash + 1);
}
var question = url.indexOf("?");
if (question !== -1) {
url = url.slice(question + 1);
pieces = url.split("&");
for (i = 0; i < pieces.length; i++) {
parts = pieces[i].split("=");
if (parts.length < 2) {
parts.push("");
}
parms[decodeURIComponent(parts[0])] = decodeURIComponent(parts[1]);
}
}
return parms;
}
Working demo: http://jsfiddle.net/jfriend00/v8cd5/
And, then if you wanted the local option, you'd just do this:
var parms = getParmsFromURL(url);
var locale = parms["locale"];

locale = location.hash.match( /[?&]locale=([^&]*)?/ );
locale = ( locale == null ? "" : locale[1] || "" );
Will do the trick. I don't think the .* are needed, because you do not specify a start or an end of the string.
I tested this regular expression on all your examples and they all worked correctly :)
Edit: sorry, it was invalid in some cases. It is now correct in all cases.

If you really want to do it in one regex:
locale = location.hash.match(/([?&]locale=|^((?![?&]locale=).)+$)([^&]*)/)[3];
It works against all of your examples, though I imagine it's horribly inefficient.

How do I get the YouTube video ID from a URL?

I want to get the v=id from YouTube’s URL with JavaScript (no jQuery, pure JavaScript).
Example YouTube URL formats
http://www.youtube.com/watch?v=u8nQa1cJyX8&a=GxdCwVVULXctT2lYDEPllDR0LRTutYfW
http://www.youtube.com/watch?v=u8nQa1cJyX8
Or any other YouTube format that contains a video ID in the URL.
Result from these formats
u8nQa1cJyX8

I made an enhancement to Regex provided by "jeffreypriebe" because he needed a kind of YouTube URL is the URL of the videos when they are looking through a channel.
Well no but this is the function that I have armed.
<script type="text/javascript">
function youtube_parser(url){
var regExp = /^.*((youtu.be\/)|(v\/)|(\/u\/\w\/)|(embed\/)|(watch\?))\??v?=?([^#&?]*).*/;
var match = url.match(regExp);
return (match&&match[7].length==11)? match[7] : false;
}
</script>
These are the types of URLs supported
http://www.youtube.com/watch?v=0zM3nApSvMg&feature=feedrec_grec_index
http://www.youtube.com/user/IngridMichaelsonVEVO#p/a/u/1/QdK8U-VIH_o
http://www.youtube.com/v/0zM3nApSvMg?fs=1&hl=en_US&rel=0
http://www.youtube.com/watch?v=0zM3nApSvMg#t=0m10s
http://www.youtube.com/embed/0zM3nApSvMg?rel=0
http://www.youtube.com/watch?v=0zM3nApSvMg
http://youtu.be/0zM3nApSvMg
Can be found in [http://web.archive.org/web/20160926134334/]
http://lasnv.net/foro/839/Javascript_parsear_URL_de_YouTube

I simplified Lasnv's answer a bit.
It also fixes the bug that WebDeb describes.
Here it is:
var regExp = /^.*(youtu\.be\/|v\/|u\/\w\/|embed\/|watch\?v=|\&v=)([^#\&\?]*).*/;
var match = url.match(regExp);
if (match && match[2].length == 11) {
return match[2];
} else {
//error
}
Here is a regexer link to play with:
http://regexr.com/3dnqv

You don't need to use a regular expression for this.
var video_id = window.location.search.split('v=')[1];
var ampersandPosition = video_id.indexOf('&');
if(ampersandPosition != -1) {
video_id = video_id.substring(0, ampersandPosition);
}

None of these worked on the kitchen sink as of 1/1/2015, notably URLs without protocal http/s and with youtube-nocookie domain. So here's a modified version that works on all these various Youtube versions:
// Just the regex. Output is in [1].
/^.*(?:(?:youtu\.be\/|v\/|vi\/|u\/\w\/|embed\/|shorts\/)|(?:(?:watch)?\?v(?:i)?=|\&v(?:i)?=))([^#\&\?]*).*/
// For testing.
var urls = [
'https://youtube.com/shorts/dQw4w9WgXcQ?feature=share',
'//www.youtube-nocookie.com/embed/up_lNV-yoK4?rel=0',
'http://www.youtube.com/user/Scobleizer#p/u/1/1p3vcRhsYGo',
'http://www.youtube.com/watch?v=cKZDdG9FTKY&feature=channel',
'http://www.youtube.com/watch?v=yZ-K7nCVnBI&playnext_from=TL&videos=osPknwzXEas&feature=sub',
'http://www.youtube.com/ytscreeningroom?v=NRHVzbJVx8I',
'http://www.youtube.com/user/SilkRoadTheatre#p/a/u/2/6dwqZw0j_jY',
'http://youtu.be/6dwqZw0j_jY',
'http://www.youtube.com/watch?v=6dwqZw0j_jY&feature=youtu.be',
'http://youtu.be/afa-5HQHiAs',
'http://www.youtube.com/user/Scobleizer#p/u/1/1p3vcRhsYGo?rel=0',
'http://www.youtube.com/watch?v=cKZDdG9FTKY&feature=channel',
'http://www.youtube.com/watch?v=yZ-K7nCVnBI&playnext_from=TL&videos=osPknwzXEas&feature=sub',
'http://www.youtube.com/ytscreeningroom?v=NRHVzbJVx8I',
'http://www.youtube.com/embed/nas1rJpm7wY?rel=0',
'http://www.youtube.com/watch?v=peFZbP64dsU',
'http://youtube.com/v/dQw4w9WgXcQ?feature=youtube_gdata_player',
'http://youtube.com/vi/dQw4w9WgXcQ?feature=youtube_gdata_player',
'http://youtube.com/?v=dQw4w9WgXcQ&feature=youtube_gdata_player',
'http://www.youtube.com/watch?v=dQw4w9WgXcQ&feature=youtube_gdata_player',
'http://youtube.com/?vi=dQw4w9WgXcQ&feature=youtube_gdata_player',
'http://youtube.com/watch?v=dQw4w9WgXcQ&feature=youtube_gdata_player',
'http://youtube.com/watch?vi=dQw4w9WgXcQ&feature=youtube_gdata_player',
'http://youtu.be/dQw4w9WgXcQ?feature=youtube_gdata_player'
];
var i, r, rx = /^.*(?:(?:youtu\.be\/|v\/|vi\/|u\/\w\/|embed\/|shorts\/)|(?:(?:watch)?\?v(?:i)?=|\&v(?:i)?=))([^#\&\?]*).*/;
for (i = 0; i < urls.length; ++i) {
r = urls[i].match(rx);
console.log(r[1]);
}

The best solution (from 2019-2021) I found is that:
function YouTubeGetID(url){
url = url.split(/(vi\/|v=|\/v\/|youtu\.be\/|\/embed\/)/);
return (url[2] !== undefined) ? url[2].split(/[^0-9a-z_\-]/i)[0] : url[0];
}
I found it here.
/*
* Tested URLs:
var url = 'http://youtube.googleapis.com/v/4e_kz79tjb8?version=3';
url = 'https://www.youtube.com/watch?feature=g-vrec&v=Y1xs_xPb46M';
url = 'http://www.youtube.com/watch?feature=player_embedded&v=Ab25nviakcw#';
url = 'http://youtu.be/Ab25nviakcw';
url = 'http://www.youtube.com/watch?v=Ab25nviakcw';
url = '<iframe width="420" height="315" src="http://www.youtube.com/embed/Ab25nviakcw" frameborder="0" allowfullscreen></iframe>';
url = '<object width="420" height="315"><param name="movie" value="http://www.youtube-nocookie.com/v/Ab25nviakcw?version=3&hl=en_US"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube-nocookie.com/v/Ab25nviakcw?version=3&hl=en_US" type="application/x-shockwave-flash" width="420" height="315" allowscriptaccess="always" allowfullscreen="true"></embed></object>';
url = 'http://i1.ytimg.com/vi/Ab25nviakcw/default.jpg';
url = 'https://www.youtube.com/watch?v=BGL22PTIOAM&feature=g-all-xit';
url = 'BGL22PTIOAM';
*/

/^.*(youtu.be\/|v\/|e\/|u\/\w+\/|embed\/|v=)([^#\&\?]*).*/
Tested on:
http://www.youtube.com/v/0zM3nApSvMg?fs=1&hl=en_US&rel=0
http://www.youtube.com/embed/0zM3nApSvMg?rel=0
http://www.youtube.com/watch?v=0zM3nApSvMg&feature=feedrec_grec_index
http://www.youtube.com/watch?v=0zM3nApSvMg
http://youtu.be/0zM3nApSvMg
http://www.youtube.com/watch?v=0zM3nApSvMg#t=0m10s
http://www.youtube.com/user/IngridMichaelsonVEVO#p/a/u/1/KdwsulMb8EQ
http://youtu.be/dQw4w9WgXcQ
http://www.youtube.com/embed/dQw4w9WgXcQ
http://www.youtube.com/v/dQw4w9WgXcQ
http://www.youtube.com/e/dQw4w9WgXcQ
http://www.youtube.com/watch?v=dQw4w9WgXcQ
http://www.youtube.com/?v=dQw4w9WgXcQ
http://www.youtube.com/watch?feature=player_embedded&v=dQw4w9WgXcQ
http://www.youtube.com/?feature=player_embedded&v=dQw4w9WgXcQ
http://www.youtube.com/user/IngridMichaelsonVEVO#p/u/11/KdwsulMb8EQ
http://www.youtube-nocookie.com/v/6L3ZvIMwZFM?version=3&hl=en_US&rel=0
Inspired by this other answer.

Given that YouTube has a variety of URL styles, I think Regex is a better solution. Here is my Regex:
^.*(youtu.be\/|v\/|embed\/|watch\?|youtube.com\/user\/[^#]*#([^\/]*?\/)*)\??v?=?([^#\&\?]*).*
Group 3 has your YouTube ID
Sample YouTube URLs (currently, including "legacy embed URL style") - the above Regex works on all of them:
http://www.youtube.com/v/0zM3nApSvMg?fs=1&hl=en_US&rel=0
http://www.youtube.com/embed/0zM3nApSvMg?rel=0
http://www.youtube.com/watch?v=0zM3nApSvMg&feature=feedrec_grec_index
http://www.youtube.com/watch?v=0zM3nApSvMg
http://youtu.be/0zM3nApSvMg
http://www.youtube.com/watch?v=0zM3nApSvMg#t=0m10s
http://www.youtube.com/user/IngridMichaelsonVEVO#p/a/u/1/QdK8U-VIH_o
Hat tip to Lasnv

tl;dr.
Matches all URL examples on this question and then some.
let re = /(https?:\/\/)?(((m|www)\.)?(youtube(-nocookie)?|youtube.googleapis)\.com.*(v\/|v=|vi=|vi\/|e\/|embed\/|user\/.*\/u\/\d+\/)|youtu\.be\/)([_0-9a-z-]+)/i;
let id = "https://www.youtube.com/watch?v=l-gQLqv9f4o".match(re)[7];
ID will always be in match group 8.
Live examples of all the URLs I grabbed from the answers to this question:
https://regexr.com/3u0d4
Full explanation:
As many answers/comments have brought up, there are many formats for youtube video URLs. Even multiple TLDs where they can appear to be "hosted".
You can look at the full list of variations I checked against by following the regexr link above.
Lets break down the RegExp.
^ Lock the string to the start of the string.
(https?:\/\/)? Optional protocols http:// or https:// The ? makes the preceding item optional so the s and then the entire group (anything enclosed in a set of parenthesis) are optional.
Ok, this next part is the meat of it. Basically we have two options, the various versions of [optional-subdomain].youtube.com/...[id] and the link shortened youtu.be/[id] version.
( // Start a group which will match everything after the protocol and up to just before the video id.
((m|www)\.)? // Optional subdomain, this supports looking for 'm' or 'www'.
(youtube(-nocookie)?|youtube.googleapis) // There are three domains where youtube videos can be accessed. This matches them.
\.com // The .com at the end of the domain.
.* // Match anything
(v\/|v=|vi=|vi\/|e\/|embed\/|user\/.*\/u\/\d+\/) // These are all the things that can come right before the video id. The | character means OR so the first one in the "list" matches.
| // There is one more domain where you can get to youtube, it's the link shortening url which is just followed by the video id. This OR separates all the stuff in this group and the link shortening url.
youtu\.be\/ // The link shortening domain
) // End of group
Finally we have the group to select the video ID. At least one character that is a number, letter, underscore, or dash.
([_0-9a-z-]+)
You can find out much more detail about each part of the regex by heading over the regexr link and seeing how each part of the expression matches with the text in the url.

I created a function that tests a users input for Youtube, Soundcloud or Vimeo embed ID's, to be able to create a more continous design with embedded media. This function detects and returns an object withtwo properties: "type" and "id". Type can be either "youtube", "vimeo" or "soundcloud" and the "id" property is the unique media id.
On the site I use a textarea dump, where the user can paste in any type of link or embed code, including the iFrame-embedding of both vimeo and youtube.
function testUrlForMedia(pastedData) {
var success = false;
var media = {};
if (pastedData.match('http://(www.)?youtube|youtu\.be')) {
if (pastedData.match('embed')) { youtube_id = pastedData.split(/embed\//)[1].split('"')[0]; }
else { youtube_id = pastedData.split(/v\/|v=|youtu\.be\//)[1].split(/[?&]/)[0]; }
media.type = "youtube";
media.id = youtube_id;
success = true;
}
else if (pastedData.match('http://(player.)?vimeo\.com')) {
vimeo_id = pastedData.split(/video\/|http:\/\/vimeo\.com\//)[1].split(/[?&]/)[0];
media.type = "vimeo";
media.id = vimeo_id;
success = true;
}
else if (pastedData.match('http://player\.soundcloud\.com')) {
soundcloud_url = unescape(pastedData.split(/value="/)[1].split(/["]/)[0]);
soundcloud_id = soundcloud_url.split(/tracks\//)[1].split(/[&"]/)[0];
media.type = "soundcloud";
media.id = soundcloud_id;
success = true;
}
if (success) { return media; }
else { alert("No valid media id detected"); }
return false;
}

Late to the game here, but I've mashed up two excellent responses from mantish and j-w. First, the modified regex:
const youtube_regex = /^.*(youtu\.be\/|vi?\/|u\/\w\/|embed\/|\?vi?=|\&vi?=)([^#\&\?]*).*/
Here's the test code (I've added mantish's original test cases to j-w's nastier ones):
var urls = [
'http://www.youtube.com/watch?v=0zM3nApSvMg&feature=feedrec_grec_index',
'http://www.youtube.com/user/IngridMichaelsonVEVO#p/a/u/1/QdK8U-VIH_o',
'http://www.youtube.com/v/0zM3nApSvMg?fs=1&hl=en_US&rel=0',
'http://www.youtube.com/watch?v=0zM3nApSvMg#t=0m10s',
'http://www.youtube.com/embed/0zM3nApSvMg?rel=0',
'http://www.youtube.com/watch?v=0zM3nApSvMg',
'http://youtu.be/0zM3nApSvMg',
'//www.youtube-nocookie.com/embed/up_lNV-yoK4?rel=0',
'http://www.youtube.com/user/Scobleizer#p/u/1/1p3vcRhsYGo',
'http://www.youtube.com/watch?v=cKZDdG9FTKY&feature=channel',
'http://www.youtube.com/watch?v=yZ-K7nCVnBI&playnext_from=TL&videos=osPknwzXEas&feature=sub',
'http://www.youtube.com/ytscreeningroom?v=NRHVzbJVx8I',
'http://www.youtube.com/user/SilkRoadTheatre#p/a/u/2/6dwqZw0j_jY',
'http://youtu.be/6dwqZw0j_jY',
'http://www.youtube.com/watch?v=6dwqZw0j_jY&feature=youtu.be',
'http://youtu.be/afa-5HQHiAs',
'http://www.youtube.com/user/Scobleizer#p/u/1/1p3vcRhsYGo?rel=0',
'http://www.youtube.com/watch?v=cKZDdG9FTKY&feature=channel',
'http://www.youtube.com/watch?v=yZ-K7nCVnBI&playnext_from=TL&videos=osPknwzXEas&feature=sub',
'http://www.youtube.com/ytscreeningroom?v=NRHVzbJVx8I',
'http://www.youtube.com/embed/nas1rJpm7wY?rel=0',
'http://www.youtube.com/watch?v=peFZbP64dsU',
'http://youtube.com/v/dQw4w9WgXcQ?feature=youtube_gdata_player',
'http://youtube.com/vi/dQw4w9WgXcQ?feature=youtube_gdata_player',
'http://youtube.com/?v=dQw4w9WgXcQ&feature=youtube_gdata_player',
'http://www.youtube.com/watch?v=dQw4w9WgXcQ&feature=youtube_gdata_player',
'http://youtube.com/?vi=dQw4w9WgXcQ&feature=youtube_gdata_player',
'http://youtube.com/watch?v=dQw4w9WgXcQ&feature=youtube_gdata_player',
'http://youtube.com/watch?vi=dQw4w9WgXcQ&feature=youtube_gdata_player',
'http://youtu.be/dQw4w9WgXcQ?feature=youtube_gdata_player'
];
var failures = 0;
urls.forEach(url => {
const parsed = url.match(youtube_regex);
if (parsed && parsed[2]) {
console.log(parsed[2]);
} else {
failures++;
console.error(url, parsed);
}
});
if (failures) {
console.error(failures, 'failed');
}
Experimental version to handle the m.youtube urls mentioned in comments:
const youtube_regex = /^.*((m\.)?youtu\.be\/|vi?\/|u\/\w\/|embed\/|\?vi?=|\&vi?=)([^#\&\?]*).*/
It requires parsed[2] to be changed to parsed[3] in two places in the tests (which it then passes with m.youtube urls added to the tests). Let me know if you see problems.

This regex matches embed, share and link URLs.
const youTubeIdFromLink = (url) => url.match(/(?:https?:\/\/)?(?:www\.|m\.)?youtu(?:be)?\.(?:com|be)(?:\/watch\/?\?v=|\/embed\/|\/)([^\s&\?\/\#]+)/)[1];
console.log(youTubeIdFromLink('https://youtu.be/You-Tube_ID?rel=0&hl=en')); //You-Tube_ID
console.log(youTubeIdFromLink('https://www.youtube.com/embed/You-Tube_ID?rel=0&hl=en')); //You-Tube_ID
console.log(youTubeIdFromLink('https://m.youtube.com/watch?v=You-Tube_ID&rel=0&hl=en')); //You-Tube_ID

Since YouTube video ids is set to be 11 characters, we can simply just substring after we split the url with v=.
Then we are not dependent on the ampersand at the end.
var sampleUrl = "http://www.youtube.com/watch?v=JcjoGn6FLwI&asdasd";
var video_id = sampleUrl.split("v=")[1].substring(0, 11)
Nice and simple :)

I have got a Regex which supports commonly used url's which also includes YouTube Shorts
Regex Pattern:
(youtu.*be.*)\/(watch\?v=|embed\/|v|shorts|)(.*?((?=[&#?])|$))
Javascript Return Method:
function getId(url) {
let regex = /(youtu.*be.*)\/(watch\?v=|embed\/|v|shorts|)(.*?((?=[&#?])|$))/gm;
return regex.exec(url)[3];
}
Types of URL's supported:
http://www.youtube.com/watch?v=0zM3nApSvMg&feature=feedrec_grec_index
http://www.youtube.com/user/IngridMichaelsonVEVO#p/a/u/1/QdK8U-VIH_o
http://www.youtube.com/v/0zM3nApSvMg?fs=1&hl=en_US&rel=0
http://www.youtube.com/watch?v=0zM3nApSvMg#t=0m10s
http://www.youtube.com/embed/0zM3nApSvMg?rel=0
http://www.youtube.com/watch?v=0zM3nApSvMg
http://youtu.be/0zM3nApSvMg
https://youtube.com/shorts/0dPkkQeRwTI?feature=share
https://youtube.com/shorts/0dPkkQeRwTI
With Test:
https://regex101.com/r/5JhmpW/1

I have summed up all the suggestions and here is the universal and short answer to this question:
if(url.match('http://(www.)?youtube|youtu\.be')){
youtube_id=url.split(/v\/|v=|youtu\.be\//)[1].split(/[?&]/)[0];
}

Java Code: (Works for all the URLs:
http://www.youtube.com/watch?v=0zM3nApSvMg&feature=feedrec_grec_index
http://www.youtube.com/user/IngridMichaelsonVEVO#p/a/u/1/QdK8U-VIH_o
http://youtube.googleapis.com/v/0zM3nApSvMg?fs=1&hl=en_US&rel=0
http://www.youtube.com/watch?v=0zM3nApSvMg#t=0m10s
http://www.youtube.com/embed/0zM3nApSvMg?rel=0"
http://www.youtube.com/watch?v=0zM3nApSvMg
http://youtu.be/0zM3nApSvMg
http://www.youtube.com/watch?v=0zM3nApSvMg/
http://www.youtube.com/watch?feature=player_detailpage&v=8UVNT4wvIGY
)
String url = "http://www.youtube.com/watch?v=0zM3nApSvMg&feature=feedrec_grec_index";
String regExp = "/.*(?:youtu.be\\/|v\\/|u/\\w/|embed\\/|watch\\?.*&?v=)";
Pattern compiledPattern = Pattern.compile(regExp);
Matcher matcher = compiledPattern.matcher(url);
if(matcher.find()){
int start = matcher.end();
System.out.println("ID : " + url.substring(start, start+11));
}
For DailyMotion:
String url = "http://www.dailymotion.com/video/x4xvnz_the-funny-crash-compilation_fun";
String regExp = "/video/([^_]+)/?";
Pattern compiledPattern = Pattern.compile(regExp);
Matcher matcher = compiledPattern.matcher(url);
if(matcher.find()){
String match = matcher.group();
System.out.println("ID : " + match.substring(match.lastIndexOf("/")+1));
}

Slightly stricter version:
^https?://(?:www\.)?youtu(?:\.be|be\.com)/(?:\S+/)?(?:[^\s/]*(?:\?|&)vi?=)?([^#?&]+)
Tested on:
http://www.youtube.com/user/dreamtheater#p/u/1/oTJRivZTMLs
https://youtu.be/oTJRivZTMLs?list=PLToa5JuFMsXTNkrLJbRlB--76IAOjRM9b
http://www.youtube.com/watch?v=oTJRivZTMLs&feature=youtu.be
https://youtu.be/oTJRivZTMLs
http://youtu.be/oTJRivZTMLs&feature=channel
http://www.youtube.com/ytscreeningroom?v=oTJRivZTMLs
http://www.youtube.com/embed/oTJRivZTMLs?rel=0
http://youtube.com/v/oTJRivZTMLs&feature=channel
http://youtube.com/v/oTJRivZTMLs&feature=channel
http://youtube.com/vi/oTJRivZTMLs&feature=channel
http://youtube.com/?v=oTJRivZTMLs&feature=channel
http://youtube.com/?feature=channel&v=oTJRivZTMLs
http://youtube.com/?vi=oTJRivZTMLs&feature=channel
http://youtube.com/watch?v=oTJRivZTMLs&feature=channel
http://youtube.com/watch?vi=oTJRivZTMLs&feature=channel

You can use the following code to get the YouTube video ID from a URL:
url = "https://www.youtube.com/watch?v=qeMFqkcPYcg"
VID_REGEX = /(?:youtube(?:-nocookie)?\.com\/(?:[^\/\n\s]+\/\S+\/|(?:v|e(?:mbed)?)\/|\S*?[?&]v=)|youtu\.be\/)([a-zA-Z0-9_-]{11})/
alert(url.match(VID_REGEX)[1]);

This can get video id from any type of youtube links
var url= 'http://youtu.be/0zM3nApSvMg';
var urlsplit= url.split(/^.*(youtu.be\/|v\/|embed\/|watch\?|youtube.com\/user\/[^#]*#([^\/]*?\/)*)\??v?=?([^#\&\?]*).*/);
console.log(urlsplit[3]);

A slightly changed version from the one mantish posted:
var regExp = /^.*(youtu.be\/|v\/|u\/\w\/|embed\/|watch\?v=|\&v=)([^#\&\?]{11,11}).*/;
var match = url.match(regExp);
if (match) if (match.length >= 2) return match[2];
// error
This assumes the code is always 11 characters.
I'm using this in ActionScript, not sure if {11,11} is supported in Javascript. Also added support for &v=.... (just in case)

This definitely requires regex:
Copy into Ruby IRB:
var url = "http://www.youtube.com/watch?v=NLqASIXrVbY"
var VID_REGEX = /(?:youtube(?:-nocookie)?\.com\/(?:[^\/\n\s]+\/\S+\/|(?:v|e(?:mbed)?)\/|\S*?[?&]v=)|youtu\.be\/)([a-zA-Z0-9_-]{11})/
url.match(VID_REGEX)[1]
See for all test cases: https://gist.github.com/blairanderson/b264a15a8faaac9c6318

One more:
var id = url.match(/(^|=|\/)([0-9A-Za-z_-]{11})(\/|&|$|\?|#)/)[2]
It works with any URL showed in this thread.
It won't work when YouTube addS some other parameter with 11 base64 characters. Till then it is the easy way.

I made a small function to extract the video id out of a Youtube url which can be seen below.
var videoId = function(url) {
var match = url.match(/v=([0-9a-z_-]{1,20})/i);
return (match ? match['1'] : false);
};
console.log(videoId('https://www.youtube.com/watch?v=dQw4w9WgXcQ'));
console.log(videoId('https://www.youtube.com/watch?t=17s&v=dQw4w9WgXcQ'));
console.log(videoId('https://www.youtube.com/watch?v=dQw4w9WgXcQ&t=17s'));
This function will extract the video id even if there are multiple parameters in the url.

If someone needs the perfect function in Kotlin to save their time. Just hoping this helps
fun extractYTId(ytUrl: String?): String? {
var vId: String? = null
val pattern = Pattern.compile(
"^https?://.*(?:youtu.be/|v/|u/\\w/|embed/|watch?v=)([^#&?]*).*$",
Pattern.CASE_INSENSITIVE
)
val matcher = pattern.matcher(ytUrl)
if (matcher.matches()) {
vId = matcher.group(1)
}
return vId
}

Here's a ruby version of this:
def youtube_id(url)
# Handles various YouTube URLs (youtube.com, youtube-nocookie.com, youtu.be), as well as embed links and urls with various parameters
regex = /(?:youtube(?:-nocookie)?\.com\/(?:[^\/\n\s]+\/\S+\/|(?:v|vi|e(?:mbed)?)\/|\S*?[?&]v=|\S*?[?&]vi=)|youtu\.be\/)([a-zA-Z0-9_-]{11})/
match = regex.match(url)
if match && !match[1].nil?
match[1]
else
nil
end
end
To test the method:
example_urls = [
'www.youtube-nocookie.com/embed/dQw4-9W_XcQ?rel=0',
'http://www.youtube.com/user/Scobleizer#p/u/1/dQw4-9W_XcQ',
'http://www.youtube.com/watch?v=dQw4-9W_XcQ&feature=channel',
'http://www.youtube.com/watch?v=dQw4-9W_XcQ&playnext_from=TL&videos=osPknwzXEas&feature=sub',
'http://www.youtube.com/ytscreeningroom?v=dQw4-9W_XcQ',
'http://www.youtube.com/user/SilkRoadTheatre#p/a/u/2/dQw4-9W_XcQ',
'http://youtu.be/dQw4-9W_XcQ',
'http://www.youtube.com/watch?v=dQw4-9W_XcQ&feature=youtu.be',
'http://youtu.be/dQw4-9W_XcQ',
'http://www.youtube.com/user/Scobleizer#p/u/1/dQw4-9W_XcQ?rel=0',
'http://www.youtube.com/watch?v=dQw4-9W_XcQ&playnext_from=TL&videos=dQw4-9W_XcQ&feature=sub',
'http://www.youtube.com/ytscreeningroom?v=dQw4-9W_XcQ',
'http://www.youtube.com/embed/dQw4-9W_XcQ?rel=0',
'http://www.youtube.com/watch?v=dQw4-9W_XcQ',
'http://youtube.com/v/dQw4-9W_XcQ?feature=youtube_gdata_player',
'http://youtube.com/vi/dQw4-9W_XcQ?feature=youtube_gdata_player',
'http://youtube.com/?v=dQw4-9W_XcQ&feature=youtube_gdata_player',
'http://www.youtube.com/watch?v=dQw4-9W_XcQ&feature=youtube_gdata_player',
'http://youtube.com/?vi=dQw4-9W_XcQ&feature=youtube_gdata_player',
'http://youtube.com/watch?v=dQw4-9W_XcQ&feature=youtube_gdata_player',
'http://youtube.com/watch?vi=dQw4-9W_XcQ&feature=youtube_gdata_player',
'http://youtu.be/dQw4-9W_XcQ?feature=youtube_gdata_player'
]
# Test each one
example_urls.each do |url|
raise 'Test failed!' unless youtube_id(url) == 'dQw4-9W_XcQ'
end
To see this code and run the tests in an online repl you can also go here:
https://repl.it/#TomChapin/youtubeid

I liked Surya's answer.. Just a case where it won't work...
String regExp = "/.*(?:youtu.be\\/|v\\/|u/\\w/|embed\\/|watch\\?.*&?v=)";
doesn't work for
youtu.be/i4fjHzCXg6c and www.youtu.be/i4fjHzCXg6c
updated version:
String regExp = "/?.*(?:youtu.be\\/|v\\/|u/\\w/|embed\\/|watch\\?.*&?v=)";
works for all.

Try this one -
function getYouTubeIdFromURL($url)
{
$pattern = '/(?:youtube.com/(?:[^/]+/.+/|(?:v|e(?:mbed)?)/|.*[?&]v=)|youtu.be/)([^"&?/ ]{11})/i';
preg_match($pattern, $url, $matches);
return isset($matches[1]) ? $matches[1] : false;
}

Chris Nolet cleaner example of Lasnv answer is very good, but I recently found out that if you trying to find your youtube link in text and put some random text after the youtube url, regexp matches way more than needed. Improved Chris Nolet answer:
/^.*(?:youtu.be\/|v\/|u\/\w\/|embed\/|watch\?v=)([^#\&\?]{11,11}).*/

function parser(url){
var regExp = /^.*((youtu.be\/)|(v\/)|(\/u\/\w\/)|(embed\/)|(watch\/)|(\?v=|\&v=))([^#\&\?]*).*/;
var match = url.match(regExp);
if (match && match[8].length==11){
alert('OK');
}else{
alert('BAD');
}
}
For testing:
https://www.youtube.com/embed/vDoO_bNw7fc - attention first symbol «v» in «vDoO_bNw7fc»
http://www.youtube.com/user/dreamtheater#p/u/1/oTJRivZTMLs
https://youtu.be/oTJRivZTMLs?list=PLToa5JuFMsXTNkrLJbRlB--76IAOjRM9b
http://www.youtube.com/watch?v=oTJRivZTMLs&feature=youtu.be
https://youtu.be/oTJRivZTMLs
http://youtu.be/oTJRivZTMLs&feature=channel
http://www.youtube.com/ytscreeningroom?v=oTJRivZTMLs
http://www.youtube.com/embed/oTJRivZTMLs?rel=0
http://youtube.com/v/oTJRivZTMLs&feature=channel
http://youtube.com/v/oTJRivZTMLs&feature=channel
http://youtube.com/vi/oTJRivZTMLs&feature=channel
http://youtube.com/?v=oTJRivZTMLs&feature=channel
http://youtube.com/?feature=channel&v=oTJRivZTMLs
http://youtube.com/?vi=oTJRivZTMLs&feature=channel
http://youtube.com/watch?v=oTJRivZTMLs&feature=channel
http://youtube.com/watch?vi=oTJRivZTMLs&feature=channel

i wrote a function for that below:
function getYoutubeUrlId (url) {
const urlObject = new URL(url);
let urlOrigin = urlObject.origin;
let urlPath = urlObject.pathname;
if (urlOrigin.search('youtu.be') > -1) {
return urlPath.substr(1);
}
if (urlPath.search('embed') > -1) {
// Örneğin "/embed/wCCSEol8oSc" ise "wCCSEol8oSc" return eder.
return urlPath.substr(7);
}
return urlObject.searchParams.get('v');
},
https://gist.github.com/semihkeskindev/8a4339c27203c5fabaf2824308c7868f

Python3 version:
import re
def get_youtube_id(url):
match = re.match('^.*((youtu.be\/)|(v\/)|(\/u\/\w\/)|(embed\/)|(watch\?))?\?v?=?(?P<id>\w*).*', url);
return match.group('id')
If you are looking to include it in a shell/bash/zsh/fish script, here's how to do it:
echo -n "$YOUTUBE_URL" | python -c "import re; import sys; m = re.match('^.*((youtu.be\/)|(v\/)|(\/u\/\w\/)|(embed\/)|(watch\?))?\?v?=?(?P<id>\w*).*', sys.stdin.read()); sys.stdout.write(m.group('id'))"
Example:
echo -n "https://www.youtube.com/watch/?v=APYVWYHS654" | python -c "import re; import sys; m = re.match('^.*((youtu.be\/)|(v\/)|(\/u\/\w\/)|(embed\/)|(watch\?))?\?v?=?(?P<id>\w*).*', sys.stdin.read()); sys.stdout.write(m.group('id'))"
APYVWYHS654

Get subdomain and load it to url with greasemonkey

I am having the URL http://somesubdomain.domain.com (subdomains may vary, domain is always the same). Need to take subdomain and reload the page with something like domain.com/some/path/here/somesubdomain using greasemonkey (or open a new window with URL domain.com/some/path/here/somesubdomain, whatever).

var full = window.location.host
//window.location.host is subdomain.domain.com
var parts = full.split('.')
var sub = parts[0]
var domain = parts[1]
var type = parts[2]
//sub is 'subdomain', 'domain', type is 'com'
var newUrl = 'http://' + domain + '.' + type + '/your/other/path/' + subDomain
window.open(newUrl);

The answer provided by Derek will work in the most common cases, but will not work for "xxx.xxx" sub domains, or "host.co.uk". (also, using window.location.host, will also retrieve the port number, which is not treated : http://www.w3schools.com/jsref/prop_loc_host.asp)
To be honest I do not see a perfect solution for this problem.
Personally, I've created a method for host name splitting which I use very often because it covers a larger number of host names.
This method splits the hostname into {domain: "", type: "", subdomain: ""}
function splitHostname() {
var result = {};
var regexParse = new RegExp('([a-z\-0-9]{2,63})\.([a-z\.]{2,5})$');
var urlParts = regexParse.exec(window.location.hostname);
result.domain = urlParts[1];
result.type = urlParts[2];
result.subdomain = window.location.hostname.replace(result.domain + '.' + result.type, '').slice(0, -1);;
return result;
}
console.log(splitHostname());
This method only returns the subdomain as a string:
function getSubdomain(hostname) {
var regexParse = new RegExp('[a-z\-0-9]{2,63}\.[a-z\.]{2,5}$');
var urlParts = regexParse.exec(hostname);
return hostname.replace(urlParts[0],'').slice(0, -1);
}
console.log(getSubdomain(window.location.hostname));
// for use in node with express: getSubdomain(req.hostname)
These two methods will work for most common domains (including co.uk)
NOTE: the slice at the end of sub domains is to remove the extra dot.
I hope this solves your problem.

The solutions provided here work some of the time, or even most of the time, but not everywhere. To the best of my knowledge, the best way to find the full subdomain of any domain (and remember, sometimes subdomains have periods in them too! You can have sub-subdomains, etc) is to use the Public Suffix List, which is maintained by Mozilla.
The part of the URL that isn't in the Public Suffix List is the subdomain plus the domain itself, joined by a dot. Once you remove the public suffix, you can remove the domain and have just the subdomain left by removing the last segment between the dots.
Let's look at a complicated example. Say you're testing sub.sub.example.pvt.k12.ma.us. pvt.k12.ma.us is a public suffix, believe it or not! So if you used the Public Suffix List, you'd be able to quickly turn that into sub.sub.example by removing the known suffix. Then you could go from sub.sub.example to just sub.sub after stripping off the last portion of the remaining pieces, which was the domain. sub.sub is your subdomain.

This could work in most cases except for the one that #jlbang mention
const split=location.host.split(".");
let subdomain="";
let domain="";
if(split.length==1){//localHost
domain=split[0];
}else if(split.length==2){//sub.localHost or example.com
if(split[1].includes("localhost")){//sub.localHost
domain=split[1];
subdomain=split[0];
}else{//example.com
domain=split.join(".");
}
}else{//sub2.sub.localHost or sub2.sub.example.com or sub.example.com or example.com.ec sub.example.com.ec or ... etc
const last=split[split.length-1];
const lastLast=split[split.length-2];
if(last.includes("localhost")){//sub2.sub.localHost
domain=last;
subdomain=split.slice(0,split.length-1).join(".");
}else if(last.length==2 && lastLast.length<=3){//example.com.ec or sub.example.com.ec
domain=split.slice(split.length-3,split.length).join(".");
if(split.length>3){//sub.example.com.ec
subdomain=split.slice(0,split.length-3).join(".");
}
}else{//sub2.sub.example.com
domain=split.slice(split.length-2,split.length).join(".");
subdomain=split.slice(0,split.length-2).join(".");
}
}
const newUrl = 'http://example.com/some/path/here/' + subdomain

I adapted Vlad's solution in modern Typescript:
const splitHostname = (
hostname: string
): { domain: string; type: string; subdomain: string } | undefined => {
var urlParts = /([a-z-0-9]{2,63}).([a-z.]{2,5})$/.exec(hostname);
if (!urlParts) return;
const [, domain, type] = urlParts;
const subdomain = hostname.replace(`${domain}.${type}`, "").slice(0, -1);
return {
domain,
type,
subdomain,
};
};

get a subdomain from URL
function getSubdomain(url) {
url = url.replace( "https://www.","");
url = url.replace( "http://www.","");
url = url.replace( "https://","");
url = url.replace("http://", "");
var temp = url.split("/");
if (temp.length > 0) {
var temp2 = temp[0].split(".");
if (temp2.length > 2) {
return temp2[0];
}
else {
return "";
}
}
return "";
}

We Keep Coding

JavaScript is the programming language of the Web.

Trying to use .split with two different page URL structures - javascript

function getLanguage(url) { var rgx = /^https:\/\/[^\/]+\/(?:#\/)?([a-z]+)/; var language = url.match(rgx)[1]; return language; } var url = 'https://www.booking.example.com/#/en/'; var language = getLanguage(url);

Related

Removing subdomain in a string in TypeScript

Javascript: Extracting local part and domain of an email

RegEx to extract parameters from url hash in JavaScript

How do I get the YouTube video ID from a URL?

Get subdomain and load it to url with greasemonkey

Categories

Resources