Checking for a specific URL regex - javascript

I need to check for a specific URL pattern using regex and not sure what would be the approach but I think it should not be too complex for this case and therefore regex would be the preferred solution. I just need to check that the exact strings #, shares and assets are in the appropriate slots, for example:
http://some-domain.com/#/shares/a454-rte3-445f-4543/assets
Everything in the URL can be variable (protocol, domain, port, share id) except the exact strings I'm looking for and the slots (slash positions) at which they appear.
Thanks for your help!

You can use
/^https?:\/\/some-domain\.com\/#\/shares\/[^/]+\/assets/i
let url = `http://some-domain.com/#/shares/a454-rte3-445f-4543/assets`
let matched = /^https?:\/\/some-domain\.com\/#\/shares\/[^/]+\/assets/i.test(url)
console.log(matched)

Decided to avoid regex and do it this way instead.
const urlParts = window.location.href.split('/');
if (urlParts[3] === '#' && urlParts[4] === 'shares' && urlParts[6] === 'assets') {
// code goes here...
}

Related

How to redirect url based on only one part of the path?

Let me explain what I mean:
I want to redirect from https://example.net/category/83745/my-first-post to https://myredirect.net/my-first-post but without considering /category/numbers/
For the moment I work with this:
if(window.location.pathname == '/category/83745/my-first-post')
{
window.location.href="https://myredirect.net/my-first-post";
}
And it is working fine but as I described I need to remove /category/numbers/ because they could be different and only consider this part /my-first-post for the redirection.
Thanks in advance.
if you want to just ignore the first 2 parts dynamically and only care about the last part of the URL then just do the following:
var stringContains = function (str, partial){
return (str.indexOf(partial) > -1);
};
var url = '/category/83745/my-first-post';
if(stringContains(url, "/category")){
var parts = a.split("/");
window.location.href = parts[parts.length-1];
}
You can use String's methods lastIndexOf and slice:
var path = window.location.pathname;
window.location.href = "https://myredirect.net" + path.slice(path.lastIndexOf('/') + 1);
Use Regex. Something like
if(window.location.pathname.match(/\/category\/\d+\/my\-first\-post$/)
{
window.location.href="https://myredirect.net/my-first-post";
}
You can run a regular expression match on the pathname
if(window.location.pathname.match(/my-first-post$/)) {
window.location.href='/my-first-post';
}
More on regexes: https://www.regular-expressions.info/
Another good tool for building and testing regexes: https://regex101.com/
Edit:
To give an example of how to regex according to the more fleshed out specs from Chris G
let pathmatch = window.location.pathname.match(/([^\/]+)$/g);
window.location.href = '/' + pathmatch[0];
Thus, regex can be utilized to grab any pattern and use it later.
IF there is a need to make sure the pathname contains category and/or numbers, it is easily added in to the pattern. This one simply disregards anything before the last forward slash (/)

Check Array Entries with Regex

I have an Array with one or more entries. Each one is a string (List of urls in open Tabs via Firefox SDK). I want to check if a specific url is already opened in some of the tabs (nothing special till now).
My problem is, that the url in tab list can have four diffrent fourms. For example:
Url I want to find in the tablist:
https://cmsr-author.de/cf#/content/test/de.html
But the url can also look like this:
https://cmsr-author.de/content/test/de.html
https://cmsr-author.de/test/de.html
https://cmsr-author.de/cf#/test/de.html
Of course the last part of the url (after /test/...) is always something diffrent. If I wasn't able to find one of the four urls in the tablist i want to call some other action.
My Solution till now is to build some if-chain:
if (res !== url1) {
if (res !== url2) {
if ...
But i thought there must be some more elegant way. Maybe via RegEx? I already have a capture to catch the first part (which stays the same https://cmsr-author.ws...) with it four forms. But i dont know how to implent this probably.
var urls = ["https://cmsr-author.de/content/test/de.html","https://cmsr-author.de/test/de.html","https://cmsr-author.de/cf#/test/de.html"]
var filtered = urls.filter(function(url)
{
return url.indexOf("cf#") > -1 && url.endsWith("/test/de.html")
})
var contains = filtered.length > 0
console.log(contains)
If you want to use regex you can do this by using groups for the middle part, which is explained in detail here: http://www.regular-expressions.info/refcapture.html
Practically, your regex would look something like that:
https:\/\/cmsr-author\.de\/(content|...|...)\/de\.html
Where ... must be replaced by the middle parts of the url which differ.
Note that | is "or" used to provide multiple possibilities within the group. The character / and . must be escaped since they have special roles in regex.
I hope that helps!
My English is not good,Do not fully understand what you mean,According to my idea,You should need a regular expression,Only to match the first.If I am wrong,
please # me.
I hope that helps!
var reg = /^https:\/\/cmsr\-author\.de\/cf#\/(?:\w+\/)+test\/de\.html$/gi;
var str1 = "https://cmsr-author.de/cf#/content/test/de.html";
var str2 = "https://cmsr-author.de/content/test/de.html";
var str3 = "https://cmsr-author.de/test/de.html";
var str4 = "https://cmsr-author.de/cf#/test/de.html";
console.log(reg.test(str1));
console.log(reg.test(str2));
console.log(reg.test(str3));
console.log(reg.test(str4));

how to get the base url using regex in javascript

My app is going to work in multiple env, in which i need to get the common value (base url for my app) to work across..
from my window location how to i get certain part from the start..
example :
http://xxxxx.yyyy.xxxxx.com:14567/yx/someother/foldername/index.html
how can i get only:
http://xxxxx.yyyy.xxxxx.com:14567/yx/
my try :
var base = \w([yx]/)
the base only select yx/ how to get the value in front of this?
this part..
thanks in advance..
If 'someother' is known to be the root of your site, then replace
\w([yx]/)
with
(.*\/)someother\/
(note that the / characters are escaped here) which gives a first match of:
http://xxxxx.yyyy.xxxxx.com:14567/yx/
However, a regular expression may not be the best way of doing this; see if there's any way you can pass the base URL in by another manner, for example from the code running behind the page.
If you don't mind disregarding the trailing slash, you can do it without a regex:
var url = 'http://xxxxx.yyyy.xxxxx.com:14567/yx/someother/foldername/index.html';
url.split('/', 4).join('/');
//-> "http://xxxxx.yyyy.xxxxx.com:14567/yx"
If you want the trailing slash, it's easy to append with + '/'.
Please try following regexp:
http\:\/\/[\w\.]+\:\d+\/\w+\/
This one should do pretty well
http:\/\/[\w\.]+\:\d+\/\w+\/
Perhaps something like this?
Javascript
function myBase(url, baseString) {
if (url && baseString) {
var array = url.split(new RegExp("\\b" + baseString + "\\b"));
if (array.length === 2) {
return array[0] + baseString + "/";
}
}
return null;
}
var testUrl = "http://xxxxx.yyyy.xxxxx.com:14567/yx/someother/foldername/index.html",
testBase = "yx";
console.log(myBase(testUrl, testBase))
;
Output
http://xxxxx.yyyy.xxxxx.com:14567/yx/
On jsfiddle

What is the best way to parse a URL with JavaScript? [duplicate]

If there is one thing I just cant get my head around, it's regex.
So after a lot of searching I finally found this one that suits my needs:
function get_domain_name()
{
aaaa="http://www.somesite.se/blah/sdgsdgsdgs";
//aaaa="http://somesite.se/blah/sese";
domain_name_parts = aaaa.match(/:\/\/(.[^/]+)/)[1].split('.');
if(domain_name_parts.length >= 3){
domain_name_parts[0] = '';
}
var domain = domain_name_parts.join('.');
if(domain.indexOf('.') == 0)
alert("1"+ domain.substr(1));
else
alert("2"+ domain);
}
It basically gives me back the domain name, is there anyway I can also get all the stuff after the domain name? in this case it would be /blah/sdgsdgsdgs from the aaaa variable.
EDIT (2020): In modern browsers, you can use the built-in URL Web API.
https://developer.mozilla.org/en-US/docs/Web/API/URL/URL
var url = new URL("http://www.somesite.se/blah/sdgsdgsdgs");
var pathname = url.pathname; // returns /blah/sdgsdgsdgs
Instead of relying on a potentially unreliable* regex, you should instead use the built-in URL parser that the JavaScript DOM API provides:
var url = document.createElement('a');
url.href = "http://www.example.com/some/path?name=value#anchor";
That's all you need to do to parse the URL. Everything else is just accessing the parsed values:
url.protocol; //(http:)
url.hostname; //(www.example.com)
url.pathname; //(/some/path)
url.search; // (?name=value)
url.hash; //(#anchor)
In this case, if you're looking for /blah/sdgsdgsdgs, you'd access it with url.pathname
Basically, you're just creating a link (technically, anchor element) in JavaScript, and then you can make calls to the parsed pieces directly. (Since you're not adding it to the DOM, it doesn't add any invisible links anywhere.) It's accessed in the same way that values on the location object are.
(Inspired by this wonderful answer.)
EDIT: An important note: it appears that Internet Explorer has a bug where it omits the leading slash on the pathname attribute on objects like this. You could normalize it by doing something like:
url.pathname = url.pathname.replace(/(^\/?)/,"/");
Note:
*: I say "potentially unreliable", since it can be tempting to try to build or find an all-encompassing URL parser, but there are many, many conditions, edge cases and forgiving parsing techniques that might not be considered or properly supported; browsers are probably best at implementing (since parsing URLs is critical to their proper operation) this logic, so we should keep it simple and leave it to them.
The RFC (see appendix B) provides a regular expression to parse the URI parts:
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
12 3 4 5 6 7 8 9
where
scheme = $2
authority = $4
path = $5
query = $7
fragment = $9
Example:
function parse_url(url) {
var pattern = RegExp("^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?");
var matches = url.match(pattern);
return {
scheme: matches[2],
authority: matches[4],
path: matches[5],
query: matches[7],
fragment: matches[9]
};
}
console.log(parse_url("http://www.somesite.se/blah/sdgsdgsdgs"));
gives
Object
authority: "www.somesite.se"
fragment: undefined
path: "/blah/sdgsdgsdgs"
query: undefined
scheme: "http"
DEMO
Please note that this solution is not the best. I made this just to match the requirements of the OP. I personally would suggest looking into the other answers.
THe following regexp will give you back the domain and the rest. :\/\/(.[^\/]+)(.*):
www.google.com
/goosomething
I suggest you studying the RegExp documentation here: http://www.regular-expressions.info/reference.html
Using your function:
function get_domain_name()
{
aaaa="http://www.somesite.se/blah/sdgsdgsdgs";
//aaaa="http://somesite.se/blah/sese";
var matches = aaaa.match(/:\/\/(?:www\.)?(.[^/]+)(.*)/);
alert(matches[1]);
alert(matches[2]);
}
You just need to modify your regex a bit. For example:
var aaaa="http://www.somesite.se/blah/sdgsdgsdgs";
var m = aaaa.match(/^[^:]*:\/\/([^\/]+)(\/.*)$/);
m will then contain the following parts:
["http://www.somesite.se/blah/sdgsdgsdgs", "www.somesite.se", "/blah/sdgsdgsdgs"]
Here is the same example, but modified so that it will split out the "www." part. I think the regular expression should be written so that the match will work whether or not you you have the "www." part. So check this out:
var aaaa="http://www.somesite.se/blah/sdgsdgsdgs";
var m = aaaa.match(/^[^:]*:\/\/(www\.)?([^\/]+)(\/.*)$/);
m will then contain the following parts:
["http://www.somesite.se/blah/sdgsdgsdgs", "www.", "somesite.se", "/blah/sdgsdgsdgs"]
Now check out the same regular expression but with a url that does not start with "www.":
var bbbb="http://somesite.se/blah/sdgsdgsdgs";
var m = .match(/^[^:]*:\/\/(www\.)?([^\/]+)(\/.*)$/);
Now your match looks like this:
["http://somesite.se/blah/sdgsdgsdgs", undefined, "somesite.se", "/blah/sdgsdgsdgs"]
So as you can see it will do the right thing in both cases.
There is a nice jQuery plugin for parsing URLs: Purl.
All the regex stuff is hidden inside, and you get something like:
> url = $.url("http://markdown.com/awesome/language/markdown.html?show=all#top");
> url.attr('source');
"http://markdown.com/awesome/language/markdown.html?show=all#top"
> url.attr('protocol');
"http"
> url.attr('host');
"markdown.com"
> url.attr('relative');
"/awesome/language/markdown.html?show=all#top"
> url.attr('path');
"/awesome/language/markdown.html"
> url.attr('directory');
"/awesome/language/"
> url.attr('file');
"markdown.html"
> url.attr('query');
"show=all"
> url.attr('fragment');
"top"
Browsers have come a long way since this question was first asked. You can now use the native URL interface to accomplish this:
const url = new URL('http://www.somesite.se/blah/sdgsdgsdgs')
console.log(url.host) // "www.somesite.se"
console.log(url.href) // "http://www.somesite.se/blah/sdgsdgsdgs"
console.log(url.origin) // "http://www.somesite.se"
console.log(url.pathname) // "/blah/sdgsdgsdgs"
console.log(url.protocol) // "http:"
// etc.
Be aware that IE does not support this API. But, you can easily polyfill it with polyfill.io:
<script crossorigin="anonymous" src="https://polyfill.io/v3/polyfill.min.js?flags=gated&features=URL"></script>

How to parse a URL?

If there is one thing I just cant get my head around, it's regex.
So after a lot of searching I finally found this one that suits my needs:
function get_domain_name()
{
aaaa="http://www.somesite.se/blah/sdgsdgsdgs";
//aaaa="http://somesite.se/blah/sese";
domain_name_parts = aaaa.match(/:\/\/(.[^/]+)/)[1].split('.');
if(domain_name_parts.length >= 3){
domain_name_parts[0] = '';
}
var domain = domain_name_parts.join('.');
if(domain.indexOf('.') == 0)
alert("1"+ domain.substr(1));
else
alert("2"+ domain);
}
It basically gives me back the domain name, is there anyway I can also get all the stuff after the domain name? in this case it would be /blah/sdgsdgsdgs from the aaaa variable.
EDIT (2020): In modern browsers, you can use the built-in URL Web API.
https://developer.mozilla.org/en-US/docs/Web/API/URL/URL
var url = new URL("http://www.somesite.se/blah/sdgsdgsdgs");
var pathname = url.pathname; // returns /blah/sdgsdgsdgs
Instead of relying on a potentially unreliable* regex, you should instead use the built-in URL parser that the JavaScript DOM API provides:
var url = document.createElement('a');
url.href = "http://www.example.com/some/path?name=value#anchor";
That's all you need to do to parse the URL. Everything else is just accessing the parsed values:
url.protocol; //(http:)
url.hostname; //(www.example.com)
url.pathname; //(/some/path)
url.search; // (?name=value)
url.hash; //(#anchor)
In this case, if you're looking for /blah/sdgsdgsdgs, you'd access it with url.pathname
Basically, you're just creating a link (technically, anchor element) in JavaScript, and then you can make calls to the parsed pieces directly. (Since you're not adding it to the DOM, it doesn't add any invisible links anywhere.) It's accessed in the same way that values on the location object are.
(Inspired by this wonderful answer.)
EDIT: An important note: it appears that Internet Explorer has a bug where it omits the leading slash on the pathname attribute on objects like this. You could normalize it by doing something like:
url.pathname = url.pathname.replace(/(^\/?)/,"/");
Note:
*: I say "potentially unreliable", since it can be tempting to try to build or find an all-encompassing URL parser, but there are many, many conditions, edge cases and forgiving parsing techniques that might not be considered or properly supported; browsers are probably best at implementing (since parsing URLs is critical to their proper operation) this logic, so we should keep it simple and leave it to them.
The RFC (see appendix B) provides a regular expression to parse the URI parts:
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
12 3 4 5 6 7 8 9
where
scheme = $2
authority = $4
path = $5
query = $7
fragment = $9
Example:
function parse_url(url) {
var pattern = RegExp("^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?");
var matches = url.match(pattern);
return {
scheme: matches[2],
authority: matches[4],
path: matches[5],
query: matches[7],
fragment: matches[9]
};
}
console.log(parse_url("http://www.somesite.se/blah/sdgsdgsdgs"));
gives
Object
authority: "www.somesite.se"
fragment: undefined
path: "/blah/sdgsdgsdgs"
query: undefined
scheme: "http"
DEMO
Please note that this solution is not the best. I made this just to match the requirements of the OP. I personally would suggest looking into the other answers.
THe following regexp will give you back the domain and the rest. :\/\/(.[^\/]+)(.*):
www.google.com
/goosomething
I suggest you studying the RegExp documentation here: http://www.regular-expressions.info/reference.html
Using your function:
function get_domain_name()
{
aaaa="http://www.somesite.se/blah/sdgsdgsdgs";
//aaaa="http://somesite.se/blah/sese";
var matches = aaaa.match(/:\/\/(?:www\.)?(.[^/]+)(.*)/);
alert(matches[1]);
alert(matches[2]);
}
You just need to modify your regex a bit. For example:
var aaaa="http://www.somesite.se/blah/sdgsdgsdgs";
var m = aaaa.match(/^[^:]*:\/\/([^\/]+)(\/.*)$/);
m will then contain the following parts:
["http://www.somesite.se/blah/sdgsdgsdgs", "www.somesite.se", "/blah/sdgsdgsdgs"]
Here is the same example, but modified so that it will split out the "www." part. I think the regular expression should be written so that the match will work whether or not you you have the "www." part. So check this out:
var aaaa="http://www.somesite.se/blah/sdgsdgsdgs";
var m = aaaa.match(/^[^:]*:\/\/(www\.)?([^\/]+)(\/.*)$/);
m will then contain the following parts:
["http://www.somesite.se/blah/sdgsdgsdgs", "www.", "somesite.se", "/blah/sdgsdgsdgs"]
Now check out the same regular expression but with a url that does not start with "www.":
var bbbb="http://somesite.se/blah/sdgsdgsdgs";
var m = .match(/^[^:]*:\/\/(www\.)?([^\/]+)(\/.*)$/);
Now your match looks like this:
["http://somesite.se/blah/sdgsdgsdgs", undefined, "somesite.se", "/blah/sdgsdgsdgs"]
So as you can see it will do the right thing in both cases.
There is a nice jQuery plugin for parsing URLs: Purl.
All the regex stuff is hidden inside, and you get something like:
> url = $.url("http://markdown.com/awesome/language/markdown.html?show=all#top");
> url.attr('source');
"http://markdown.com/awesome/language/markdown.html?show=all#top"
> url.attr('protocol');
"http"
> url.attr('host');
"markdown.com"
> url.attr('relative');
"/awesome/language/markdown.html?show=all#top"
> url.attr('path');
"/awesome/language/markdown.html"
> url.attr('directory');
"/awesome/language/"
> url.attr('file');
"markdown.html"
> url.attr('query');
"show=all"
> url.attr('fragment');
"top"
Browsers have come a long way since this question was first asked. You can now use the native URL interface to accomplish this:
const url = new URL('http://www.somesite.se/blah/sdgsdgsdgs')
console.log(url.host) // "www.somesite.se"
console.log(url.href) // "http://www.somesite.se/blah/sdgsdgsdgs"
console.log(url.origin) // "http://www.somesite.se"
console.log(url.pathname) // "/blah/sdgsdgsdgs"
console.log(url.protocol) // "http:"
// etc.
Be aware that IE does not support this API. But, you can easily polyfill it with polyfill.io:
<script crossorigin="anonymous" src="https://polyfill.io/v3/polyfill.min.js?flags=gated&features=URL"></script>

Categories