Say I have a URL for an article:
http://domain.com/blog/articles/title-here/
And it has about 5 pages, so as you go through each page, you get:
http://domain.com/blog/articles/title-here/ OR http://domain.com/blog/articles/title-here/1
http://domain.com/blog/articles/title-here/2
http://domain.com/blog/articles/title-here/3
http://domain.com/blog/articles/title-here/4
http://domain.com/blog/articles/title-here/5
I know that the following code will get the full current URL (aka including the page #):
var u = window.location.href;
But is there a way to limit it so that the page # is NOT a part of the variable "u"?
Perhaps there's a regex or something I should add in there..? (I'm fairly new to javascript, so not sure how to apply this?)
var u = window.location.href.match(/.*[/][^\d]*/)[0]
Would that work for you?
Edit
I changed it... again :P
NOTE: Regex is a more complicated version of Joseph's and still suffers from the same bug. Will undelete when I fix it.
Joseph's answer is good, but it has a minor bug: it will drop the last part of the URL if you have an URL like:
http://domain.com/blog/articles/title-here
You can use this instead:
var u = window.location.href.match(/(.*)(\/\d*)/)[1]
How the regex works:
/ # delimiter
(.*?) # match anything and put in capture group 1
(\/ # match the forward slash
\d*) # match zero or more digits
/ # delimiter
var l = window.location;
l.href.replace(l.pathname, l.pathname.split(/\/[0-9]+$/)[0]);
try it in the console at this URL
Regex would do it. But in this case, you could just turn it into an array, strip of the end item, and re-serialize it.
var a = 'http://domain.com/blog/articles/title-here/2'.split('/');
a.splice(-1, 1);
a.join('/');
Related
I have regex to parse all hash url in HTML content.
/(\#)([^\s]+")/g
HTML content will be as
Some text some linksome content some link1
Expected is
#some-hash1, #some-hash2
But current regex is returning as (ending double come along with hash):
#some-hash1", #some-hash2"
I am unable to understand why its come along with double quotes. Any suggestion that will be very helpful.
I wouldn't use regex for this because it's overkill and because you can simply loop through the anchors pulling the value of their hrefs...
var anchors = document.querySelectorAll('a');
var hrefs = [];
anchors.forEach(function(e){
hrefs.push(e.getAttribute('href'));
});
console.log(hrefs);
link 1
link 2
Use non-capturing parenthesis,
/(\#)([^\s]+(?="))/g
DEMO
var z = 'Some text some linksome content some link1';
console.log( z.match(/(\#)([^\s]+(?="))/g) );
I am assuming that you are looking at the content of $2 for your result.
If so, the problem is the " inside the second capture group. Changing /(\#)([^\s]+")/g to /(\#)([^\s]+")/g results in the correct result.
I suggest joining the capture groups. Then /(\#[^\s]+)"/g will return $1=>#some-hash1, #some-hash2
Since $1 will always just return #, I suppose you trim it off elsewhere in your program, so perhaps you should use /\#([^\s]+)"/g which will return some-hash1, some-hash2 without the #
Just move double quote out the brackets:
(\#)([^\s]+)"
See how it works: https://regex101.com/r/fmrDyu/1
I would like to match a path in a Url, but ignoring the querystring.
The regex should include an optional trailing slash before the querystring.
Example urls that should give a valid match:
/path/?a=123&b=123
/path?a=123&b=123
So the string '/path' should match either of the above urls.
I have tried the following regex: (/path[^?]+).*
But this will only match urls like the first example above: /path/?a=123&b=123
Any idea how i would go about getting it to match the second example without the trailing slash as well?
Regex is a requirement.
No need for regexp:
url.split("?")[0];
If you really need it, then try this:
\/path\?*.*
EDIT Actually the most precise regexp should be:
^(\/path)(\/?\?{0}|\/?\?{1}.*)$
because you want to match either /path or /path/ or /path?something or /path/?something and nothing else. Note that ? means "at most one" while \? means a question mark.
BTW: What kind of routing library does not handle query strings?? I suggest using something else.
http://jsfiddle.net/bJcX3/
var re = /(\/?[^?]*?)\?.*/;
var p1 = "/path/to/something/?a=123&b=123";
var p2 = "/path/to/something/else?a=123&b=123";
var p1_matches = p1.match(re);
var p2_matches = p2.match(re);
document.write(p1_matches[1] + "<br>");
document.write(p2_matches[1] + "<br>");
I found many solutions, but none was useful for me.
Let's say, as an example, I want to find URLs that start with www. and end with a space or ?. In this case, I really mean it ends in a ?, not that it's necessarily a CGI-related URL.
I'm trying to use the regex
var r = /(^|[\s\?])(www\..+?(?=([\s]|\?|($))))/g;
My sample use: http://jsfiddle.net/DKNat/2/
How can I use \? in a regex to prevent the end of the URL containing / before ??
http://jsfiddle.net/DKNat/11/
I can't solve last prob with DOT at the end of url.
Can any body help?
Try this in your fiddle:
var r = /(^|\??)(www\.[^\?]+)/g;
I updated your fiddle here:
http://jsfiddle.net/DKNat/3/
Update:
I see what you are trying to do now. Unfortunately, both your strings are essentially the same, apart from the /, so unless you want your regex to make the assumption that a ? anywhere after a slash denotes a CGI call, then there isn't much you can do. But you could try this:
var r = /(^|\??)(www\.[^\?]+\/[^\/]+\?[^\?]+|www\.[^\?]+)/g;
Updated fiddle:
http://jsfiddle.net/DKNat/5/
Update 2: After determining the requirements, this is the final RegExp I added to fiddle 10:
var r = /(^|[\?\s])(www\.[^\? ]+\/[^\/ ]*\?[^\? ]+|www\.[^\? ]+)/g;
I´m trying to get the first part of a hash from a url (the part between the # and a /, a ? or the end of the string
So far now I came out with this:
r = /#(.*)[\?|\/|$]/
// OK
r.exec('http://localhost/item.html#hash/sub')
["#hash/", "hash"]
// OK
r.exec('http://localhost/item.html#hash?sub')
["#hash?", "hash"]
// WAT?
r.exec('http://localhost/item.html#hash')
null
I was expeting to receive "hash"
I tracked down the problem to
/#(.*)[$]/
r2.exec('http://localhost/item.html#hash')
null
any idea what could be wrong?
r = /#(.*)[\?|\/|$]/
When $ appears in [] (character class, it's the literal "$" character, not the end of input/line. In fact, your [\?|\/|$] part is equivalent to just [?/$|], which matches the 4 specific characters (including pipe).
Use this instead (JSFiddle)
r = /#(.+?)(\?|\/|$)/
You aren't supposed to write [$] (within a character class) unless you want to match the $ literally and not the end of line.
/#(.*)$/
Code:
var regex = /\#(.*)$/;
regex.exec('http://localhost/item.html#hash');
Output:
["#hash", "hash"]
Your regex: /#(.*)[\?|\/|$]/
//<problem>-----^ ^-----<problem>
| operator won't work within [], but within ()
$ will be treated literally within []
.* will match as much as possible. .*? will be non-greedy
On making the above changes,
you end up with /#(.*?)(\?|\/|$)/
I use http://regexpal.com/ to test my regular expressions.
Your problem here is that your regular expression wants a /. So it don't works with http://localhost/item.html#hash but it works with http://localhost/item.html#hash/
Try this one :
r = /#([^\?|\/|$]*)/
You can't use the $ end-of-string marker in a character class. You're probably better off just matching characaters that aren't / or ?, like this:
/#([^\?\/]*)/
Why Regex? Do it like this (nearly no regex):
var a = document.createElement('a');
a.href = 'http://localhost/item.html#hash/foo?bar';
console.log(a.hash.split(/[\/\?]/)[0]); // #hash
Just for the sake, if it is node.js you are working with:
var hash = require('url').parse('http://localhost/item.html#hash').hash;
I found this regular expression that seems to work
r = /#([^\/\?]*)/
r.exec('http://localhost/item.html#hash/sub')
["#hash", "hash"]
r.exec('http://localhost/item.html#hash?sub')
["#hash", "hash"]
r.exec('http://localhost/item.html#hash')
["#hash", "hash"]
Anyway, I still don't get why the original one isn't working
I have this URL...
http://www.google.com/local/add/analytics?hl=en-US&gl=US
And I want to check these URLs to see if they matches above URL...
www.google.com/local/add*
www.google.com/local/add/*
http://www.google.com/local/add*
http://www.google.com/local/add/*
https://www.google.com/local/add*
https://www.google.com/local/add/*
You can see the input URL is also a regex having * so what regex that I can use to match a list of URLs with a regex to see if the url exists? Currently I am doing this...
var isAllowed = (url.indexOf(newURL) === 0);
Which is definitely not efficient.
it's not the cleanest regex i've ever written but I think it should work.
var url = "http://www.google.com/local/add/analytics?hl=en-US&gl=US";
var reg = /((https|http|)(\:\/\/|)www\.google.com\/local\/add(\/|)).*/;
console.log(reg.test(url));
this will return true for all of these cases
www.google.com/local/add*
www.google.com/local/add/*
http://www.google.com/local/add*
http://www.google.com/local/add/*
https://www.google.com/local/add*
https://www.google.com/local/add/*
it should look for (http or https or nothing) then (:// or nothing) then www.google.com/local/add then (/ or nothing) then anything.
the one case it will also return true that I will leave for you is the case (http|https)www.google.com/local/add(/|)*
var reg = new RegExp("(https?://)?(www.)?google.com/local/add/?"),
URL = "http://www.google.com/local/add/analytics?hl=en-US&gl=US";
console.log(reg.test(URL));
I've used the ? a lot, which means, whatever character precedes the question mark may or may not be matched.
https? means the s may or may not be there. (www.)? means that the www. may be absent entirely. You hopefully get how it works now.
Demo
Learn how to use Regular Expressions
As far as I understand you, you want something like this:
Convert the input URL to a regex. E.g.:
var input = "http://www.google.com/local/add*";
var reg_url = input .replace(/\*/g,".*").replace(/\./g,"\\.");
you might need to escape some more characters, see here
And check if it matches:
var url = "http://www.google.com/local/add/analytics?hl=en-US&gl=US";
var isAllowed = url.search(reg_url) >= 0;