Regex to get a specific part of an URL - javascript

Say I've got an URL:
http://www.example.com/foo/bar/yes_no_no.html
I'm looking for a regex that can extract all the characters of the filename up to the first underscore. So in the above URL I want to extract "yes".
So far I managed to get the whole "yes_no_no" bit using:
/([^\/]+)(?=\.\w+$)/, but I can't seem to get it to only match "yes".

Try this:
/\/([^_\/]+)_?[^\/]*\.[a-z]+$/
Example:
console.log('http://www.example.com/foo/bar/yes_no_no.html'.match(/\/([^_\/]+)_?[^\/]*\.[a-z]+$/)[1])
console.log('http://www.example.com/foo/bar/yes.html'.match(/\/([^_\/]+)_?[^\/]*\.[a-z]+$/)[1])

Related

REGEX to get ID ath the end of an URL without /

I'm trying to do a GreaseMonkey script and i'm stucked on a REGEX to get the ID of an article in URL.
I have URL like https://www.blabla.com/poney-2000-poneys-dance-bla,272317
And i want to use the ID at the end after the Comma : 272317
I tried this REGEX : (,([\d]+)) to avoid taking digit in the rest of the URL and it get me ,272317 but I want it without the comma at the begining.
Do you have an idea how i can improve my REGEX ?
Thanks a lot :)
Just need to remove that outer paran since you don't want to capture the ,
,(\d+)$
An alternative approach without regex:
var url = 'https://www.blabla.com/poney-2000-poneys-dance-bla,272317';
url.substring(url.lastIndexOf(',') + 1);
The regex would be (\d+)$ to lookup the digits in the end but it doesn't consider the comma.
Try this: (?!,)([0-9]+){5}, it should be fine as long as the id is greater than 5

Match everything after word using only regex

I'm trying to set up a ShareX custom engine, and after the upload I'm given the full url, for instance http://foo.com/HF139hR and I can work that string with regex before copying it to clipboard. What I want to do is to get only the last part of the url, HF139hR so I can throw it into another url, say http://foo.com/?viewer=HF139hR. So far I was using the expression\w+$ to grab it but sometimes I can get an upload error, and that will also get the last word of the error message and pass it to ?viewer=.
Doing my research I found \bfoo.com\/\K\S+, which is exactly what I want, but unfortunately it is not supported on javascript.
\bfoo.com/\K\S+
\bfoo.com\/(\S+)
You can use a similar one and grab the group 1 or capture 1
You can use this Regex: /\/(\w+)(\?+.*)*$/ and get the capturing group between (), this will avoid the part of the upload error which starts with ? like in the example `?viewer=$1$, you can try it here:
var url="http://foo.com/HF139hR?=viewer=$1$";
var reg=/\/(\w+)(\?+.*)*$/;
alert(url.match(reg)[1]);
And if you use only the url="http://foo.com/HF139hR" as a url it will also match the same thing.
And you can take a look at this Regex DEMO where you can see the match information.

Replace everything after last character in URL

I have the following code which replaces the current URL using JavaScript:
window.location.replace(window.location.href.replace(/\/?$/, '#/view-0'));
However if I have a URL like:
domain.com/#/test or domain.com/#/
It will append the #/view-0 to the current hash. What I want to is replace EVERYTHING after the last part of the URL including any query strings or hashes.
So presume my regex doesn't handle that... How can I amend it, to be more aggressive?
The following syntax may help:
location.href.replace(/[?#].*$/, '#/view')
It will replace everything after (and together with) ? or # in the string with #/view.
(^[^\/]*?\/)(?:.*)
Use this.Replace by \1 then your string
See demo.
http://regex101.com/r/sA7pZ0/28

Is there a better way to match part of a URL's pathname in JavaScript?

I'm using window.location.pathname to get the path of the current page. Pathnames that I'll get will be similar to:
H-Foo-Bar-cu-s/9.htm
Some-Thing-sb-s/22297.htm
Foo-Boo-or-Bar-cu-s/553.htm
Random-32-Ness-Can-be-Fun-cu-s/4617.htm
Chicken-Fried-264-Seaturtles-for-Pennies-cu-s/3971.htm
Asymetrical-Banana-Party-p/asy-banana-p.htm
Basically, I'm trying to match the page-type suffixes to the page names that come before the last forward slash.
I got it to work but was wondering if there was a better way to do this:
http://regex101.com/r/dX6rZ9
(-[^-]{2}-[^-]|-[^-])(?=\/.*)
You could try something like this:
var result = yourPath.replace(/^.+(-[^-].+-[^-])\/.+$/, '$1');

How to identify all URLs that contain a (domain) substring?

If I am correct, the following code will only match a URL that is exactly as presented.
However, what would it look like if you wanted to identify subdomains as well as urls that contain various different query strings - in other words, any address that contains this domain:
var url = /test.com/
if (window.location.href.match(url)){
alert("match!");
}
If you want this regex to match "test.com" you need to escape the "." and both of the "/" that means any character in regex syntax.
Escaped : \/test\.com\/
Take a look for here for more info
No, your pattern will actually match on all strings containing test.com.
The regular expresssion /test.com/ says to match for test[ANY CHARACTER]com anywhere in the string
Better to use example.com for example links. So I replaces test with example.
Some example matches could be
http://example.com
http://examplexcom.xyz
http://example!com.xyz
http://example.com?q=123
http://sub.example.com
http://fooexample.com
http://example.com/asdf/123
http://stackoverflow.com/?site=example.com
I think you need to use /g. /g enables "global" matching. When using the replace() method, specify this modifier to replace all matches, rather than only the first one:
var /test.com/g;
If you want to test if an URL is valid this is the one I use. Fairly complex, because it takes care also of numeric domain & a few other peculiarities :
var urlMatcher = /(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?#)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?/;
Takes care of parameters and anchors etc... dont ask me to explain the details pls.

Categories