Match between occurrence of last slash in URL and hash - javascript

I'm trying to match between two characters, specifically the last part of a url but only between the last slash (/) and a hash (#).
http://example.com/path/to/thing#name
The match should return
thing
I'm partly there now. I can either get the whole string of the last part of the url, or everything before hash (#) but not both.
/([^\/]*?.*(?=#))/
Please see my regex101 for testing.

You are close but still overthinking it. This suffices:
/[^\/#]+(?=#|$)/
– a sequence of not-/ or # characters, where the next one should be #. You don't need to add parentheses to match it as a separate group, the match itself is correct. The final lookahead (?=#|$) makes it stop on either an intervening # or the end of the URL.
See regex101.

Related

Optional regex string/pattern sections with and without non-capturing groups

Here's what I'm trying to do:
http://i.imgur.com/Xqrf8Wn.png
Simply take a URL with 3 groups, $1 not so important, $2 & $3 are but $2 is totally optional including (obviously) the corresponding backslash when present, which is all I am trying to make optional. I get that it can/should? be in a non-cap group, but does it HAVE to be? I've seen enough now seems to indicate it does not HAVE to be. If possible, I'd really like to have someone explain it so I can try to fully understand it, and not just get one possible working answer handed to me to simply copy, like some come here seeking.
Here's my regex string(s) tried and at best only currently matching second URL string with optional present:
^https:\/\/([a-z]{0,2})\.?blah\.com(?:\/)(.*)\/required\/B([A-Z0-9]{9}).*
^https:\/\/([a-z]{0,2})\.?blah\.com(\/)?(.*)\/required\/B([A-Z0-9]{9}).*
^https:\/\/([a-z]{0,2})\.?blah\.com(?:\/)?(.*)?\/required\/B([A-Z0-9]{9}).*
Here are the two URLs that I want to capture group 2 & 3, with 1 and 2 being optional, but $2 being the problem. I've tried all the strings above and have yet to get it to match the string when the optional is NOT present and I believe it must be due to the backslashes?
https://blah.com/required/B7BG0Z0GU1A
https://blah.com/optional/required/B7BG0Z0GU1A
Making a part of the pattern optional is as simple as adding ?, and your last two attempts both work: https://regex101.com/r/RIKvYY/1
Your mistake is that your test is wrong - you are using ^ which matches the beginning of the string. You need to add the /m flag (multiline) to make it match the beginning of each line. This is the reason your patterns never match the second line...
Note that you're allowing two slashes (//required, for example). You can solve it by joining the first slash and the optional part to the same capturing group (of course, as long as you are using .* you can still match multiple slashes):
https:\/\/([a-z]{0,2})\.?blah\.com(?:\/(.*))?\/required\/B([A-Z0-9]{9}).*

regex - how to select all double slashes except followed by colon

I need some help with RegEx, it may be a basic stuff but I cannot find a correct way how to do it. Please help!
So, here's my question:
I have a list of URLs, that are invalid because of double slash, like this:
http://website.com//wp-content/folder/file.jpg, to fix it I need to remove all double slashes except the first one followed by colon (http://), so fixed URL is this: http://website.com/wp-content/folder/file.jpg.
I need to do it with RegExp.
Variant 1
url.replace(/\/\//g,'/'); // => http:/website.com/wp-content/folder/file.jpg
will replace all double slashed (//), including the first one, which is not correct.
example here:
https://regex101.com/r/NhCVMz/2
You may use
url = url.replace(/(https?:\/\/)|(\/){2,}/g, "$1$2")
See the regex demo
Note: a ^ anchor at the beginning of the pattern might be used if the strings are entire URLs.
This pattern will match and capture http:// or https:// and will restore it in the resulting string with the $1 backreference and all other cases of 2 or more / will be matched by (\/){2,} and only 1 occurrence will be put back into the resulting string since the capturing group does not include the quantifier.
Find (^|[^:])/{2,}
Replace $1/
delimited: /(^|[^:])\/{2,}/

Match part of string not proceeded by a space

How can I match a substring only if it is not proceeded by a space?
In the string below, I want to match only the first and third lines and not the second. In this case the line also needs to start with a #
#match
#not match
#match
https://regex101.com/r/VE3Q8z/1
The negative lookahead (?! ) doesn't seem to affect anything. Maybe what I'm looking for is a negative look-behind, but I haven't found any examples (that make sense to me) on how do do so in Javascript.
You could achieve it with anchors:
^(?! )(#+)(.*)
See the afore-mentionned link to your own demo: https://regex101.com/r/VE3Q8z/2
Just use an anchor to verify that the string starts with a "#". And then add the "global" and "multiline" flags to it
/^#+(.*)/gm
https://regex101.com/r/koOXUB/1
This is your regex :
^#(.*)
this part ^# match with all string that begin by #. You can modify the last part to match only character or number..

URL Pattern Matching issue, .+ matches all after

I am matching up stored URLs to the current URL and having a little bit of an issue - the regex works fine when being matched against the URL itself, but for some reason all sub-directories match too (when I want a direct match only of course).
Say the user stores www.facebook.com, this should match both http://www.facebook.com and https://www.facebook.com and it does
The problem is it is also matching sub-directories such as https://www.facebook.com/events/upcoming etc.
The regex for example:
/.+:\/\/www\.facebook\.com/
Matches the following:
https://www.facebook.com/events/upcoming
When it should just be matching
http://www.facebook.com/
https://www.facebook.com/
How can I fix this seemingly broken regex?
If you're being really specific about what you want to match, why not reflect that in your RegExp?
/^https?:\/\/(?:(?:www|m)\.)?facebook\.com\/?$/
http or https
www., m. or no subdomain
facebook.com
Demo
edit to include optional trailing backslash
Put an end marker $, like:
/.+:\/\/www\.facebook\.com\/$/
but really should have a start marker ^ too, like:
/^https?:\/\/www\.facebook\.com\/$/
also if you're matching the current domain, you may as well just match the location.host rather than location.href
Try adding a $ at the end of your regex. It's the symbol for end of string.

Match index.html or empty url end with regex

I'm trying to check for a url ending of either:
http://www.site.com/html/
OR
http://www.site.com/html/index.html
So far I have this (with numerous attempts of moving the $ and /'s) but can seem to get it to work.
window.location.pathname.match(/index.html/|/^$z/))
You could try this:
window.location.pathname.match(/\/$|index\.html/)
Will match the last / of the pathname, and also index.html
The first part of the regex "/$" escapes the forward slash, and the $ matches the last character of the string. So the way I read it is "The last character is a forwardslash"
The second part of the regex "index.html" matches index.html, but you have to escape the period because "." matches any character.
Heres a regular expression cheatsheet: http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/

Categories