Matching Multiple Level Deep URL

Matching Multiple Level Deep URL - javascript

I have been trying to write a regex on javascript that helps matches URL exactly 2 level deep URL in the following format:
https://myurl.com/x/y/z
https://anotherurl.com/ab9/zx/qs
I have tried couple of revelant regex suggested in other answers and tried modifying them for my own purposes - however to no avail: Regex to match 2 level deep URL and not 3 level deep (Google Analytics), https://mixedanalytics.com/blog/regex-match-number-subdirectories-url/
Could someone shade some light? Please pardon my lack of knowledge in regex. I am just starting out.

This is a Regular Expression which I think you want something like this:-
\^(?:https|http)\:\/\/[^\/]+\/[^\/]+\/[^\/]+\/[^\/]+$\
The Explanation:-
^ : The first of the string.
(?: ... ) : A non-capturing group.
https|http : Matches both https and http.
\:\/\/ : Matches :// which appear after https.
[^\/]+ : Matches anything except /, and the plus means one or more occurrences(letters or symbols).
\/ : Matches / symbol.
$ : The end of the string.
And the other part of the regex code is repeated and described above, and also if you don't understand the explanation open this link this describes more nicely than me, but I didn't wrote 2 level deep URL because your examples aren't two level deep URL, they're 3 level deep URL, And if you just want 2 level deep URL without looking at your examples so use this instead:-
\^(?:https|http)\:\/\/[^\/]+\/[^\/]+\/[^\/]+$\

Related

Regex: Replace last segment of url

I try to figure out the correct regex to replace the last segment of an url with a modified version of that very last segment. (I know that there are similar threads out there, but none seemed to help...)
Example:
https://www.test.com/one/two/three/mypost/
--->
one/two/three?id=mypost
https://www.test.com/one/mypost/
--->
one?id=mypost
Now I am stuck here:
https://regex101.com/r/9GqYaU/1
I can get the last segment in capturing group 2 but how would I replace it?
I think I will have to something like this:
const url = 'https://www.test.com/one/two/three/mypost/'
const regex = /(http[s]?:\/\/)([^\/]+\/)*(?=\/$|$)/
const path = url.replace(regex, `${myUrlWithoutTheLastSegmentAnd WithoutHTTPS}?id=$2`)
return path
But I have no idea how to get the url without the last segment. I have currently only access to the whole string or group 1 (which is useless in this case) and then group 2, but not the string without group 2.
I would be very glad for any help here. Sometimes I just lack the knowledge of what is possible with regex and how to achieve it.
Thank you in advance.
Cheers

You could use the URL class to extract the pathname and substring to remove the first '/'.
Then, you could put the last part of the pathname in a group and use it as a reference $1 for the replacement.
const url = new URL('https://www.test.com/one/two/three/mypost/').pathname.substring(1)
console.log(url.replace(/\/([^/]*)\/$/, '?id=$1'))

I came across your question yesterday and agree with going down the route of parsing the URL. Once you get there you could even use JavaScript array methods which I prefer to string methods like:
pathname.split("/").filter(p => p.length).pop()
This would separate each folder, ignore any with no length (i.e. handle a trailing slash) and return the last one (mypost).
Anyway, I am also learning regex so sometimes when I find a question like this I just try to find the answer anyway as the best way of learning is doing. It took 24 hours 😂 I came up with this:
/(https?:\/\/).+?([a-z-]*)\/?$/gm
(https?:\/\/) you know what this does. Small correction, you don't need the square brackets. Question mark matches 0 or 1 of the preceding character. As we're only matching s this just works. If you wanted to match s or z you would use [sz]?. I think.
.+? this is the cool one I think I will use in future now I found it. The question mark here has a different meaning - it makes .+ (which means one or more of any character) non-greedy. That means it stops applying once it reaches the next rule. Which is...
([a-z-]*) any number of letters or a hyphen. You should maybe change this to include numbers and upper case.
\/? Optional slash
$ all this must apply at the end of the string.
Here is a demo
https://regex101.com/r/mQNkIS/1

javascript regexp to match path depth

Been struggling for the last hour to try and get this regexp to work but cannot seem to crack it.
It must be a regexp and I cannot use split etc as it is part of a bigger regexp that searches for numerous other strings using .test().
(public\/css.*[!\/]?)
public/css/somefile.css
public/css/somepath/somefile.css
public/css/somepath/anotherpath/somefile.css
Here I am trying to look for path starting with public/css followed by any character except for another forward slash.
so "public/css/somefile.css" should match but the other 2 should not.
A better solution may be to somehow specify the number of levels to match after the prefix using something like
(public\/css\/{1,2}.*)
but I can't seem to figure that out either, some help with this would be appreciated.
edit
No idea why this question has been marked down twice, I have clearly stated the requirement with sample code and test cases and also attempted to solve the issue, why is it being marked down ?

You can use this regex:
/^(public\/css\/[^\/]*?)$/gm
^ : Starts with
[^/] : Not /
*?: Any Characters
$: Ends with
g: Global Flag
m: Multi-line Flag

Something like this?
/public\/css\/[^\/]+$/
This will match
public/css/[Any characters except for /]$
$ is matching the end of the string in regex.

RegEx to get ALL Strings between two Strings

I seem to have a love/hate relationship with RegEx in that I love how incredibly powerful it is, but at the same time, I don't quite understand all of the nuances of it yet.
I've got rather lengthy JSON feed that I need to parse and capture ALL of the matches between two specific strings. I've included a link to the regex101.com example with a few of the JSON results.
regex101.com Example
I'm trying to match every string between each /content/usergenerated and /jcr:content
...
I guess what I should really be trying to match is a string that starts with /content/webAppName/en/home and ends before /jcr:content
The path that I care about will always start with /content/webAppName/en/home

you have to use "positive look-ahead" that match a sequence of digits if they are followed by something
https://regex101.com/r/fU1iD1/4

Just wrap the two things you're looking to remove in parenthesis, and then remove them from the output. So...
(\/content\/usergenerated)(.*)(\/jcr\:content)
replaced by
/2
Which is everything in the middle of those two.
edit: Sorry, didn't look at your example :) - there was a deleted answer that said to add the g modifier, which looks like it works.

/content/usergenerated/content/webAppName/en/home([a-zA-Z/-]+)/jcr:content
This should work. It matches 3 out of 4 don't know why it doesn't match one of em. You could use exec() in a loop till it returns null and get hold of the object[1] which contains data for the first and only capture group.
all the best.
PS: I used gmi in options for the regex.

Capturing optional part of URL with RegExp

While writing an API service for my site, I realized that String.split() won't do it much longer, and decided to try my luck with regular expressions. I have almost done it but I can't find the last bit. Here is what I want to do:
The URL represents a function call:
/api/SECTION/FUNCTION/[PARAMS]
This last part, including the slash, is optional. Some functions display a JSON reply without having to receive any arguments. Example: /api/sounds/getAllSoundpacks prints a list of available sound packs. Though, /api/sounds/getPack/8Bit prints the detailed information.
Here is the expression I have tried:
req.url.match(/\/(.*)\/(.*)\/?(.*)/);
What am I missing to make the last part optional - or capture it in whole?

This will capture everything after FUNCTION/ in your URL, independent of the appearance of any further / after FUNCTION/:
FUNCTION\/(.+)$
The RegExp will not match if there is no part after FUNCTION.

This regex should work by making last slash and part after optional:
/^\/[^/]*\/[^/]*(?:\/.*)?$/
This matches all of these strings:
/api/SECTION/FUNCTION/abc
/api/SECTION
/api/SECTION/
/api/SECTION/FUNCTION

Your pattern /(.*)/(.*)/?(.*) was almost correct, it's just a bit too short - it allows 2 or 3 slashes, but you want to accept anything with 3 or 4 slashes. And if you want to capture the last (optional) slash AND any text behind it as a whole, you simply need to create a group around that section and make it optional:
/.*/.*/.*(?:/.+)?
should do the trick.
Demo. (The pattern looks different because multiline mode is enabled, but it still works. It's also a little "better" because it won't match garbage like "///".)

Regex to convert URL to Links

I 'borrowed' a regex from this website : http://daringfireball.net/2010/07/improved_regex_for_matching_urls that is almost complete but i want to match exemple.com
I know that stackoverflow is not doyourhomework.com but I passed a long time thinking without results. Here is a fiddle to test : http://jsfiddle.net/BGnMm/25/ and you can see at the end that exemple.com is not a link.
var reg=/\b((?:[a-z][\w-]+:(?:\/*)|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))/gi;
var allurl="http:foo.com/blah_blah http://foo.com/blah_blah/ (Something like http://foo.com/blah_blah) http://foo.com/blah_blah_(wikipedia) http://foo.com/more_(than)_one_(parens) (Something like http://foo.com/blah_blah_(wikipedia)) http://foo.com/blah_(wikipedia)#cite-1 http://foo.com/blah_(wikipedia)_blah#cite-1 http://foo.com/unicode_(✪)_in_parens http://foo.com/(something)?after=parens http://foo.com/blah_blah. http://foo.com/blah_blah/. <http://foo.com/blah_blah> <http://foo.com/blah_blah/> http://foo.com/blah_blah, http://www.extinguishedscholar.com/wpglob/?p=364. http://✪df.ws/1234 rdar://1234 rdar:/1234 x-yojimbo-item://6303E4C1-6A6E-45A6-AB9D-3A908F59AE0E message://%3c330e7f840905021726r6a4ba78dkf1fd71420c1bf6ff#mail.gmail.com%3e http://➡.ws/䨹 www.c.ws/䨹 <tag>http://example.com</tag> Just a www.example.com link. http://example.com/something?with,commas,in,url, but not at end What about <mailto:gruber#daringfireball.net?subject=TEST> (including brokets). mailto:name#example.com bit.ly/foo “is.gd/foo/” WWW.EXAMPLE.COM http://www.asianewsphoto.com/(S(neugxif4twuizg551ywh3f55))/Web_ENG/View_DetailPhoto.aspx?PicId=752 http://www.asianewsphoto.com/(S(neugxif4twuizg551ywh3f55)) http://lcweb2.loc.gov/cgi-bin/query/h?pp/horyd:#field(NUMBER+#band(thc+5a46634)) 6:00p filename.txt http://example.com/quotes-are-“part” ✪df.ws/1234 example.com example.com/";
document.write(allurl.replace(reg,"<a href='$1' >$1</a><br />"));

Add an alternation operator (|) after the {2,4}\/, i.e.
var reg=/\b((?:[a-z][\w-]+:(?:\/*)|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/|)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))/gi;
There's something you should understand about this. The first non-captured group, (?: … ), looks for "indicators" of URLs. One indicator, for example, is the www (followed by up to 3 digits of numbers). You however are asking for a way to identify URLs without any indicator at all. So, what we've done above is we've added a clause, "or an empty match," as a "valid" indicator. The consequence of this is that your regular expression is less selective now: all sorts of strings, not only example.com but also filename.txt, 3.141593, and omg...really are identified as URLs! Your only other (readily available) option is to be more selective about suffixes, e.g. require specific suffixes (com|org|net), but then this takes away from the generality of the original regex, which doesn't specify any suffixes at all.
In other words, you are probably faced with a limitation of logic, not a limitation of regex-writing skills or the regex language itself.

Please check if
var reg=/\b((?:[a-z][\w-]+:(?:\/*)|(?:www\d{0,3}[.])|[a-z0-9.\-]+[.][a-z]{2,4}\/{0,1})(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))*(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))/gi;
suits your needs. www(anyNumber) has just been put to appear one or zero times. Sorry for the first answer, did not notice the texts.

We Keep Coding

JavaScript is the programming language of the Web.

Matching Multiple Level Deep URL - javascript

Related

Regex: Replace last segment of url

javascript regexp to match path depth

RegEx to get ALL Strings between two Strings

Capturing optional part of URL with RegExp

Regex to convert URL to Links

Categories

Resources