Get substring between substring and first occurrence of another string - javascript

I have URL pathnames that look similar to this: /service-area/i-need-this/but-not-this/. The /service-area/ part never changes, and the rest of the path is dynamic.
I need to get the part of the URL saying i-need-this.
Here was my attempt:
location.pathname.match(new RegExp('/service-area/' + "(.*)" + '/'));.
The goal was to get everything between /service-area/ and / but it's actually going up to the last occurrence of /, not the first occurrance. So the output from this is actually i-need-this/but-not-this.
I'm not so good with regex, is there a way it can be tweaked to get the desired result?

You need a lazy regex rather than a greedy one - so (.*?) instead of (.*). See also: What do 'lazy' and 'greedy' mean in the context of regular expressions?

You can do this without a regex too using replace and split:
var path = '/service-area/i-need-this/but-not-this/';
var res = path.replace('/service-area/', '').split('/')[0];
console.log(res);

Related

How to extract a particular text from url in JavaScript

I have a url like http://www.somedotcom.com/all/~childrens-day/pr?sid=all.
I want to extract childrens-day. How to get that? Right now I am doing it like this
url = "http://www.somedotcom.com/all/~childrens-day/pr?sid=all"
url.match('~.+\/');
But what I am getting is ["~childrens-day/"].
Is there a (definitely there would be) short and sweet way to get the above text without ["~ and /"] i.e just childrens-day.
Thanks
You could use a negated character class and a capture group ( ) and refer to capture group #1. The caret (^) inside of a character class [ ] is considered the negation operator.
var url = "http://www.somedotcom.com/all/~childrens-day/pr?sid=all";
var result = url.match(/~([^~]+)\//);
console.log(result[1]); // "childrens-day"
See Working demo
Note: If you have many url's inside of a string you may want to add the ? quantifier for a non greedy match.
var result = url.match(/~([^~]+?)\//);
Like so:
var url = "http://www.somedotcom.com/all/~childrens-day/pr?sid=all"
var matches = url.match(/~(.+?)\//);
console.log(matches[1]);
Working example: http://regex101.com/r/xU4nZ6
Note that your regular expression wasn't actually properly delimited either, not sure how you got the result you did.
Use non-capturing groups with a captured group then access the [1] element of the matches array:
(?:~)(.+)(?:/)
Keep in mind that you will need to escape your / if using it also as your RegEx delimiter.
Yes, it is.
url = "http://www.somedotcom.com/all/~childrens-day/pr?sid=all";
url.match('~(.+)\/')[1];
Just wrap what you need into parenteses group. No more modifications into your code is needed.
References: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp
You could just do a string replace.
url.replace('~', '');
url.replace('/', '');
http://www.w3schools.com/jsref/jsref_replace.asp

RegEx - Get All Characters After Last Slash in URL

I'm working with a Google API that returns IDs in the below format, which I've saved as a string. How can I write a Regular Expression in javascript to trim the string to only the characters after the last slash in the URL.
var id = 'http://www.google.com/m8/feeds/contacts/myemail%40gmail.com/base/nabb80191e23b7d9'
Don't write a regex! This is trivial to do with string functions instead:
var final = id.substr(id.lastIndexOf('/') + 1);
It's even easier if you know that the final part will always be 16 characters:
var final = id.substr(-16);
A slightly different regex approach:
var afterSlashChars = id.match(/\/([^\/]+)\/?$/)[1];
Breaking down this regex:
\/ match a slash
( start of a captured group within the match
[^\/] match a non-slash character
+ match one of more of the non-slash characters
) end of the captured group
\/? allow one optional / at the end of the string
$ match to the end of the string
The [1] then retrieves the first captured group within the match
Working snippet:
var id = 'http://www.google.com/m8/feeds/contacts/myemail%40gmail.com/base/nabb80191e23b7d9';
var afterSlashChars = id.match(/\/([^\/]+)\/?$/)[1];
// display result
document.write(afterSlashChars);
Just in case someone else comes across this thread and is looking for a simple JS solution:
id.split('/').pop(-1)
this is easy to understand (?!.*/).+
let me explain:
first, lets match everything that has a slash at the end, ok?
that's the part we don't want
.*/ matches everything until the last slash
then, we make a "Negative lookahead" (?!) to say "I don't want this, discard it"
(?!.*) this is "Negative lookahead"
Now we can happily take whatever is next to what we don't want with this
.+
YOU MAY NEED TO ESCAPE THE / SO IT BECOMES:
(?!.*\/).+
this regexp: [^\/]+$ - works like a champ:
var id = ".../base/nabb80191e23b7d9"
result = id.match(/[^\/]+$/)[0];
// results -> "nabb80191e23b7d9"
This should work:
last = id.match(/\/([^/]*)$/)[1];
//=> nabb80191e23b7d9
Don't know JS, using others examples (and a guess) -
id = id.match(/[^\/]*$/); // [0] optional ?
Why not use replace?
"http://google.com/aaa".replace(/(.*\/)*/,"")
yields "aaa"

Match Url path without query string

I would like to match a path in a Url, but ignoring the querystring.
The regex should include an optional trailing slash before the querystring.
Example urls that should give a valid match:
/path/?a=123&b=123
/path?a=123&b=123
So the string '/path' should match either of the above urls.
I have tried the following regex: (/path[^?]+).*
But this will only match urls like the first example above: /path/?a=123&b=123
Any idea how i would go about getting it to match the second example without the trailing slash as well?
Regex is a requirement.
No need for regexp:
url.split("?")[0];
If you really need it, then try this:
\/path\?*.*
EDIT Actually the most precise regexp should be:
^(\/path)(\/?\?{0}|\/?\?{1}.*)$
because you want to match either /path or /path/ or /path?something or /path/?something and nothing else. Note that ? means "at most one" while \? means a question mark.
BTW: What kind of routing library does not handle query strings?? I suggest using something else.
http://jsfiddle.net/bJcX3/
var re = /(\/?[^?]*?)\?.*/;
var p1 = "/path/to/something/?a=123&b=123";
var p2 = "/path/to/something/else?a=123&b=123";
var p1_matches = p1.match(re);
var p2_matches = p2.match(re);
document.write(p1_matches[1] + "<br>");
document.write(p2_matches[1] + "<br>");

what's wrong with this regular expression? getting the hash part of an url

I´m trying to get the first part of a hash from a url (the part between the # and a /, a ? or the end of the string
So far now I came out with this:
r = /#(.*)[\?|\/|$]/
// OK
r.exec('http://localhost/item.html#hash/sub')
["#hash/", "hash"]
// OK
r.exec('http://localhost/item.html#hash?sub')
["#hash?", "hash"]
// WAT?
r.exec('http://localhost/item.html#hash')
null
I was expeting to receive "hash"
I tracked down the problem to
/#(.*)[$]/
r2.exec('http://localhost/item.html#hash')
null
any idea what could be wrong?
r = /#(.*)[\?|\/|$]/
When $ appears in [] (character class, it's the literal "$" character, not the end of input/line. In fact, your [\?|\/|$] part is equivalent to just [?/$|], which matches the 4 specific characters (including pipe).
Use this instead (JSFiddle)
r = /#(.+?)(\?|\/|$)/
You aren't supposed to write [$] (within a character class) unless you want to match the $ literally and not the end of line.
/#(.*)$/
Code:
var regex = /\#(.*)$/;
regex.exec('http://localhost/item.html#hash');
Output:
["#hash", "hash"]
Your regex: /#(.*)[\?|\/|$]/
//<problem>-----^ ^-----<problem>
| operator won't work within [], but within ()
$ will be treated literally within []
.* will match as much as possible. .*? will be non-greedy
On making the above changes,
you end up with /#(.*?)(\?|\/|$)/
I use http://regexpal.com/ to test my regular expressions.
Your problem here is that your regular expression wants a /. So it don't works with http://localhost/item.html#hash but it works with http://localhost/item.html#hash/
Try this one :
r = /#([^\?|\/|$]*)/
You can't use the $ end-of-string marker in a character class. You're probably better off just matching characaters that aren't / or ?, like this:
/#([^\?\/]*)/
Why Regex? Do it like this (nearly no regex):
var a = document.createElement('a');
a.href = 'http://localhost/item.html#hash/foo?bar';
console.log(a.hash.split(/[\/\?]/)[0]); // #hash
Just for the sake, if it is node.js you are working with:
var hash = require('url').parse('http://localhost/item.html#hash').hash;
I found this regular expression that seems to work
r = /#([^\/\?]*)/
r.exec('http://localhost/item.html#hash/sub')
["#hash", "hash"]
r.exec('http://localhost/item.html#hash?sub')
["#hash", "hash"]
r.exec('http://localhost/item.html#hash')
["#hash", "hash"]
Anyway, I still don't get why the original one isn't working

Javascript regex expression to replace multiple strings?

I've a string done like this: "http://something.org/dom/My_happy_dog_%28is%29cool!"
How can I remove all the initial domain, the multiple underscore and the percentage stuff?
For now I'm just doing some multiple replace, like
str = str.replace("http://something.org/dom/","");
str = str.replace("_%28"," ");
and go on, but it's really ugly.. any help?
Thanks!
EDIT:
the exact input would be "My happy dog is cool!" so I would like to get rid of the initial address and remove the underscores and percentage and put the spaces in the right place!
The problem is that trying to put a regex on Chrome "something goes wrong". Is it a problem of Chrome or my regex?
I'd suggest:
var str = "http://something.org/dom/My_happy_dog_%28is%29cool!";
str.substring(str.lastIndexOf('/')+1).replace(/(_)|(%\d{2,})/g,' ');
JS Fiddle demo.
The reason I took this approach is that RegEx is fairly expensive, and is often tricky to fine tune to the point where edge-cases become less troublesome; so I opted to use simple string manipulation to reduce the RegEx work.
Effectively the above creates a substring of the given str variable, from the index point of the lastIndexOf('/') (which does exactly what you'd expect) and adding 1 to that so the substring is from the point after the / not before it.
The regex: (_) matches the underscores, the | just serves as an or operator and the (%\d{2,}) serves to match digit characters that occur twice in succession and follow a % sign.
The parentheses surrounding each part of the regex around the |, serve to identify matching groups, which are used to identify what parts should be replaced by the ' ' (single-space) string in the second of the arguments passed to replace().
References:
lastIndexOf().
replace().
substring().
You can use unescape to decode the percentages:
str = unescape("http://something.org/dom/My_happy_dog_%28is%29cool!")
str = str.replace("http://something.org/dom/","");
Maybe you could use a regular expression to pull out what you need, rather than getting rid of what you don't want. What is it you are trying to keep?
You can also chain them together as in:
str.replace("http://something.org/dom/", "").replace("something else", "");
You haven't defined the problem very exactly. To get rid of all stretches of characters ending in %<digit><digit> you'd say
var re = /.*%\d\d/g;
var str = str.replace(re, "");
ok, if you want to replace all that stuff I think that you would need something like this:
/(http:\/\/.*\.[a-z]{3}\/.*\/)|(\%[a-z0-9][a-z0-9])|_/g
test
var string = "http://something.org/dom/My_happy_dog_%28is%29cool!";
string = string.replace(/(http:\/\/.*\.[a-z]{3}\/.*\/)|(\%[a-z0-9][a-z0-9])|_/g,"");

Categories