Match everything after word using only regex - javascript

I'm trying to set up a ShareX custom engine, and after the upload I'm given the full url, for instance http://foo.com/HF139hR and I can work that string with regex before copying it to clipboard. What I want to do is to get only the last part of the url, HF139hR so I can throw it into another url, say http://foo.com/?viewer=HF139hR. So far I was using the expression\w+$ to grab it but sometimes I can get an upload error, and that will also get the last word of the error message and pass it to ?viewer=.
Doing my research I found \bfoo.com\/\K\S+, which is exactly what I want, but unfortunately it is not supported on javascript.

\bfoo.com/\K\S+
\bfoo.com\/(\S+)
You can use a similar one and grab the group 1 or capture 1

You can use this Regex: /\/(\w+)(\?+.*)*$/ and get the capturing group between (), this will avoid the part of the upload error which starts with ? like in the example `?viewer=$1$, you can try it here:
var url="http://foo.com/HF139hR?=viewer=$1$";
var reg=/\/(\w+)(\?+.*)*$/;
alert(url.match(reg)[1]);
And if you use only the url="http://foo.com/HF139hR" as a url it will also match the same thing.
And you can take a look at this Regex DEMO where you can see the match information.

Related

Regex to get a specific part of an URL

Say I've got an URL:
http://www.example.com/foo/bar/yes_no_no.html
I'm looking for a regex that can extract all the characters of the filename up to the first underscore. So in the above URL I want to extract "yes".
So far I managed to get the whole "yes_no_no" bit using:
/([^\/]+)(?=\.\w+$)/, but I can't seem to get it to only match "yes".
Try this:
/\/([^_\/]+)_?[^\/]*\.[a-z]+$/
Example:
console.log('http://www.example.com/foo/bar/yes_no_no.html'.match(/\/([^_\/]+)_?[^\/]*\.[a-z]+$/)[1])
console.log('http://www.example.com/foo/bar/yes.html'.match(/\/([^_\/]+)_?[^\/]*\.[a-z]+$/)[1])

Capturing optional part of URL with RegExp

While writing an API service for my site, I realized that String.split() won't do it much longer, and decided to try my luck with regular expressions. I have almost done it but I can't find the last bit. Here is what I want to do:
The URL represents a function call:
/api/SECTION/FUNCTION/[PARAMS]
This last part, including the slash, is optional. Some functions display a JSON reply without having to receive any arguments. Example: /api/sounds/getAllSoundpacks prints a list of available sound packs. Though, /api/sounds/getPack/8Bit prints the detailed information.
Here is the expression I have tried:
req.url.match(/\/(.*)\/(.*)\/?(.*)/);
What am I missing to make the last part optional - or capture it in whole?
This will capture everything after FUNCTION/ in your URL, independent of the appearance of any further / after FUNCTION/:
FUNCTION\/(.+)$
The RegExp will not match if there is no part after FUNCTION.
This regex should work by making last slash and part after optional:
/^\/[^/]*\/[^/]*(?:\/.*)?$/
This matches all of these strings:
/api/SECTION/FUNCTION/abc
/api/SECTION
/api/SECTION/
/api/SECTION/FUNCTION
Your pattern /(.*)/(.*)/?(.*) was almost correct, it's just a bit too short - it allows 2 or 3 slashes, but you want to accept anything with 3 or 4 slashes. And if you want to capture the last (optional) slash AND any text behind it as a whole, you simply need to create a group around that section and make it optional:
/.*/.*/.*(?:/.+)?
should do the trick.
Demo. (The pattern looks different because multiline mode is enabled, but it still works. It's also a little "better" because it won't match garbage like "///".)

trouble using string.replace with regex

Given something a regex like this:
http://rubular.com/r/ai1LFT5jvK
I want to use string.replace to replace "subdir" with a string of my choosing.
Doing myStr.replace(/^.*\/\/.*\.net\/.*\/(.*)\/.*\z/,otherStr)
only returns the same string, as shown here: http://jsfiddle.net/nLmbV/
If you view the Rublar, it appears to capture what I want it to capture, but on the Fiddle, it doesn't replace it.
I'd like to know why this happens, and what I'm doing wrong. A correct regex or a correct implementation of the replace call would be nice, but most of all, I want to understand what I'm doing wrong so that I can avoid it in the future.
EDIT
I've updated the fiddle to change my regex from:
/^.*\/\/.*\.net\/.*\/(.*)\/.*\z/
to
/^.*\/\/.*\.net\/.*\/(.*)\/.*$/
And according to the fiddle, it just returns hello instead of https://xxxxxxxxxxx.cloudfront.net/dir/hello/Slide1_v2.PNG
It's that little \z in your regex.
You probably forgot to replace it with a $ sign. JavaScript uses ^ and $ as anchors, while Ruby uses \A and \z.
To answer your edit:
The match is always replaced as a whole. You'll want to group both the left side and the right side of the to-be-replaced part and reinsert it in the replacement:
url.replace(/^(.*\/\/.*\.net\/.*\/).*(\/.*)$/,"$1hello$2")
Before I get marked down, I know the question asks about regexp. The reason for this answer URLs are nearly impossible to process reliably with a regexp without writing fiendishly complex regexps. It can be done, but it makes your head hurt!
If you are doing this in a browser, you can use an A tag in your script to make things much simpler. The A tag knows how to parse them into pieces, and it lets you modify the pieces independently, so you only need to deal with the pathname:
//make a temporary a tag
var a = document.createElement('a');
//set the href property to the url you want to process
a.href = "scheme://host.domain/path/to/the/file?querystring"
//grab the path part of the url, and chop up into an array of directories
var dirs = a.pathname.split('/');
//set 2nd dir name - array is ['','path','to','file']
dirs[2]='hello';
//put the path back together
a.pathname = dirs.join('/');
a.href now contains the URL you want.
More lines, but also more hair left when you come back to change the code later.

Getting http URL using regex when there are multiple http URLs

I want to extract different .swf files from different sites for a project. Different sites use different source methods so I can't use src= or data= in my regex.
I'm able to match the file name with /[\w-]+.swf/g , but when I try to match the full path( http(.*?).swf ) starting with http it matches another http before the path (the first one in the code). Also I can't use src= or data= etc, it must be only the link.
Basically, is there a way to limit the match to the first http found when searching backwards?
If anyone cares to take a look then here's the code: http://pastebin.com/kT20UqqJ .
And here's a good place to test regex: http://regex.larsolavtorvik.com/
Try the following one:
var regex = /http:[\.\/\w-%]+\.swf/g
You need to escape the . else it will match an arbitrary character and the / since it is the expression delimiter.
You can see the working Example here.
If you have url encoded characters (like white space) you would have also a % in your url.
Here is an example which will work in this case: /http:[\./\w%-]+\.swf/g
Here is a tool where you can test the regex: http://regexpal.com/
And one where you can check it's performance: http://regexter.com/

Can someone tell me the purpose of the second capture group in the jQuery rts regular expression?

In Jeff Roberson's jQuery Regular Expressions Review he proposes changing the rts regular expression in jQuery's ajax.js from /(\?|&)_=.*?(&|$)/ to /([?&])_=[^&\r\n]*(&?)/. In both versions, what is the purpose of the second capture group? The code does a replacement of the current random timestamp with a new random timestamp:
var ts = jQuery.now();
// try replacing _= if it is there
var ret = s.url.replace(rts, "$1_=" + ts + "$2");
Doesn't it only replace what it matches? I am thinking this does the same:
var ret = s.url.replace(/([?&])_=[^&\r\n]*/, "$1_=" + ts);
Can someone explain the purpose of the second capture group?
It's to pick up the next delimiter in the query string on the URL, so that it still works properly as a query string. Thus if the url is
http://foo.bar/what/ever?blah=blah&_=12345&zebra=banana
then the second group picks up the "&" before "zebra".
That's an awesome blog post by the way and everybody should read it.
edit — now that I think about it, I'm not sure why it's necessary to bother with replacing that second delimiter. In the "fixed" expression, that greedy * will pick up the whole parameter value and stop at the delimiter (or the end of the string) anyway.
I think you're right. It was needed in the original because matching the ampersand or end-of-string was how the .*? knew when to stop. In Jeff's version that's no longer necessary.
As the author of the article I can't tell you the reason for the second capture group. My intent with the article was to take existing regexes and simply make them more efficient - i.e. they should all match the same text - just do it faster. Unfortunately I did not have time to delve deeply into the code to see exactly how each and every one of them was being used. I assumed that the capture group for this one was there for a reason so I did not mess with it.

Categories