Regex to match filename at end of URL - javascript

Having this text:
http://img.oo.com.au/prod/CRWWBGFWG/1t44.jpg
And other texts like this where the last 1 can be any other number and the last 44 can be any other number as well, I need a regex that will match /1t44.jpg.
Everything I've tried so far (/.+?\.([^\.]+)$) matches from the first slash (//img.oo.com.au/prod/CRWWBGFWG/1t44.jpg).
I'm using JavaScript, so whatever works on RegexPal should do.

Here's a simple Regex that will match everything after the last /:
/[^/]*$

If you want to match a filename with a very specific file extenstion, you can use something like this:
/\/\dt\d\d\.jpg$/
This matches:
a slash
followed by a digit
followed by the letter 't'
followed by two digits
followed by '.jpg' at the end of the string
Or, if you really just want the filename (whatever is after the last slash with any file extension), then you can use this:
/\/[^\/]+$/
This matches:
a slash
followed by one or more non-slash characters
at the end of the string
In your sample string of http://img.oo.com.au/prod/CRWWBGFWG/1t44.jpg, both of these will match /1t44.jpg. The first is obviously much more restrictive since it requires a specific format of the filename. The second matches any filename.
Other choices. In node.js development, you can use the path module and use path.parse() to break a path up into all of its various components.
And, there are various libraries written for the browser that will break up a path into its components too.

As Johnsyweb says, a regular express isn't really needed here. AFAIK the fastest way to do this is with lastIndexOf and substr.
str.substr(str.lastIndexOf('/'));

Of course you don't have to use a regular expression to split a string and pop the last part:
var str="http://img.oo.com.au/prod/CRWWBGFWG/1t44.jpg";
var parts = str.split("/");
document.write(parts.pop() + "<br />");

Based on answer of Scott, try this: (JavaScript)
var url = "http://img.oo.com.au/prod/CRWWBGFWG/1t44.jpg";
var path = url.replace(/(.*)([\\\/][^\\\/]*$)/, "$1" );
var lastElement = url.replace(/(.*)([\\\/][^\\\/]*$)/, "$2" );
This can be also matched for Windows/Nix file path, to extract file name or file path :
c:\Program Files\test.js => c:\Program Files
c:\Program Files\test.js => \test.js

This is for Java on a Linux machine. It grabs the last part of a file path, so that it can be used for making a file lock.
// String to be scanned to find the pattern.
String pattern = ".*?([^/.]+)*$";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher match = r.matcher("/usr/local/java/bin/keystore");
/**
* Now you have two matches
* #0 /usr/local/java/bin/keystore
* #1 keystore
*/
String fileLock = "";
if (match.find()) {
fileLock = match.group(1) + ".lock";
}
A little different than the original question, I know. But I hope this helps others who were stuck with the same problem I had.

Related

How to search a string for 1st occurrence of ":/" and then search all other occurences of the found substring inclusive ":/"?

A little explanation:
I have a string like (from a commandline programm execution kpsewhich -all etex.src):
c:/texlive/2019/texmf-dist/tex/luatex/hyph-utf8/etex.srcc:/texlive/2019/texmf-dist/tex/plain/etex/etex.src
This string consists of 2 or more concatenated file paths, which are to be separated again.
Dynamic search pattern: c:/
The files are always on the same volume, here c, but the volume name has to be determined.
Is it possible to do something like this with an RegExp?
I could split the string according to the actual filename etex.src, but is the other approach possible?
Update:
The RegExp as follows
(.+?:[\/\\]+)(?:(?!\1).)*
meets my requirements even better. How to disassemble a string with CaptureGroup?
I'm guessing that maybe this expression would be somewhat close to what you might want to design:
c:\/.*?(?=c:\/|$)
DEMO
I'm not entirely sure what you want this RegExp to retrieve but if you want to get the array of file paths then you can do it with /(?<!^)[^:]+/g regex:
// in node.js
const str = 'c:/texlive/2019/texmf-dist/tex/luatex/hyph-utf8/etex.srcc:/texlive/2019/texmf-dist/tex/plain/etex/etex.src'
const paths = str.match(/(?<!^)[^:]+/g)
// [
// "/texlive/2019/texmf-dist/tex/luatex/hyph-utf8/etex.srcc",
// "/texlive/2019/texmf-dist/tex/plain/etex/etex.src"
// ]
This RegExp searches for a sequence of symbols which don't include : and which don't start at the beginning of the string (this excludes c volume or any other volume name)

Remove slashes from Start and end of the URL

I have a URL like :
var folderPath = 'files/New folder';
Here are the conditions that i want to prevent, For example user tries:
../../.././../../././../files/New folder
OR
../../.././../../././../files/New folder/../../././../.././
OR
./files/New folder/
Basically i need to extract the New folder from the URL thus i need the URL cleaned !
WHAT I HAVE TRIED?
Tried the following but it only removes the Multiple slashes '../' and './' from the start of the URL.
var cleaned = folderPath.replace(/^.+\.\//, '');
EXPECTED OUTPUT:
if someone can provide a function that cleans the url that will be much helpful.
files/New folder
How about a filter?
var oneSlash = (str) => str.split("/").filter(
word => word.match(/\w+/)
).join("/")
console.log(oneSlash(" ../../.././../../././../files/New folder"))
console.log(oneSlash("///../..///files/New folder///../"))
// this imaginary useless path ends up like the others
console.log(oneSlash("files/////New folder/"))
So here the idea is first using the regex i am taking out the match from the input string but it includes // extra which you also want to remove so in the callback function i removing those // also using replace on matched group.
I guess this (using replace twice) still can be improved i am trying to improve a bit more.
function replaceDots(input){
return input.replace(/^[./]+([^.]+)\/?.*/g, function(match,group){
return group.replace(/(.*?)\/*$/, "$1")
})
}
console.log(replaceDots(`../../.././../../././../files/New folder`))
console.log(replaceDots(`files/New folder`))
console.log(replaceDots(`../../.././../../././../files/New folder/../../././../.././`))
console.log(replaceDots(`///../..///files/New folder///../`))
You can use this regex to remove all unwanted text in your path,
\/?\.\.?\/|\/{2,}|\/\s*$
\/?\.\.?\/ this removes all patterns of type ../ or ./ or /../ and \/{2,} removes all occurrences of two or more / and \/\s* removes all trailing slashes in the path.
Demo
console.log('../../.././../../././../files/New folder'.replace(/\/?\.\.?\/|\/{2,}|\/\s*$/g,''));
console.log('../../.././../../././../files/New folder/../../././../.././'.replace(/\/?\.\.?\/|\/{2,}|\/\s*$/g,''));
console.log('./files/New folder/'.replace(/\/?\.\.?\/|\/{2,}|\/\s*$/g,''));
console.log('///../..///files/New folder///../'.replace(/\/?\.\.?\/|\/{2,}|\/\s*$/g,''));
To remove all / preceded by . or / plus ending /:
var folderPaths = [
"../../.././../../././../files/New folder",
"../../.././../../././../files/New folder/../../././../.././",
"./files/New folder/"
];
var re = new RegExp('(?:[./]+)/|/$', 'g');
folderPaths.forEach(e => console.log(e.replace(re, "")));

How can I get a specific part of a URL using RegEx?

I am trying to get a part of a file download using RegEx (or other methods). I have pasted below the link that I am trying to parse and put the part I am trying to select in bold.
https://minecraft.azureedge.net/bin-linux/bedrock-server-1.7.0.13.zip
I have looked around and thought about trying Named Capture Groups, however I couldn't figure it out. I would like to be able to do this in JavaScript/Node.js, even if it requires a module 👻.
You can use node.js default modules to ease the match
URL and path to identify filename, and an easy regexp finally.
const { URL } = require('url')
const path = require('path')
const test = new URL(
'https://minecraft.azureedge.net/bin-linux/bedrock-server-1.7.0.13.zip'
)
/*
test.pathname = '/bin-linux/bedrock-server-1.7.0.13.zip'
path.parse(test.pathname) = { root: '/',
dir: '/bin-linux',
base: 'bedrock-server-1.7.0.13.zip',
ext: '.zip',
name: 'bedrock-server-1.7.0.13' }
match = [ '1.7.0.13', index: 15, input: 'bedrock-server-1.7.0.13' ]
*/
const match = path.parse(test.pathname)
.name
.match(/[0-9.]*$/)
You could use the below regex:
[\d.]+(?=\.\w+$)
This matches dots and digits that are following a file extension. You could also make it more accurate:
\d+(?:\.\d+)*(?=\.\w+$)
I'd stick with this:
-(\d+(?:\.\d+)*)(?:\.\w+)$
It matches a dash before any numbers
The parenthesis will make a capture group
Then, \d+ will match from one to any number of digits
?: will make a group but not capture it
Inside this group, \.\d+ will match a dot followed by any number of digits
The last expression will repeat from zero to any times thanks to *
After that, (?:\.\w+)$ will make a group that matches the extension toward the end of the string but not capture it
So, basically, this format would allow you to capture all the numbers that are after the dash and before the extension, be it 1, 1.7, 1.7.0, 1.7.0.13, 1.7.0.13.5 etc. On the match array, at index [0] you will have the entire regex match, and on [1] you will have your captured group, the number you're looking for.
Perhaps a regular expression like this is what you need?
var url = 'https://minecraft.azureedge.net/bin-linux9.9.9/bedrock-server-1.7.0.13.zip'
var match = url.match(/(\d+[.\d+]*)(?=\.\w+$)/gi)
console.log( match )
The way this pattern /\d+[.\d+]*\d+/gi works is to basically say that we want a sub string match that:
first contains one or more digit characters, ie \d+
immediately following this, there can be optional groupings of digits and decimal characters, ie [.\d+]
and finally, (?=\.\w+$) requires a file extension like .zip to follow immediately after our matched string
For more information on special characters like + and *, see this documentation. Hope that helps!

Split string of folder path

If I have a file path such as:
var/www/parent/folder
How would I go about removing the last folder to return:
var/www/parent
The folders could have any names, I'm quite happy using regex.
Thanks in advance.
use the split->slice->join function:
"var/www/parent/folder".split( '/' ).slice( 0, -1 ).join( '/' );
Use the following regular expression to match the last directory part, and replace it with empty string.
/\/[^\/]+$/
'var/www/parent/folder'.replace(/\/[^\/]+$/, '')
// => "var/www/parent"
UPDATE
If the path ends with /, the above expression will not match the path. If you want to remove the last part of the such path, you need to use folloiwng pattern (to match optional last /):
'var/www/parent/folder/'.replace(/\/[^\/]+\/?$/, '')
// => "var/www/parent"
This is no specialized version of split per se, but you can split by the path.sep like so:
import path from 'path';
filePath.split(path.sep);
If it's always the last folder you want to get rid of, the easiest method would be to use substr() and lastIndexOf():
var parentFolder = folder.substr(0, folder.lastIndexOf('/'));
jsfiddle example

JavaScript Regex to match a URL in a field of text

How can I setup my regex to test to see if a URL is contained in a block of text in javascript. I cant quite figure out the pattern to use to accomplish this
var urlpattern = new RegExp( "(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?"
var txtfield = $('#msg').val() /*this is a textarea*/
if ( urlpattern.test(txtfield) ){
//do something about it
}
EDIT:
So the Pattern I have now works in regex testers for what I need it to do but chrome throws an error
"Invalid regular expression: /(http|ftp|https)://[w-_]+(.[w-_]+)+([w-.,#?^=%&:/~+#]*[w-#?^=%&/~+#])?/: Range out of order in character class"
for the following code:
var urlexp = new RegExp( '(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?' );
Though escaping the dash characters (which can have a special meaning as character range specifiers when inside a character class) should work, one other method for taking away their special meaning is putting them at the beginning or the end of the class definition.
In addition, \+ and \# in a character class are indeed interpreted as + and # respectively by the JavaScript engine; however, the escapes are not necessary and may confuse someone trying to interpret the regex visually.
I would recommend the following regex for your purposes:
(http|ftp|https)://[\w-]+(\.[\w-]+)+([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?
this can be specified in JavaScript either by passing it into the RegExp constructor (like you did in your example):
var urlPattern = new RegExp("(http|ftp|https)://[\w-]+(\.[\w-]+)+([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?")
or by directly specifying a regex literal, using the // quoting method:
var urlPattern = /(http|ftp|https):\/\/[\w-]+(\.[\w-]+)+([\w.,#?^=%&:\/~+#-]*[\w#?^=%&\/~+#-])?/
The RegExp constructor is necessary if you accept a regex as a string (from user input or an AJAX call, for instance), and might be more readable (as it is in this case). I am fairly certain that the // quoting method is more efficient, and is at certain times more readable. Both work.
I tested your original and this modification using Chrome both on <JSFiddle> and on <RegexLib.com>, using the Client-Side regex engine (browser) and specifically selecting JavaScript. While the first one fails with the error you stated, my suggested modification succeeds. If I remove the h from the http in the source, it fails to match, as it should!
Edit
As noted by #noa in the comments, the expression above will not match local network (non-internet) servers or any other servers accessed with a single word (e.g. http://localhost/... or https://sharepoint-test-server/...). If matching this type of url is desired (which it may or may not be), the following might be more appropriate:
(http|ftp|https)://[\w-]+(\.[\w-]+)*([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?
#------changed----here-------------^
<End Edit>
Finally, an excellent resource that taught me 90% of what I know about regex is Regular-Expressions.info - I highly recommend it if you want to learn regex (both what it can do and what it can't)!
Complete Multi URL Pattern.
UPDATED: Nov. 2020, April & June 2021 (Thanks commenters)
Matches all URI or URL in a string!
Also extracts the protocol, domain, path, query and hash. ([a-z0-9-]+\:\/+)([^\/\s]+)([a-z0-9\-#\^=%&;\/~\+]*)[\?]?([^ \#\r\n]*)#?([^ \#\r\n]*)
https://regex101.com/r/jO8bC4/56
Example JS code with output - every URL is turned into a 5-part array of its 'parts' (protocol, host, path, query, and hash)
var re = /([a-z0-9-]+\:\/+)([^\/\s]+)([a-z0-9\-#\^=%&;\/~\+]*)[\?]?([^ \#\r\n]*)#?([^ \#\r\n]*)/mig;
var str = 'Bob: Hey there, have you checked https://www.facebook.com ?\n(ignore) https://github.com/justsml?tab=activity#top (ignore this too)';
var m;
while ((m = re.exec(str)) !== null) {
if (m.index === re.lastIndex) {
re.lastIndex++;
}
console.log(m);
}
Will give you the following:
["https://www.facebook.com",
"https://",
"www.facebook.com",
"",
"",
""
]
["https://github.com/justsml?tab=activity#top",
"https://",
"github.com",
"/justsml",
"tab=activity",
"top"
]
You have to escape the backslash when you are using new RegExp.
Also you can put the dash - at the end of character class to avoid escaping it.
& inside a character class means & or a or m or p or ; , you just need to put & and ; , a, m and p are already match by \w.
So, your regex becomes:
var urlexp = new RegExp( '(http|ftp|https)://[\\w-]+(\\.[\\w-]+)+([\\w-.,#?^=%&:/~+#-]*[\\w#?^=%&;/~+#-])?' );
try (http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?
I've cleaned up your regex:
var urlexp = new RegExp('(http|ftp|https)://[a-z0-9\-_]+(\.[a-z0-9\-_]+)+([a-z0-9\-\.,#\?^=%&;:/~\+#]*[a-z0-9\-#\?^=%&;/~\+#])?', 'i');
Tested and works just fine ;)
Try this general regex for many URL format
/(([A-Za-z]{3,9})://)?([-;:&=\+\$,\w]+#{1})?(([-A-Za-z0-9]+\.)+[A-Za-z]{2,3})(:\d+)?((/[-\+~%/\.\w]+)?/?([&?][-\+=&;%#\.\w]+)?(#[\w]+)?)?/g
The trouble is that the "-" in the character class (the brackets) is being parsed as a range: [a-z] means "any character between a and z." As Vini-T suggested, you need to escape the "-" characters in the character classes, using a backslash.
try this worked for me
/^((ftp|http[s]?):\/\/)?(www\.)([a-z0-9]+)\.[a-z]{2,5}(\.[a-z]{2})?$/
that is so simple and understandable

Categories