Regular expression to remove a file's extension - javascript

I am in need of a regular expression that can remove the extension of a filename, returning only the name of the file.
Here are some examples of inputs and outputs:
myfile.png -> myfile
myfile.png.jpg -> myfile.png
I can obviously do this manually (ie removing everything from the last dot) but I'm sure that there is a regular expression that can do this by itself.
Just for the record, I am doing this in JavaScript

Just for completeness: How could this be achieved without Regular Expressions?
var input = 'myfile.png';
var output = input.substr(0, input.lastIndexOf('.')) || input;
The || input takes care of the case, where lastIndexOf() provides a -1. You see, it's still a one-liner.

/(.*)\.[^.]+$/
Result will be in that first capture group. However, it's probably more efficient to just find the position of the rightmost period and then take everything before it, without using regex.

The regular expression to match the pattern is:
/\.[^.]*$/
It finds a period character (\.), followed by 0 or more characters that are not periods ([^.]*), followed by the end of the string ($).
console.log(
"aaa.bbb.ccc".replace(/\.[^.]*$/,'')
)

/^(.+)(\.[^ .]+)?$/
Test cases where this works and others fail:
".htaccess" (leading period)
"file" (no file extension)
"send to mrs." (no extension, but ends in abbr.)
"version 1.2 of project" (no extension, yet still contains a period)
The common thread above is, of course, "malformed" file extensions. But you always have to think about those corner cases. :P
Test cases where this fails:
"version 1.2" (no file extension, but "appears" to have one)
"name.tar.gz" (if you view this as a "compound extension" and wanted it split into "name" and ".tar.gz")
How to handle these is problematic and best decided on a project-specific basis.

/^(.+)(\.[^ .]+)?$/
Above pattern is wrong - it will always include the extension too. It's because of how the javascript regex engine works. The (\.[^ .]+) token is optional so the engine will successfully match the entire string with (.+)
http://cl.ly/image/3G1I3h3M2Q0M
Here's my tested regexp solution.
The pattern will match filenameNoExt with/without extension in the path, respecting both slash and backslash separators
var path = "c:\some.path/subfolder/file.ext"
var m = path.match(/([^:\\/]*?)(?:\.([^ :\\/.]*))?$/)
var fileName = (m === null)? "" : m[0]
var fileExt = (m === null)? "" : m[1]
dissection of the above pattern:
([^:\\/]*?) // match any character, except slashes and colon, 0-or-more times,
// make the token non-greedy so that the regex engine
// will try to match the next token (the file extension)
// capture the file name token to subpattern \1
(?:\. // match the '.' but don't capture it
([^ :\\/.]*) // match file extension
// ensure that the last element of the path is matched by prohibiting slashes
// capture the file extension token to subpattern \2
)?$ // the whole file extension is optional
http://cl.ly/image/3t3N413g3K09
http://www.gethifi.com/tools/regex
This will cover all cases that was mentioned by #RogerPate but including full paths too

another no-regex way of doing it (the "oposite" of #Rahul's version, not using pop() to remove)
It doesn't require to refer to the variable twice, so it's easier to inline
filename.split('.').slice(0,-1).join()

This will do it as well :)
'myfile.png.jpg'.split('.').reverse().slice(1).reverse().join('.');
I'd stick to the regexp though... =P

return filename.split('.').pop();
it will make your wish come true. But not regular expression way.

In javascript you can call the Replace() method that will replace based on a regular expression.
This regular expression will match everything from the begining of the line to the end and remove anything after the last period including the period.
/^(.*)\..*$/
The how of implementing the replace can be found in this Stackoverflow question.
Javascript regex question

Related

RegExp works in JS and PHP but not in Java

I have a regexp to extract an id and a label out of an HTML source code. It can be found HERE.
As you can see it work fine and its fast but when i try this regexp in java with the same source code it 1. Takes for ever and 2. only matches one string (from the first a to the last a is one match).
I tried it with the Multiline flag on and off but no difference. I don't understand how a regexp can work everywhere but in java. Any ideas?
private static final String COURSE_REGEX = "<a class=\"list-group-item list-group-item-action \" href=\"https:\\/\\/moodle-hs-ulm\\.de\\/course\\/view\\.php\\?id=([0-9]*)\"(?:.*\\s){7}<span class=\"media-body \">([^<]*)<\\/span>";
Pattern pattern = Pattern.compile(COURSE_REGEX, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(sourceCode);
List<String> courses = new ArrayList<>();
while(matcher.find() && matcher.groupCount() == 2){
courses.add(matcher.group(1) + "(" + matcher.group(2) + ")");
}
Your regex is running into catastrophic backtracking because of the gargantuan number of possible permutations the subexpression (?:.*\s){7} needs to check (because the . can also match spaces). Java aborts the match attempt after a certain number of steps (not sure how many, certainly > 1.000.000). PHP or JS may not be so cautious.
If you simplify that part of your regex to .*?, you do get the matches:
"(?s)<a class=\"list-group-item list-group-item-action \" href=\"https://moodle-hs-ulm\\.de/course/view\\.php\\?id=([0-9]*)\".*?<span class=\"media-body \">([^<]*)</span>"
Note that you need the DOTALL flag ((?s), so . may match a newline) instead of the MULTILINE flag which changes the behavior of ^ and $ anchors (none of which your regex is using).
Also note that you don't need to escape slashes in a Java regex.
This solution is not very robust because .*? is rather unspecific. I suppose your previous attempt of (?:.*\\s){7} may have been designed to match no more than 7 lines of text? In that case, you could use (?:(?!</a>).)* instead to ensure that you don't cross over into the next <a> tag. That's one of the dangers of parsing HTML with regex :)
Finally, greetings from a staff member of the faculty of Informatics at your university :)

How do I match URLs with regular expressions?

We want to check if a URL matches mail.google.com or mail.yahoo.com (also a subdomain of them is accepted) but not a URL which contains this string after a question mark. We also want the strings "mail.google.com" and "mail.yahoo.com" to come before the third slash of the URL, for example https://mail.google.com/ is accepted, https://www.facebook.com/mail.google.com/ is not accepted, and https://www.facebook.com/?mail=https://mail.google.com/ is also not accepted. https://mail.google.com.au/ is also not accepted. Is it possible to do it with regular expressions?
var possibleURLs = /^[^\?]*(mail\.google\.com|mail\.yahoo\.com)\//gi;
var url;
// assign a value to var url.
if (url.match(possibleURLs) !== null) {
// Do something...
}
Currently this will match both https://mail.google.com/ and https://www.facebook.com/mail.google.com/ , but we don't want to match https://www.facebook.com/mail.google.com/.
Edit: I want to match any protocol (any string which doesn't contain "?" and "/") followed by a slash "/" twice (the string and the slash can both be twice), then any string which doesn't contain "?" and "/" (if it's not empty, it must end with a dot "."), and then (mail\.google\.com|mail\.yahoo\.com)\/. Case insensitive.
Not being funny - but why must it be a regular expression?
Is there are reason why you couldn't simplify the process using URL (or webkitURL in Chrome and Safari) - the URL constructor simply takes a string and then contains properties for each part of the URL. Whether it supports all the host types that you want to support, I don't know.
Granted, you might still need a regex after that (although really you'd just be checking that the hostname ends with either yahoo.com or google.com), but you would just be running it against the hostname of the URL object rather than the whole URI.
The API is not ubiquitous, but seems reasonably well supported and, anyway, if this is client-side validation then I hope you're checking it on the server, too, because sidestepping javascript validation is easy.
How about
^[a-z]+:\/\/([^.\/]+\.)*mail\.(google|yahoo).com\/
Regex Example Link
^ Anchors the regex at the start of the string
[a-z]+ Matches the protocol. If you want a specific set of protocols, then (https?|ftp) may do the work
([^.\/]+\.)* matches the subdomin part
^([-a-z]+://|^cid:|^//)([^/\?]+\.)?mail\.(google|yahoo)\.com/
Should do the trick
The first ^ means "match beginning of line", the second negates the allowed characters, thus making a slash / not allowed.
Nb. You still have to escape the slashes, or use it as a string in new RegExp(string):
new RegExp('^([-a-z]+://|^cid:|^//)([^/\?]+\.)?mail\.(google|yahoo)\.com/')
OK, I found that it works with:
var possibleURLs = /^([^\/\?]*\/){2}([^\.\/\?]+\.)*(mail\.google\.com|mail\.yahoo\.com)\//gi;

why the code in jquery doesn't work?

the url is as this: http://example.com/download/
var pathname = window.location.pathname;
if(pathname=='download/'){
$("#subnav-content div:first").hide();
$("#subnav-content div:second").show();
}
why the above code in jquery doesn't work? i want to when the url is http://example.com/download/. show the secong div.
ps*:does this check affect the site performance?*
You need the leading slash.
'/download/'
If you expect query string parameters you may try a regular expression to just match the download portion of the url: the following matches /download/.
if (window.location.pathname.match(/^\/download\//i))
Regarding the jquery, there is no :second, you need to use :eq(1)
var pathname = window.location.pathname;
if(pathname=='/download/'){
$("#subnav-content div:first").hide();
$("#subnav-content div:eq(1)").show();
}
Response to comments
I'm putting my comment here because the formatting is horrible in the comments. The regular expression for matching download can be summed up as follows:
/ - start of regular expression matching syntax
^ - means start matching at the very start of the screen
\/ - means match the literal string '/', which is a special character which must be escaped
download - match the literal string 'download'
\/ - again means match the literal string '/'
/ - end of the matching syntax
i - regular expression options, i means ignore case
It was not clear to me what your other note was asking for.
Second is not a selector. You want:
$("#subnav-content div:nth-child(2)").show();
Try using
$("#subnav-content div:eq(0)")
$("#subnav-content div:eq(1)")
Also, you need to bind the code to an Event that will get fired when the Document is ready(load, or onDOMReady where supported) otherwise the divs might not exist in memory yet.
ps*:does this check affect the site performance?*
Every line of code has an effect on site performance. Not necessarily a visible one although.

Javascript regex to find a base URL

I'm going mad with this regex in JS:
var patt1=/^http(s)?:\/\/[a-z0-9-]+(.[a-z0-9-]+)*?(:[0-9]+)?(\/)?$/i;
If I give an input string like "http://www.eitb.com/servicios/concursos/516522/" this regex it's supossed to return NULL, because there are a "folder" after base URL. It works in PHP, but not in Javascript, like in this script:
<script type="text/javascript">
var str="http://www.eitb.com/servicios/concursos/516522/";
var patt1=/^http(s)?:\/\/[a-z0-9-]+(.[a-z0-9-]+)*?(:[0-9]+)?(\/)?$/i;
document.write(str.match(patt1));
</script>
It returns
http://www.eitb.com/servicios/concursos/516522/,,/516522,,/
The question is: why it is not working? How to make it work?
The idea is to implement this regex in another function to get NULL when the URL passed is not in the correct format:
http://www.eitb.com/ -> Correct
http://www.eitb.com/something -> Incorrect
Thanks
I'm no javascript pro, but accustomed to perl regexp, so I'll give it a try; the . in the middle of the regexp might need to be escaped, as it can map a / and jinx the whole regexp.
Try this way:
var patt1=/^http(s)?:\/\/[a-z0-9-]+(\.[a-z0-9-]+)*?(:[0-9]+)?(\/)?$/i;
Considering you have a properly formatted URL this simple RegExp should do the trick every time.
var patt1=/^https?:\/\/[^\/]+/i;
Here's the breakdown...
Starting with the first position (denoted by ^)
Look for http
http can be followed by s (denoted by the ? which means 0 or 1 of the character or set before it)
Then look for :// after the http or https (denoted by :\/\/)
Next match any number of characters except for / (denoted by [^\/]+ - the + means 1 or more)
Case insensitive (denoted by i)
NOTE: this will also pick up ports http://example.com:80 - to get rid of the :80 (or a colon followed by any port number) simply add a : to the negated character class [^\/:] for example.

JS/Jquery, Match not finding the PNG = match('/gif|jpg|jpeg|png/')

I have the following code which I use to match fancybox possible elements:
$('a.grouped_elements').each(function(){
var elem = $(this);
// Convert everything to lower case to match smart
if(elem.attr('href').toLowerCase().match('/gif|jpg|jpeg|png/') != null) {
elem.fancybox();
}
});
It works great with JPGs but it isn't matching PNGs for some reason. Anyone see a bug with the code?
Thanks
A couple of things.
Match accepts an object of RegExp, not a string. It may work in some browsers, but is definitely not standard.
"gif".match('/gif|png|jpg/'); // null​​​​​​​​​​​​​​​​​​​​​​​​​​​​
Without the strings
"gif".match(/gif|png|jpg/); // ["gif"]
Also, you would want to check these at the end of a filename, instead of anywhere in the string.
"isthisagif.nope".match(/(gif|png|jpg|jpeg)/); // ["gif", "gif"]
Only searching at the end of string with $ suffix
"isthisagif.nope".match(/(gif|png|jpg|jpeg)$/); // null
No need to make href lowercase, just do a case insensitive search /i.
Look for a dot before the image extension as an additional check.
And some tests. I don't know how you got any results back with using a string argument to .match. What browser are you on?
I guess the fact that it'll match anywhere in the string (it would match "http://www.giftshop.com/" for instance) could be considered a bug. I'd use
/\.(gif|jpe?g|png)$/i
You are passing a string to the match() function rather than a regular expression. In JavaScript, strings are delimited with single quotes, and regular expressions are delimited with forward slashes. If you use both, you have a string, not a regex.
This worked perfectly for me: /.+\.(gif|png|jpe?g)$/i
.+ -> any string
\. -> followed by a point.
(gif|png|jpe?g) -> and then followed by any of these extensions. jpeg may or may not have the letter e.
$ -> now the end of the string it's expected
/i -> case insensitive mode: matches both sflkj.JPG and lkjfsl.jpg

Categories