Remove slashes from Start and end of the URL - javascript

I have a URL like :
var folderPath = 'files/New folder';
Here are the conditions that i want to prevent, For example user tries:
../../.././../../././../files/New folder
OR
../../.././../../././../files/New folder/../../././../.././
OR
./files/New folder/
Basically i need to extract the New folder from the URL thus i need the URL cleaned !
WHAT I HAVE TRIED?
Tried the following but it only removes the Multiple slashes '../' and './' from the start of the URL.
var cleaned = folderPath.replace(/^.+\.\//, '');
EXPECTED OUTPUT:
if someone can provide a function that cleans the url that will be much helpful.
files/New folder

How about a filter?
var oneSlash = (str) => str.split("/").filter(
word => word.match(/\w+/)
).join("/")
console.log(oneSlash(" ../../.././../../././../files/New folder"))
console.log(oneSlash("///../..///files/New folder///../"))
// this imaginary useless path ends up like the others
console.log(oneSlash("files/////New folder/"))

So here the idea is first using the regex i am taking out the match from the input string but it includes // extra which you also want to remove so in the callback function i removing those // also using replace on matched group.
I guess this (using replace twice) still can be improved i am trying to improve a bit more.
function replaceDots(input){
return input.replace(/^[./]+([^.]+)\/?.*/g, function(match,group){
return group.replace(/(.*?)\/*$/, "$1")
})
}
console.log(replaceDots(`../../.././../../././../files/New folder`))
console.log(replaceDots(`files/New folder`))
console.log(replaceDots(`../../.././../../././../files/New folder/../../././../.././`))
console.log(replaceDots(`///../..///files/New folder///../`))

You can use this regex to remove all unwanted text in your path,
\/?\.\.?\/|\/{2,}|\/\s*$
\/?\.\.?\/ this removes all patterns of type ../ or ./ or /../ and \/{2,} removes all occurrences of two or more / and \/\s* removes all trailing slashes in the path.
Demo
console.log('../../.././../../././../files/New folder'.replace(/\/?\.\.?\/|\/{2,}|\/\s*$/g,''));
console.log('../../.././../../././../files/New folder/../../././../.././'.replace(/\/?\.\.?\/|\/{2,}|\/\s*$/g,''));
console.log('./files/New folder/'.replace(/\/?\.\.?\/|\/{2,}|\/\s*$/g,''));
console.log('///../..///files/New folder///../'.replace(/\/?\.\.?\/|\/{2,}|\/\s*$/g,''));

To remove all / preceded by . or / plus ending /:
var folderPaths = [
"../../.././../../././../files/New folder",
"../../.././../../././../files/New folder/../../././../.././",
"./files/New folder/"
];
var re = new RegExp('(?:[./]+)/|/$', 'g');
folderPaths.forEach(e => console.log(e.replace(re, "")));

Related

Split string of folder path

If I have a file path such as:
var/www/parent/folder
How would I go about removing the last folder to return:
var/www/parent
The folders could have any names, I'm quite happy using regex.
Thanks in advance.
use the split->slice->join function:
"var/www/parent/folder".split( '/' ).slice( 0, -1 ).join( '/' );
Use the following regular expression to match the last directory part, and replace it with empty string.
/\/[^\/]+$/
'var/www/parent/folder'.replace(/\/[^\/]+$/, '')
// => "var/www/parent"
UPDATE
If the path ends with /, the above expression will not match the path. If you want to remove the last part of the such path, you need to use folloiwng pattern (to match optional last /):
'var/www/parent/folder/'.replace(/\/[^\/]+\/?$/, '')
// => "var/www/parent"
This is no specialized version of split per se, but you can split by the path.sep like so:
import path from 'path';
filePath.split(path.sep);
If it's always the last folder you want to get rid of, the easiest method would be to use substr() and lastIndexOf():
var parentFolder = folder.substr(0, folder.lastIndexOf('/'));
jsfiddle example

Match Url path without query string

I would like to match a path in a Url, but ignoring the querystring.
The regex should include an optional trailing slash before the querystring.
Example urls that should give a valid match:
/path/?a=123&b=123
/path?a=123&b=123
So the string '/path' should match either of the above urls.
I have tried the following regex: (/path[^?]+).*
But this will only match urls like the first example above: /path/?a=123&b=123
Any idea how i would go about getting it to match the second example without the trailing slash as well?
Regex is a requirement.
No need for regexp:
url.split("?")[0];
If you really need it, then try this:
\/path\?*.*
EDIT Actually the most precise regexp should be:
^(\/path)(\/?\?{0}|\/?\?{1}.*)$
because you want to match either /path or /path/ or /path?something or /path/?something and nothing else. Note that ? means "at most one" while \? means a question mark.
BTW: What kind of routing library does not handle query strings?? I suggest using something else.
http://jsfiddle.net/bJcX3/
var re = /(\/?[^?]*?)\?.*/;
var p1 = "/path/to/something/?a=123&b=123";
var p2 = "/path/to/something/else?a=123&b=123";
var p1_matches = p1.match(re);
var p2_matches = p2.match(re);
document.write(p1_matches[1] + "<br>");
document.write(p2_matches[1] + "<br>");

Removing last part of URL based on

I need to remove any occurence of a product number that may occur in URLs, using javascript/jquery.
URL looks like this:
http://www.mysite.com/section1/section2/section3/section4/01-012-15_1571884
The final part of the url is always formatted with 2 digits followed by -, so I was thinking a regex might do the job? I need everything removing after the last /.
It must also work when the product occurs higher or lower in the hierarchy, i.e.: http://www.mysite.com/section1/section2/01-012-15_1571884
So far I have tried different solutions with location.pathname and splits, but I am stuck on how to handle differences in product hierarchy and handling the arrays.
DEMO
var x = "http://www.mysite.com/section1/section2/section3/section4/01-012-15_1571884";
console.log(x.substr(0,x.lastIndexOf('/')));
Use lastIndexOf to find the last occurence of "/" and then remove the rest of the path using substring.
var url = 'http://www.mysite.com/section1/section2/section3/section4/01-012-15_1571884';
parts = url.split('/');
parts.pop();
url = parts.join('/');
http://jsfiddle.net/YXe6L/
var a = 'http://www.mysite.com/section1/section2/01-012-15_1571884',
result = a.replace(a.match(/(\d{1,2}-\d{1,3}-\d{1,2}_\d+)[^\d]*/g), '');
JSFiddle: http://jsfiddle.net/2TVBk/2/
This is a very nice online regex tester to test your regexes with: http://regexpal.com/
Here is an approach that will properly handle a situation where there is no product ID as you requested. http://jsfiddle.net/84GVe/
var url1 = "http://www.mysite.com/section1/section2/section3/section4/01-012-15_1571884";
var url2 = "http://www.mysite.com/section1/section2/section3/section4";
function removeID(url) {
//look for a / followed by _, - or 0-9 characters,
//and use $ to ensure it is the end of the string
var reg = /\/[-\d_]+$/;
if(reg.test(url))
{
url = url.substr(0,url.lastIndexOf('/'));
}
return url;
}
console.log( removeID(url1) );
console.log( removeID(url2) );

Regex to match filename at end of URL

Having this text:
http://img.oo.com.au/prod/CRWWBGFWG/1t44.jpg
And other texts like this where the last 1 can be any other number and the last 44 can be any other number as well, I need a regex that will match /1t44.jpg.
Everything I've tried so far (/.+?\.([^\.]+)$) matches from the first slash (//img.oo.com.au/prod/CRWWBGFWG/1t44.jpg).
I'm using JavaScript, so whatever works on RegexPal should do.
Here's a simple Regex that will match everything after the last /:
/[^/]*$
If you want to match a filename with a very specific file extenstion, you can use something like this:
/\/\dt\d\d\.jpg$/
This matches:
a slash
followed by a digit
followed by the letter 't'
followed by two digits
followed by '.jpg' at the end of the string
Or, if you really just want the filename (whatever is after the last slash with any file extension), then you can use this:
/\/[^\/]+$/
This matches:
a slash
followed by one or more non-slash characters
at the end of the string
In your sample string of http://img.oo.com.au/prod/CRWWBGFWG/1t44.jpg, both of these will match /1t44.jpg. The first is obviously much more restrictive since it requires a specific format of the filename. The second matches any filename.
Other choices. In node.js development, you can use the path module and use path.parse() to break a path up into all of its various components.
And, there are various libraries written for the browser that will break up a path into its components too.
As Johnsyweb says, a regular express isn't really needed here. AFAIK the fastest way to do this is with lastIndexOf and substr.
str.substr(str.lastIndexOf('/'));
Of course you don't have to use a regular expression to split a string and pop the last part:
var str="http://img.oo.com.au/prod/CRWWBGFWG/1t44.jpg";
var parts = str.split("/");
document.write(parts.pop() + "<br />");
Based on answer of Scott, try this: (JavaScript)
var url = "http://img.oo.com.au/prod/CRWWBGFWG/1t44.jpg";
var path = url.replace(/(.*)([\\\/][^\\\/]*$)/, "$1" );
var lastElement = url.replace(/(.*)([\\\/][^\\\/]*$)/, "$2" );
This can be also matched for Windows/Nix file path, to extract file name or file path :
c:\Program Files\test.js => c:\Program Files
c:\Program Files\test.js => \test.js
This is for Java on a Linux machine. It grabs the last part of a file path, so that it can be used for making a file lock.
// String to be scanned to find the pattern.
String pattern = ".*?([^/.]+)*$";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher match = r.matcher("/usr/local/java/bin/keystore");
/**
* Now you have two matches
* #0 /usr/local/java/bin/keystore
* #1 keystore
*/
String fileLock = "";
if (match.find()) {
fileLock = match.group(1) + ".lock";
}
A little different than the original question, I know. But I hope this helps others who were stuck with the same problem I had.

Match filename in <link href="/path/?????/?????/????.css"

Need to get the css file name from a link tag that is in a specific folder.
<link href="/assets/49f0ugdf8g/sub/style.css" -> style.css
Currently have
match(`<link .*?href="\/assets\/(.*?\.css)"/i)
Which returns the path minus "/assets/".
Can it be extended to remove the rest of the path and just return the file name.
It would be simpler not to use regex and to use the native JS String.split function:
var link = document.getElementsByTagName('link')[0]; // or whatever JS to get your link element
var filename = link.href.split('/').pop(); // split the string by the / character and get the last part
Sure:
match(<link .*?href="\/assets\/([^/]+\.css)"/i)
// ^^^^^ change here
I dropped the ? because you probably want the capture group to be greedy. Also used + rather than * on the assumption there will always be at least one character in the "file" name.
First, be warned.
Try this: "\/assets\/(?:[^\/">]+\/)*([^\/">]*?.css)".
The (?:...) is a non-capturing group.
Try:
match(`<link .*?href="\/assets\/(?:[^\/]*\/)*(.*?\.css)"/i)
var link = '<link href="/assets/49f0ugdf8g/sub/style.css">';
var match = link.match(/<link .*?href="(.*)">/i);
var filename = match[1].split('/').pop();

Categories