Regex Help - Match any URL Parameter & Value not in List

Regex Help - Match any URL Parameter & Value not in List - javascript

Thank you for looking at this!
I am trying to build some Regex that works in JavaScript that will match ALL URL parameters and their values that are not in my predefined list. Example:
Raw URL:
/folder/index.html?knownParamA=1234&unknownParamA=1234&knownParamB=1234&unknownParamB=1234
My List of Know Parameters:
((knownParamA|knownParamB|knownParamC)=[^&]*&?)/gi
Resulting (Cleaned up) URL:
/folder/index.html?knownParamA=1234&unknownParam=1234
Ultimately, I want to capture a cleaned up version of any URL with only the parameters and values I need. There's tons of parameters on my website that are meaningless to me and only get in the way. One solution I found required a look back but I don't think JavaScript supports those.
Thank you so much for the help!!!
Solution Based on Feedback Below:
pageURL = window.location.pathname + window.location.search;
knownParams = 'knownParamA|knownParamB|knownParamC|knownParamD';
var urlCleanerRegexStep1 = new RegExp('[?&](?!(?:' + knownParams + ')(?==))[^=]+=[^&]*', 'gi');
var urlCleanerRegexStep2 = new RegExp('[?&]([^=]+=[^&]*)', '');
cleanPageURL = pageURL.replace(urlCleanerRegexStep1, "").replace(urlCleanerRegexStep2, '?$1');

Negative searches are tricky, and require zero-width lookaheads.
This will find the unknown parameters and strip them out of the URL: (Update 2: This doesn't keep unknown parameters that start with known parameters any more.)
step1 = url.replace(/[?&](?!(?:knownParamA|knownParamB)(?==))[^=]+=[^&]*/gi, '');
// "/folder/index.html?knownParamA=1234&knownParamB=1234"
However, if the first parameter gets stripped out, your first remaining parameter will be preceded by a & instead of a ?, and you will need to replace that too:
clean = step1.replace(/[?&]([^=]+=[^&]*)/, '?$1');
// "/folder/index.html?knownParamA=1234&knownParamB=1234"
You can chain these together, of course:
clean = url.replace(/[?&](?!(?:knownParamA|knownParamB)(?==))[^=]+=[^&]*/gi, '').
replace(/[?&]([^=]+=[^&]*)/, '?$1');
Update: I have included user3842539's expansion of the code, as it's easier to read here than in a comment.
pageURL = window.location.pathname + window.location.search;
knownParams = 'knownParamA|knownParamB|knownParamC|knownParamD';
var urlCleanerRegexStep1 = new RegExp('[?&](?!(?:' + knownParams + ')(?==))[^=]+=[^&]*', 'gi');
var urlCleanerRegexStep2 = new RegExp('[?&]([^=]+=[^&]*)', '');
cleanPageURL = pageURL.replace(urlCleanerRegexStep1, '').replace(urlCleanerRegexStep2, '?$1');
To help you interpret these regexes:
[?&] = either ? or &
(...) = captured group
(?!...) = not followed by a match for this group
(?:...) = uncaptured group
(?=...) = followed by a match for this group
= = =
[^=] = any character other than =
+ = one or more times
[^&] = any character other than &
* = zero or more times
Outside the regex body,
The g flag means 'all matches' (as opposed to only the first)
The i flag means 'case-insensitive'
In the replacement string, $1 means 'captured group 1'

Related

Concatenate / simplify RegExp

I've this working RegExp in my JavaScript file:
var reA = new RegExp(urlValueToRemove);
var reB = new RegExp('(,&)');
var reC = new RegExp('(,,)');
var reD = new RegExp('(=,)');
var reE = new RegExp('(,$)');
window.history.pushState(null, null, decodeURIComponent(window.location.search).replace(reA, '').replace(reB, '&').replace(reC, ',').replace(reD, '=').replace(reE, ''));
Is it possible to concatenate / simplify this so that I don't need to do the replace 5 times?
I've asked this in the codereview community but there is nobody available so I think I must need to wait days there.
Example
When I have this URL here:
http://localhost.com/?color=Red,Blue,Green&size=X,L,M,S
When I want to remove now the Green from the URL I can pass Green to the first Regex reA and it gets removed from the URL:
http://localhost.com/?color=Red,Blue&size=X,L,M,S

You can use the capture group to indicate what should be kept, and join the two cases with a |: one case needs to keep the character that precedes the word (like =), the other what follows the word (like &):
function removeWord(url, text) {
const re = new RegExp(`,${text}(&|,|$)|(=)${text},`, 'g');
return url.replace(re, '$1$2');
}
const url = "http://localhost.com/?color=Red,Blue,Green&size=X,L,M,S"
console.log(removeWord(url, "Green"));

How to split a word for getting a specific value in Javascript or Jquery? [duplicate]

How do I get the last segment of a url? I have the following script which displays the full url of the anchor tag clicked:
$(".tag_name_goes_here").live('click', function(event)
{
event.preventDefault();
alert($(this).attr("href"));
});
If the url is
http://mywebsite/folder/file
how do I only get it to display the "file" part of the url in the alert box?

You can also use the lastIndexOf() function to locate the last occurrence of the / character in your URL, then the substring() function to return the substring starting from that location:
console.log(this.href.substring(this.href.lastIndexOf('/') + 1));
That way, you'll avoid creating an array containing all your URL segments, as split() does.

var parts = 'http://mywebsite/folder/file'.split('/');
var lastSegment = parts.pop() || parts.pop(); // handle potential trailing slash
console.log(lastSegment);

window.location.pathname.split("/").pop()

The other answers may work if the path is simple, consisting only of simple path elements. But when it contains query params as well, they break.
Better use URL object for this instead to get a more robust solution. It is a parsed interpretation of the present URL:
Input:
const href = 'https://stackoverflow.com/boo?q=foo&s=bar'
const segments = new URL(href).pathname.split('/');
const last = segments.pop() || segments.pop(); // Handle potential trailing slash
console.log(last);
Output: 'boo'
This works for all common browsers. Only our dying IE doesn't support that (and won't). For IE there is a polyfills available, though (if you care at all).

Just another solution with regex.
var href = location.href;
console.log(href.match(/([^\/]*)\/*$/)[1]);

Javascript has the function split associated to string object that can help you:
const url = "http://mywebsite/folder/file";
const array = url.split('/');
const lastsegment = array[array.length-1];

Shortest way how to get URL Last Segment with split(), filter() and pop()
function getLastUrlSegment(url) {
return new URL(url).pathname.split('/').filter(Boolean).pop();
}
console.log(getLastUrlSegment(window.location.href));
console.log(getLastUrlSegment('https://x.com/boo'));
console.log(getLastUrlSegment('https://x.com/boo/'));
console.log(getLastUrlSegment('https://x.com/boo?q=foo&s=bar=aaa'));
console.log(getLastUrlSegment('https://x.com/boo?q=foo#this'));
console.log(getLastUrlSegment('https://x.com/last segment with spaces'));
Works for me.

Or you could use a regular expression:
alert(href.replace(/.*\//, ''));

var urlChunks = 'mywebsite/folder/file'.split('/');
alert(urlChunks[urlChunks.length - 1]);

Returns the last segment, regardless of trailing slashes:
var val = 'http://mywebsite/folder/file//'.split('/').filter(Boolean).pop();
console.log(val);

I know, it is too late, but for others:
I highly recommended use PURL jquery plugin. Motivation for PURL is that url can be segmented by '#' too (example: angular.js links), i.e. url could looks like
http://test.com/#/about/us/
or
http://test.com/#sky=blue&grass=green
And with PURL you can easy decide (segment/fsegment) which segment you want to get.
For "classic" last segment you could write:
var url = $.url('http://test.com/dir/index.html?key=value');
var lastSegment = url.segment().pop(); // index.html

Get the Last Segment using RegEx
str.replace(/.*\/(\w+)\/?$/, '$1');
$1 means using the capturing group. using in RegEx (\w+) create the first group then the whole string replace with the capture group.
let str = 'http://mywebsite/folder/file';
let lastSegment = str.replace(/.*\/(\w+)\/?$/, '$1');
console.log(lastSegment);

Also,
var url = $(this).attr("href");
var part = url.substring(url.lastIndexOf('/') + 1);

Building on Frédéric's answer using only javascript:
var url = document.URL
window.alert(url.substr(url.lastIndexOf('/') + 1));

If you aren't worried about generating the extra elements using the split then filter could handle the issue you mention of the trailing slash (Assuming you have browser support for filter).
url.split('/').filter(function (s) { return !!s }).pop()

window.alert(this.pathname.substr(this.pathname.lastIndexOf('/') + 1));
Use the native pathname property because it's simplest and has already been parsed and resolved by the browser. $(this).attr("href") can return values like ../.. which would not give you the correct result.
If you need to keep the search and hash (e.g. foo?bar#baz from http://quux.com/path/to/foo?bar#baz) use this:
window.alert(this.pathname.substr(this.pathname.lastIndexOf('/') + 1) + this.search + this.hash);

To get the last segment of your current window:
window.location.href.substr(window.location.href.lastIndexOf('/') +1)

you can first remove if there is / at the end and then get last part of url
let locationLastPart = window.location.pathname
if (locationLastPart.substring(locationLastPart.length-1) == "/") {
locationLastPart = locationLastPart.substring(0, locationLastPart.length-1);
}
locationLastPart = locationLastPart.substr(locationLastPart.lastIndexOf('/') + 1);

var pathname = window.location.pathname; // Returns path only
var url = window.location.href; // Returns full URL
Copied from this answer

// Store original location in loc like: http://test.com/one/ (ending slash)
var loc = location.href;
// If the last char is a slash trim it, otherwise return the original loc
loc = loc.lastIndexOf('/') == (loc.length -1) ? loc.substring(0,loc.length-1) : loc.substring(0,loc.lastIndexOf('/'));
var targetValue = loc.substring(loc.lastIndexOf('/') + 1);
targetValue = one
If your url looks like:
http://test.com/one/
or
http://test.com/one
or
http://test.com/one/index.htm
Then loc ends up looking like:
http://test.com/one
Now, since you want the last item, run the next step to load the value (targetValue) you originally wanted.
var targetValue = loc.substr(loc.lastIndexOf('/') + 1);
// Store original location in loc like: http://test.com/one/ (ending slash)
let loc = "http://test.com/one/index.htm";
console.log("starting loc value = " + loc);
// If the last char is a slash trim it, otherwise return the original loc
loc = loc.lastIndexOf('/') == (loc.length -1) ? loc.substring(0,loc.length-1) : loc.substring(0,loc.lastIndexOf('/'));
let targetValue = loc.substring(loc.lastIndexOf('/') + 1);
console.log("targetValue = " + targetValue);
console.log("loc = " + loc);

Updated raddevus answer :
var loc = window.location.href;
loc = loc.lastIndexOf('/') == loc.length - 1 ? loc.substr(0, loc.length - 1) : loc.substr(0, loc.length + 1);
var targetValue = loc.substr(loc.lastIndexOf('/') + 1);
Prints last path of url as string :
test.com/path-name = path-name
test.com/path-name/ = path-name

I am using regex and split:
var last_path = location.href.match(/./(.[\w])/)[1].split("#")[0].split("?")[0]
In the end it will ignore # ? & / ending urls, which happens a lot. Example:
https://cardsrealm.com/profile/cardsRealm -> Returns cardsRealm
https://cardsrealm.com/profile/cardsRealm#hello -> Returns cardsRealm
https://cardsrealm.com/profile/cardsRealm?hello -> Returns cardsRealm
https://cardsrealm.com/profile/cardsRealm/ -> Returns cardsRealm

I don't really know if regex is the right way to solve this issue as it can really affect efficiency of your code, but the below regex will help you fetch the last segment and it will still give you the last segment even if the URL is followed by an empty /. The regex that I came up with is:
[^\/]+[\/]?$

I know it is old but if you want to get this from an URL you could simply use:
document.location.pathname.substring(document.location.pathname.lastIndexOf('/.') + 1);
document.location.pathname gets the pathname from the current URL.
lastIndexOf get the index of the last occurrence of the following Regex, in our case is /.. The dot means any character, thus, it will not count if the / is the last character on the URL.
substring will cut the string between two indexes.

if the url is http://localhost/madukaonline/shop.php?shop=79
console.log(location.search); will bring ?shop=79
so the simplest way is to use location.search
you can lookup for more info here
and here

You can do this with simple paths (w/0) querystrings etc.
Granted probably overly complex and probably not performant, but I wanted to use reduce for the fun of it.
"/foo/bar/"
.split(path.sep)
.filter(x => x !== "")
.reduce((_, part, i, arr) => {
if (i == arr.length - 1) return part;
}, "");
Split the string on path separators.
Filter out empty string path parts (this could happen with trailing slash in path).
Reduce the array of path parts to the last one.

Adding up to the great Sebastian Barth answer.
if href is a variable that you are parsing, new URL will throw a TypeError so to be in the safe side you should try - catch
try{
const segments = new URL(href).pathname.split('/');
const last = segments.pop() || segments.pop(); // Handle potential trailing slash
console.log(last);
}catch (error){
//Uups, href wasn't a valid URL (empty string or malformed URL)
console.log('TypeError ->',error);
}

I believe it's safer to remove the tail slash('/') before doing substring. Because I got an empty string in my scenario.
window.alert((window.location.pathname).replace(/\/$/, "").substr((window.location.pathname.replace(/\/$/, "")).lastIndexOf('/') + 1));

Bestway to get URL Last Segment Remove (-) and (/) also
jQuery(document).ready(function(){
var path = window.location.pathname;
var parts = path.split('/');
var lastSegment = parts.pop() || parts.pop(); // handle potential trailing slash
lastSegment = lastSegment.replace('-',' ').replace('-',' ');
jQuery('.archive .filters').before('<div class="product_heading"><h3>Best '+lastSegment+' Deals </h3></div>');
});

A way to avoid query params
const urlString = "https://stackoverflow.com/last-segment?param=123"
const url = new URL(urlString);
url.search = '';
const lastSegment = url.pathname.split('/').pop();
console.log(lastSegment)

How to obtain URL parameter using jquery in case insensitive way

Is there a way to obtain a URL parameter in a case insensitive way using jquery?
Essentially, I'm looking to do something like $.url('?someparameter');, where it would match both http:\\www.test.com?someparameter=ABC or
http:\\www.test.com?SOMEparAMeter=ABC

You should try toLowerCase. This function converts any string to lowercase.

Use a regular expression where you set the case-insensitive flag.
Regular Expressions -- scroll down to "Advanced Searching With Flags"
Please take a look at: How can I get query string values in JavaScript?
The line to adapt to your needs is as follows:
var regex = new RegExp("[\\?&]" + name + "=([^&#]*)", "i");
//"i" for case-insensitive

This doesnt use jQuery, just javascript. But it addresses the question in general.
The problem w/ ucasing the entire ULR is you may be keying off the value to look up an HTML element.
why there is not a collection of keys in URL.searchParams, I do not know, but there is not.
Below is a function i wrote that will find a key and return a value.
I am just barely literate in regEx, so I am sure there is a better regEx that can pull the
value out and omit trailing key value pairs.
function getParm_CI(parm) {
var str = window.location.href;
var rgx = new RegExp('\\b' + parm + '=.*\\b', 'gi');
//this gets an array of matches
var aMatches = str.match(rgx);
if (aMatches == null) return;
var parmVal = aMatches[0].substring(parm.length + 1);
//we shouldnt, but make sure there are not trailing parms
var idx = parmVal.indexOf('&');
//alert('amp:' + idx);
if (idx > -1) parmVal = parmVal.substring(0, idx);
return parmVal;
}
usage would be like this
var topic = getParm_CI('SOMEparAMeter');

Javascript RegExp match & Multiple backreferences

I'm having trouble trying to use multiple back references in a javascript match so far I've got: -
function newIlluminate() {
var string = "the time is a quarter to two";
var param = "time";
var re = new RegExp("(" + param + ")", "i");
var test = new RegExp("(time)(quarter)(the)", "i");
var matches = string.match(test);
$("#debug").text(matches[1]);
}
newIlluminate();
#Debug when matching the Regex 're' prints 'time' which is the value of param.
I've seen match examples where multiple back references are used by wrapping the match in parenthesis however my match for (time)(quarter)... is returning null.
Where am I going wrong? Any help would be greatly appreciated!

Your regex is literally looking for timequarterthe and splitting the match (if it finds one) into the three backreferences.
I think you mean this:
var test = /time|quarter|the/ig;

Your regex test simply doesn't match the string (as it does not contain the substring timequarterthe). I guess you want alternation:
var test = /time|quarter|the/ig; // does not even need a capturing group
var matches = string.match(test);
$("#debug").text(matches!=null ? matches.join(", ") : "did not match");

How to obtain index of subpattern in JavaScript regexp?

I wrote a regular expression in JavaScript for searching searchedUrl in a string:
var input = '1234 url( test ) 5678';
var searchedUrl = 'test';
var regexpStr = "url\\(\\s*"+searchedUrl+"\\s*\\)";
var regex = new RegExp(regexpStr , 'i');
var match = input.match(regex);
console.log(match); // return an array
Output:
["url( test )", index: 5, input: "1234 url( test ) 5678"]
Now I would like to obtain position of the searchedUrl (in the example above it is the position of test in 1234 url( test ) 5678.
How can I do that?

As far as I could tell it wasn't possible to get the offset of a sub-match automatically, you have to do the calculation yourself using either lastIndex of the RegExp, or the index property of the match object returned by exec(). Depending on which you use you'll either have to add or subtract the length of groups leading up to your sub-match. However, this does mean you have to group the first or last part of the Regular Expression, up to the pattern you wish to locate.
lastIndex only seems to come into play when using the /g/ global flag, and it will record the index after the entire match. So if you wish to use lastIndex you'll need to work backwards from the end of your pattern.
For more information on the exec() method, see here:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec
The following succinctly shows the solution in operation:
var str = '---hello123';
var r = /([a-z]+)([0-9]+)/;
var m = r.exec( str );
alert( m.index + m[1].length ); // will give the position of 123
update
This would apply to your issue using the following:
var input = '1234 url( test ) 5678';
var searchedUrl = 'test';
var regexpStr = "(url\\(\\s*)("+searchedUrl+")\\s*\\)";
var regex = new RegExp(regexpStr , 'i');
var match = regex.exec(input);
Then to get the submatch offset you can use:
match.index + match[1].length
match[1] now contains url( (plus two spaces) due to the bracket grouping which allows us to tell the internal offset.
update 2
Obviously things are a little more complicated if you have patterns in the RegExp, that you wish to group, before the actual pattern you want to locate. This is just a simple act of adding together each group length.
var s = '~- [This may or may not be random|it depends on your perspective] -~';
var r = /(\[)([a-z ]+)(\|)([a-z ]+)(\])/i;
var m = r.exec( s );
To get the offset position of it depends on your perspective you would use:
m.index + m[1].length + m[2].length + m[3].length;
Obviously if you know the RegExp has portions that never change length, you can replace those with hard coded numeric values. However, it's probably best to keep the above .length checks, just in case you — or someone else — ever changes what your expression matches.

JS doesn't have a direct way to get the index of a subpattern/capturing group. But you can work around that with some tricks. For example:
var reStr = "(url\\(\\s*)" + searchedUrl + "\\s*\\)";
var re = new RegExp(reStr, 'i');
var m = re.exec(input);
if(m){
var index = m.index + m[1].length;
console.log("url found at " + index);
}

You can add the 'd' flag to the regex in order to generate indices for substring matches.
const input = '1234 url( test ) 5678';
const searchedUrl = 'test';
const regexpStr = "url\\(\\s*("+searchedUrl+")\\s*\\)";
const regex = new RegExp(regexpStr , 'id');
const match = regex.exec(input).indices[1]
console.log(match); // return [11, 15]

You don't need the index.
This is a case where providing just a bit more information would have gotten a much better answer. I can't fault you for it; we're encouraged to create simple test cases and cut out irrelevant detail.
But one important item was missing: what you plan to do with that index. In the meantime, we were all chasing the wrong problem. :-)
I had a feeling something was missing; that's why I asked you about it.
As you mentioned in the comment, you want to find the URL in the input string and highlight it in some way, perhaps by wrapping it in a <b></b> tag or the like:
'1234 url( <b>test</b> ) 5678'
(Let me know if you meant something else by "highlight".)
You can use character indexes to do that, however there is a much easier way using the regular expression itself.
Getting the index
But since you asked, if you did need the index, you could get it with code like this:
var input = '1234 url( test ) 5678';
var url = 'test';
var regexpStr = "^(.*url\\(\\s*)"+ url +"\\s*\\)";
var regex = new RegExp( regexpStr , 'i' );
var match = input.match( regex );
var start = match[1].length;
This is a bit simpler than the code in the other answers, but any of them would work equally well. This approach works by anchoring the regex to the beginning of the string with ^ and putting all the characters before the URL in a group with (). The length of that group string, match[1], is your index.
Slicing and dicing
Once you know the starting index of test in your string, you could use .slice() or other string methods to cut up the string and insert the tags, perhaps with code something like this:
// Wrap url in <b></b> tag by slicing and pasting strings
var output =
input.slice( 0, start ) +
'<b>' + url + '</b>' +
input.slice( start + url.length );
console.log( output );
That will certainly work, but it is really doing things the hard way.
Also, I left out some error handling code. What if there is no matching URL? match will be undefined and the match[1] will fail. But instead of worrying about that, let's see how we can do it without any character indexing at all.
The easy way
Let the regular expression do the work for you. Here's the whole thing:
var input = '1234 url( test ) 5678';
var url = 'test';
var regexpStr = "(url\\(\\s*)(" + url + ")(\\s*\\))";
var regex = new RegExp( regexpStr , 'i' );
var output = input.replace( regex, "$1<b>$2</b>$3" );
console.log( output );
This code has three groups in the regular expression, one to capture the URL itself, with groups before and after the URL to capture the other matching text so we don't lose it. Then a simple .replace() and you're done!
You don't have to worry about any string lengths or indexes this way. And the code works cleanly if the URL isn't found: it returns the input string unchanged.

We Keep Coding

JavaScript is the programming language of the Web.

Regex Help - Match any URL Parameter & Value not in List - javascript

Related

Concatenate / simplify RegExp

How to split a word for getting a specific value in Javascript or Jquery? [duplicate]

How to obtain URL parameter using jquery in case insensitive way

Javascript RegExp match & Multiple backreferences

How to obtain index of subpattern in JavaScript regexp?

Categories

Resources