Is there something like glob but for URLs, in JavaScript? - javascript

I need to match URLs against string patterns but I want to avoid RegExp to keep the patterns simple and readable.
I'd like to be able to have patterns like http://*.example.org/*, which should be equivalent of /^http:\/\/.*\.example.org\/.*$/ in RegExp. That RegExp should also illustrate why I want to keep it more readable.
Basically I'd like glob-like patterns that work for URLs. The Problem is: normal glob implementations treat / as a delimiter. That means, http://foo.example.org/bar/bla wouldn't match my simple pattern.
So, an implementation of glob that can ignore slashes would be great. Is there such a thing or something similar?

You can start with a function like this for glob like behavior:
function glob(pattern, input) {
var re = new RegExp(pattern.replace(/([.?+^$[\]\\(){}|\/-])/g, "\\$1").replace(/\*/g, '.*'));
return re.test(input);
}
Then call it as:
glob('http://*.example.org/*', 'http://foo.example.org/bar/bla');
true

Solved the problem by writing a lib for it:
https://github.com/lnwdr/calmcard
This matches arbitrary strings with simple wildcards.

Related

RegExp in Javascript to find all parenthesis constructions

Well, I have expressions like this: 27+3/(12-5)+9-(2*(12-10)+(7-6))
I need all the parenthesis to get like this array:
[(12-5),(2*(12-10)+(7-6)),(12-10),(7-6)]
Or some this array shaped. Is there some easy way to make RegExp for the case? Well, to make smth like:
const myExprStr = '27+3/(12-5)+9-(2*(12-10)+(7-6))';
const neededParenthesisArray = [...[], ...myExprStr.matchAll([some magic regexp])];
Well, finally. The question is: can someone to share with me the needed RegExp, or, maybe, there is some docs to find oyt how to make the RegExp?
Assuming you said there would be not much of nesting. You can see how things easily blow up. The reason is the theoretical boundaries of regex, it is a type of language that is more easily to parse, but on the other hand is is not meant to count. By matching nested parenthesis you need to count. If you have only three levels, we can use a trick, but if you want to go deeper, better use an appropriate parser.
capturing all sigle parenthesis:
\(([^()\n]*)\)
double:
\(([^()\n]*\([^()\n]*\)[^()\n]*)+\)
tripple:
\(([^()\n]*\(([^()\n]*\([^()\n]*\)[^()\n]*)+\))+[^()\n]*\)
https://regex101.com/r/n8SVYH/1
https://regex101.com/r/n8SVYH/2
https://regex101.com/r/n8SVYH/3

Split string into array between to characters?

So I'm trying to split string that I have in some not regular way. I tried to do something with regular expressions but I'm not so much into regex to invent something like that.
Basically I have a string that look like this:
var links = "<a>4</a><b><c><d><e><f>";
And I wanna use a .split() js method to recieve them in array like so:
["<a>4</a>", "<b>", "<c>", "<d>", "<e>", "<f>"]
So it's obviously that I need to split them on >< characthers but if I do this:
links.split("><");
They are gonna split in the way I wanna but I'm gonna loose the > and < signs. example:
["<a>4</a", "b", "c", "d", "e", "f>"]
This is not good solution for me.
So basically my question is: Is it possible to make some kind solution with regex or something else to get array result as I imagined?
A quick and clean method would be to use match instead of split :
var matches = links.match(/<.+?>(?=<|$)/g)
For your string it gives
["<a>4</a>", "<b>", "<c>", "<d>", "<e>", "<f>"]
Be careful that using regular expressions with a non regular language like HTML is dangerous. It's often OK for simple things but it's also limited and you may encounter surprises, for example if you want to apply it to nested elements.
A quick'n dirty method would be to add a delimiter between the ><, with a regex replace, and then to split on that delimiter:
var links = "<a>4</a><b><c><d><e><f>";
links.replace(/></g, '>|<').split('|');
Result:
["<a>4</a>", "<b>", "<c>", "<d>", "<e>", "<f>"]
Just make sure the delimiter you chose doesn't occur in the string itself.
(You can use longer strings as delimiters, like: "|-|".)

Regex if substring exist then match another part of the string

I am using the YUI3 library and am using a filter to match and replace parts of a URL.
Because filter is not very flexible, I am only able to provide a regex expression for searching and then a string for replacing the matches:
filter: {
searchExp : "-min\\.js",
replaceStr: "-debug.js"
}
In my case, I have a URL that looks like this:
http://site.com/assets/js?yui-3.9.0/widget-base/assets/skins/sam/widget-base.css&yui-3.9.0/cssbutton/cssbutton-min.css
I would like to match /assets/js if there are .css files. If the parameters contain a CSS file, then it will always only contain CSS files.
So far, I have written a small regex to check for the presence of .css at the very end:
.*\.css$
However, now, if we have a match, I would like to return /assets/js as the match. Is this something that is doable with regex?
Personally, I would rather this be done with a simple function and a simple if/else, but due to the limitations (I can only use regex), I need to find a regex solution to this.
This is a bit hacked together, but should do the job:
var t = new RegExp( "/assets/js(([^\\.]*\\.)*[^\\.]*\\.css)$" )
document.write( "http://site.com/assets/js?yui-3.9.0/widget-base/assets/skins/sam/widget-base.css&yui-3.9.0/cssbutton/cssbutton-min.css".replace( t, "/newthing/$1" ) );
Essentially it searches for /assets/js, followed by any characters, followed by .css. If the whole thing matches it wil replace it with the new text, and include the matched pattern (from the first brackets) after it. Everything from before /assets isn't included in the match, so doesn't need to be included.
I imagine your library uses replace internally, so those strings should work. Specifically,
"/assets/js(([^\\.]*\\.)*[^\\.]*\\.css)$"
"/newthing/$1"
I'm not quite sure what you want to do with the results, but this allows you to change the folder and add suffixes (as well as check for the presence of both tokens in the first place). To add a suffix change the replacement to this:
"/assets/js$1-mysuffix"

JavaScript regular expression for matching URL path components

What JavaScript regular expression should I use to match individual components of a URL path? By path, I mean the path of the resource on the server, e.g. if the URL is 'http://example.com/directory/resource?start=0', the path is '/directory/resource'. By path components, I mean the /-separated parts of the path.
Let's say we have the URL 'http://example.com/component1/component2'. What I would like is to be able to match 'component1' or 'component2' with a grouped regular expression for each, so each component can be extracted, i.e. something like this: 'http://example.com/($component-regex)/($component-regex) ($component-regex being the regular expression we need to devise). In this example, there would be two matched groups: 'component1' and 'component2'.
Please come up with a regex that's considered safe by JSLint :) For example, it considers [^/]+ insecure.
You don't need a regex for this:
var components = url.split(/[?#]/)[0].split("/").slice(3);
Okay, you do need a regex to split on one of two possible characters, but you could do it without any regex with this:
var components = url.split("#")[0].split("?")[0].split("/").slice(3);

What's wrong with this regular expression to find URLs?

I'm working on a JavaScript to extract a URL from a Google search URL, like so:
http://www.google.com/search?client=safari&rls=en&q=thisisthepartiwanttofind.org&ie=UTF-8&oe=UTF-8
Right now, my code looks like this:
var checkForURL = /[\w\d](.org)/i;
var findTheURL = checkForURL.exec(theURL);
I've ran this through a couple regex testers and it seems to work, but in practice the string I get returned looks like this:
thisisthepartiwanttofind.org,.org
So where's that trailing ,.org coming from?
I know my pattern isn't super robust but please don't suggest better patterns to use. I'd really just like advice on what in particular I did wrong with this one. Thanks!
Remove the parentheses in the regex if you do not process the .org (unlikely since it is a literal). As per #Mark comment, add a + to match one or more characters of the class [\w\d]. Also, I would escape the dot:
var checkForURL = /[\w\d]+\.org/i;
What you're actually getting is an array of 2 results, the first being the whole match, the second - the group you defined by using parens (.org).
Compare with:
/([\w\d]+)\.org/.exec('thisistheurl.org')
→ ["thisistheurl.org", "thisistheurl"]
/[\w\d]+\.org/.exec('thisistheurl.org')
→ ["thisistheurl.org"]
/([\w\d]+)(\.org)/.exec('thisistheurl.org')
→ ["thisistheurl.org", "thisistheurl", ".org"]
The result of an .exec of a JS regex is an Array of strings, the first being the whole match and the subsequent representing groups that you defined by using parens. If there are no parens in the regex, there will only be one element in this array - the whole match.
You should escape .(DOT) in (.org) regex group or it matches any character. So your regex would become:
/[\w\d]+(\.org)/
To match the url in your example you can use something like this:
https?://([0-9a-zA-Z_.?=&\-]+/?)+
or something more accurate like this (you should choose the right regex according to your needs):
^https?://([0-9a-zA-Z_\-]+\.)+(com|org|net|WhatEverYouWant)(/[0-9a-zA-Z_\-?=&.]+)$

Categories