Partial Regexp Match in Javascript - javascript

I have a long regex that is generated to match URLs like
/^\/([^\/.?]+)(?:\/([^\/.?]+)(?:\/([^\/.?]+)(?:\.([^\/.?]+))?)?)?$/
Would match:
/foo/bar/1.html
as ['foo', 'bar', '1', 'html']
In Javascript I would like to get the parts that match as the user types the url (like a typeahead). For example if they typed:
/foo
It would tell me that /foo was matched, but the whole regexp hasn't been satisfied. Ruby can return an array with only the matching partial elements like : ['foo', nil, nil, nil] is this possible, or easy to do in Javascript?

#minitech basically gave half the answer: use ? after each group, and then you'll be able to match the regex even if they're missing. Once you can do that, then just check the groups of the regex result to see which bits have been matched and which haven't.
For example:
/^\/([^\/.?]+)?(?:\/([^\/.?]+)?(?:\/([^\/.?]+)?(?:\.([^\/.?]+))?)?)?$/.exec('/ab/c')
Would return:
["/ab:c", "ab:c", "c", undefined, undefined]
By checking and seeing that the fourth value returned is undefined, you could then figure out which chunks were/were not entered.
As a side note, if you're going to be working lots of regexs like this, you can easily lose your sanity just trying to keep track of which group is which. For this reason I strongly recommend using "named group" regular expressions. These are otherwise normal regular expressions that you can create if you use the XRegxp library (http://xregexp.com/), like so:
var result = XRegExp.exec('/ab/c', /^\/(?<fooPart>[^\/.?]+)?(?<barPart>?:\/([^\/.?]+)?(?:\/([^\/.?]+)?(?:\.([^\/.?]+))?)?)?$/)
var fooPart = result.fooPart
That library also has other handy features like comments that can similarly help keep regular expression under control. If you're only using this one regex it's probably overkill, but if you're doing lots of JS regexp work I can't recommend that library enough.

Related

regex101 Conditional statement is always else

I am using http://www.regexr.com/ and https://regex101.com/ to learn regex
regex101's Quick reference shows
Conditional statement: (?(...)|)
If the given pattern matches, matches the pattern before the vertical bar. Otherwise, matches the pattern after the vertical bar.
I can't get it to work at all
/(^(?!no)if|else)/gm
no if else
yes if else
looks like its broken
/(?:(yes)true|false)/g
yes true false
I need to return a single match using string.match so I stay compatible with a third party. I don't have the option to do anything with the results myself so I won't be able to do multiple regex nor filter results with javascript. what I would like to achive is a regex that asks for
if the sentence starts with the word 'name' or the sentence contains '.classX' then return nothing else return '.classA'
returning eather [""] or [".classA "]
Is this possible at all or am I completly waisting my time?
Javascript does not support the full spectrum of regular expression features, which is why your conditionals are not working.
Take a look at the mozilla docs for a complete list of supported features.
While regex is indeed very powerful, I'd recommend that you delegate the conditionals to javascript.
So for your example at the end of your post, you would just create a pattern to match "name" or ".classX", and if match() returns true, you return "" in javascript. Else, return ".classA".

Split string into array between to characters?

So I'm trying to split string that I have in some not regular way. I tried to do something with regular expressions but I'm not so much into regex to invent something like that.
Basically I have a string that look like this:
var links = "<a>4</a><b><c><d><e><f>";
And I wanna use a .split() js method to recieve them in array like so:
["<a>4</a>", "<b>", "<c>", "<d>", "<e>", "<f>"]
So it's obviously that I need to split them on >< characthers but if I do this:
links.split("><");
They are gonna split in the way I wanna but I'm gonna loose the > and < signs. example:
["<a>4</a", "b", "c", "d", "e", "f>"]
This is not good solution for me.
So basically my question is: Is it possible to make some kind solution with regex or something else to get array result as I imagined?
A quick and clean method would be to use match instead of split :
var matches = links.match(/<.+?>(?=<|$)/g)
For your string it gives
["<a>4</a>", "<b>", "<c>", "<d>", "<e>", "<f>"]
Be careful that using regular expressions with a non regular language like HTML is dangerous. It's often OK for simple things but it's also limited and you may encounter surprises, for example if you want to apply it to nested elements.
A quick'n dirty method would be to add a delimiter between the ><, with a regex replace, and then to split on that delimiter:
var links = "<a>4</a><b><c><d><e><f>";
links.replace(/></g, '>|<').split('|');
Result:
["<a>4</a>", "<b>", "<c>", "<d>", "<e>", "<f>"]
Just make sure the delimiter you chose doesn't occur in the string itself.
(You can use longer strings as delimiters, like: "|-|".)

Regex if substring exist then match another part of the string

I am using the YUI3 library and am using a filter to match and replace parts of a URL.
Because filter is not very flexible, I am only able to provide a regex expression for searching and then a string for replacing the matches:
filter: {
searchExp : "-min\\.js",
replaceStr: "-debug.js"
}
In my case, I have a URL that looks like this:
http://site.com/assets/js?yui-3.9.0/widget-base/assets/skins/sam/widget-base.css&yui-3.9.0/cssbutton/cssbutton-min.css
I would like to match /assets/js if there are .css files. If the parameters contain a CSS file, then it will always only contain CSS files.
So far, I have written a small regex to check for the presence of .css at the very end:
.*\.css$
However, now, if we have a match, I would like to return /assets/js as the match. Is this something that is doable with regex?
Personally, I would rather this be done with a simple function and a simple if/else, but due to the limitations (I can only use regex), I need to find a regex solution to this.
This is a bit hacked together, but should do the job:
var t = new RegExp( "/assets/js(([^\\.]*\\.)*[^\\.]*\\.css)$" )
document.write( "http://site.com/assets/js?yui-3.9.0/widget-base/assets/skins/sam/widget-base.css&yui-3.9.0/cssbutton/cssbutton-min.css".replace( t, "/newthing/$1" ) );
Essentially it searches for /assets/js, followed by any characters, followed by .css. If the whole thing matches it wil replace it with the new text, and include the matched pattern (from the first brackets) after it. Everything from before /assets isn't included in the match, so doesn't need to be included.
I imagine your library uses replace internally, so those strings should work. Specifically,
"/assets/js(([^\\.]*\\.)*[^\\.]*\\.css)$"
"/newthing/$1"
I'm not quite sure what you want to do with the results, but this allows you to change the folder and add suffixes (as well as check for the presence of both tokens in the first place). To add a suffix change the replacement to this:
"/assets/js$1-mysuffix"

UnderscoreJS Interpolation Regex that supports both mustache and original?

I regret going mustache-style underscore template interpolation because it conflicts with my django templates.
I'd like to start moving towards using the default interpolation settings moving forward without breaking existing code.
Can I get _ to respect two interpolation regexes without explicitly toggling between them?
The mustache regex: /\{\{(.+?)\}\}/g
I've tried matching the original + mustache to no success.
/(?:\{\{(.+?)\}\})|(?:\<\%\=(.+?)\%\>)/g
My shoddy regex skills prevents me from figuring out whether this is possible or not.
If you look at the _.template implementation, you'll see the root of your problem:
_.template = function(text, data, settings) {
//...
// Combine delimiters into one regular expression via alternation.
var matcher = new RegExp([
(settings.escape || noMatch).source,
(settings.interpolate || noMatch).source,
(settings.evaluate || noMatch).source
].join('|') + '|$', 'g');
//...
text.replace(matcher, function(match, escape, interpolate, evaluate, offset) {
So _.template expects each of the three template delimiter expressions to contain exactly one capture group; the noMatch placeholder is just /(.)^/ so it won't match anything but it still contains the requisite capture group. Your attempt contains two capture groups as indicated:
/(?:\{\{(.+?)\}\})|(?:\<\%\=(.+?)\%\>)/g
// ^^^ ^^^
The second <%=...%> group is behind your troubles.
You can probably get away with this:
/(?:\{\{|<%=)(.+?)(?:%>|\}\})/g
But that will see things like <%= pancakes}} and {{pancakes %> as template expressions. I don't think you'll have to worry about things like that though.
That said, you should be able to automatically update your templates to your preferred style with some pretty simple regex work, just send all your templates through your favorite tool's version of:
s/\{\{(.+?)\}\}/<%= $1 %>/g
In JavaScript you'd have:
// read your template into old_school
new_school = old_school.replace(/\{\{(.+?)\}\}/g, '<%= $1 %>');
// replace your template with the content of new_school
Then you wouldn't have to worry about the funky regex above or having two sets of delimiters.

What's wrong with this regular expression to find URLs?

I'm working on a JavaScript to extract a URL from a Google search URL, like so:
http://www.google.com/search?client=safari&rls=en&q=thisisthepartiwanttofind.org&ie=UTF-8&oe=UTF-8
Right now, my code looks like this:
var checkForURL = /[\w\d](.org)/i;
var findTheURL = checkForURL.exec(theURL);
I've ran this through a couple regex testers and it seems to work, but in practice the string I get returned looks like this:
thisisthepartiwanttofind.org,.org
So where's that trailing ,.org coming from?
I know my pattern isn't super robust but please don't suggest better patterns to use. I'd really just like advice on what in particular I did wrong with this one. Thanks!
Remove the parentheses in the regex if you do not process the .org (unlikely since it is a literal). As per #Mark comment, add a + to match one or more characters of the class [\w\d]. Also, I would escape the dot:
var checkForURL = /[\w\d]+\.org/i;
What you're actually getting is an array of 2 results, the first being the whole match, the second - the group you defined by using parens (.org).
Compare with:
/([\w\d]+)\.org/.exec('thisistheurl.org')
→ ["thisistheurl.org", "thisistheurl"]
/[\w\d]+\.org/.exec('thisistheurl.org')
→ ["thisistheurl.org"]
/([\w\d]+)(\.org)/.exec('thisistheurl.org')
→ ["thisistheurl.org", "thisistheurl", ".org"]
The result of an .exec of a JS regex is an Array of strings, the first being the whole match and the subsequent representing groups that you defined by using parens. If there are no parens in the regex, there will only be one element in this array - the whole match.
You should escape .(DOT) in (.org) regex group or it matches any character. So your regex would become:
/[\w\d]+(\.org)/
To match the url in your example you can use something like this:
https?://([0-9a-zA-Z_.?=&\-]+/?)+
or something more accurate like this (you should choose the right regex according to your needs):
^https?://([0-9a-zA-Z_\-]+\.)+(com|org|net|WhatEverYouWant)(/[0-9a-zA-Z_\-?=&.]+)$

Categories