Why isn't this regex matching the expected way? - javascript

I'm trying to get rid of the slash character in case it exists at the end of my string. I used the following expression, intending to match any character not being slash at the end of the line.
var str = "http://hazaa.com/blopp/";
str.match("[^/$]+", "g");
For some reason (surely logical and explainable but not graspabled to me on my own), I get the split into three string looking as follows.
["http:", "hazaa.com", "blopp"]
What am I assuming wrongly?
How to resolve it?

In str.match("[^/$]+", "g");, why put dollar sign inside bracket? It's supposed to be outside, namely, str.match("[^/]+$", "g");.
To remove all the trailing slash, you can use str.replace(/\/+$/, ""). (If you'd like to remove the last trailing slash ONLY, remove the + in the replace's regex)
Update:
One more way that doesn't use replace:
function stripEndingSlashes(str) {
var matched = str.match("(.*[^/]+)/*$");
return matched ? matched[1] : "";
}

The regexp is choosing "everything except slash". That is why match() returns the parts of the string between slashes.
You can resolve it with the replace() function:
var str = "http://hazaa.com/blopp/";
//replace the last slash with an empty string to remove it
str = str.replace(/\/$/,'');
The regexp literal should always be surrounded between / characters. So here the regexp is:
\/ : this means a single slash character. In order to prevent Javascript from interpreting your slash as the end of regexp, it needs to be 'escaped' with a backslash.
$ : this means the end of the string

Your current regex will match the portion of string until the first / or $ is encountered. The second parameter is ignored; there is no second parameter for String.match.
To remove the trailing slash, use the String.replace function:
var str = "http://hazaa.com/blopp/";
str = str.replace(/\/$/, "");
console.log(str);
// "http://hazaa.com/blopp"
If you need to check whether a string ends with a slash, use the String.match method like this:
var str = "http://hazaa.com/blopp/";
var match = str.match(/\/$/);
console.log(match);
// null if string does not end with /
// ["/"] if string ends with a /
If you need to grab every thing except the last character(s) being /, use this:
var r = /(.+?)\/*$/;
console.log("http://hazaa.com/blopp//".match(r)); // ["http://hazaa.com/blopp//", "http://hazaa.com/blopp"]
console.log("http://hazaa.com/blopp/".match(r)); // ["http://hazaa.com/blopp/", "http://hazaa.com/blopp"]
console.log("http://hazaa.com/bloppA".match(r)); // ["http://hazaa.com/bloppA", "http://hazaa.com/bloppA"]
The 2nd index in the returned array contains the desired portion of the URL. The regex works as follows:
(.+?) un-greedy match (and capture) any character
\/*$ matches optional trailing slash(es)
The first portion regex is intentionally changed to un-greedy. If it was greedy, it would attempt to find the biggest match as long the the whole regex matches (consuming the trailing / in the process). When ungreedy, it will find the smallest match as long as the whole regex matches.

Why use regex? Just check if the last symbol of the string is a slash and then slice. Like this:
if (str.slice(-1) === '/') {
str = str.slice(0, -1);
}

Related

Javascript regular expression to extract characters from mid string with optional end character

I would like to extract characters from mid string with optional end character. If the optional end character is not found, extract until end of string. The first characters are S= and the last optional character is &.
Example #1:
"rilaS=testingabc"
should extract:
"testingabc"
Example #2:
"rilaS=testing123&thistest"
should extract:
"testing123"
This is what I have so far (Javascript):
var Str = "rilaS=testing123&thistest";
var tmpStr = Str.match("S=(.*)[\&]{0,1}");
var newStr = tmpStr[1];
alert(newStr);
But it does not detect that the end should be the ampersand (if found). Thank you before hand.
Answer (By ggorlen)
var Str = "rilaS=testing123&thistest";
var tmpStr = Str.match("S=([^&]*)");
var newStr = tmpStr[1];
alert(newStr);
You may use /S=([^&]*)/ to grab from an S= to end of line or &:
["rilaS=testingabc", "rilaS=testing123&thistest"].forEach(s =>
console.log(s.match(/S=([^&]*)/)[1])
);
Just in case you are wondering why your original regex didn't work: the problem is that the (.*) pattern is greedy - meaning it will happily slurp up anything, including &, and not leave it for for later items to match. This is why you want the "not &" - it will match up to, but not including the &.

regex replace only a part of the match

I want to replace only a part of the string of a regex pattern match. I found this answer but I don't get it...
How do I use substitution?
Example of what I want: keep the first slug digit, only replace others
/09/small_image/09x/ > /09/thumbnail/
1st: unknown digit
2nd: "small_image"
3rd: unknown digit + "x"
Here is what I have so far:
var regexPattern = /\/\d\/small\_image\/\d*x/;
var regexPattern = /\/\d\/(small\_image\/\d*x)$1/; ??
var result = regexPattern.test(str);
if (result) {
str = str.replace(regexPattern, 'thumbnail');
}
var input = "/09/small_image/09x/";
var output = input.replace(/(\/\d+\/)small_image\/\d*x/, "$1thumbnail");
console.log(output);
Explanation:
Put the part you want to keep in parentheses, then refer to that as $1 in the replacement string - don't put $1 in your regex. So (\/\d+\/) means to match a forward slash followed by one or more digits, followed by another forward slash.
(Note that you don't need to escape underscores in a regex.)
Go with
var regexPattern = /(\/\d+\/)small\_image\/\d*x/;
and
str = str.replace(regexPattern, '$1thumbnail');
First, you were missing the +. Because 09 are two digits, you need the regexp to match one or more digits (\ḑ would be exactly one). This is accomplished by \d+
Second, everything you match is being removed at first. To get the /09/ part back afterwards, you have to remember it by putting it into brackets in the regexp (...) and afterwards reference it in the replacement via $1
One could as well create other groups and reference them by $2,$3 ...

regex precceded by two or more special character

I am stuck with creating regex such that if the word is preceded or ended by special character more than one regex on each side regex 'exec' method should throw null. Only if word is wrap with exactly one bracket on each side 'exec' method should give result Below is the regular expression I have come up with.
If the string is like "(test)" or then only regex.exec should have values for other combination such as "((test))" OR "((test)" OR "(test))" it should be null. Below code is not throwing null which it should. Please suggest.
var w1 = "\(test\)";
alert(new RegExp('(^|[' + '\(\)' + '])(' + w1 + ')(?=[' + '\(\)' + ']|$)', 'g').exec("this is ((test))"))
If you have a list of words and want to filter them, you can do the following.
string.split(' ').filter(function(word) {
return !(/^[!##$%^&*()]{2,}.+/).test(word) || !(/[!##$%^&*()]{2,}$).test(word)
});
The split() function splits a string at a space character and returns an array of words, which we can then filter.
To keep the valid words, we will test two regex expressions to see if the word starts or ends with 2 or more special characters respectively.
RegEx Breakdown
^ - Expression starts with the following
[] - A single character in the block
!##$%^&*() - These are the special characters I used. Replace them with the ones you want.
{2,} - Matches 2 or more of the preceeding characters
.+ - Matches 1 or more of any character
$ - Expression ends with the following
To use the exec function this way do this
!(/^[!##$%^&*()]{2,}.+/).exec(string) || !(/[!##$%^&*()]{2,}$).exec(string)
If I understand correctly, you are looking for any string which contains (test), anywhere in it, and exactly that, right?
In that case, what you probably need is the following:
var regExp = /.*[^)]\(test\)[^)].*/;
alert(regExp.exec("this is ((test))")); // → null
alert(regExp.exec("this is (test))" )); // → null
alert(regExp.exec("this is ((test)" )); // → null
alert(regExp.exec("this is (test) ...")); // → ["this is (test) ..."]
Explanation:
.* matches any character (except newline) between zero and unlimited times, as many times as possible.
[^)] match a single character but not the literal character )
This makes sure there's your test string in the given string, but it is only ever wrapped with one brace in every side!
You can use the following regex:
(^|[^(])(\(test\))(?!\))
See regex demo here, replace with $1<span style="new">$2</span>.
The regex features an alternation group (^|[^(]) that matches either start of string ^ or any character other than (. This alternation is a kind of a workaround since JS regex engine does not support look-behinds.
Then, (\(test\)) matches and captures (test). Note the round brackets are escaped. If they were not, they would be treated as a capturing group delimiters.
The (?!\)) is a look-ahead that makes sure there is no literal ) right after test). Look-aheads are supported fully by JS regex engine.
A JS snippet:
var re = /(^|[^(])(\(test\))(?!\))/gi;
var str = 'this is (test)\nthis is ((test))\nthis is ((test)\nthis is (test))\nthis is ((test\nthis is test))';
var subst = '$1<span style="new">$2</span>';
var result = str.replace(re, subst);
alert(result);

Javascript reg exp not right

Here is a string str = '.js("aaa").js("bbb").js("ccc")', I want to write a regular expression to return an Array like this:
[aaa, bbb, ccc];
My regular expression is:
var jsReg = /.js\(['"](.*)['"]\)/g;
var jsAssets = [];
var js;
while ((js = jsReg.exec(find)) !== null) {
jsAssets.push(js[1]);
}
But the jsAssets result is
[""aaa").js("bbb").js("ccc""]
What's wrong with this regular expression?
Use the lazy version of .*:
/\.js\(['"](.*?)['"]\)/g
^
And it would be better if you escape the first dot.
This will match the least number of characters until the next quote.
jsfiddle demo
If you want to allow escaped quotes, use something like this:
/\.js\(['"]((?:\\['"]|[^"])+)['"]\)/g
regex101 demo
I believe it can be done in one-liner with replace and match method calls:
var str = '.js("aaa").js("bbb").js("ccc")';
str.replace(/[^(]*\("([^"]*)"\)[^(]*/g, '$1,').match(/[^,]+/g);
//=> ["aaa", "bbb", "ccc"]
The problem is that you are using .*. That will match any character. You'll have to be a bit more specific with what you are trying to capture.
If it will only ever be word characters you could use \w which matches any word character. This includes [a-zA-Z0-9_]: uppercase, lowercase, numbers and an underscore.
So your regex would look something like this :
var jsReg = /js\(['"](\w*)['"]\)/g;
In
/.js\(['"](.*)['"]\)/g
matches as much as possible, and does not capture group 1, so it matches
"aaa").js("bbb").js("ccc"
but given your example input.
Try
/\.js\(('(?:[^\\']|\\.)*'|"(?:[\\"]|\\.)*"))\)/
To break this down,
\. matches a literal dot
\.js\( matches the literal string ".js("
( starts to capture the string.
[^\\']|\\. matches a character other than quote or backslash or an escaped non-line terminator.
(?:[\\']|\\.)* matches the body of a string
'(?:[\\']|\\.)*' matches a single quoted string
(...|...) captures a single quoted or double quoted string
)\) closes the capturing group and matches a literal close parenthesis
The second major problem is your loop.
You're doing a global match repeatedly which makes no sense.
Get rid of the g modifier, and then things should work better.
Try this one - http://jsfiddle.net/UDYAq/
var str = new String('.js("aaa").js("bbb").js("ccc")');
var regex = /\.js\(\"(.*?)\"\){1,}/gi;
var result = [];
result = str.match (regex);
for (i in result) {
result[i] = result[i].match(/\"(.*?)\"/i)[1];
}
console.log (result);
To be sure that matched characters are surrounded by the same quotes:
/\.js\((['"])(.*?)\1\)/g

JavaScript regexp not matching

I am having a difficult time getting a seemingly simple Regexp. I am trying to grab the last occurrences of word characters between square brackets in a string. My code:
pattern = /\[(\w+)\]/g;
var text = "item[gemstones_attributes][0][shape]";
if (pattern.test(text)) {
alert(RegExp.lastMatch);
}
The above code is outputting "gemstones_attributes", when I want it to output "shape". Why is this regexp not working, or is there something wrong with my approach to getting the last match? I'm sure that I am making an obvious mistake - regular expressions have never been my string suit.
Edit:
There are cases in which the string will not terminate with a right-bracket.
You can greedily match as much as possible before your pattern which will result in your group matching only the last match:
pattern = /.*\[(\w+)\]/g;
var text = "item[gemstones_attributes][0][shape]";
var match = pattern.exec(text);
if (match != null) alert(match[1]);
RegExp.lastMatch gives the match of the last regular expression. It isn't the last match in the text.
Regular expressions parse left to right and are greedy. So your regexp matches the first '[' it sees and grabs the words between it. When you call lastMatch it gives you the last pattern matched. What you need is to match everything you can first .* and then your pattern.
i think your problem is in your regex not in your src line .lastMatch.
Your regex returns just the first match of your square brackets and not all matches. You can try to add some groups to your regular expression - and normally you should get all matches.
krikit
Use match() instead of test()
if (text.match(pattern))
test() checks for a match inside a string. This is successfull after the first occurence, so there is no need for further parsing.

Categories