Javascript reg exp not right - javascript

Here is a string str = '.js("aaa").js("bbb").js("ccc")', I want to write a regular expression to return an Array like this:
[aaa, bbb, ccc];
My regular expression is:
var jsReg = /.js\(['"](.*)['"]\)/g;
var jsAssets = [];
var js;
while ((js = jsReg.exec(find)) !== null) {
jsAssets.push(js[1]);
}
But the jsAssets result is
[""aaa").js("bbb").js("ccc""]
What's wrong with this regular expression?

Use the lazy version of .*:
/\.js\(['"](.*?)['"]\)/g
^
And it would be better if you escape the first dot.
This will match the least number of characters until the next quote.
jsfiddle demo
If you want to allow escaped quotes, use something like this:
/\.js\(['"]((?:\\['"]|[^"])+)['"]\)/g
regex101 demo

I believe it can be done in one-liner with replace and match method calls:
var str = '.js("aaa").js("bbb").js("ccc")';
str.replace(/[^(]*\("([^"]*)"\)[^(]*/g, '$1,').match(/[^,]+/g);
//=> ["aaa", "bbb", "ccc"]

The problem is that you are using .*. That will match any character. You'll have to be a bit more specific with what you are trying to capture.
If it will only ever be word characters you could use \w which matches any word character. This includes [a-zA-Z0-9_]: uppercase, lowercase, numbers and an underscore.
So your regex would look something like this :
var jsReg = /js\(['"](\w*)['"]\)/g;

In
/.js\(['"](.*)['"]\)/g
matches as much as possible, and does not capture group 1, so it matches
"aaa").js("bbb").js("ccc"
but given your example input.
Try
/\.js\(('(?:[^\\']|\\.)*'|"(?:[\\"]|\\.)*"))\)/
To break this down,
\. matches a literal dot
\.js\( matches the literal string ".js("
( starts to capture the string.
[^\\']|\\. matches a character other than quote or backslash or an escaped non-line terminator.
(?:[\\']|\\.)* matches the body of a string
'(?:[\\']|\\.)*' matches a single quoted string
(...|...) captures a single quoted or double quoted string
)\) closes the capturing group and matches a literal close parenthesis
The second major problem is your loop.
You're doing a global match repeatedly which makes no sense.
Get rid of the g modifier, and then things should work better.

Try this one - http://jsfiddle.net/UDYAq/
var str = new String('.js("aaa").js("bbb").js("ccc")');
var regex = /\.js\(\"(.*?)\"\){1,}/gi;
var result = [];
result = str.match (regex);
for (i in result) {
result[i] = result[i].match(/\"(.*?)\"/i)[1];
}
console.log (result);

To be sure that matched characters are surrounded by the same quotes:
/\.js\((['"])(.*?)\1\)/g

Related

Splitting string by regular expression

I have the following code snippet:
var colorText = "red,blue,green,yellow";
var colors3 = colorText.split(/[^\,]+/);
alert(colors3); // ["", ",", ",", ",", ""]
I don't understand what's going on here. As far as I understand, the regular expression will match any commas at the beginning of a string, and it matches 1 or more of these strings. What happens when we provide this regular expression as the argument to split? Surely, if we just tried to match the regex against colorText, we'd be getting no match, because the starting character is not a comma. But how does the regex provided to split lead to an array of commas and two empty string on each side?
Why do you need a regex when you can simply do split(',') ?
var colorText = "red,blue,green,yellow";
var colors3 = colorText.split(',');
console.log(colors3);
If you want to select everything but the comma then maybe using match is a better idea.
var colorText = ",red,blue,green,yellow";
var colors3 = colorText.match(/[^\,]+/g);
console.log(colors3);
As explained in MDN web docs [^xyz]
A negated or complemented character set. That is, it matches anything that is not enclosed in the brackets.
Your regex /[^\,]+/ will match any sequence of characters that doesn't include any comma.
So your regex will match these sequences in colorText:
red
blue
green
yellow
and the split function will split colorText at those sequences.
However, if you want to split your string at each comma, use this:
colors = colorText.split(',');
If you like to prevent empty items on splitting, you could use String#match instead of String#split and a regular expression which matches all characters except commas.
var regex = /[^,]+/g;
console.log(",red,blue,green,yellow,".match(regex));
console.log("red,blue,green,yellow".match(regex));
.as-console-wrapper { max-height: 100% !important; top: 0; }
So, my goal was not to separate the words in the string by comma. I found this code in a book and wanted to understand it. The mistake I made was that I thought that the ^ matched the beginning of a string, while in fact it means "anything but" inside of square brackets. Now I understand that the regular expression matches any number of character that is not a comma, and that's what tells split() what to put in each list element. The first and last elements are empty strings because that's what at the left and right side of the first and last words, respectively.
You have to remove that caret ^ from var colors3 = colorText.split(/[^\,]+/); so that it works well:
var colorText = "red,blue,green,yellow";
var colors3 = colorText.split(/[\,]+/);
console.log(colors3);

regex replace only a part of the match

I want to replace only a part of the string of a regex pattern match. I found this answer but I don't get it...
How do I use substitution?
Example of what I want: keep the first slug digit, only replace others
/09/small_image/09x/ > /09/thumbnail/
1st: unknown digit
2nd: "small_image"
3rd: unknown digit + "x"
Here is what I have so far:
var regexPattern = /\/\d\/small\_image\/\d*x/;
var regexPattern = /\/\d\/(small\_image\/\d*x)$1/; ??
var result = regexPattern.test(str);
if (result) {
str = str.replace(regexPattern, 'thumbnail');
}
var input = "/09/small_image/09x/";
var output = input.replace(/(\/\d+\/)small_image\/\d*x/, "$1thumbnail");
console.log(output);
Explanation:
Put the part you want to keep in parentheses, then refer to that as $1 in the replacement string - don't put $1 in your regex. So (\/\d+\/) means to match a forward slash followed by one or more digits, followed by another forward slash.
(Note that you don't need to escape underscores in a regex.)
Go with
var regexPattern = /(\/\d+\/)small\_image\/\d*x/;
and
str = str.replace(regexPattern, '$1thumbnail');
First, you were missing the +. Because 09 are two digits, you need the regexp to match one or more digits (\ḑ would be exactly one). This is accomplished by \d+
Second, everything you match is being removed at first. To get the /09/ part back afterwards, you have to remember it by putting it into brackets in the regexp (...) and afterwards reference it in the replacement via $1
One could as well create other groups and reference them by $2,$3 ...

regular expression to match special characters between delimiters

i have a basic string and would like to get only specific charaters between the brackets
Base string: This is a test string [more or less]
regex: to capture all r's and e's works just fine.
(r|e)
=> This is a test string [more or less]
Now i want to use the following regex and group it with my regex to give only r's and e's between the brackets, but unfortunately this doesn't work:
\[(r|e)\]
Expected result should be : more or less
can someone explain?
edit: the problem is very similar to this one: Regular Expression to find a string included between two characters while EXCLUDING the delimiters
but with the difference, that i don't want to get the whole string between the brackets.
Follow up problem
base string = 'this is a link:/en/test/äpfel/öhr[MyLink_with_äöü] BREAK äöü is now allowed'
I need a regex for finding the non-ascii characters äöü in order to replace them but only in the link:...] substring which starts with the word link: and ends with a ] char.
The result string will look like this:
result string = 'this is a link:/en/test/apfel/ohr[MyLink_with_aou] BREAK äöü is now allowed again'
The regex /[äöü]+(?=[^\]\[]*])/g from the solution in the comments only delivers the äöü chars between the two brackets.
I know that there is a forward lookahead with a char list in the regex, but i wonder why this one does not work:
/link:([äöü]+(?=[^\]\[]*])/
thanks
You can use the following solution: match all between link: and ], and replace your characters only inside the matched substrings inside a replace callback method:
var hashmap = {"ä":"a", "ö":"o", "ü":"u"};
var s = 'this is a link:/en/test/äpfel/öhr[MyLink_with_äöü] BREAK äöü is now allowed';
var res = s.replace(/\blink:[^\]]*/g, function(m) { // m = link:/en/test/äpfel/öhr[MyLink_with_äöü]
return m.replace(/[äöü]/g, function(n) { // n = ä, then ö, then ü,
return hashmap[n]; // each time replaced with the hashmap value
});
});
console.log(res);
Pattern details:
\b - a leading word boundary
link: - whole word link with a : after it
[^\]]* - zero or more chars other than ] (a [^...] is a negated character class that matches any char/char range(s) but the ones defined inside it).
Also, see Efficiently replace all accented characters in a string?

regex precceded by two or more special character

I am stuck with creating regex such that if the word is preceded or ended by special character more than one regex on each side regex 'exec' method should throw null. Only if word is wrap with exactly one bracket on each side 'exec' method should give result Below is the regular expression I have come up with.
If the string is like "(test)" or then only regex.exec should have values for other combination such as "((test))" OR "((test)" OR "(test))" it should be null. Below code is not throwing null which it should. Please suggest.
var w1 = "\(test\)";
alert(new RegExp('(^|[' + '\(\)' + '])(' + w1 + ')(?=[' + '\(\)' + ']|$)', 'g').exec("this is ((test))"))
If you have a list of words and want to filter them, you can do the following.
string.split(' ').filter(function(word) {
return !(/^[!##$%^&*()]{2,}.+/).test(word) || !(/[!##$%^&*()]{2,}$).test(word)
});
The split() function splits a string at a space character and returns an array of words, which we can then filter.
To keep the valid words, we will test two regex expressions to see if the word starts or ends with 2 or more special characters respectively.
RegEx Breakdown
^ - Expression starts with the following
[] - A single character in the block
!##$%^&*() - These are the special characters I used. Replace them with the ones you want.
{2,} - Matches 2 or more of the preceeding characters
.+ - Matches 1 or more of any character
$ - Expression ends with the following
To use the exec function this way do this
!(/^[!##$%^&*()]{2,}.+/).exec(string) || !(/[!##$%^&*()]{2,}$).exec(string)
If I understand correctly, you are looking for any string which contains (test), anywhere in it, and exactly that, right?
In that case, what you probably need is the following:
var regExp = /.*[^)]\(test\)[^)].*/;
alert(regExp.exec("this is ((test))")); // → null
alert(regExp.exec("this is (test))" )); // → null
alert(regExp.exec("this is ((test)" )); // → null
alert(regExp.exec("this is (test) ...")); // → ["this is (test) ..."]
Explanation:
.* matches any character (except newline) between zero and unlimited times, as many times as possible.
[^)] match a single character but not the literal character )
This makes sure there's your test string in the given string, but it is only ever wrapped with one brace in every side!
You can use the following regex:
(^|[^(])(\(test\))(?!\))
See regex demo here, replace with $1<span style="new">$2</span>.
The regex features an alternation group (^|[^(]) that matches either start of string ^ or any character other than (. This alternation is a kind of a workaround since JS regex engine does not support look-behinds.
Then, (\(test\)) matches and captures (test). Note the round brackets are escaped. If they were not, they would be treated as a capturing group delimiters.
The (?!\)) is a look-ahead that makes sure there is no literal ) right after test). Look-aheads are supported fully by JS regex engine.
A JS snippet:
var re = /(^|[^(])(\(test\))(?!\))/gi;
var str = 'this is (test)\nthis is ((test))\nthis is ((test)\nthis is (test))\nthis is ((test\nthis is test))';
var subst = '$1<span style="new">$2</span>';
var result = str.replace(re, subst);
alert(result);

Why isn't this regex matching the expected way?

I'm trying to get rid of the slash character in case it exists at the end of my string. I used the following expression, intending to match any character not being slash at the end of the line.
var str = "http://hazaa.com/blopp/";
str.match("[^/$]+", "g");
For some reason (surely logical and explainable but not graspabled to me on my own), I get the split into three string looking as follows.
["http:", "hazaa.com", "blopp"]
What am I assuming wrongly?
How to resolve it?
In str.match("[^/$]+", "g");, why put dollar sign inside bracket? It's supposed to be outside, namely, str.match("[^/]+$", "g");.
To remove all the trailing slash, you can use str.replace(/\/+$/, ""). (If you'd like to remove the last trailing slash ONLY, remove the + in the replace's regex)
Update:
One more way that doesn't use replace:
function stripEndingSlashes(str) {
var matched = str.match("(.*[^/]+)/*$");
return matched ? matched[1] : "";
}
The regexp is choosing "everything except slash". That is why match() returns the parts of the string between slashes.
You can resolve it with the replace() function:
var str = "http://hazaa.com/blopp/";
//replace the last slash with an empty string to remove it
str = str.replace(/\/$/,'');
The regexp literal should always be surrounded between / characters. So here the regexp is:
\/ : this means a single slash character. In order to prevent Javascript from interpreting your slash as the end of regexp, it needs to be 'escaped' with a backslash.
$ : this means the end of the string
Your current regex will match the portion of string until the first / or $ is encountered. The second parameter is ignored; there is no second parameter for String.match.
To remove the trailing slash, use the String.replace function:
var str = "http://hazaa.com/blopp/";
str = str.replace(/\/$/, "");
console.log(str);
// "http://hazaa.com/blopp"
If you need to check whether a string ends with a slash, use the String.match method like this:
var str = "http://hazaa.com/blopp/";
var match = str.match(/\/$/);
console.log(match);
// null if string does not end with /
// ["/"] if string ends with a /
If you need to grab every thing except the last character(s) being /, use this:
var r = /(.+?)\/*$/;
console.log("http://hazaa.com/blopp//".match(r)); // ["http://hazaa.com/blopp//", "http://hazaa.com/blopp"]
console.log("http://hazaa.com/blopp/".match(r)); // ["http://hazaa.com/blopp/", "http://hazaa.com/blopp"]
console.log("http://hazaa.com/bloppA".match(r)); // ["http://hazaa.com/bloppA", "http://hazaa.com/bloppA"]
The 2nd index in the returned array contains the desired portion of the URL. The regex works as follows:
(.+?) un-greedy match (and capture) any character
\/*$ matches optional trailing slash(es)
The first portion regex is intentionally changed to un-greedy. If it was greedy, it would attempt to find the biggest match as long the the whole regex matches (consuming the trailing / in the process). When ungreedy, it will find the smallest match as long as the whole regex matches.
Why use regex? Just check if the last symbol of the string is a slash and then slice. Like this:
if (str.slice(-1) === '/') {
str = str.slice(0, -1);
}

Categories