Splitting string by regular expression - javascript

I have the following code snippet:
var colorText = "red,blue,green,yellow";
var colors3 = colorText.split(/[^\,]+/);
alert(colors3); // ["", ",", ",", ",", ""]
I don't understand what's going on here. As far as I understand, the regular expression will match any commas at the beginning of a string, and it matches 1 or more of these strings. What happens when we provide this regular expression as the argument to split? Surely, if we just tried to match the regex against colorText, we'd be getting no match, because the starting character is not a comma. But how does the regex provided to split lead to an array of commas and two empty string on each side?

Why do you need a regex when you can simply do split(',') ?
var colorText = "red,blue,green,yellow";
var colors3 = colorText.split(',');
console.log(colors3);
If you want to select everything but the comma then maybe using match is a better idea.
var colorText = ",red,blue,green,yellow";
var colors3 = colorText.match(/[^\,]+/g);
console.log(colors3);

As explained in MDN web docs [^xyz]
A negated or complemented character set. That is, it matches anything that is not enclosed in the brackets.
Your regex /[^\,]+/ will match any sequence of characters that doesn't include any comma.
So your regex will match these sequences in colorText:
red
blue
green
yellow
and the split function will split colorText at those sequences.
However, if you want to split your string at each comma, use this:
colors = colorText.split(',');

If you like to prevent empty items on splitting, you could use String#match instead of String#split and a regular expression which matches all characters except commas.
var regex = /[^,]+/g;
console.log(",red,blue,green,yellow,".match(regex));
console.log("red,blue,green,yellow".match(regex));
.as-console-wrapper { max-height: 100% !important; top: 0; }

So, my goal was not to separate the words in the string by comma. I found this code in a book and wanted to understand it. The mistake I made was that I thought that the ^ matched the beginning of a string, while in fact it means "anything but" inside of square brackets. Now I understand that the regular expression matches any number of character that is not a comma, and that's what tells split() what to put in each list element. The first and last elements are empty strings because that's what at the left and right side of the first and last words, respectively.

You have to remove that caret ^ from var colors3 = colorText.split(/[^\,]+/); so that it works well:
var colorText = "red,blue,green,yellow";
var colors3 = colorText.split(/[\,]+/);
console.log(colors3);

Related

How to match regular expression In Javascript

I have string [FBWS-1] comes first than [FBWS-2]
In this string, I want to find all occurance of [FBWS-NUMBER]
I tried this :
var term = "[FBWS-1] comes first than [FBWS-2]";
alert(/^([[A-Z]-[0-9]])$/.test(term));
I want to get all the NUMBERS where [FBWS-NUMBER] string is matched.
But no success. I m new to regular expressions.
Can anyone help me please.
Note that ^([[A-Z]-[0-9]])$ matches start of a string (^), a [ or an uppercase ASCII letter (with [[A-Z]), -, an ASCII digit and a ] char at the end of the string. So,basically, strings like [-2] or Z-3].
You may use
/\[[A-Z]+-[0-9]+]/g
See the regex demo.
NOTE If you need to "hardcode" FBWS (to only match values like FBWS-123 and not ABC-3456), use it instead of [A-Z]+ in the pattern, /\[FBWS-[0-9]+]/g.
Details
\[ - a [ char
[A-Z]+ - one or more (due to + quantifier) uppercase ASCII letters
- - a hyphen
[0-9]+ - one or more (due to + quantifier) ASCII digits
] - a ] char.
The /g modifier used with String#match() returns all found matches.
JS demo:
var term = "[FBWS-1] comes first than [FBWS-2]";
console.log(term.match(/\[[A-Z]+-[0-9]+]/g));
You can use:
[\w+-\d]
var term = "[FBWS-1] comes first than [FBWS-2]";
alert(/[\w+-\d]/.test(term));
There are several reasons why your existing regex doesn't work.
You trying to match the beginning and ending of your string when you
actually want everything in between, don't use ^$
Your only trying to match one alpha character [A-Z] you need to make this greedy using the +
You can shorten [A-Z] and [0-9] by using the shorthands \w and \d. The brackets are generally unnecessary.
Note your code only returns a true false value (your using test) ATM it's unclear if this is what you want. You may want to use match with a global modifier (//g) instead of test to get a collection.
Here is an example using string.match(reg) to get all matches strings:
var term = "[FBWS-1] comes first than [FBWS-2]";
var reg1 = /\[[A-Z]+-[0-9]\]/g;
var reg2 = /\[FBWS-[0-9]\]/g;
var arr1 = term.match(reg1);
var arr2 = term.match(reg2)
console.log(arr1);
console.log(arr2);
Your regular expression /^([[A-Z]-[0-9]])$/ is wrong.
Give this regex a try, /\[FBWS-\d\]/g
remove the g if you only want to find 1 match, as g will find all similar matches
Edit: Someone mentioned that you want ["any combination"-"number"], hence if that's what you're looking for then this should work /\[[A-Z]+-\d\]/

Javascript reg exp not right

Here is a string str = '.js("aaa").js("bbb").js("ccc")', I want to write a regular expression to return an Array like this:
[aaa, bbb, ccc];
My regular expression is:
var jsReg = /.js\(['"](.*)['"]\)/g;
var jsAssets = [];
var js;
while ((js = jsReg.exec(find)) !== null) {
jsAssets.push(js[1]);
}
But the jsAssets result is
[""aaa").js("bbb").js("ccc""]
What's wrong with this regular expression?
Use the lazy version of .*:
/\.js\(['"](.*?)['"]\)/g
^
And it would be better if you escape the first dot.
This will match the least number of characters until the next quote.
jsfiddle demo
If you want to allow escaped quotes, use something like this:
/\.js\(['"]((?:\\['"]|[^"])+)['"]\)/g
regex101 demo
I believe it can be done in one-liner with replace and match method calls:
var str = '.js("aaa").js("bbb").js("ccc")';
str.replace(/[^(]*\("([^"]*)"\)[^(]*/g, '$1,').match(/[^,]+/g);
//=> ["aaa", "bbb", "ccc"]
The problem is that you are using .*. That will match any character. You'll have to be a bit more specific with what you are trying to capture.
If it will only ever be word characters you could use \w which matches any word character. This includes [a-zA-Z0-9_]: uppercase, lowercase, numbers and an underscore.
So your regex would look something like this :
var jsReg = /js\(['"](\w*)['"]\)/g;
In
/.js\(['"](.*)['"]\)/g
matches as much as possible, and does not capture group 1, so it matches
"aaa").js("bbb").js("ccc"
but given your example input.
Try
/\.js\(('(?:[^\\']|\\.)*'|"(?:[\\"]|\\.)*"))\)/
To break this down,
\. matches a literal dot
\.js\( matches the literal string ".js("
( starts to capture the string.
[^\\']|\\. matches a character other than quote or backslash or an escaped non-line terminator.
(?:[\\']|\\.)* matches the body of a string
'(?:[\\']|\\.)*' matches a single quoted string
(...|...) captures a single quoted or double quoted string
)\) closes the capturing group and matches a literal close parenthesis
The second major problem is your loop.
You're doing a global match repeatedly which makes no sense.
Get rid of the g modifier, and then things should work better.
Try this one - http://jsfiddle.net/UDYAq/
var str = new String('.js("aaa").js("bbb").js("ccc")');
var regex = /\.js\(\"(.*?)\"\){1,}/gi;
var result = [];
result = str.match (regex);
for (i in result) {
result[i] = result[i].match(/\"(.*?)\"/i)[1];
}
console.log (result);
To be sure that matched characters are surrounded by the same quotes:
/\.js\((['"])(.*?)\1\)/g

Regex with dynamic length

I have string with 2 or 3 words:
'apple grape lemon'
'apple grape'
I need to get first char from all words.
my regex:
/^(\w).*?\ (\w).*?\ ?(\w?).*?$/
For all strings this regex get only first char of 2 words.
How to fix?
You cannot do this with one regex (unless you are using .NET). But you can use a regex that matches one first character of a word, then get all the matches, and join them together:
var firstLetters = '';
var match = str.match(/\b\w/g)
if (match)
firstLetters = match.join('');
Of course if you just want to get the letters on their own, there is no need for the join, since the match will simply be an array containing all those letters.
You should not, that \w is not only letters, but digits and underscores, too.
If you work with javascript, you don't need to regex the hell out of a simple problem.
To get the first letter, just do that:
var aString = 'apple bee plant';
var anArray = aString.split(' ');
for(var aWord in anArray) {
var firstLetter = aWord.charAt(0);
}
Regular expressions are a regular language, such that you cannot have this kind of repetition in them. What you want is to cut the string into individual tokes (which can be done via regular expressions to match the separator) and then apply an regular expression on each token. To get the first char from each word it is faster to use a substring operation instead of a regular expression.
The problem with your regex is that the .*? after the second word eats up all the following content as everything afterwards is optional. This could be solved, but I personally think it makes things more complicated than required.
The most simple way would be:
firstLetters = (m = str.match(/\b\w/g))? m.join('') : '';
In regexp "words" don't mean only letters. In JavaScript \w is equals [A-Za-z0-9_]. So if you want only letters in your result, you can use [A-Za-z].

JavaScript regexp not matching

I am having a difficult time getting a seemingly simple Regexp. I am trying to grab the last occurrences of word characters between square brackets in a string. My code:
pattern = /\[(\w+)\]/g;
var text = "item[gemstones_attributes][0][shape]";
if (pattern.test(text)) {
alert(RegExp.lastMatch);
}
The above code is outputting "gemstones_attributes", when I want it to output "shape". Why is this regexp not working, or is there something wrong with my approach to getting the last match? I'm sure that I am making an obvious mistake - regular expressions have never been my string suit.
Edit:
There are cases in which the string will not terminate with a right-bracket.
You can greedily match as much as possible before your pattern which will result in your group matching only the last match:
pattern = /.*\[(\w+)\]/g;
var text = "item[gemstones_attributes][0][shape]";
var match = pattern.exec(text);
if (match != null) alert(match[1]);
RegExp.lastMatch gives the match of the last regular expression. It isn't the last match in the text.
Regular expressions parse left to right and are greedy. So your regexp matches the first '[' it sees and grabs the words between it. When you call lastMatch it gives you the last pattern matched. What you need is to match everything you can first .* and then your pattern.
i think your problem is in your regex not in your src line .lastMatch.
Your regex returns just the first match of your square brackets and not all matches. You can try to add some groups to your regular expression - and normally you should get all matches.
krikit
Use match() instead of test()
if (text.match(pattern))
test() checks for a match inside a string. This is successfull after the first occurence, so there is no need for further parsing.

Javascript: String replace problem

I've got a string which contains q="AWORD" and I want to replace q="AWORD" with q="THEWORD". However, I don't know what AWORD is.. is it possible to combine a string and a regex to allow me to replace the parameter without knowing it's value? This is what I've got thus far...
globalparam.replace('q="/+./"', 'q="AWORD"');
What you have is just a string, not a regular expression. I think this is what you want:
globalparam.replace(/q=".+?"/, 'q="THEWORD"');
I don't know how you got the idea why you have to "combine" a string and a regular expression, but a regex does not need to exist of wildcards only. A regex is like a pattern that can contain wildcards but otherwise will try to match the exact characters given.
The expression shown above works as follows:
q=": Match the characters q, = and ".
.+?": Match any character (.) up to (and including) the next ". There must be at least one character (+) and the match is non-greedy (?), meaning it tries to match as few characters as possible. Otherwise, if you used .+", it would match all characters up to the last quotation mark in the string.
Learn more about regular expressions.
Felix's answer will give you the solution, but if you actually want to construct a regular expression using a string you can do it this way:
var fullstring = 'q="AWORD"';
var sampleStrToFind = 'AWORD';
var mat = 'q="'+sampleStrToFind+'"';
var re = new RegExp(mat);
var newstr = fullstring.replace(re,'q="THEWORD"');
alert(newstr);
mat = the regex you are building, combining strings or whatever is needed.
re = RegExp constructor, if you wanted to do global, case sensitivity, etc do it here.
The last line is string.replace(RegExp,replacement);

Categories