Using regex, extract values from a string in javascript - javascript

Need to extract values from a string using regex(for perf reasons).
Cases might be as follows:
RED,100
RED,"100"
RED,"100,"
RED,"100\"ABC\"200"
The resulting separated [label, value] array should be:
['RED','100']
['RED','100']
['RED','100,']
['RED','100"ABC"200']
I looked into solutions and a popular library even, just splits the entire string to get the values,
e.g. 'RED,100'.split(/,/) might just do the thing.
But I was trying to make a regex with comma, which splits only if that comma is not enclosed within a quotes type value.
This isnt a standard CSV behaviour might be. But its very easy for end-user to enter values.
enter label,value. Do whatever inside value, if thats surrounded by quotes. If you wanna contain quotes, use a backslash.
Any help is appreciated.

You can use this regex that takes care of escaped quotes in string:
/"[^"\\]*(?:\\.[^"\\]*)*"|[^,"]+/g
RegEx Explanation:
": Match a literal opening quote
[^"\\]*: Match 0 or more of any character that is not \ and not a quote
(?:\\.[^"\\]*)*: Followed by escaped character and another non-quote, non-\. Match 0 or more of this combination to get through all escaped characters
": Match closing quote
|: OR (alternation)
[^,"]+: Match 1+ of non-quote, non-comma string
RegEx Demo
const regex = /"[^"\\]*(?:\\.[^"\\]*)*"|[^,"]+/g;
const arr = [`RED,100`, `RED,"100"`, `RED,"100,"`,
`RED,"100\\"ABC\\"200"`];
let m;
for (var i = 0; i < arr.length; i++) {
var str = arr[i];
var result = [];
while ((m = regex.exec(str)) !== null) {
result.push(m[0]);
}
console.log("Input:", str, ":: Result =>", result);
}

You could use String#match and take only the groups.
var array = ['RED,100', 'RED,"100"', 'RED,"100,"', 'RED,"100\"ABC\"200"'];
console.log(array.map(s => s.match(/^([^,]+),(.*)$/).slice(1)))

Related

How to use regex expression to split str based off multiple characters in expression?

Sorry if the wording is bad. So I'm trying to find out how to pass in a string match of multiple characters long into my dynamic regex expression.
The regex in my else statement works with 1 character being passed in so I'm trying to do the same thing except with multiple characters being passed in the first if statement.
const delimiter = str.slice(0, str.indexOf('\n'));
const strLength = delimiter.length;
if (delimiter[0] === '[' && delimiter.charAt(strLength - 1) === ']') {
const customDelimiter = delimiter.slice(delimiter.indexOf(delimiter[1]), delimiter.indexOf(delimiter.charAt(strLength - 1)));
console.log(customDelimiter) // => '***'
const regex = new RegExp(`,|\\n|\\${customDelimiter}`,'g');
return strArr = str.split(regex).filter(Boolean);
} else {
const firstChar = str.slice(0, 1); // => '*'
const regex = new RegExp(`,|\\n|\\${firstChar}`,'g');
return strArr = str.split(regex).filter(Boolean);
}
So for example I want this string:
'[*]\n11***22***33' to equal 66 b/c it should split it into an array of [11, 22, 33] using the '*' delimiter. I get an error message saying: "SyntaxError: Invalid regular expression: /,|\n|***/: Nothing to repeat".
When you use * as delimeter in your regex, it becomes ,|\\n|\\|\*, which is the correct regex.
It matches ',' or '\n' or a '*' character.
For your string, it matches [***]\n11***22***33.
But when you use *** as a delimiter in your regex, it becomes ,|\\n|\\|\***, which is incorrect. Here it gets two unescaped * at the end. * in regex means 0 or more of the preceding pattern. You cannot have two of them together.
This is a special case because * has a special meaning in regex.
If you would have used any non-regex character, it would work.
A simpler solution would be to use javascript split function to easily get the desired result.
You could first split the string using \n.
let splitStr = str.split('\n');
// This would return ["[***]", "11***22***33"]
and then split the 1st index of the splitStr using the delimeter.
splitStr[1].split('***');
// splitStr[1].split(customDelimiter)
// This would return ["11", "22", "33"]
Using this you wouldn't need to use if or else statement to separate out single character delimiter and multiple character delimiter.

Replace matching elements in array using regular expressions: invalid character

var input = [paul, Paula, george];
var newReg = \paula?\i
for(var text in input) {
if (newReg.test(text) == true) {
input[input.indexOf(text)] = george
}
}
console.log(input)
I don't know what's wrong in my code. it should change paul and Paula to george but when I run it it says there's an illegal character
The backslash (\) is an escape character in Javascript (along with a lot of other C-like languages). This means that when Javascript encounters a backslash, it tries to escape the following character. For instance, \n is a newline character (rather than a backslash followed by the letter n).
So, thats what is causing your error, you need to replace \paula?\i with /paula?/i
You need to replace \ by / in your regexp pattern.
You should wrap the strings inside quotes "
You need to match correctly your array, val is just the index of the word, not the word himself.
var input = ["paul", "Paula", "george"];
var newReg = /paula?/i;
for (var val in input) {
if (newReg.test(input[val]) == true) {
input[input.indexOf(input[val])] = "george";
}
}
console.log(input);
JSFIDDLE

Split a string ( {a},{b} ) with RegExp in Javascript

I have a string object that is returned by an API. It looks like this:
{Apple},{"A tree"},{Three2},{123},{A bracket {},{Two brackets {}},{}
I only need to split at commas that have } and { on both sides, which I want to keep them as part of the returned result. Doing split("},{") results in first and last entries having leading and trailing brackets, and when there is only one element returned, I have to make additional checks to ensure I don't add any extra brackets to first and last (which is same as first) elements.
I hope there is an elegant RegExp to split at ,, surrounded by }{.
You need to use a positive lookahead to match only a comma which is followed by curly braces. I've tested this and it works:
var apiResponse = "{Apple},{\"A tree\"},{Three2},{123},{A bracket {},{Two brackets {}},{}";
var split = apiResponse.split(/,(?={)/);
console.log("Split length is "+split.length);
for(i = 0; i < split.length; ++i) {
console.log("split["+i+"] is: "+split[i]);
}
The (?=\{) means "must be immediately followed by an opening curly brace".
To read about lookaheads, see this regex tutorial.
var _data = '{Apple},{"A tree"},{Three2},{123},{A bracket {},{Two brackets {}},{}';
var _items = [];
var re = /(^|,){(.*?)}(?=,{|$)/g;
var m;
while ((m = re.exec(_data)) !== null){
_items.push(m[2]);
}
You can test it out using jsFiddle http://jsfiddle.net/wao20/SgFx7/24/
Regex breakdown:
(^|,) Start of the string or by a comma
{ A literal bracket "{"
(.*?) Non-greedy match between two brackets (for more
info http://javascript.info/tutorial/greedy-and-lazy)
} A literal bracket "}"
(?=,{|$) Look ahead and non-comsuming (match a comma ",{" or end of
string) without the look ahead it will eat up the comma and you end up with only every other items.
Update: Changed regex to address Robin's comments.
/(^|,)\{(.*?)\}(?=,|$)/g to /(^|,){(.*?)}(?=,{|$)/g
This should work for the string as provided - it doesn't account for whitespace between braces and commas, nor does it retain the brace-comma-brace pattern within quotes.
var str = '{Apple},{"A tree"},{Three2},{123},{A bracket {},{Two brackets {}},{}';
var parts = [];
var nextIndex = function(str) {
return (str.search(/},{/) > -1) ? str.search(/},{/) + 1 : null;
};
while (nextIndex(str)) {
parts.push(str.slice(0, nextIndex(str)));
str = str.slice(nextIndex(str) + 1);
}
parts.push(str); // Final piece
console.log(parts);

How to get the characters preceded by "add_"

I have a strings "add_dinner", "add_meeting", "add_fuel_surcharge" and I want to get characters that are preceded by "add_" (dinner, meeting, fuel_surcharge).
[^a][^d]{2}[^_]\w+
I have tried this one, but it only works for "add_dinner"
[^add_]\w+
This one works for "add_fuel_surcharge", but takes "inner" from "add_dinner"
Help me to understand please.
Use capturing groups:
/^add_(\w+)$/
Check the returned array to see the result.
Since JavaScript doesn't support lookbehind assertions, you need to use a capturing group:
var myregexp = /add_(\w+)/;
var match = myregexp.exec(subject);
if (match != null) {
result = match[1];
}
[^add_] is a character class that matches a single character except a, d or _. When applied to add_dinner, the first character it matches is i, and \w+ then matches nner.
The [^...] construct matches any single character except the ones listed. So [^add_] matches any single character other than "a", "d" or "_".
If you want to retrieve the bit after the _ you can do this:
/add_(\w+_)/
Where the parentheses "capture" the part of the expression inside. So to get the actual text from a string:
var s = "add_meeting";
var result = s.match(/add_(\w+)/)[1];
This assumes the string will match such that you can directly get the second element in the returned array that will be the "meeting" part that matched (\w+).
If there's a possibility that you'll be testing a string that won't match you need to test that the result of match() is not null.
(Or, possibly easier to understand: result = "add_meeting".split("_")[1];)
You can filter _ string by JavaScript for loop ,
var str = ['add_dinner', 'add_meeting', 'add_fuel_surcharge'];
var filterString = [];
for(var i = 0; i < str.length; i ++){
if(str[i].indexOf("_")>-1){
filterString.push(str[i].substring(str[i].indexOf("_") + 1, str[i].length));
}
}
alert(filterString.join(", "));

Javascript split function not correct worked with specific regex

I have a problem. I have a string - "\,str\,i,ing" and i need to split by comma before which not have slash. For my string - ["\,str\,i", "ing"]. I'm use next regex
myString.split("[^\],", 2)
but it's doesn't worked.
Well, this is ridiculous to avoid the lack of lookbehind but seems to get the correct result.
"\\,str\\,i,ing".split('').reverse().join('').split(/,(?=[^\\])/).map(function(a){
return a.split('').reverse().join('');
}).reverse();
//=> ["\,str\,i", "ing"]
Not sure about your expected output but you are specifying string not a regex, use:
var arr = "\,str\,i,ing".split(/[^\\],/, 2);
console.log(arr);
To split using regex, wrap your regex in /..../
This is not easily possible with js, because it does not support lookbehind. Even if you'd use a real regex, it would eat the last character:
> "xyz\\,xyz,xyz".split(/[^\\],/, 2)
["xyz\\,xy", "xyz"]
If you don't want the z to be eaten, I'd suggest:
var str = "....";
return str.split(",").reduce(function(res, part) {
var l = res.length;
if (l && res[l-1].substr(-1) == "\\" || l<2)
// ^ ^^ ^
// not the first was escaped limit
res[l-1] += ","+part;
else
res.push(part);
return;
}, []);
Reading between the lines, it looks like you want to split a string by , characters that are not preceded by \ characters.
It would be really great if JavaScript had a regular expression lookbehind (and negative lookbehind) pattern, but unfortunately it does not. What it does have is a lookahead ((?=) )and negative lookahead ((?!)) pattern. Make sure to review the documentation.
You can use these as a lookbehind if you reverse the string:
var str,
reverseStr,
arr,
reverseArr;
//don't forget to escape your backslashes
str = '\\,str\\,i,ing';
//reverse your string
reverseStr = str.split('').reverse().join('');
//split the array on `,`s that aren't followed by `\`
reverseArr = reverseStr.split(/,(?!\\)/);
//reverse the reversed array, and reverse each string in the array
arr = reverseArr.reverse().map(function (val) {
return val.split('').reverse().join('');
});
You picked a tough character to match- a forward slash preceding a comma is apt to disappear while you pass it around in a string, since '\,'==','...
var s= 'My dog, the one with two \\, blue \\,eyes, is asleep.';
var a= [], M, rx=/(\\?),/g;
while((M= rx.exec(s))!= null){
if(M[1]) continue;
a.push(s.substring(0, rx.lastIndex-1));
s= s.substring(rx.lastIndex);
rx.lastIndex= 0;
};
a.push(s);
/* returned value: (Array)
My dog
the one with two \, blue \,eyes
is asleep.
*/
Find something which will not be present in your original string, say "###". Replace "\\," with it. Split the resulting string by ",". Replace "###" back with "\\,".
Something like this:
<script type="text/javascript">
var s1 = "\\,str\\,i,ing";
var s2 = s1.replace(/\\,/g,"###");
console.log(s2);
var s3 = s2.split(",");
for (var i=0;i<s3.length;i++)
{
s3[i] = s3[i].replace(/###/g,"\\,");
}
console.log(s3);
</script>
See JSFiddle

Categories