Regexp, wrap each CSV field in double quotes - javascript

Using a regular expression, I can't find a solution to wrap each field from a csv text into double quotes.
The issue is that there could be already double-quoted fields.
Example:
Country;Product Family;Product SKU;Commercial Status
Germany;Aprobil;"Apro&'bil_1_5 mL";Actively Marketed
Should be
"Country";"Product Family";"Product SKU";"Commercial Status"
"Germany";"Aprobil";"Apro&'bil_1_5 mL";"Actively Marketed"
Basically, I have a problem to get two logical part in a regular expression...
Thanks in advance!

You will need to to do 2 replacements, I think, first regex looks like this:
/([\w ]+[^;\n]*|\"[^\"]*\")/g
The regex will either match:
Any Word character or Space, 1 or more times, followed by any char not being semi colon ';' or newline, any number of times.
A double quote followed by any characters not being double quote, any number of times, ending with a double quote.
You then replace the matches with: \"\1\".
Fianally you replace 2 double quotes with a single one.
In JavaScript this is:
var test = 'Country;Product Family;Product SKU;Commercial Status\n'
+ 'Germany;Aprobil;"Apro&'bil_1_5 mL";Actively Marketed\n';
var regex = /([\w ]+[^;\n]*|\"[^\"]*\")/g;
test = test.replace(regex, '\"\1\"'); // wrap in double quotes
test = test.replace(/\"\"/g, '\"'); // replace 2 quotes with one
Now you should have what you want.

Related

Backslash bug in JavaScript

I have a string that involves tricky \\ characters.
Below is the initial code, and what I am literally trying to achieve but it is not working. I have to replace the \" characters but I think that is where the bug is.
var current = csvArray[0][i].Replace("\"", "");
I have tried the variation below but it is still not working.
var current = csvArray[0][i].Replace('\"', '');
It is currently throwing an Uncaught TypeError: csvArray[0][i].Replace is not a function
Is there a way for Javascript to take my string ("\"") literally like in C#? Kindly help me investigate. Thanks!
If the sequence you want to match is a single backslash character followed by a quotation mark, then you need to escape the backslash itself because backslashes have special meaning in string literals. You then need to separately escape the quotation mark with its own backslash:
.replace("\\\"", "")
I believe that would also be true in C#.
Or you can simplify it by using single quotes around the string so that only the backslash needs to be escaped:
.replace('\\"', '')
If the first argument to .replace() is a string, however, it will only replace the first occurrence. To do a global replace you have to use a regular expression with the g flag, noting that backslashes need to be escaped in regular expressions too:
.replace(/\\"/g, '')
I'm not going to setup a demo array to exactly match your code, but here's a simple demo where you can see that a lone backslash or quote in the input string are not replaced, but all backslash-quote combinations are replaced:
var input = 'Some\\ test" \\" text \\" for demo \\"'
var output = input.replace(/\\"/g, '')
console.log(input)
console.log(output)

Regex to include quotes in match between quotes (and new lines)

I'm trying to find strings enclosed in single quotes with this regex : /'+(.*?)'+,?/g
The problem is that single quotes are allowed inside the string as long as they are escaped with a second quote: 'it''s, you''ve, I''m... and so on, ends with one more single quote '''.
My regex breaks if there are any amount of single quotes inside and ends up skipping quotes in the beginning and end of the match if there are any.
It seems to work perfectly as long as nobody adds any quotes inside the string. But this is not how the real world works unfortunately.
How can I make my regex include the quotes in the match?
try this regex:
'(?:''|[^'])*'
explanation: single quote followd by (two quotes OR a non quote char) repeated as necessary, followed by a closing single quote.
https://regex101.com/r/R4sd47/1

JS Regex: Double slash splitting along with other characters

I have a split statement in my JavaScript that will split spaces and semicolons, but I want to split double slashes as well. I cannot figure out how to include a double slash along with the space and semicolon.
line = lines[i].split(/[\s;]+/);
Any help is greatly appreciated.
so assuming that by "double slashes" you mean a double forward slash ( "//" ) you are going to want to do something like the following:
line = lines[i].split(/[\s;]+|\/{2}/);
Note that the matching options are being moved from between brackets, because when placed within the brackets, "{", "2", and "}" would be interpreted literally, rather than as a pattern
The other answers will not behave properly in the presence of a double slash or semi-colon surrounded by spaces. It will generate empty strings in the output. This regexp handles that case:
/(?:\s|;|\/\/)+/
In other words, split on any sequence composed of spaces, semi-colons, or double slashes.
var re = /(?:\s|;|\/\/)+/;
var input = "Some stuff; more stuff // last stuff";
console.log(input.split(re));

Matching items in a comma-delimited list which aren't surrounded by single or double quotes

I'm wanting to match any instance of text in a comma-delimited list. For this, the following regular expression works great:
/[^,]+/g
(Regex101 demo).
The problem is that I'm wanting to ignore any commas which are contained within either single or double quotes and I'm unsure how to extend the above selector to allow me to do that.
Here's an example string:
abcd, efgh, ij"k,l", mnop, 'q,rs't
I'm wanting to either match the five chunks of text or match the four relevant commas (so I can retreive the data using split() instead of match()):
abcd
efgh
ij"k,l"
mnop
'q,rs't
Or:
abcd, efgh, ij"k,l", mnop, 'q,rs't
^ ^ ^ ^
How can I do this?
Three relevant questions exist, but none of them cater for both ' and " in JavaScript:
Regex for splitting a string using space when not surrounded by single or double quotes - Java solution, doesn't appear to work in JavaScript.
A regex to match a comma that isn't surrounded by quotes - Only matches on "
Alternative to regex: match all instances not inside quotes - Only matches on "
Okay, so your matching groups can contain:
Just letters
A matching pair of "
A matching pair of '
So this should work:
/((?:[^,"']+|"[^"]*"|'[^']*')+)/g
RegEx101 Demo
As a nice bonus, you can drop extra single-quotes inside the double-quotes, and vice versa. However, you'll probably need a state machine for adding escaped double-quotes inside double quoted strings (eg. "aa\"aa").
Unfortunately it matches the initial space as well - you'll have to the trim the matches.
Using a double lookahead to ascertain matched comma is outside quotes:
/(?=(([^"]*"){2})*[^"]*$)(?=(([^']*'){2})*[^']*$)\s*,\s*/g
(?=(([^"]*"){2})*[^"]*$) asserts that there are even number of double quotes ahead of matching comma.
(?=(([^']*"){2})*[^']*$) does the same assertion for single quote.
PS: This doesn't handle case of unbalanced, nested or escaped quotes.
RegEx Demo
Try this in JavaScript
(?:(?:[^,"'\n]*(?:(?:"[^"\n]*")|(?:'[^'\n]*'))[^,"'\n]*)+)|[^,\n]+
Demo
Add group for more readable (remove ?<name> for Javascript)
(?<has_quotes>(?:[^,"'\n]*(?:(?<double_quotes>"[^"\n]*")|(?<single_quotes>'[^'\n]*'))[^,"'\n]*)+)|(?<simple>[^,\n]+)
Demo
Explanation:
(?<double_quotes>"[^"\n]*") matches "Any inside but not "" = (1) (in double quote)
(?<single_quotes>'[^'\n]*') matches 'Any inside but not '' = (2) (in single quote)
(?:(?<double_quotes>"[^"\n]*")|(?<single_quotes>'[^'\n]*')) matches (1)or(2) = (3)
[^,"'\n]* matches any text but not "', = (w)
(?:(?:(?<double_quotes>"[^"\n]*")|(?<single_quotes>'[^'\n]*'))[^,"'\n]*) matches (3)(w)
(?:(?:(?<double_quotes>"[^"\n]*")|(?<single_quotes>'[^'\n]*'))[^,"'\n]*)+ matches repeat (3)(w) = (3w+)
(?<has_quotes>[^,"'\n]*(?:(?:(?<double_quotes>"[^"\n]*")|(?<single_quotes>'[^'\n]*'))[^,"'\n]*)+) matches (w)(3w+) = (4) (has quotes)
[^,\n]+ matches other case (5) (simple)
So in final we have (4)|(5) (has quote or simple)
Input
abcd,efgh, ijkl
abcd, efgh, ij"k,l", mnop, 'q,rs't
'q, rs't
"'q,rs't, ij"k, l""
Output:
MATCH 1
simple [0-4] `abcd`
MATCH 2
simple [5-9] `efgh`
MATCH 3
simple [10-15] ` ijkl`
MATCH 4
simple [16-20] `abcd`
MATCH 5
simple [21-26] ` efgh`
MATCH 6
has_quotes [27-35] ` ij"k,l"`
double_quotes [30-35] `"k,l"`
MATCH 7
simple [36-41] ` mnop`
MATCH 8
has_quotes [42-50] ` 'q,rs't`
single_quotes [43-49] `'q,rs'`
MATCH 9
has_quotes [51-59] `'q, rs't`
single_quotes [51-58] `'q, rs'`
MATCH 10
has_quotes [60-74] `"'q,rs't, ij"k`
double_quotes [60-73] `"'q,rs't, ij"`
MATCH 11
has_quotes [75-79] ` l""`
double_quotes [77-79] `""`

Javascript regexp non greedy search for quotes

I have following text:
{{field.text || 'Čeština' | l10n}}
Regexp:
/((?!l10n))*?(['"])(.*?)\2[\s]*?\|[\s]*?l10n/g
And I am trying to replace strings before l10n with modified strings. My regexp is working fine except for this situation, where it eats ' from setLocale function.
Here is interactive regex tester with my expression - https://regex101.com/r/vX5tJ6/3
Question is, why is it eating the ' from setLocale when there is no | after (as specified in regexp)?
Maybe this is what you're looking for:
(['"])([^'"]*)\1\s*\|\s*l10n
https://regex101.com/r/lV8wV7/1
It looks for anything in single or double quotes followed by | l10n with optional spaces.
Your regex was matching a single or double quote, followed by any characters, non-greedily, then another matching quote. However, it was able to non-greedily match the enclosing quotes (so not just the last satisfying quote it encountered) without violating the rest of the pattern.
The main difference in the above pattern is that it won't allow enclosing quotes.
If you need to allow double quotes enclosed in single quotes or single quotes in double quotes, you can try the following:
(?:(')([^']*)'|(")([^"]*)")\s*\|\s*l10n
https://regex101.com/r/mL8gA6/1

Categories