Regex to include quotes in match between quotes (and new lines) - javascript

I'm trying to find strings enclosed in single quotes with this regex : /'+(.*?)'+,?/g
The problem is that single quotes are allowed inside the string as long as they are escaped with a second quote: 'it''s, you''ve, I''m... and so on, ends with one more single quote '''.
My regex breaks if there are any amount of single quotes inside and ends up skipping quotes in the beginning and end of the match if there are any.
It seems to work perfectly as long as nobody adds any quotes inside the string. But this is not how the real world works unfortunately.
How can I make my regex include the quotes in the match?

try this regex:
'(?:''|[^'])*'
explanation: single quote followd by (two quotes OR a non quote char) repeated as necessary, followed by a closing single quote.
https://regex101.com/r/R4sd47/1

Related

Remove all non-ASCII characters from a string except smart quotes

I have this regex that removes all non-ascii characters from a string including all smart quotes:
str.replace(/[\u{0080}-\u{FFFF}]/gu,"");
But I need to keep the Smart quotes
The regex for removing Smart single quotes is: [\u2018\u2019\u201A\u201B\u2032\u2035] and for Smart double quotes is: [\u201C\u201D\u201E\u201F\u2033\u2036].
I need a combined regex that that removes all non-ASCII ([\u{0080}-\u{FFFF}]) except smart quotes ([\u2018\u2019\u201A\u201B\u2032\u2035] or [\u201C\u201D\u201E\u201F\u2033\u2036]).
Note that you need to use the \u{XXXX} notation in the regex with u modifier, and to build the regex you need you need to put the character class with exceptions into a negative lookahead placed right before your more generic pattern:
/(?![\u{2018}\u{2019}\u{201A}\u{201B}\u{2032}\u{2035}\u{201C}\u{201D}\u{201E}\u{201F}\u{2033}\u{2036}])[\u{0080}-\u{FFFF}]/gu
See the regex demo
Note that some chars in the Unicode table go one after another, so we may shorten the pattern using ranges:
/(?![\u{2018}-\u{201F}\u{2032}\u{2033}\u{2035}\u{2036}])[\u{0080}-\u{FFFF}]/gu
See this demo.
Instead of matching the non-ascii, match the ascii + the characters you need, and negate the expression. Example:
str.replace(/[^\x00-\x7F\u2018\u2019\u201A\u201B\u2032\u2035\u201C\u201D\u201E\u201F\u2033\u2036]/gu,"");

Regexp, wrap each CSV field in double quotes

Using a regular expression, I can't find a solution to wrap each field from a csv text into double quotes.
The issue is that there could be already double-quoted fields.
Example:
Country;Product Family;Product SKU;Commercial Status
Germany;Aprobil;"Apro&'bil_1_5 mL";Actively Marketed
Should be
"Country";"Product Family";"Product SKU";"Commercial Status"
"Germany";"Aprobil";"Apro&'bil_1_5 mL";"Actively Marketed"
Basically, I have a problem to get two logical part in a regular expression...
Thanks in advance!
You will need to to do 2 replacements, I think, first regex looks like this:
/([\w ]+[^;\n]*|\"[^\"]*\")/g
The regex will either match:
Any Word character or Space, 1 or more times, followed by any char not being semi colon ';' or newline, any number of times.
A double quote followed by any characters not being double quote, any number of times, ending with a double quote.
You then replace the matches with: \"\1\".
Fianally you replace 2 double quotes with a single one.
In JavaScript this is:
var test = 'Country;Product Family;Product SKU;Commercial Status\n'
+ 'Germany;Aprobil;"Apro&'bil_1_5 mL";Actively Marketed\n';
var regex = /([\w ]+[^;\n]*|\"[^\"]*\")/g;
test = test.replace(regex, '\"\1\"'); // wrap in double quotes
test = test.replace(/\"\"/g, '\"'); // replace 2 quotes with one
Now you should have what you want.

Regex to Match Quoted String Containing Escapes Quotes

I'm trying to write a regular expression that will match a double quoted string - with the possibility that an escaped double quote may reside inside this string.
My current attempt at this regular expression can be found here:
^"([^"]|\\")*"
Where I am attempting to run this against the following value:
"sdfs\"dasf"
The regular expression will complete at the second double quote, not at the third one as intended. However, if I add an $ at the end of the regular expression, it will correctly parse correctly. Unfortunately, I cannot use the end of string symbol ($) in my code implementation.
It seems that the capturing group is not greedy enough and allows the second double quote to go to the end of the regular expression.
Any ideas what would cause this behaviour or how to remedy this?
This should do the trick:
"(?!\\").+"

Javascript regexp non greedy search for quotes

I have following text:
{{field.text || 'Čeština' | l10n}}
Regexp:
/((?!l10n))*?(['"])(.*?)\2[\s]*?\|[\s]*?l10n/g
And I am trying to replace strings before l10n with modified strings. My regexp is working fine except for this situation, where it eats ' from setLocale function.
Here is interactive regex tester with my expression - https://regex101.com/r/vX5tJ6/3
Question is, why is it eating the ' from setLocale when there is no | after (as specified in regexp)?
Maybe this is what you're looking for:
(['"])([^'"]*)\1\s*\|\s*l10n
https://regex101.com/r/lV8wV7/1
It looks for anything in single or double quotes followed by | l10n with optional spaces.
Your regex was matching a single or double quote, followed by any characters, non-greedily, then another matching quote. However, it was able to non-greedily match the enclosing quotes (so not just the last satisfying quote it encountered) without violating the rest of the pattern.
The main difference in the above pattern is that it won't allow enclosing quotes.
If you need to allow double quotes enclosed in single quotes or single quotes in double quotes, you can try the following:
(?:(')([^']*)'|(")([^"]*)")\s*\|\s*l10n
https://regex101.com/r/mL8gA6/1

What is this "/\,$/"?

Tried to search for /\,$/ online, but coudnt find anything.
I have:
coords = coords.replace(/\,$/, "");
Im guessing it returns coords string index number. What I have to search online for this, so I can learn more?
/\,$/ finds the comma character (,) at the end of a string (denoted by the $) and replaces it with empty (""). You sometimes see this in regex code aiming to clean up excerpts of text.
It's a regular expression to remove a trailing comma.
That thing is a Regular Expression, also known as regex or regexp. It is a way to "match" strings using some rules. If you want to learn how to use it in JavaScript, read the Mozilla Developer Network page about RegExp.
By the way, regular expressions are also available on most languages and in some tools. It is a very useful thing to learn.
That's a regular expression that finds a comma at the end of a string. That code removes the comma.
// defines a JavaScript regular expression, used to match a pattern within a string.
\,$ is the pattern
In this case \, translates to ,. A backslash is used to escape special characters, but in this case, it's not necessary. An example where it would be necessary would be to remove trailing periods. If you tried to do that with /.$/ the period here has a different meaning; it is used as a wildcard to match [almost] any character (aside for some newlines). So in this case to match on "." (period character) you would have to escape the wildcard (/\.$/).
When $ is placed at the end of the pattern, it means only look at the end of the string. This means that you can't mistakingly find a comma anywhere in the middle of the string (e.g., not after help in help, me,), only at the end (trailing). It also speeds of the regular expression search considerably. If you wanted to match on characters only at the beginning of the string, you would start off the pattern with a carat (^), for instance /^,/ would find a comma at the start of a string if one existed.
It's also important to note that you're only removing one comma, whereas if you use the plus (+) after the comma, you'd be replacing one or more: /,+$/.
Without the +; trailing commas,, becomes trailing commas,
With the +; no trailing comma,, becomes no trailing comma

Categories