Javascript regexp non greedy search for quotes - javascript

I have following text:
{{field.text || 'Čeština' | l10n}}
Regexp:
/((?!l10n))*?(['"])(.*?)\2[\s]*?\|[\s]*?l10n/g
And I am trying to replace strings before l10n with modified strings. My regexp is working fine except for this situation, where it eats ' from setLocale function.
Here is interactive regex tester with my expression - https://regex101.com/r/vX5tJ6/3
Question is, why is it eating the ' from setLocale when there is no | after (as specified in regexp)?

Maybe this is what you're looking for:
(['"])([^'"]*)\1\s*\|\s*l10n
https://regex101.com/r/lV8wV7/1
It looks for anything in single or double quotes followed by | l10n with optional spaces.
Your regex was matching a single or double quote, followed by any characters, non-greedily, then another matching quote. However, it was able to non-greedily match the enclosing quotes (so not just the last satisfying quote it encountered) without violating the rest of the pattern.
The main difference in the above pattern is that it won't allow enclosing quotes.
If you need to allow double quotes enclosed in single quotes or single quotes in double quotes, you can try the following:
(?:(')([^']*)'|(")([^"]*)")\s*\|\s*l10n
https://regex101.com/r/mL8gA6/1

Related

Regex for escape apostrophe(single quote) in word with single quotes [duplicate]

I am looking for a pattern that can find apostrophes that are inside single quotes. For example the text
Foo 'can't' bar 'don't'
I want to find and replace the apostrophe in can't and don't, but I don't want to find the single quotes
I have tried something like
(.*)'(.*)'(.*)'
and apply the replace on the second matching group. But for text that has 2 words with apostrophes this pattern won't work.
Edit: to clarify the text could have single quotes with no apostrophes inside them, which should be preserved as is. For example
'foo' 'can't' bar 'don't'
I am still looking for only apostrophes, so the single quotes around foo should not match
I believe you need to require "word" characters to appear before and after a ' symbol, and it can be done with a word boundary:
\b'\b
See the regex demo
To only match the quote inside letters use
(?<=\p{L})'(?=\p{L})
(?<=[[:alpha:]])'(?=[[:alpha:]])
(?U)(?<=\p{Alpha})'(?=\p{Alpha}) # Java, double the backslashes in the string literal
Or ASCII only
(?<=[a-zA-Z])'(?=[a-zA-Z])
You can use the following regular expression:
'[^']+'\s|'[^']+(')[^' ]+'
it will return 3 matches, and if capture group 1 participated in the word, it will be the apostrophe in the word:
'foo'
'can't'
'don't'
demo
How it works:
'[^']+'\s
' match an apostrophe
[^']+ followed by at least one character that isn't an apostrophe
' followed by an apostrophe
\s followed by a space
| or
'[^']+(')[^' ]+'
' match an apostrophe
[^']+ followed by at least one character that isn't an apostrophe
(') followed by an apostrophe, and capture it in capture group 1
[^' ]+ followed by at least one character that is not an apostrophe or a space
' followed by an apostrophe

Remove all non-ASCII characters from a string except smart quotes

I have this regex that removes all non-ascii characters from a string including all smart quotes:
str.replace(/[\u{0080}-\u{FFFF}]/gu,"");
But I need to keep the Smart quotes
The regex for removing Smart single quotes is: [\u2018\u2019\u201A\u201B\u2032\u2035] and for Smart double quotes is: [\u201C\u201D\u201E\u201F\u2033\u2036].
I need a combined regex that that removes all non-ASCII ([\u{0080}-\u{FFFF}]) except smart quotes ([\u2018\u2019\u201A\u201B\u2032\u2035] or [\u201C\u201D\u201E\u201F\u2033\u2036]).
Note that you need to use the \u{XXXX} notation in the regex with u modifier, and to build the regex you need you need to put the character class with exceptions into a negative lookahead placed right before your more generic pattern:
/(?![\u{2018}\u{2019}\u{201A}\u{201B}\u{2032}\u{2035}\u{201C}\u{201D}\u{201E}\u{201F}\u{2033}\u{2036}])[\u{0080}-\u{FFFF}]/gu
See the regex demo
Note that some chars in the Unicode table go one after another, so we may shorten the pattern using ranges:
/(?![\u{2018}-\u{201F}\u{2032}\u{2033}\u{2035}\u{2036}])[\u{0080}-\u{FFFF}]/gu
See this demo.
Instead of matching the non-ascii, match the ascii + the characters you need, and negate the expression. Example:
str.replace(/[^\x00-\x7F\u2018\u2019\u201A\u201B\u2032\u2035\u201C\u201D\u201E\u201F\u2033\u2036]/gu,"");

Regex to include quotes in match between quotes (and new lines)

I'm trying to find strings enclosed in single quotes with this regex : /'+(.*?)'+,?/g
The problem is that single quotes are allowed inside the string as long as they are escaped with a second quote: 'it''s, you''ve, I''m... and so on, ends with one more single quote '''.
My regex breaks if there are any amount of single quotes inside and ends up skipping quotes in the beginning and end of the match if there are any.
It seems to work perfectly as long as nobody adds any quotes inside the string. But this is not how the real world works unfortunately.
How can I make my regex include the quotes in the match?
try this regex:
'(?:''|[^'])*'
explanation: single quote followd by (two quotes OR a non quote char) repeated as necessary, followed by a closing single quote.
https://regex101.com/r/R4sd47/1

Regex to Match Quoted String Containing Escapes Quotes

I'm trying to write a regular expression that will match a double quoted string - with the possibility that an escaped double quote may reside inside this string.
My current attempt at this regular expression can be found here:
^"([^"]|\\")*"
Where I am attempting to run this against the following value:
"sdfs\"dasf"
The regular expression will complete at the second double quote, not at the third one as intended. However, if I add an $ at the end of the regular expression, it will correctly parse correctly. Unfortunately, I cannot use the end of string symbol ($) in my code implementation.
It seems that the capturing group is not greedy enough and allows the second double quote to go to the end of the regular expression.
Any ideas what would cause this behaviour or how to remedy this?
This should do the trick:
"(?!\\").+"

Regex match with '\' slash and replace with '\\'?

I was converting normal string in to latex format.So i was created the latex code match and replace the \ single slash into \\ double slash.why the i need it Refer this link.I tried Below code :
function test(){
var tex="$$\left[ x=\left({{11}\over{2}}+{{\sqrt{3271}}\over{2\,3^{{{3}\over{2} $$";
var tex_form = tex.replace("/[\\\/\\\\\.\\\\]/g", "\\");
document.getElementById('demo').innerHTML=tex_form;//nothing get
}
test();
<p id="demo"></p>
Not getting any output data.But the match in this link
i wish to need replace the \ into \\
There are these issues:
The string literal has no backslashes;
The regular expression is not a regular expression;
The class in the intended regular expression cannot match sequences, only single characters;
The replacement would not add backslashes, only replace with them.
Here you find the details on each point:
1. How to Encode Backslashes in String Literals
Your tex variable has no backslashes. This is because a backslash in a string literal is not taken as a literal backslash, but as an escape for interpreting the character that follows it.
When you have "$$\left...", then the \l means "literal l", and so the content of your variable will be:
$$left...
As an l does not need to be escaped, the backslash is completely unnecessary, and these two assignments result in the same string value:
var tex="$$\left[ x=\left({{11}\over{2}}+{{\sqrt{3271}}\over{2\,3^{{{3}\over{2} $$";
var tex="$$left[ x=left({{11}over{2}}+{{sqrt{3271}}over{2,3^{{{3}over{2} $$";
To bring the point home, this will also represent the same value:
var tex="\$\$\l\e\f\t\[\ \x\=\l\e\f\t\(\{\{\1\1\}\o\v\e\r\{\2\}\}\+\{\{\s\q\r\t\{\3\2\7\1\}\}\o\v\e\r\{\2\,\3\^\{\{\{\3\}\o\v\e\r\{\2\}\ \$\$";
If you really want to have literal backslashes in your content (which I understand you do, as this is about LaTeX), then you need to escape each of those backslashes... with a backslash:
var tex="$$\\left[ x=\\left({{11}\\over{2}}+{{\\sqrt{3271}}\\over{2\\,3^{{{3}\\over{2} $$";
Now the content of your tex variable will be this string:
$$\left[ x=\left({{11}\over{2}}+{{\sqrt{3271}}\over{2\,3^{{{3}\over{2} $$
2. How to Code Regular Expression Literals
You are passing a string literal to the first argument of replace, while you really intend to pass a regular expression literal. You should leave out the quotes for that to happen. The / are the delimiters of a regular expression literal, not quotes:
/[\\\/\\\\\.\\\\]/g
This should not be wrapped in quotes. JavaScript understands the / delimiters as denoting a regular expression literal, including the optional modifiers at the end (like g here).
3. Classes are sets of single characters
This regular expression has unnecessary characters. The class [...] should list all individual characters you want to match. Currently you have these characters (after resolving the escapes):
\
/
\
\
.
\
\
It is overkill to have the backslash represented 5 times. Also, in JavaScript the forward slash and dot do not need to be escaped when occurring in a class. So the above regular expression is equivalent to this one:
/[\\/.]/g
Maybe this is, or is not, what you intended to match. To match several sequences of characters, you could use the | operator. This is just an example:
/\\\\|\\\/|\\\./g
... but I don't think you need this.
4. How to actually prefix with backslashes
It seems strange to me that you would want to replace a point or forward slash with a backslash. Probably you want to prefix those with a backslash. In that case make a capture group (with parentheses) and refer to it with $1 in this replace:
tex.replace(/([\\/.])/g, "\\$1");
Note again, that in the replacement string there is only one literal backslash, as the first one is an escape (see point 1 above).
why the i need it
As the question you link to says, the \ character has special meaning inside a JavaScript string literal. It represents an escape sequence.
Not getting any output data.But the match in this link
The escape sequence is processed when the string literal is parsed by the JavaScript compiler.
By the time you apply your regular expression to them, they have been consumed. The slash characters only exist in your source code, not in your data.
If you want to put a slash character in your string, then you need to write the escape sequence for it (the \\) in the source code. You can't add them back in with JavaScript afterwards.
Not sure if I understood the problem, but try this code:
var tex_form = tex.replace("/(\\)/g","\\\\");.
You need to use '(' ')' instead of '['']' to get a match for output.

Categories