Remove all non-ASCII characters from a string except smart quotes - javascript

I have this regex that removes all non-ascii characters from a string including all smart quotes:
str.replace(/[\u{0080}-\u{FFFF}]/gu,"");
But I need to keep the Smart quotes
The regex for removing Smart single quotes is: [\u2018\u2019\u201A\u201B\u2032\u2035] and for Smart double quotes is: [\u201C\u201D\u201E\u201F\u2033\u2036].
I need a combined regex that that removes all non-ASCII ([\u{0080}-\u{FFFF}]) except smart quotes ([\u2018\u2019\u201A\u201B\u2032\u2035] or [\u201C\u201D\u201E\u201F\u2033\u2036]).

Note that you need to use the \u{XXXX} notation in the regex with u modifier, and to build the regex you need you need to put the character class with exceptions into a negative lookahead placed right before your more generic pattern:
/(?![\u{2018}\u{2019}\u{201A}\u{201B}\u{2032}\u{2035}\u{201C}\u{201D}\u{201E}\u{201F}\u{2033}\u{2036}])[\u{0080}-\u{FFFF}]/gu
See the regex demo
Note that some chars in the Unicode table go one after another, so we may shorten the pattern using ranges:
/(?![\u{2018}-\u{201F}\u{2032}\u{2033}\u{2035}\u{2036}])[\u{0080}-\u{FFFF}]/gu
See this demo.

Instead of matching the non-ascii, match the ascii + the characters you need, and negate the expression. Example:
str.replace(/[^\x00-\x7F\u2018\u2019\u201A\u201B\u2032\u2035\u201C\u201D\u201E\u201F\u2033\u2036]/gu,"");

Related

RegExp lookbehind assertion alternative

Given the string below, how would you split this into an array containing only the double quoted strings (ignoring nested quoted strings) without using a lookbehind assertion?
source string: 1|2|3|"A"|"B|C"|"\"D\"|\"E\""
target array:
[
'"A"',
'"B|C"',
'"\"D\"|\"E\""'
]
Basically, I'm trying to find an alternative to /(?<!\\)".*?(?<!\\)"/g since Firefox currently doesn't support lookbehinds. The solution doesn't have to use regular expressions, but it should be reasonably efficient.
Just find all the quoted text /"[^"\\]*(?:\\[\S\s][^"\\]*)*"/g
Don't need split for this.
https://regex101.com/r/r5SJsR/1
Formatted
"
[^"\\]* # Double quoted text
(?: \\ [\S\s] [^"\\]* )*
"
How about the simple regex /"[^\\"]+"|"\S*"/g.
The first two sets ("A"' and "B|C") are covered by "[^\\"]+" - anything that is not a backslash or a quotation mark wrapped inside a set of quotation marks
A pipe (|) separates the two conditionals
The third set ("\"D\"|\"E\"") is simply covered by "\S*" - anything non-whitespace wrapped inside a set of quotation marks
This returns the same results as your initial regex, has no lookbehinds and can be seen working on Regex101 here.

Javascript regexp non greedy search for quotes

I have following text:
{{field.text || 'Čeština' | l10n}}
Regexp:
/((?!l10n))*?(['"])(.*?)\2[\s]*?\|[\s]*?l10n/g
And I am trying to replace strings before l10n with modified strings. My regexp is working fine except for this situation, where it eats ' from setLocale function.
Here is interactive regex tester with my expression - https://regex101.com/r/vX5tJ6/3
Question is, why is it eating the ' from setLocale when there is no | after (as specified in regexp)?
Maybe this is what you're looking for:
(['"])([^'"]*)\1\s*\|\s*l10n
https://regex101.com/r/lV8wV7/1
It looks for anything in single or double quotes followed by | l10n with optional spaces.
Your regex was matching a single or double quote, followed by any characters, non-greedily, then another matching quote. However, it was able to non-greedily match the enclosing quotes (so not just the last satisfying quote it encountered) without violating the rest of the pattern.
The main difference in the above pattern is that it won't allow enclosing quotes.
If you need to allow double quotes enclosed in single quotes or single quotes in double quotes, you can try the following:
(?:(')([^']*)'|(")([^"]*)")\s*\|\s*l10n
https://regex101.com/r/mL8gA6/1

what is the difference between PHP regex and javascript regex

i am working in regex my regex is /\[([^]\s]+).([^]]+)\]/g this works great in PHP for [http://sdgdssd.com fghdfhdhhd]
but when i use this regex for javascript it do not match with this input string
my input is [http://sdgdssd.com fghdfhdhhd]
In JavaScript regex, you must always escape the ] inside a character class:
\[([^\]\s]+).([^\]]+)\]
See the regex demo
JS parsed [^] as *any character including a newline in your regex, and the final character class ] symbol as a literal ].
In this regard, JS regex engine deviates from the POSIX standard where smart placement is used to match [ and ] symbols with bracketed expressions like [^][].
The ] character is treated as a literal character if it is the first character after ^: [^]abc].
In JS and Ruby, that is not working like that:
You can include an unescaped closing bracket by placing it right after the opening bracket, or right after the negating caret. []x] matches a closing bracket or an x. [^]x] matches any character that is not a closing bracket or an x. This does not work in JavaScript, which treats [] as an empty character class that always fails to match, and [^] as a negated empty character class that matches any single character. Ruby treats empty character classes as an error. So both JavaScript and Ruby require closing brackets to be escaped with a backslash to include them as literals in a character class.
Related:
(?1) regex subroutine used to shorten a PCRE pattern conversion - REGEX from PHP to JS
I would like to add this little fact about translating PHP preg_replace Regex in JavaScript .replace Regex :
<?php preg_replace("/([^0-9\,\.\-])/i";"";"-1 220 025.47 $"); ?>
Result : "-1220025.47"
with PHP, you have to use the quotes "..." around the Regex, a point comma to separate the Regex with the replacement and the brackets are used as a repetition research (witch do not mean the same thing at all.
<script>"-1 220 025.47 $".replace(/[^0-9\,\.\-]/ig,"") </script>
Result : "-1220025.47"
With JavaScript, no quotes around the Regex, a comma to separate Regex with the replacement and you have to use /g option in order to say multiple research in addition of the /i option (that's why /ig).
I hope this will be usefull to someone !
Note that the "\," may be suppressed in case of "1,000.00 $" (English ?) kind of number :
<script>"-1,220,025.47 $".replace(/[^0-9\.\-]/ig,"")</script>
<?php preg_replace("/([^0-9\.\-])/i";"";"-1,220,025.47 $"); ?>
Result : "-1220025.47"

Reg Expression to ignore limited special characters javascript

Using Jquery validator plugin in my implementation. Need a regular expression which excludes special characters like , and &.
is there any regular expression for this. also if this special characters are anywhere in the string it should find and throw the error.
You can use regular expressions like this:
[\,\&]
you can add as much as u want to this.
try it out yourself on this site:
http://www.regexr.com/
/[,&]/g
matches , and &.
Demo: https://regex101.com/r/gY0mC3/2#javascript
If you want to search for every special character except letters, numbers and the underscore, use
/\W/g
Demo: https://regex101.com/r/gY0mC3/5#javascript
If you need to include spaces (e.g. a name) use
/[^\w\s]/g
Demo: https://regex101.com/r/gY0mC3/4#javascript
The brackets [] define custom regex classes.
To match a character for only those characters, you can do [\,\&].
To match all except that, you can add a ^, such as [^\,\&].
To match any non-word character, you can use \W (any character not a-z, A-Z, 0-9, or _).
To include an underscore, you can do [\W_].
Keep in mind that whitespaces are represented by \s and that depending on your environment, you may need to escape (add an additional backslash to) your backslashes.

What is this "/\,$/"?

Tried to search for /\,$/ online, but coudnt find anything.
I have:
coords = coords.replace(/\,$/, "");
Im guessing it returns coords string index number. What I have to search online for this, so I can learn more?
/\,$/ finds the comma character (,) at the end of a string (denoted by the $) and replaces it with empty (""). You sometimes see this in regex code aiming to clean up excerpts of text.
It's a regular expression to remove a trailing comma.
That thing is a Regular Expression, also known as regex or regexp. It is a way to "match" strings using some rules. If you want to learn how to use it in JavaScript, read the Mozilla Developer Network page about RegExp.
By the way, regular expressions are also available on most languages and in some tools. It is a very useful thing to learn.
That's a regular expression that finds a comma at the end of a string. That code removes the comma.
// defines a JavaScript regular expression, used to match a pattern within a string.
\,$ is the pattern
In this case \, translates to ,. A backslash is used to escape special characters, but in this case, it's not necessary. An example where it would be necessary would be to remove trailing periods. If you tried to do that with /.$/ the period here has a different meaning; it is used as a wildcard to match [almost] any character (aside for some newlines). So in this case to match on "." (period character) you would have to escape the wildcard (/\.$/).
When $ is placed at the end of the pattern, it means only look at the end of the string. This means that you can't mistakingly find a comma anywhere in the middle of the string (e.g., not after help in help, me,), only at the end (trailing). It also speeds of the regular expression search considerably. If you wanted to match on characters only at the beginning of the string, you would start off the pattern with a carat (^), for instance /^,/ would find a comma at the start of a string if one existed.
It's also important to note that you're only removing one comma, whereas if you use the plus (+) after the comma, you'd be replacing one or more: /,+$/.
Without the +; trailing commas,, becomes trailing commas,
With the +; no trailing comma,, becomes no trailing comma

Categories