Why doesn't [.\n]+ match the string 'a\nb'? - javascript

Here is my js Regex test.
'AAa\nbBB'.match(/AA[.\n]+BB/);//failed match
I thought [.\n]+ could match any characters. Am I wrong?

The dot matches a literal dot inside a character class.
Use 'AAa\nbBB'.match(/AA[\s\S]*BB/); instead.
In most regex flavors, you could set the /s flag to allow the dot to match newlines (for a regex like /AA.*BB/s). But in JavaScript, that feature is not available, so you need to use [\s\S] to match any character.

Related

RegEx from Javascript to MySQL

I've this javascript regular expression that check if an URI is valid (RFC 3986):
/^(https?):\/\/((?:[a-z0-9.-]|%[0-9A-F]{2}){3,})(?::(\d+))?((?:\/(?:[a-z0-9-._~!$&'()*+,;=:#]|%[0-9A-F]{2})*)*)(?:\?((?:[a-z0-9-._~!$&'()*+,;=:\/?#]|%[0-9A-F]{2})*))?(?:#((?:[a-z0-9-._~!$&'()*+,;=:\/?#]|%[0-9A-F]{2})*))?$/i
Now i need to convert that in a MySQL query, using REGEXP.
Eg:
SELECT *
FROM table_name t
WHERE t.uri REGEXP '....'
Could you help me?
You need to
Double all ' chars inside '...' string literals
Replace all (?: with ( as MySQL legacy versions used a POSIX compliant regex engine that does not support non-capturing groups
Certainly remove the first / and last /i since the pattern is passed as a string in MySQL, not as a regex literal, and the pattern is case insensitive by default, no need to add i anywhere (or add A-Z manually in case some global settings are overridden)
Replace all \/ with / just to keep the regex clean
Replace \d with [0-9] (again, POSIX is not aware of shorthand character classes, although you may also use POSIX character classes, e.g. [[:digit:]] to match any digit)
Most likely, replace \? with \\?, or just use [?] to match a literal ? symbol
Always use a literal hyphen at the end or start of a character class (bracket expression in POSIX regex).
Use
WHERE t.uri REGEXP '^https?://(([A-Za-z0-9.-]|%[0-9A-Fa-f]{2}){3,})(:[0-9]+)?((/([A-Za-z0-9._~!$&''()*+,;=:#-]|%[0-9A-Fa-f]{2})*)*)([?](([A-Za-z0-9._~!$&''()*+,;=:/?#-]|%[0-9a-fA-F]{2})*))?(#(([A-Za-z0-9._~!$&''()*+,;=:/?#-]|%[0-9A-Fa-f]{2})*))?$'

RegEx pattern to not allow special character except underscore

I have a special requirement, where i need the achieve the following
No Special Character is allowed except _ in between string.
string should not start or end with _, . and numeric value.
underscore should not be allowed before or after any numeric value.
I am able to achieve most of it, but my RegEx pattern is also allowing other special characters.
How can i modify the below RegEx pattern to not allow any special character apart from underscore that to in between strings.
^[^0-9._]*[a-zA-Z0-9_]*[^0-9._]$
What you might do is use negative lookaheads to assert your requirements:
^(?![0-9._])(?!.*[0-9._]$)(?!.*\d_)(?!.*_\d)[a-zA-Z0-9_]+$
Explanation
^ Assert the start of the string
(?![0-9._]) Negative lookahead to assert that the string does not start with [0-9._]
(?!.*[0-9._]$) Negative lookahead to assert that the string does not end with [0-9._]
(?!.*\d_) Negative lookahead to assert that the string does not contain a digit followed by an underscore
(?!.*_\d) Negative lookahead to assert that the string does not contain an underscore followed by a digit
[a-zA-Z0-9_]+ Match what is specified in the character class one or more times. You can add to the character class what you would allow to match, for example also add a .
$ Assert the end of the string
Regex demo
Keep it simple. Only allow underscore and alphanumeric regex:
/^[a-zA-Z0-9_]+$/
Javascript es6 implementation (works for React):
const re = /^[a-zA-Z0-9_]+$/;
re.test(variable_to_test);
Your opening and closing sections; [^0-9._], say match ANY character other than those.
So you need to change it to be what you can match.
/^[A-Z][A-Z0-9_]*[A-Z]$/i
And since you now said one character is valid:
/^[A-Z]([A-Z0-9_]*[A-Z])?$/i

Regular Expression in JS: \\. does not match \n

I am getting a string containing newlines (/n), tabs (/t) and lowercase letters [a-z]. It is possible to do that by matching /\n|\t/. AFAIK the dot represents the wildcard.
Therefore I was wondering, why /\n|\t/ doesn't match the same things as /\\./
var text = 'test1 \ntest2';
text.split(/\n/) //['test1', 'test2']
text.split(/\./) //['test1 \ntest2']
text.split(/\\./) //['test1 \ntest2']
Shouldn't the \\. match the \n (newline)?
Let me try and answer all the points:
AFAIK the dot represents the wildcard.
No, in regex, we do not use the term "wildcard". It is a special regex (meta)character. A dot in JavaScript regex matches any character but a newline.
I was wondering, why /\n|\t/ doesn't match the same things as /\\./
Because /\n|\t/ matches 1 symbol, either a newline or tab, while the regex /\\./ matches a literal \ and a character other than a newline.
The \n and \t are escape sequences. That means that the \ is not a literal backaslash that, together with the following symbol forms a code unit, a string that cannot be written otherwise. Indeed, how can we write a line break on the paper with a pen? No way!
See more about JavaScript character escape sequences here.
Now,
text.split(/\n/) //['test1', 'test2']
True, your input string contains a line break, thus, you get two elements in the resulting array
text.split(/\./) //['test1 \ntest2']
No match was found because \. matches a literal dot. A dot that is escaped (that has a literal \ before it) in the regex stops being a special regex metacharacter, and just matches its literal representation. Your string has no dot, thus, no matches.
text.split(/\\./) //['test1 \ntest2']
Again, no match is found, as /\\./ looks for a literal \ followed by any character but a newline.
A hint: use your expressions at regex101.com, it will tell you what your regex can match on the right.
Here, with regex, you have a literal notation (/.../). In literal notation, \ is considered a literal, thus, you do not have to escape it twice. If you used a constructor notation (i.e. RegExp(....)), you would have to use double escaping. E.g.
var re = /\\./; // is equal to
var re = new RegExp("\\\\.");
See more about constructor and literal notations at MDN RegExp help page.
\n gets evaluated to a new line, so you're essentially matching against an empty string. If you do a quick console.log('\n'); you can see the output of that.

Reg Expression to ignore limited special characters javascript

Using Jquery validator plugin in my implementation. Need a regular expression which excludes special characters like , and &.
is there any regular expression for this. also if this special characters are anywhere in the string it should find and throw the error.
You can use regular expressions like this:
[\,\&]
you can add as much as u want to this.
try it out yourself on this site:
http://www.regexr.com/
/[,&]/g
matches , and &.
Demo: https://regex101.com/r/gY0mC3/2#javascript
If you want to search for every special character except letters, numbers and the underscore, use
/\W/g
Demo: https://regex101.com/r/gY0mC3/5#javascript
If you need to include spaces (e.g. a name) use
/[^\w\s]/g
Demo: https://regex101.com/r/gY0mC3/4#javascript
The brackets [] define custom regex classes.
To match a character for only those characters, you can do [\,\&].
To match all except that, you can add a ^, such as [^\,\&].
To match any non-word character, you can use \W (any character not a-z, A-Z, 0-9, or _).
To include an underscore, you can do [\W_].
Keep in mind that whitespaces are represented by \s and that depending on your environment, you may need to escape (add an additional backslash to) your backslashes.

To the last tag (already in a string) RegEx

I do not know what I am doing wrong. I have this string that I want to replace
<?xml version="1.0" encoding="utf-8" ?>
<Sections>
<Section>
I am using regex to replace everything including <Section>, and leave the rest untouched.
arrayValues[index].replace("/[([.,\n,\s])*<Section>]/", "---");
What is wrong with my regex? Doesn't this mean repalce every character, including new line and spaces, up to and including <Section> with ---?
First of all, you need to remove the quotes around your regex—if they're there, the argument won't be processed as a regex. JavaScript will see it as a string (because it is a string) and try to match it literally.
Now that that's taken care of, we can simplify your regex a bit:
arrayValues[index].replace(/[\s\S]*?<Section>/, "---");
[\s\S] gets around JavaScript's lack of an s flag (a handy option supported by most languages that enables . to match newlines). \s does match newlines (even without an s flag specified), so the character class [\s\S] tells the regex engine to match:
\s - a whitespace character, which could be a newline
OR
\S - a non-whitespace character
So you can think of [\s\S] as matching . (any character except a newline) or the literal \n (a newline). See Javascript regex multiline flag doesn't work for more.
? is used to make the initial [\s\S]* match non-greedy, so the regex engine will stop once it hits the first occurrence of <Section>.
arrayValues[index].replace("/[([.,\n,\s])*<Section>]/", "---");
What is wrong with my regex?
It's no regex, it's string literal. A string would be converted to a regex, but yours would then include the slashes. Use a regex literal instead:
arrayValues[index].replace(/[\S\s]*<Section>/, "---");
Also, you have too many unnecessary characters in it. The [] around the whole thing build a character class, which is not what you want. The capturing group () just wraps a character class which can be repeated itself. And a dot . inside a character class does match a literal dot, instead of all characters.

Categories