XRegExp to replace Unicode characters in IE - javascript

I developed a javascript function to clean a range of Unicode characters. For example, "ñeóñú a1.txt" => "neonu a1.txt". For this, I used a regular expression:
var = new RegExp patternA ("[\\u0300-\\u036F]", "g");
name = name.replace (patternA,'');
But it does not work properly in IE. If my research is correct, IE does not detect Unicode in the same way. I'm trying to make an equivalent function using the library XRegExp (http://xregexp.com/), which is compatible with all browsers, but I don't know how to write the Unicode pattern so XRegExp works in IE.
One of the failed attemps:
XRegExp.replace(name,'\\u0300-\\u036F','');
How can I build this pattern?

The value provided as the XRegExp.replace method's second argument should be a regular expression object, not a string. The regex can be built by the XRegExp or the native RegExp constructor. Thus, the following two lines are equivalent:
name = name.replace(/[\u0300-\u036F]/g, '');
// Is equivalent to:
name = XRegExp.replace(name, /[\u0300-\u036F]/g, '');
The following line you wrote, however, is not valid:
var = new RegExp patternA ("[\\u0300-\\u036F]", "g");
Instead, it should be:
var patternA = new RegExp ("[\\u0300-\\u036F]", "g");
I don't know if that is the source of your problem, but perhaps. For the record, IE's Unicode support is as good or better than other browsers.
XRegExp can let you identify your block by name, rather than using magic numbers. XRegExp('[\\u0300-\\u036F]') and XRegExp('\\p{InCombiningDiacriticalMarks}') are exactly equivalent. However, the marks in that block are a small subset of all combining marks. You might actually want to match something like XRegExp('\\p{M}'). However, note that simply removing marks like you're doing is not a safe way to remove diacritics. Generally, what you're trying to do is a bad idea and should be avoided, since it will often lead to wrong or unintelligible results.

Related

Get only one character case insensitive in a globally case sensitive RegExp in JavaScript [duplicate]

I am using Nodejs to build application in which I need to process certain strings I have used the JS "RegExp" object for this purpose.
I want only a part of my string in the regex to be case insensitive
var key = '(?i)c(?-i)ustomParam';
var find = '\{(\\b' + key +'\\b:?.*?)\}';
var regex = new RegExp(find,"g");
But it breaks with following error
SyntaxError: Invalid regular expression: /{(\b(?i)c(?-i)ustomParam\b:?.*?)}/
I will get the key from some external source like redis and the string to be matched from some other external source , I want that the first alphabet should be case-Insensitive and the rest of alphabets to be case-Sensitive.
When I get the key from external source I will append the (?i) before the first alphabet and (?-i) after the first alphabet.
I even tried this just for starters sake, but that also didn't work
var key ='customParam';
var find = '(?i)\{(\\b' + key +'\\b:?.*?)\}(?-i)';
var regex = new RegExp(find,"g");
I know I can use "i" flags instead of above ,but that's not my use case. I did it just to check.
JavaScript built-in RegExp does not support inline modifiers, like (?im), let alone inline modifier groups that can be placed anywhere inside the pattern (like (?i:....)).
Even XRegExp cannot offer this functionality, you can only use (?i) on the whole pattern declaring it at the pattern beginning.
In XRegExp, you can define the regex ONLY as
var regex = XRegExp('(?i)\\{(\\b' + key +'\\b:?.*?)\\}', 'g');
On May 27, 2020, still neither JavaScript native RegExp, nor XRegExp patterns support inline modifier groups (i.e. (?i:...)), nor placing them in any part of the pattern (as far as XRegExp is concerned).

How to make this youtube videoId parsing RegExp work in JS?

I'm trying to use this great RegEx presented here for grabbing a video id from any youtube type url:
parse youtube video id using preg_match
// getting our youtube url from an input field.
var yt_url = $('#yt_url').val();
var regexp = new RegExp('%(?:youtube(?:-nocookie)?\\.com/(?:[^/]+/.+/|(?:v|e(?:mbed)?)/|.*[?&]v=)|youtu\\.be/)([^"&?/ ]{11})%','i');
var videoId = yt_url.match( regexp ) ;
console.log('vid: '+videoId);
My console is always giving me a null videoId though. Am I incorrectly escaping something in my regexp var? I added the a second backslash to escape the single backslashes already.
Scratching my head?
% are delimiters for the PHP you got the link from, Javascript does not expect delimiters when using new RegExp(). Also, it looks like \\. should probably be replaced with \. Try:
var regexp = new RegExp('(?:youtube(?:-nocookie)?\.com/(?:[^/]+/.+/|(?:v|e(?:mbed)?)/|.*[?&]v=)|youtu\.be/)([^"&?/ ]{11})','i');
Also, you can create a regular expression literally by using Javascript's /.../ delimiters, but then you'll need to escape all of your /s:
var regexp = /(?:youtube(?:-nocookie)?\.com\/(?:[^/]+\/.+\/|(?:v|e(?:mbed)?)\/|.*[?&]v=)|youtu\\.be\/)([^"&?\/ ]{11})/i;
Documentation
Update:
A quick update to address the comment on efficiency for literal expressions (/ab+c/) vs. constructors (new RegExp("ab+c")). The documentation says:
Regular expression literals provide compilation of the regular expression when the script is loaded. When the regular expression will remain constant, use this for better performance.
And:
Using the constructor function provides runtime compilation of the regular expression. Use the constructor function when you know the regular expression pattern will be changing, or you don't know the pattern and are getting it from another source, such as user input.
Since your expression will always be static, I would say creating it literally (the second example) would be slightly faster since it is compiled when loaded (however, don't confuse this into thinking it won't be creating a RegExp object). This small difference is confirmed with a quick benchmark test.

How to add special characters like & > in XML file using JavaScript

I am generating XML using Javascript. It works fine if there are no special characters in the XML. Otherwise, it will generate this message: "invalid xml".
I tried to replace some special characters, like:
xmlData=xmlData.replaceAll(">",">");
xmlData=xmlData.replaceAll("&","&");
//but it doesn't work.
For example:
<category label='ARR Builders & Developers'>
Thanks.
Consider generating the XML using DOM methods. For example:
var c = document.createElement("category");
c.setAttribute("label", "ARR Builders & Developers");
var s = new XMLSerializer().serializeToString(c);
s; // => "<category label=\"ARR Builder & Developers\"></category>"
This strategy should avoid the XML entity escaping problems you mention but might have some cross-browser issues.
This will do the replacement in JavaScript:
xml = xml.replace(/</g, "<");
xml = xml.replace(/>/g, ">");
This uses regular expression literals to replace all less than and greater than symbols with their escaped equivalent.
JavaScript comes with a powerful replace() method for string objects.
In general - and basic - terms, it works this way:
var myString = yourString.replace([regular expression or simple string], [replacement string]);
The first argument to .replace() method is the portion of the original string that you wish to replace. It can be represented by either a plain string object (even literal) or a regular expression.
The regular expression is obviously the most powerful way to select a substring.
The second argument is the string object (even literal) that you want to provide as a replacement.
In your case, the replacement operation should look as follows:
xmlData=xmlData.replace(/&/g,"&");
xmlData=xmlData.replace(/>/g,">");
//this time it should work.
Notice the first replacement operation is the ampersand, as if you should try to replace it later you would screw up pre-existing well-quoted entities for sure, just as "&gt;".
In addition, pay attention to the regex 'g' flag, as with it the replacement will take place all throughout your text, not only on the first match.
I used regular expressions, but for simple replacements like these also plain strings would be a perfect fit.
You can find a complete reference for String.replace() here.

Matching regular expression string in Javascript

Does anyone know how to find regular expression string from javascript code?
e.g.
var pattern = /some regular expression/;
Is it possible to to with regular expression :) ?
If I got your question right, and you need a regular expression which would find all the regular expressions in a JavaScript program, then I don't think it is possible. A regular expression in JavaScript does not have to use the // syntax, it can be defined as a string. Even a full-blown JavaScript parser would not be smart enough to detect a regular expression here, for instance:
var re = "abcde";
var regexClass = function() { return RegExp; }
var regex = new regexClass()(re);
So I would give up this idea unless you want to cover only a few very basic cases.
You want a regex to match a regex? Crazy. This might cover the simplest cases.
new RegExp("\/.+\/")
However, I peeked into the Javascript Textmate bundle and is has 2 regex for finding a regex start and end.
begin = '(?<=[=(:]|^|return)\s*(/)(?![/*+{}?])'
end = '(/)[igm]*';
Which you could probably use as inspiration for toward your goal.
Thanks for answers I have found also that it is nearly impossible task to do, but here is my regex which parses source code just fine:
this.mainPattern = new RegExp(//single line comment
"(?://.*$)|"+
//multiline comment
"(/\\*.*?($|\\*/))"+
//single or double quote strings
"|(?:(?:\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\")|(?:'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'))"+
//regular expression literal in javascript code
"|(?:(?:[/].+[/])[img]?[\\s]?(?=[;]|[,]|[)]))"+
//brackets
"|([{]|[(]|[\[])|([}]|[)]|[\\]])", 'g');

Why is my RegExp construction not accepted by JavaScript?

I'm using a RegExp to validate some user input on an ASP.NET web page. It's meant to enforce the construction of a password (i.e. between 8 and 20 long, at least one upper case character, at least one lower case character, at least one number, at least one of the characters ##!$% and no use of letters L or O (upper or lower) or numbers 0 and 1. This RegExp works fine in my tester (Expresso) and in my C# code.
This is how it looks:
(?-i)^(?=.{8,20})(?=.*[2-9])(?=.*[a-hj-km-np-z])(?=.*[A-HJ-KM-NP-Z])
(?=.*[##!$%])[2-9a-hj-km-np-zA-HJ-KM-NP-Z##!$%]*$
(Line break added for formatting)
However, when I run the code it lives in in IE6 or IE7 (haven't tried other browsers as this is an internal app and we're a Microsoft shop), I get a runtime error saying 'Syntax error in regular expression'. That's it - no further information in the error message aside from the line number.
What is it about this that JavaScript doesn't like?
Well, there are two ways of defining a Regex in Javascript:
a. Through a Regexp object constructor:
var re = new RegExp("pattern","flags");
re.test(myTestString);
b. Using a string literal:
var re = /pattern/flags;
You should also note that JS does not support some of the tenets of Regular Expressions. For a non-comprehensive list of features unsupported in JS, check out the regular-expressions.info site.
Specifically speaking, you appear to be setting some flags on the expression (for example, the case insensitive flag). I would suggest that you use the /i flag (as indicated by the syntax above) instead of using (?-i)
That would make your Regex as follows (Positive Lookahead appears to be supported):
/^(?=.{8,20})(?=.*[2-9])(?=.*[a-hj-km-np-z])(?=.*[A-HJ-KM-NP-Z])(?=.*[##!$%])[2-9a-hj-km-np-zA-HJ-KM-NP-Z##!$%]*$/i;
For a very good article on the subject, check out Regular Expressions in JavaScript.
Edit (after Howard's comment)
If you are simply assigning this Regex pattern to a RegularExpressionValidator control, then you will not have the ability to set Regex options (such as ignore case). Also, you will not be able to use the Regex literal syntax supported by Javascript. Therefore, the only option that remains is to make your pattern intrinsically case insensitive. For example, [a-h] would have to be written as [A-Ha-h]. This would make your Regex quite long-winded, I'm sorry to say.
Here is a solution to this problem, though I cannot vouch for it's legitimacy. Some other options that come to mind may be to turn of Client side validation altogether and validate exclusively on the Server. This will give you access to the full Regex flavour implemented by the System.Text.RegularExpressions.Regex object. Alternatively, use a CustomValidator and create your own JS function which applies the Regex match using the patterns that I (and others) have suggested.
I'm not familiar with C#'s regular expression syntax, but is this (at the start)
(?-i)
meant to turn the case insensitivity pattern modifier on? If so, that's your problem. Javascript doesn't support specifying the pattern modifiers in the expression. There's two ways to do this in javascript
var re = /pattern/i
var re = new RegExp('pattern','i');
Give one of those a try, and your expression should be happy.
As Cerberus mentions, (?-i) is not supported in JavaScript regexps. So, you need to get rid of that and use /i. Something to keep in mind is that there is no standard for regular expression syntax; it is different in each language, so testing in something that uses the .NET regular expression engine is not a valid test of how it will work in JavaScript. Instead, try and look for a reference on JavaScript regular expressions, such as this one.
Your match that looks for 8-20 characters is also invalid. This will ensure that there are at least 8 characters, but it does not limit the string to 20, since the character class with the kleene-closure (* operator) at the end can match as many characters as provided. What you want instead is to replace the * at the end with the {8,20}, and eliminate it from the beginning.
var re = /^(?=.*[2-9])(?=.*[a-hj-km-np-z])(?=.*[A-HJ-KM-NP-Z])(?=.*[##!$%])[2-9a-hj-km-np-zA-HJ-KM-NP-Z##!$%]{8,20}$/i;
On the other hand, I'm not really sure why you would want to restrict the length of passwords, unless there's a hard database limit (which there shouldn't be, since you shouldn't be storing passwords in plain text in the database, but instead hashing them down to something fixed size using a secure hash algorithm with a salt). And as mentioned, I don't see a reason to be so restrictive on the set of characters you allow. I'd recommend something more like this:
var re = /^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[##!$%])[a-zA-Z0-9##!$%]{8,}$/i;
Also, why would you forbid 1, 0, L and O from your passwords (and it looks like you're trying to forbid I as well, which you forgot to mention)? This will make it very hard for people to construct good passwords, and since you never see a password as you type it, there's no reason to worry about letters which look confusingly similar. If you want to have a more permissive regexp:
var re = /^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[##!$%]).{8,}$/i;
Are you enclosing the regexp in / / characters?
var regexp = /[]/;
return regexp.test();
(?-i)
Doesn't exist in JS Regexp. Flags can be specified as “new RegExp('pattern', 'i')”, or literal syntax “/pattern/i”.
(?=
Exists in modern implementations of JS Regexp, but is dangerously buggy in IE. Lookahead assertions should be avoided in JS for this reason.
between 8 and 20 long, at least one upper case character, at least one lower case character, at least one number, at least one of the characters ##!$% and no use of letters L or O (upper or lower) or numbers 0 and 1.
Do you have to do this in RegExp, and do you have to put all the conditions in one RegExp? Because those are easy conditions to match using multiple RegExps, or even simple string matching:
if (
s.length<8 || s.length>20 ||
s==s.toLowerCase() || s==s.toUpperCase() ||
s.indexOf('0')!=-1 || s.indexOf('1')!=-1 ||
s.toLowerCase().indexOf('l')!=-1 || s.toLowerCase().indexOf('o')!=-1 ||
(s.indexOf('#')==-1 && s.indexOf('#')==-1 && s.indexOf('!')==-1 && s.indexOf('%')==-1 && s.indexOf('%')==-1)
)
alert('Bad password!');
(These are really cruel and unhelpful password rules if meant for end-users BTW!)
I would use this regular expression:
/(?=[^2-9]*[2-9])(?=[^a-hj-km-np-z]*[a-hj-km-np-z])(?=[^A-HJ-KM-NP-Z]*[A-HJ-KM-NP-Z])(?=[^##!$%]*[##!$%])^[2-9a-hj-km-np-zA-HJ-KM-NP-Z##!$%]{8,}$/
The [^a-z]*[a-z] will make sure that the match is made as early as possible instead of expanding the .* and doing backtracking.
(?-i) is supposed to turn case-insensitivity off. Everybody seems to be assuming you're trying to turn it on, but that would be (?i). Anyway, you don't want it to be case-insensitive, since you need to ensure that there are both uppercase and lowercase letters. Since case-sensitive matching is the default, prefacing a regex with (?-i) is pointless even in those flavors (like .NET) that support inline modifiers.

Categories