Regex Java vs Regex Javascript - javascript

I got a web application and an Android app in which I want to check the input.
Now I created this Regex in Java:
private static final String NAME_REGEX = "^[\\w ]+$";
if (!Pattern.matches(NAME_REGEX, name)) {
mNameView.setError(getString(R.string.error_field_noname));
focusView = mNameView;
cancel = true;
}
In JavaScript I want to test the same so I used:
var re = /^[\w ]+$/;
if (!re.test(company)) {
...
}
Everything works fine except that the Java version accepts the characters ä,ö,ü, ó, á (...) and the JavaScript version won't.
Don't know where's the difference between the code mentioned above?
In the end the most important thing is that both (JavaScript and Java) work exactly the same.
Goal:
Get a regex for Javascript that is exactly the same as in Java (^[\\w ]+$)

Please use following regular expression.
var re=^[äöüß]*$
The above regular expression will allow these characters also.
If you want to use special characters and alphabets use the below one
var re=^[A-Za-z0-9!##$%^&*äöüß()_]*$

Try this : /^[\wäöüß ]+$/i.
Please note the modifier i for "case insensitive", or it will not match ÄÖÜ.
These languages uses different engines to read the RegExp. Java supports unicode better than JavaScript does.
See : https://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines#Part_2

So state of art is that one must use a library to get the same results in javascript as in java.
As this isn't a real solution for me I simply use this in JavaScript:
var re = /^[A-Za-z0-9_öäüÖÄÜß ]+$/;
and this one in Java:
private static final String NAME_REGEX = "^[A-Za-z0-9_öäüÖÄÜß ]+$";
So this seems to be the exact same in both environments.
Thanks for the help!

Related

How to convert a .NET URL regex to a Javascript URL regex?

I am using in .Net the [Url(UrlOptions.DisallowProtocol)] data annotation attribute which checks URL regex (no mandatory for https/http or www).
The code of this attribute looks like this:
string const regex = new RegExp('^((https?|ftp):\/\/)?(((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:)*#)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|([a-zA-Z][\-a-zA-Z0-9]*)|((([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)+(\/(([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)*)*)?)?(\?((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)|\/|\?)*)?$');
I want to convert it to JS validation but facing lots of difficulties because this is a long validation.
Is there any tool or any easy way to convert this regex to work in JS?
The regular expression seems to work in Javascript to some extent without any modification - see regex101 demo. Just remember to escape all backslashes (\\ in place of \) and single quotes ('' instead of ') if defining it in a single-quoted Javascript string:
var jsRegex = new RegExp('^((https?|ftp):\\/\\/)?(((([a-zA-Z]|\\d|-|\\.|_|~|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])|(%[\\da-fA-F]{2})|[!\\$&''\\(\\)\\*\\+,;=]|:)*#)?(((\\d|[1-9]\\d|1\\d\\d|2[0-4]\\d|25[0-5])\\.(\\d|[1-9]\\d|1\\d\\d|2[0-4]\\d|25[0-5])\\.(\\d|[1-9]\\d|1\\d\\d|2[0-4]\\d|25[0-5])\\.(\\d|[1-9]\\d|1\\d\\d|2[0-4]\\d|25[0-5]))|([a-zA-Z][\\-a-zA-Z0-9]*)|((([a-zA-Z]|\\d|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])|(([a-zA-Z]|\\d|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])([a-zA-Z]|\\d|-|\\.|_|~|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])*([a-zA-Z]|\\d|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])))\\.)+(([a-zA-Z]|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])|(([a-zA-Z]|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])([a-zA-Z]|\\d|-|\\.|_|~|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])*([a-zA-Z]|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])))\\.?)(:\\d*)?)(\\/((([a-zA-Z]|\\d|-|\\.|_|~|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])|(%[\\da-fA-F]{2})|[!\\$&''\\(\\)\\*\\+,;=]|:|#)+(\\/(([a-zA-Z]|\\d|-|\\.|_|~|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])|(%[\\da-fA-F]{2})|[!\\$&''\\(\\)\\*\\+,;=]|:|#)*)*)?)?(\\?((([a-zA-Z]|\\d|-|\\.|_|~|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])|(%[\\da-fA-F]{2})|[!\\$&''\\(\\)\\*\\+,;=]|:|#)|[\\uE000-\\uF8FF]|\\/|\\?)*)?(\\#((([a-zA-Z]|\\d|-|\\.|_|~|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])|(%[\\da-fA-F]{2})|[!\\$&''\\(\\)\\*\\+,;=]|:|#)|\\/|\\?)*)?$', 'i');
// ...or...
var jsRegex = /^((https?|ftp):\/\/)?(((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:)*#)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|([a-zA-Z][\-a-zA-Z0-9]*)|((([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)+(\/(([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)*)*)?)?(\?((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)|\/|\?)*)?$/i;
Note: The i modifier is assuming this needs case-insensitive matching.
(If the above isn't sufficient, please be more specific about what isn't working.)
Is there any tool or any easy way to convert this regex to work in JS?
The only generic tool I'm aware of that supposedly converts between regex flavors is RegexBuddy - but it is paid for software (€29.95) - although if for any reason it didn't work you could get a refund.
var YourRegEx = #"^((https?|ftp):\/\/)?(((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:)*#)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|([a-zA-Z][\-a-zA-Z0-9]*)|((([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)+(\/(([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)*)*)?)?(\?((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)|\/|\?)*)?$";
var Ismatch = Regex.Match(input, YourRegEx , RegexOptions.IgnoreCase);
if (Ismatch .Success)
{
// does match
}
You can Try this as will put your regex in #""

Parsing Phrases with a Pipe Character Using JavaScript

I've been working on my Safari extension for saving content to Instapaper and have been working on enhancing my title parsing for bookmarks. For example, an article that I recently saved has a tag that looks like this:
Report: Bing Users Disproportionately Affected By Malware Redirects | TechCrunch
I want to use the JavaScript in my Safari extension to remove all of the text after the pipe character so that I can make the final bookmark look neater once it is saved to Instapaper.
I've attempted the title parsing successfully in a couple of similar cases using blocks of code that look like this:
if(safari.application.activeBrowserWindow.activeTab.title.search(' - ') != -1) {
console.log(safari.application.activeBrowserWindow.activeTab.title);
console.log(safari.application.activeBrowserWindow.activeTab.title.search(' - '));
var parsedTitle = safari.application.activeBrowserWindow.activeTab.title.substring(0, safari.application.activeBrowserWindow.activeTab.title.search(' - '));
console.log(parsedTitle);
};
I started getting thrown for a loop once I tried doing this same thing with the pipe character; however, since JavaScript uses it as a special character. I've tried several bits of code to try and solve this problem. The most recent looks like this (attempting to use regular expressions and escape the pipe character):
if(safari.application.activeBrowserWindow.activeTab.title.search('/\|') != -1) {
console.log(safari.application.activeBrowserWindow.activeTab.title);
console.log(safari.application.activeBrowserWindow.activeTab.title.search('/\|'));
var parsedTitle = safari.application.activeBrowserWindow.activeTab.title.substring(0, safari.application.activeBrowserWindow.activeTab.title.search('/\|'));
console.log(parsedTitle);
};
If anybody could give me a tip that works for this, your help would be greatly appreciated!
Your regex is malformed. It should be:
safari.application.activeBrowserWindow.activeTab.title.search(/\|/)
Note the lack of quotes; I'm using a regex literal here. Also, regex literals need to be bound by /.
Instead of searching and then replacing, you can simply do a replace with the following regex:
str = str.replace(/\|.*$/, "");
This will remove everything after the | character if it exists.

XRegExp to replace Unicode characters in IE

I developed a javascript function to clean a range of Unicode characters. For example, "ñeóñú a1.txt" => "neonu a1.txt". For this, I used a regular expression:
var = new RegExp patternA ("[\\u0300-\\u036F]", "g");
name = name.replace (patternA,'');
But it does not work properly in IE. If my research is correct, IE does not detect Unicode in the same way. I'm trying to make an equivalent function using the library XRegExp (http://xregexp.com/), which is compatible with all browsers, but I don't know how to write the Unicode pattern so XRegExp works in IE.
One of the failed attemps:
XRegExp.replace(name,'\\u0300-\\u036F','');
How can I build this pattern?
The value provided as the XRegExp.replace method's second argument should be a regular expression object, not a string. The regex can be built by the XRegExp or the native RegExp constructor. Thus, the following two lines are equivalent:
name = name.replace(/[\u0300-\u036F]/g, '');
// Is equivalent to:
name = XRegExp.replace(name, /[\u0300-\u036F]/g, '');
The following line you wrote, however, is not valid:
var = new RegExp patternA ("[\\u0300-\\u036F]", "g");
Instead, it should be:
var patternA = new RegExp ("[\\u0300-\\u036F]", "g");
I don't know if that is the source of your problem, but perhaps. For the record, IE's Unicode support is as good or better than other browsers.
XRegExp can let you identify your block by name, rather than using magic numbers. XRegExp('[\\u0300-\\u036F]') and XRegExp('\\p{InCombiningDiacriticalMarks}') are exactly equivalent. However, the marks in that block are a small subset of all combining marks. You might actually want to match something like XRegExp('\\p{M}'). However, note that simply removing marks like you're doing is not a safe way to remove diacritics. Generally, what you're trying to do is a bad idea and should be avoided, since it will often lead to wrong or unintelligible results.

Matching regular expression string in Javascript

Does anyone know how to find regular expression string from javascript code?
e.g.
var pattern = /some regular expression/;
Is it possible to to with regular expression :) ?
If I got your question right, and you need a regular expression which would find all the regular expressions in a JavaScript program, then I don't think it is possible. A regular expression in JavaScript does not have to use the // syntax, it can be defined as a string. Even a full-blown JavaScript parser would not be smart enough to detect a regular expression here, for instance:
var re = "abcde";
var regexClass = function() { return RegExp; }
var regex = new regexClass()(re);
So I would give up this idea unless you want to cover only a few very basic cases.
You want a regex to match a regex? Crazy. This might cover the simplest cases.
new RegExp("\/.+\/")
However, I peeked into the Javascript Textmate bundle and is has 2 regex for finding a regex start and end.
begin = '(?<=[=(:]|^|return)\s*(/)(?![/*+{}?])'
end = '(/)[igm]*';
Which you could probably use as inspiration for toward your goal.
Thanks for answers I have found also that it is nearly impossible task to do, but here is my regex which parses source code just fine:
this.mainPattern = new RegExp(//single line comment
"(?://.*$)|"+
//multiline comment
"(/\\*.*?($|\\*/))"+
//single or double quote strings
"|(?:(?:\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\")|(?:'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'))"+
//regular expression literal in javascript code
"|(?:(?:[/].+[/])[img]?[\\s]?(?=[;]|[,]|[)]))"+
//brackets
"|([{]|[(]|[\[])|([}]|[)]|[\\]])", 'g');

Invalid regular expression error

I'm trying to retrieve the category part this string "property_id=516&category=featured-properties", so the result should be "featured-properties", and I came up with a regular expression and tested it on this website http://gskinner.com/RegExr/, and it worked as expected, but when I added the regular expression to my javascript code, I had a "Invalid regular expression" error, can anyone tell me what is messing up this code?
Thanks!
var url = "property_id=516&category=featured-properties"
var urlRE = url.match('(?<=(category=))[a-z-]+');
alert(urlRE[0]);
Positive lookbehinds (your ?<=) are not supported in JavaScript environments that do not comply with ECMAScript 2018 standard, which is causing your RegEx to fail.
You can mimic them in a whole bunch of different ways, but this might be a simpler RegEx to get the job done for you:
var url = "property_id=516&category=featured-properties"
var urlRE = url.match(/category=([^&]+)/);
// urlRE => ["category=featured-properties","featured-properties"]
// urlRE[1] => "featured-properties"
That's a super-simple example, but searching StackOverflow for a RegEx pattern to parse URL parameters will turn up more robust examples if you need them.
The syntax is messing up your code.
var urlRE = url.match(/category=([a-z-]+)/);
alert(urlRE[1]);
If you want to parse URL parameters, you can use the getParameterByName() function from this site:
http://james.padolsey.com/javascript/bujs-1-getparameterbyname/
In any case, as already mentioned, regular expressions in JavaScript are not plain strings:
https://developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions
var url = "property_id=516&category=featured-properties",
urlRE = url.match(/(category=)([a-z-]+)/i); //don't forget i if you want to match also uppercase letters in category "property_id=516&category=Featured-Properties"
//urlRE = url.match(/(?<=(category=))[a-z-]+/i); //this is a mess
alert(urlRE[2]);

Categories