One of my SQL like statement is breaking when user enters Chinese characters. I did some research and couldn't find any.
Is there any code available like a plugin or javascript function to validate and allow only english alphanumeric and allow special symbols.
If i validate Chinese characters there can be other languages right , so i thought to allow only english characters with all combination of numbers and symbols.
any direction or input will be appreciated thanks
I tried something like this but this allows only alphanumeric
/[^a-zA-Z 0-9]+/g
How about UTF8 encoding the text rather than ban languages?
Related
This question already has an answer here:
html password regular expression validation
(1 answer)
Closed 5 years ago.
I was wondering if I have a form and the form contain some inputs that I want the user to be only able to submit a type of inputs I select , Like if I want to make sure that the password contain at least a CAPITAL letter , a number , a symbol and at least 8 letters , How to make sure even if the Javascript is disabled by the user?
Brief
You'll want to minimalize the checking on the client-side. Any checking done at this point is pretty useless when security and/or validation is concerned. I would suggest doing a simple validation (such as minimum length) but nothing else as any method you try client-side can easily be circumvented.
Doing all your validation server-side prevents users from editing client-side code or disabling JavaScript to prevent validation. As an added bonus, if you do everything server-side (and use minimal validation client-side) it increases maintainability since you're only defining your patterns once and you don't have to worry about compatibility across multiple regex engines (which is a pain).
For example, character classes (such as \p{L}) allow you to specify groups of Unicode characters. These are fantastic when you're talking about ensuring your program works well with multiple languages (i.e. French and the inclusion of characters such as é), but they're not available in HTML or JavaScript!
You should:
Define the pattern once (coders don't like duplication)
Do the validation server-side (forget about true validation client-side, anything you implement at this step can easily be bypassed). KISS
When you're talking about password validation don't limit the characters to specific ranges (as your pattern would client-side using something like [A-Z]). You may think this increases password strength, but it may actually do exactly the opposite. Instead, allow users to use special characters as well (it's simple but using Ä is more secure than A).
Code
Client-Side
(?=.*[A-Z])(?=.*\d)(?=.*[^\w_]).{8,}
Although, honestly, I'd suggest simply using .{8,} and doing the checks solely on the server-side.
<form action="">
<input type="text" pattern="(?=.*[A-Z])(?=.*\d)(?=.*[^\w_]).{8,}" title="Must contain at least one uppercase letter, number and symbol, and at least 8 or more characters"/>
<input type="submit"/>
</form>
Server-Side
See regex in use here
^(?=.*\p{Lu})(?=.*\p{N})(?=.*[^\p{L}\p{N}\p{C}]).{8,}$
Usage
Where $str in the code below is the submitted password
$re = '^(?=.*\p{Lu})(?=.*\p{N})(?=.*[^\p{L}\p{N}\p{C}]).{8,}$';
if(preg_match($re, $str)) {
// Valid password
} else {
// Invalid password - provide user feedback and allow them to try again
}
Explanation
The HTML regex is just a simpler variation of the regex below (without using Unicode classes). I would, once again, suggest using .{8,} for the pattern in HTML and let PHP do the actual password validation.
^ Assert position at the start of the line
(?=.*\p{Lu}) Positive lookahead ensuring at least one uppercase Unicode character exists
(?=.*\p{N}) Positive lookahead ensuring at least one Unicode number exists
(?=.*[^\p{L}\p{N}\p{C}]) Positive lookahead ensuring at least one character that isn't a letter, number, or control character exists (includes punctuation, symbols, separators, marks)
.{8,} Match any character 8 or more times
$ Assert position at the end of the line
This is not simple to answer as it is written but here is the idea.
First check client-side using javascript, match it against the desired pattern before allowing submit. There are a handfull of libraries out there if you dont want to puzzle it out yourself.
Second, and to satisfy the no javascript issue, check server-side. The user may have gotten past your form with faulty data but a server-side check will ensure that it matches what you like before you actually make a change to your database.
I have a dilemma here. I am trying to write a regex pattern that matches all alpha characters for eastern languages as well as western languages. One of the criteria is that no numbers can match (so José13) is not a match but (José) is, the other criteria is that special characters cannot match (ie: !##$% etc.)
I've played around with this in chrome's console, and I've gotten:
"a".match('[a-zA-z]');
to come back successfully, when I put in:
"a".match('[\p{L}]');
I get a null response, which I'm not quite understanding why. According to http://www.regular-expressions.info/unicode.html \p{L} is a match for any letter.
EDIT: the \p doesn't seem to work in my chrome console, so I'll try a different route. I have a chart of the unicode from Unifoundry. I'll match up the regex and attempt to make the range of characters invalid.
Any input would be greatly appreciated.
This works in the javascript console, but it seems like a hack:
.match('^[^\u0000-\u0040\u005B-\u0060\u007B-\u00BF\u00D7\u00F7]*');
However it does what I need it to do.
Referenced this post on SO: Javascript + Unicode regexes
Current Javascript implementations don't support such shortcuts, but you can specify a range, for example:
/[\u4E00-\u9FFF]+/g.test("漢字")
i need to validate a field for empty. But it should allow English and the Foreign languages characters(UTF-8) but not the special characters. I'm not good at Regex. So any help on this would be great...
If you want to support a wide range of languages, you'll have to work by excluding only the characters you don't want, since specifying all of the ranges you do want will be difficult.
You'll need to look at the list of Unicode blocks and or the character database to identify the blocks you want to exclude (like, for instance, U+0000 through U+001F. This Wikipedia article may also help.
Then use a regular expression with character classes to look for what you want to exclude.
For example, this will check for the U+0000 through U+001F and the U+007F characters (obviously you'll be excluding more than just these):
if (/[\u0000-\u001F\u007F]/.exec(theString)) {
// Contains at least one invalid character
}
The [] identify a "character class" (list and/or range of characters to look for). That particular one says look for \u0000 through \u001F (inclusive) as well as \u007F.
It would have been nice if I could say "Just do /^\w+$/.test(word)", but...
See this answer for the current state of unicode support (or rather lack of) in JavaScript regular expressions.
You can either use the library he suggests, which might be slow or enlist the help of the server for this (which might be slower).
You can test for a unicode letter like this:
str.match(/\p{L}/u)
Or for the existence of a non-letter like this:
str.match(/[^\p{L}]/u)
i need a regex for all alphabets. I have an input and target text. Both of them can be belong different alphabets. I mean they can be belong chinese, latin, cyrillic and any others alphabet.
I need a regex for multi language input and multi language target text.
Is there anybody has any idea about this? How can i write this regex ?
I will use this with javascript. But i think there should be common regex for java and javascript also for this problem.
If you are in Java (not in javascript!) you can use unicode properties, e.g.
\P{L} any kind of letter from any language.
See regular-expressions.info/unicode for more informations.
For Javascript:
There is a lib from XRegExp and some plugins XRegExp Unicode plugins that extends the javasript regex features. That adds support for Unicode categories, scripts, and blocks.
With those libs you would be able to use \p{L} with javascript.
See my answer to this question for a small example
Some regex engines support special character for all Unicode letters:
\p{L}
Or you can use \w - letter, digit, underscore
i use "|" this character as a separator, so it is speacial for me. Key can be any character except of "|". it solve my problems thanks for answers. And it can be used with javascript, java and groovy. I tested it, worked.
var keyPrefix ="\\|[\u0000-\u007B\u007D-\uFFEF]*";
var keySuffix = "[\u0000-\u007B\u007D-\uFFEF]*\\|";
var searchkey = keyPrefix + key.toLowerCase() + keySuffix;
I am doing internationalization in Struts. I want to write Javascript validation for Japanese and English users. I know regular expression for English but not for Japanese users. Is it possible to write one regular expression for both the users which validate on the basis of Unicode?
Please help me.
Here is a regular expression that can be used to match all English alphanumeric characters, Japanese katakana, hiragana, multibytes of alphanumerics (hankaku and zenkaku), and dashes:
/[一-龠]+|[ぁ-ゔ]+|[ァ-ヴー]+|[a-zA-Z0-9]+|[a-zA-Z0-9]+|[々〆〤ヶ]+/u
You can edit it to fit your needs, but notice the "u" flag at the end.
Provided your text editor and programming language support Unicode, you should be able to enter Japanese characters as literal strings. Things like [A-X] ranges will probably not translate very well in general.
What kind of text are you trying to validate?
What language are the regular experssions in? Perl-compatible, POSIX, or something else?