Cannot always successfully use String.startsWith() when string contains unicode characters - javascript

So I've got a proof-of-concept conjugation practice application I'm building, with Vue.js. One of the key elements is that when you type in an answer to a conjugation, it compares the input text with String.startswith(). This is great until the string has unicode characters. It seems that almost always the unicode characters that you input are different than the ones in the database. You can actually visually see in this node CLI example that the version I type in the "ț" character is literally a different character than the one in the database "ţ".
Here is an output of the typed input, it's value and unicode value vs. the comparison:
input: anunț // anun\u21B
comparison: anunţ // anun\u163
I've tried things like .normalize() but it doesn't seem to affect either the inputted string, or the comparison string.
> var input = 'anunț'
> var comparison = 'anunţ'
> input === comparison
false
> input.normalize() === comparison
false
> input.normalize() === comparison.normalize()
false
> input === comparison.normalize()
false
/// etc etc with NFC, NFD, NFKC, NFKD forms
> input.normalize()
'anunț'
> comparison.normalize()
'anunţ'
// i've also tried .normalize() with the string decoded into unicode
I've tried converting to unicode and manually replacing one set of strings, but it only goes so far--- and brings up it's own bunch of issues --- including that sometimes when you type in the answer it will start to have issues doing a positive comparison until the entire string is entered.
Finally I've started to try regex comparisons, but I think this may also be another rabbit hole distraction.
Stripped down to it's most basic logic without any of the above attempts, this is the crux of what I am trying to do, for context:
if (this.conjugation.startsWith(this.input)) {
this.status = "correct";
} else {
this.status = "incorrect";
}
if (conjugation === val) {
// okay, we are done
}
Any thoughts of how I can get around this? I am currently testing this with Romanian verbs, so the characters appear to be in the following unicode ranges:
\u0000-\u007F, \u0180-\u024F, \u0100-\u017F

You can use Intl.Collator to construct a collator that only cares about some differences:
var word1 = "anunț"; // anun\u21B
var word2 = "anunţ"; // anun\u163
var collator = new Intl.Collator("ro", { sensitivity: "base" });
console.log(word1 === word2); // the words are not equal
console.log(collator.compare(word1, word2) == 0); // ... but they are "equal enough"

These two characters are very similar, but they are distinct. One has space between the t and the lower comma mark like part of the symbol.

Related

How to escape apostrophe successfully in jQuery on iPhone that has Smart Punctuation

I haven't been able to successfully escape Apostrophe on iPhone. After some research, it seems the Smart Punctation feature is causing some issues here. I've tried everything I can find and nothing has worked.
A user is entering text into a field and I verify if this text is correct. Here is my jQuery code. The valid Text is EMPORER'S EYE or Emporer's Eye, but neither ever comes up as valid on iPhone.
$("#actionButton").click(function() {
var seats = $("#number2").val();
var apostrophe = '\u0027';
var error = null;
if ((seats === "EMPORER\'S EYE") || (seats === 'Emporer\'s Eye')) {
$("#message").fadeIn();
$("#draggable").fadeOut();
$("#draggable2").fadeOut();
} else {
$("#messageWrong").fadeIn().delay(2500).fadeOut();
$("#draggable").fadeOut().delay(2500).fadeIn();
$("#draggable2").fadeOut().delay(2500).fadeIn();
}
// If you really need setflag:
var setflag = error != null;
});
Here is a codepen I have of the entire function and process I'm trying to put together.
https://codepen.io/MaxwellR/pen/BaYGPJL
In order to make user experience less frustrating, I would skip on checking the special characters, as this is a slippery slope.
You can convert both input and expected value to certain format that allows some degree of liberty in how people spell things. Consider this example:
const sanitizeString = string => string
.trim() // remove surrounding whitespace
.toLowerCase() // ignore the case
.replaceAll(/[^\p{L}\s]/gu, '') // cut out all special characters
.replaceAll(/\s+/g, ' '); // convert any consecutive whitespace to a space
This can be a good balance between being correct and having some freedom to spell things differently.
const string = 'Emporer\'s Eye';
const sanitizedString = sanitizeString(string);
// sanitizedString is now "emporers eye"
In your case, you could have a single value inside the condition and it would work:
if (sanitizedString(seats) === sanitizedString('Emporer\'s Eye')) {
// some magic
}

Regex taking long time to evaluate

At the time of login, I need to allow either username (alphanumeric and some special characters) or email address or username\domain format only. For this purpose, I used this regex with or (|) condition. Along with this, I need to allow some other language characters like Japanese, Chinese etc., so included those as well in the same regex. Now, the issue is when I enter characters (>=30) and # or some special character, the evaluation of this regex is taking some seconds and browser goes in hang mode.
export const usernameRegex = /(^[a-zA-Z0-9._~^#!%+\-]+#[a-z0-9.-]+\.[a-z]{2,4})+|^[a-zA-Z0-9._~^#!\-]+\\([._-~^#!]|[\p{Ll}\p{Lm}\p{Lt}a-zA-Z0-9-\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf\u3130-\u318F\uAC00-\uD7AF])+|^([._-~^#!]|[\p{Ll}\p{Lm}\p{Lt}a-zA-Z0-9-\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf\u3130-\u318F\uAC00-\uD7AF])+$/gu;
When I tried removing the other language character set such as [\p{Ll}\p{Lm}\p{Lt}a-zA-Z0-9-\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf\u3130-\u318F\uAC00-\uD7AF])+|^([._-~^#!]|[\p{Ll}\p{Lm}\p{Lt}a-zA-Z0-9-\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf\u3130-\u318F\uAC00-\uD7AF] it works fine.
I understood that generally regex looks simple but it does a lot under the hood. Is there any modification that needs to be done in this regex, so that it doesn't take time to evaluate. Any help is much appreciated!
Valid texts:
stackoverflow,
stackoverflow1~,
stackoverflow!#~^-,
stackoverflow#g.co,
stackoverflow!#~^-#g.co,
こんにちは,
你好,
tree\guava
EDIT:
e.g. Input causing the issue
stackoverflowstackoverflowstackoverflow#
On giving the above text it is taking long time.
https://imgur.com/T2Vg4lg
Your regex seems to consist of three regular expressions concatenated with |
(^[a-zA-Z0-9._~^#!%+\-]+#[a-z0-9.-]+\.[a-z]{2,4})+
^[a-zA-Z0-9._~^#!\-]+\\([._-~^#!]|[\p{Ll}\p{Lm}\p{Lt}a-zA-Z0-9-\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf\u3130-\u318F\uAC00-\uD7AF])+
^([._-~^#!]|[\p{Ll}\p{Lm}\p{Lt}a-zA-Z0-9-\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf\u3130-\u318F\uAC00-\uD7AF])+$
first regex (^...)+ how many times do you think this entire pattern can occur that starts at the beginning of the string. Either it's a second occurence OR it starts at the beginning of the string it can't be both.
So ^[a-zA-Z0-9._~^#!%+\-]+#[a-z0-9.-]+\.[a-z]{2,4}
parts 2 and 3 are mostly identical, only that nr. 2 contains this block [a-zA-Z0-9._~^#!\-]+\\ followed by what's the rest of the 3rd part.
So let's combine them: ^(?:[a-zA-Z0-9._~^#!\-]+\\)? ... and make sure to use non-capturing groups when possible.
([abc]|[def])+ can be simplified to [abcdef]+. This btw. is the part that's killing your performance.
your regex ends with a $. This was only part of the last part, but I assume you always want to match the entire string? So let's make all 3 (now 2) parts ^ ... $
Summary:
/^[a-zA-Z0-9._~^#!%+-]+#[a-z0-9.-]+\.[a-z]{2,4}$|^(?:[a-zA-Z0-9._~^#!-]+\\)?[._-~^#!\p{Ll}\p{Lm}\p{Lt}a-zA-Z0-9-\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf\u3130-\u318F\uAC00-\uD7AF]+$/u
A JS example how a simple regex would try to match a string, and how it fails, backtracks, retries with the other side of the | and so on, and so on.
// let's implement what `/([a-z]|[\p{Ll}])+/u` would do,
// how it would try to match something.
const a = /[a-z]/; // left part
const b = /[\p{Ll}]/u; // right part
const string = "abc,";
const testNextCharacter = (index) => {
if (index === string.length) {
return true;
}
const pattern = index + " ".repeat(index + 1) + "%o.test(%o)";
const character = string.charAt(index);
console.log(pattern, a, character);
// checking the left part && if successful checking the next character
if (a.test(character) && testNextCharacter(index + 1)) {
return true;
}
// checking the right part && if successful checking the next character
console.log(pattern, b, character);
if (b.test(character) && testNextCharacter(index + 1)) {
return true;
}
return false;
}
console.log("result", testNextCharacter(0));
.as-console-wrapper{top:0;max-height:100%!important}
And this are only 4 characters. Why don't you try this with 5,6 characters to get an impression how much work this will be at 20characters.

JS: Check string if it is failing for some rules

Issue
Using regex to verify if a string is matching specific rules.
My Problem
My regexes seems not to bee valid and I don't know how to check a string for multiple regexes.
Example string
This is just a senseless string with less then 1.000,00 words. and 1 x abbrevations e.g. this one ( and so on).
Rules
Every sentence must begin with an upper case character or a number
There must not be a space between number and `x`
Never multiple spaces
There must not be spaces at the beginning and at the end of bracket content
My regex attempts
/([.?!])\s*(?= [A-Z0-9])/g // Sentence have to start with upper case
/([0-9]*)(x)/g // No space between number and 'x'
/\s{2,}/g // Two or more spaces
// don't know how to do last rule
if (/([.?!])\s*(?= [A-Z0-9])/.test(string); )
failing.push('capitalizeSentence');
else if ...
But maybe it can by done a bit more dynamic...
Expected result
I need to know which rules are not matching the string if there is any. So I would suggest an array with values for those rules failed.
So in this example string the result could be an array like this, as every rule is failing.
failing = [ 'capitalizeSentence', 'spaceNumber', 'multipleSpaces', 'spaceBrackets' ];
Something like this:
var rules = {
'capitalizeSentence': /[.?!]\s+[^A-Z\d]/,
'spaceNumber': /\d\s+x/,
'multipleSpaces': /\s\s/,
'spaceBrackets': /\(\s|\s\)/
}
var check = function(str){
return Object.keys(rules).reduce(function(results,key){
if(rules[key].test(str)) {
results.push(key);
}
return results;
},[]);
};
console.log(check('This is just a senseless string with less then 1.000,00 words. and 1 x abbrevations e.g. this one ( and so on).'));
Operates by checking for rules violations and adding those violation names to an array which is returned.

Regex- match 3 or 6 of type

I'm writing an application that requires color manipulation, and I want to know when the user has entered a valid hex value. This includes both '#ffffff' and '#fff', but not the ones in between, like 4 or 5 Fs. My question is, can I write a regex that determines if a character is present a set amount of times or another exact amount of times?
What I tried was mutating the:
/#(\d|\w){3}{6}/
Regular expression to this:
/#(\d|\w){3|6}/
Obviously this didn't work. I realize I could write:
/(#(\d|\w){3})|(#(\d|\w){6})/
However I'm hoping for something that looks better.
The shortest I could come up with:
/#([\da-f]{3}){1,2}/i
I.e. # followed by one or two groups of three hexadecimal digits.
You can use this regex:
/#[a-f\d]{3}(?:[a-f\d]{3})?\b/i
This will allow #<3 hex-digits> or #<6 hex-digits> inputs. \b in the end is for word boundary.
RegEx Demo
I had to find a pattern for this myself today but I also needed to include the extra flag for transparency (i.e. #FFF5 / #FFFFFF55). Which made things a little more complicated as the valid combinations goes up a little.
In case it's of any use, here's what I came up with:
var inputs = [
"#12", // Invalid
"#123", // Valid
"#1234", // Valid
"#12345", // Invalid
"#123456", // Valid
"#1234567", // Invalid
"#12345678", // Valid
"#123456789" // Invalid
];
var regex = /(^\#(([\da-f]){3}){1,2}$)|(^\#(([\da-f]){4}){1,2}$)/i;
inputs.forEach((itm, ind, arr) => console.log(itm, (regex.test(itm) ? "valid" : "-")));
Which should return:
#123 valid
#1234 valid
#12345 -
#123456 valid
#1234567 -
#12345678 valid
#123456789 -

How to check if a string contains a number in JavaScript?

I don't get how hard it is to discern a string containing a number from other strings in JavaScript.
Number('') evaluates to 0, while '' is definitely not a number for humans.
parseFloat enforces numbers, but allow them to be tailed by abitrary text.
isNaN evaluates to false for whitespace strings.
So what is the programatically function for checking if a string is a number according to a simple and sane definition what a number is?
By using below function we can test whether a javascript string contains a number or not. In above function inplace of t, we need to pass our javascript string as a parameter, then the function will return either true or false
function hasNumbers(t)
{
var regex = /\d/g;
return regex.test(t);
}
If you want something a little more complex regarding format, you could use regex, something like this:
var pattern = /^(0|[1-9][0-9]{0,2}(?:(,[0-9]{3})*|[0-9]*))(\.[0-9]+){0,1}$/;
Demo
I created this regex while answering a different question awhile back (see here). This will check that it is a number with atleast one character, cannot start with 0 unless it is 0 (or 0.[othernumbers]). Cannot have decimal unless there are digits after the decimal, may or may not have commas.. but if it does it makes sure they are 3 digits apart, etc. Could also add a -? at the beginning if you want to allow negative numbers... something like:
/^(-)?(0|[1-9][0-9]{0,2}(?:(,[0-9]{3})*|[0-9]*))(\.[0-9]+){0,1}$/;
There's this simple solution :
var ok = parseFloat(s)==s;
If you need to consider "2 " as not a number, then you might use this one :
var ok = !!(+s==s && s.length && s.trim()==s);
You can always do:
function isNumber(n)
{
if (n.trim().length === 0)
return false;
return !isNaN(n);
}
Let's try
""+(+n)===n
which enforces a very rigid canonical way of the number.
However, such number strings can be created by var n=''+some_number by JS reliable.
So this solution would reject '.01', and reject all simple numbers that JS would stringify with exponent, also reject all exponential representations that JS would display with mantissa only. But as long we stay in integer and low float number ranges, it should work with otherwise supplied numbers to.
No need to panic just use this snippet if name String Contains only numbers or text.
try below.
var pattern = /^([^0-9]*)$/;
if(!YourNiceVariable.value.match(pattern)) {//it happen while Name Contains only Charectors.}
if(YourNiceVariable.value.match(pattern)) {//it happen while Name Contains only Numbers.}
This might be insane depending on the length of your string, but you could split it into an array of individual characters and then test each character with isNaN to determine if it's a number or not.
A very short, wrong but correctable answer was just deleted. I just could comment it, besides it was very cool! So here the corrected term again:
n!=='' && +n==n'
seems good. The first term eliminates the empty string case, the second one enforces the string interpretataion of a number created by numeric interpretation of the string to match the string. As the string is not empty, any tolerated character like whitespaces are removed, so we check if they were present.

Categories