charCodeAt is not behaving as expected

charCodeAt is not behaving as expected - javascript

How can this be possible:
var string1 = "🌀", string2 = "🌀🌂";
//comparing the charCode
console.log(string1.charCodeAt(0) === string2.charCodeAt(0)); //true
//comparing the character
console.log(string1 === string2.substring(0,1)); //false
//This is giving me a headache.
http://jsfiddle.net/DerekL/B9Xdk/
If their char codes are the same in both strings, by comparing the character itself should return true. It is true when I put in a and ab. But when I put in these strings, it simply breaks.
Some said that it might be the encoding that is causing the problem. But since it works perfectly fine when there's only one character in the string literal, I assume encoding has nothing to do with it.
(This question addresses the core problem in my previous questions. Don't worry I deleted them already.)

In JavaScript, strings are treated by characters instead of bytes, but only if they can be expressed in 16-bit code points.
A majority of the characters will cause no issues, but in this case they don't "fit" and so they occupy 2 characters as far as JavaScript is concerned.
In this case you need to do:
string2.substring(0, 2) // "🌀"
For more information on Unicode quirkiness, see UTF-8 Everywhere.

Substring parameters are the index where he starts, and the end, where as if you change it to substr, the parameters are index where to start and how many characters.

You can use the method to compare 2 strings:
string1.localeCompare(string2);

Related

Why does this regular expression won't work properly?

I'm in the need to check wether some input is strictly as this one:
PEOPLE-123456 or PERSON-12345376 (it can be any combination of numbers)
The number of numbers following the - doesn't matter. It can be from 0 to N numbers.
I've come up with the following expression:
/(PEOPLE-)|(PERSON-)?=^[0-9]+$/
The problem is, this will work even if the characters after the -are not numbers.
PEOPLE-123131 yields true
PERSON-123242 yields true
PERSON-23123.341 yields true
PEOPLE-.2341231 yields false
What am I doing wrong with it? I don't see any problems with the expression itself, maybe I am to noob to see it.

Try this:
^(PERSON|PEOPLE)-[0-9]{1,}$
This ensures the beginnings starts with exactly wither PERSON or PEOPLE, followed by - and ends with at least one number.

You need to put the grouping parentheses around both alternatives:
/^(PEOPLE|PERSON)-\d+$/
And you shouldn't mark it optional with ?. I have no idea why you put = and ^ after that part.
And if you want to allow decimal points in the number, use [0-9.] instead of \d.

This should work if numbers are optional. Otherwise at least 1 number is required replace * with +.
/^(PEOPLE|PERSON)-\d*$/

Number is different than itself (trimming strange characters)

I've copied the first number from the windows calculator, and typed the second one. In Chrome console I get:
"‭65033‬" == "65033"
//false
65033‬ == 65033
//Uncaught SyntaxError: Invalid or unexpected token
It seems there is an unknown character at the beginning and end of it.
1) Is there a way to trim all "strange" characters without knowing them a priori?
2) Why does the windows calculator puts such chars in the number?
Edit: Was not explicit in the question, but any chars with valid information, such as ã,ü,ç,¢,£ would also be valid. What I don't want is characters that do not carry any information for the human reader.

Edit: after the edit of the original question, this answer no longer offers a bulletproof solution.
var myNumber = 'foo123bar';
var realNumber = window.parseInt(myNumber.replace(/\D*/g, ''), 10);
What this does?
It replaces all the non-digit characters with empty character and then parses the integer out of numbers left in the string.

A quick solution for this case:
eval("65033‬ == 65033".replace(/[^a-zA-Z0-9 =-_.]/, ''))
You can place your copied text in a string, then remove all unnecessary characters (by explicitly listing the ones that should stay there).
These may include non-alphanumerical characters + hyphen, underscore, equality, space et cetera - actual character that need to stay there will depend on your choice and needs.
Alternatively, you may try to remove all non-printable characters, as suggested here.
Finally, evaluate resulting code. Remember this is not necessarily the best idea for production code.

negative number in parentheses using javascript

I use match to split a mathematics expression into separated strings and save them in an array.
var STRING = ST.match(/\d*\.\d+|\d+|[()/*+-]/g);
but this method separate everything including negative numbers which are inside parentheses.
For example (-2+4) does not give me -2, instead it saves - in one index of STRING array and 2 in the next index.
Is there anyway use match and save negative numbers which are in the parentheses?
This is what I want:
(-2+4):
STRING[0] give me (
STRING[1] give me -2
STRING[2] give me +
STRING[3] give me 4
STRING[4] give me )
and if there is no negative number work as normal:
(2+4):
STRING[0] give me (
STRING[1] give me 2
STRING[2] give me +
STRING[3] give me 4
STRING[4] give me )

I don't think it's possible to parse complex cases like "(-2+4*-(3.5--8))" with just a regex especially given we don't have negative look behind in javascript.
A solution would be to postprocess your match array by merging signs when they're between a separator and an unsigned expression.
In my opinion a regex is useful here, but only for the primary tokenization. Most of the work will be ahead of you as you'll build the binary expression tree (or any other formal representation you choose).

Unfortunately, if what you're trying to do is parsing a mathematical expression, regexps can not be used.
RegExps can be used in languages that are describable by Regular Grammars and arithmetical expressions can not, they are described by a Context Free Grammar (CFG). If you want to parse, and perhaps interpret the result, you'll certainly need some stacked state machine.
You can look at something like this well known algorithm.
Hope this helps.

You can add an optional sign to the numbers, that would work with your example:
var STRING = ST.match(/-?\d*\.\d+|-?\d+|[()/*+-]/g);
However, that will also turn a minus operator into a sign. The expression (4-2) would give you { "(", "4", "-2", ")" }.
Also, it will happily "parse" an expression like +---((((*** without complaining. If you want a result that makes sense, you should parse it for real, not just split it with a regular expression.

I think you have some mistake in your RegExp try this, it works for me:
var STRING = ST.match(/(\d*)(\.)(\d+)|(\d+)|[()\/*+-]/g);

Regex for integer, integer + dot, and decimals

I have searched StackOverflow and I can't find an answer as to how to check for regex of numeric inputs for a calculator app that will check for the following format with every keyup (jquery key up):
Any integer like: 34534
When a dot follows the integer when the user is about to enter a decimal number like this: 34534. Note that a dot can only be entered once.
Any float: 34534.093485
I don't plan to use commas to separate the thousands...but I would welcome if anyone can also provide a regex for that.
Is it possible to check the above conditions with just one regex? Thanks in advance.

Is a lone . a successful match or not? If it is then use:
\d+(\.\d*)?|\.\d*
If not then use:
\d+(\.\d*)?|\.\d+
Rather than incorporating commas into the regexes, I recommend stripping them out first: str = str.replace(/,/g, ''). Then check against the regex.
That wouldn't verify that digits are properly grouped into groups of three, but I don't see much value in such a check. If a user types 1,024 and then decides to add a digit (1,0246), you probably shouldn't force them to move the comma.

Let's write our your specifications, and develop from that.
Any integer: \d+
A comma, optionally followed by an integer: \.\d*
Combine the two and make the latter optional, and you get:
\d+\.?\d*
As for handling commas, I'd rather not go into it, as it gets very ugly very fast. You should simply strip all commas from input if you still care about them.

you can use in this way:
[/\d+./]
I think this can be used for any of your queries.
Whether it's 12445 or 1244. or 12445.43

I'm going to throw in a potentially downvoted answer here - this is a better solution:
function valid_float (num) {
var num = (num + '').replace(/,/g, ''), // don't care about commas, this turns `num` into a String
float_num = parseFloat(num);
return float_num == num || float_num + '.' == num; // allow for the decimal point, deliberately using == to ignore type as `num` is a String now
}
Any regex that does your job correctly will come with a big asterisk after it saying "probably", and if it's not spot on, it'll be an absolute pig to debug.
Sure, this answer isn't giving you the most awesomely cool one-liner that's going to make you go "Cool!", but in 6 months time when you realise it's going wrong somewhere, or you want to change it to do something slightly different, it's going to be a hell of a lot easier to see where, and to fix.

I'm using ^(\d)+(.(\d)+)+$ to capture each integer and to have an unlimited length, so long as the string begins and ends with integers and has dots between each integer group. I'm capturing the integer groups so that I can compare them.

Javascript string comparison fails when comparing unicode characters

I want to compare two strings in JavaScript that are the same, and yet the equality operator == returns false. One string contains a special character (eg. the danish å).
JavaScript code:
var filenameFromJS = "Designhåndbog.pdf";
var filenameFromServer = "Designhåndbog.pdf";
print(filenameFromJS == filenameFromServer); // This prints false why?
The solution
What worked for me is unicode normalization as slevithan pointed out.
I forked my original jsfiddle to make a version using the normalization lib suggested by slevithan. Link: http://jsfiddle.net/GWZ8j/1/.

Unlike what some other people here have said, this has nothing to do with encodings. Rather, your two strings use different code points to render the same visual characters.
To solve this correctly, you need to perform Unicode normalization on the two strings before comparing them. Unforunately, JavaScript doesn't have this functionality built in. Here is a JavaScript library that can perform the normalization for you: https://github.com/walling/unorm

The JavaScript equality operator == will appear to be failing under the following circumstances. In all cases it is programmer error. Not a bug in JavaScript.
The two strings do not contain the same number and sequence of characters.
There is whitespace or newlines before, within or after one string. Use a trim() operator on both and look closely at both strings.
Surprise typecasting. The programmer is comparing datatypes that are incompatible.
There are unicode characters which look identical to other unicode characters but in fact are different unicode characters.

UTF-8 is a complex thing. The charset has two different codes for characters such as á, é etc. As you already see in the URL encoded version, the HEX bytes of which the character is made differ for both versions.
See this answer for more information.

I had this same problem.
Adding
<meta charset="UTF-8">
to the HTML file fixed the issue.
In my case the templating engine was baking a json string into the HTML file. This string was in unicode.
While the template was also a unicode file, the JS engine was treating the string I wrote into the template as a latin-1 encoded string, until I added the meta tag.
I was comparing the typed in string to one of the JSON objects items (location.title == "Mühle")

Let the browser normalize unicode for you. This approach worked for me:
function normalizeUnicode(s) {
let div = $('<div style="display: none"></div>').html(s).appendTo('body');
let res = div.html();
div.remove();
return res;
}
normalizeUnicode(unicodeVal1) == normalizeUnicode(unicodeVal2)

We Keep Coding

JavaScript is the programming language of the Web.

charCodeAt is not behaving as expected - javascript

Substring parameters are the index where he starts, and the end, where as if you change it to substr, the parameters are index where to start and how many characters.

You can use the method to compare 2 strings: string1.localeCompare(string2);

Related

Why does this regular expression won't work properly?

Number is different than itself (trimming strange characters)

negative number in parentheses using javascript

Regex for integer, integer + dot, and decimals

Javascript string comparison fails when comparing unicode characters

Categories

Resources