javascript string comparison in a foreign language

javascript string comparison in a foreign language - javascript

I have some old code that doesn't have any comments that is using javascript differently then how I've ever used it. The following is doing math on strings.
if ((toFind > nextOptionText) && (toFind < lookAheadOptionText))
I've found questions like this one - that basically states that "a" < "b":
how exactly do Javascript numeric comparison operators handle strings?
However in my example I have some special characters that it is comparing against. In my code the parameters from above are:
if (("A" > "+") && ("A" < ">EINFUHRUNG ZUM"))
This is equaling TRUE - For me in my case I need it to equal FALSE, but I'm not asking how to make it false, I really just want to understand what the developer that wrote this code was thinking and how does the above if statement work.
Obviously I'm also dealing with a foreign language (German) - I'm pretty sure that this code was written prior to the application becoming multi-lingual.
If there is other suggestions that I should look into, please let me know (i.e. like using the locale or doing a different type of comparison).

My quick test shows that this code evaluates to FALSE, as expected.
if (("A" > "+") && ("A" < ">EINFUHRUNG ZUM")) {
alert('true')
} else {
alert( 'false')
}
In general, the comparison is done as usual according to the character codes, therefore "A" > ">" and "A" > "+".
To compare strings with non-ASCII letters, you might find this reference useful.

Related

ANTLR: How to allow boolean values in arithmetic operations

I am trying to write ANTLR grammar for reading JavaScript arithmetic operations.
Specifically, I want to support boolean values in arithmetic operations such as 0 + true = 1 and 0 + false = 0.
I have the following right now:
BOOLEAN: 'true'
| 'false';
How can I make 1 also mean "true", and "0" also mean false?

How to allow boolean values in arithmetic operations
By making BOOLEAN one possible alternative for your expression rule. In fact, this will already be the case if you wrote your grammar the normal way. Something like 0 + true is syntactically valid in virtually any language that supports infix operators (it'd be a type error in many languages, but still syntactically valid).
How can I make 1 also mean "true", and "0" also mean false?
By treating it as such in your type checking, code generation and/or evaluation code. The grammar doesn't specify what things mean - only what is and isn't syntactically valid and what the resulting parse tree will look like.

Perils of <= and >= in Javascript. How to avoid them?

Bill the Lizard stated the dangers and cons of using the equality operator (==) in this answer. However, little has been said about the undercover "evil twins", the >= and the <=.
0 >= '' // true
0 <= '' // true
'' >= 0 // true
'' <= 0 // true
Therefore, my questions are:
Should elements of different types be comparable by default?
1.1 If not, what should be the returned value? false? undefined? Bear in mind that if a >= b === false, this implies that a < b === true.
What could be done to avoid, in a practical way, odd cases as the ones in the example?
Since greater than (<) and less than (>) operators also do type conversion, are there any odd cases for them?

Should elements of different types be comparable by default?
This only matters if you are designing a language. In JavaScript, values of different types are comparable, and comparison follows certain rules. You have to be aware of them, but I don't see any point in discussing "what if the rules were different".
What could be done to avoid, in a practical way, odd cases as the ones
in the example?
Just avoid comparing values of different types, it doesn't make sense if most cases. The only situation where it's useful is comparing numeric strings with numbers. And in that case JavaScript behaves just as anyone would expect, no odd results.
Since greater than (<) and less than (>) operators
also do type conversion, are there any odd cases for them?
I'm sure there are, although I can't think of one right now. But why you think your examples are odd? You seem to understand that "" == 0 because of type conversion, so it's no wonder all the comparisons in your example return true, as they all include zero.

What does Math.random() do in this JavaScript snippet?

I'm watching this Google I/O presentation from 2011 https://www.youtube.com/watch?v=M3uWx-fhjUc
At minute 39:31, Michael shows the output of the closure compiler, which looks like the code included below.
My question is what exactly is this code doing (how and why)
// Question #1 - floor & random? 2147483648?
Math.floor(Math.random() * 2147483648).toString(36);
var b = /&/g,
c = /</g,d=/>/g,
e = /\"/g,
f = /[&<>\"]/;
// Question #2 - sanitizing input, I get it...
// but f.test(a) && ([replaces]) ?
function g(a) {
a = String(a);
f.test(a) && (
a.indexOf("&") != -1 && (a = a.replace(b, "&")),
a.indexOf("<") != -1 && (a = a.replace(c, "<")),
a.indexOf(">") != -1 && (a = a.replace(d, ">")),
a.indexOf('"') != -1 && (a = a.replace(e, """))
);
return a;
};
// Question #3 - void 0 ???
var h = document.getElementById("submit-button"),
i,
j = {
label: void 0,
a: void 0
};
i = '<button title="' + g(j.a) + '"><span>' + g(j.label) + "</span></button>";
h.innerHTML = i;
Edit
Thanks for the insightful answers. I'm still really curious about the reason why the compiler threw in that random string generation at the top of the script. Surely there must be a good reason for it. Anyone???

1) This code is pulled from Closure Library. This code in is simply creating random string. In later version it has been replaced by to simply create a large random integer that is then concatenated to a string:
'closure_uid_' + ((Math.random() * 1e9) >>> 0)
This simplified version is easier for the Closure Compiler to remove so you won't see it leftover like it was previously. Specifically, the Compiler assumes "toString" with no arguments does not cause visible state changes. It doesn't make the same assumption about toString calls with parameters, however. You can read more about the compiler assumptions here:
https://code.google.com/p/closure-compiler/wiki/CompilerAssumptions
2) At some point, someone determined it was faster to test for the characters that might need to be replaced before making the "replace" calls on the assumption most strings don't need to be escaped.
3) As others have stated the void operator always returns undefined, and "void 0" is simply a reasonable way to write "undefined". It is pretty useless in normal usage.

1) I have no idea what the point of number 1 is.
2) Looks to make sure that any symbols are properly converted into their corresponding HTML entities , so yes basically sanitizing the input to make sure it is HTML safe
3) void 0 is essentially a REALLY safe way to make sure it returns undefined . Since the actual undefined keyword in javascript is mutable (i.e. can be set to something else), it's not always safe to assume undefined is actually equal to an undefined value you expect.

When in doubt, check other bases.
2147483648 (base 10) = 0x80000000 (base 16). So it's just making a random number which is within the range of a 32-bit signed int. floor is converting it to an actual int, then toString(36) is converting it to a 36-character alphabet, which is 0-9 (10 characters) plus a-z (26 characters).
The end-result of that first line is a string of random numbers and letters. There will be 6 of them (36^6 = 2176782336), but the first one won't be quite as random as the others (won't be late in the alphabet). Edit: Adrian has worked this out properly in his answer; the first letter can be any of the 36 characters, but is slightly less likely to be Z. The other letters have a small bias towards lower values.
For question 2, if you mean this a = String(a); then yes, it is ensuring that a is a string. This is also a hint to the compiler so that it can make better optimisations if it's able to convert it to machine code (I don't know if they can for strings though).
Edit: OK you clarified the question. f.test(a) && (...) is a common trick which uses short-circuit evaluation. It's effectively saying if(f.test(a)){...}. Don't use it like that in real code because it makes it less readable (although in some cases it is more readable). If you're wondering about test, it's to do with regular expressions.
For question 3, it's new to me too! But see here: What does `void 0` mean? (quick google search. Turns out it's interesting, but weird)

There's a number of different questions rolled into one, but considering the question title I'll just focus on the first here:
Math.floor(Math.random() * 2147483648).toString(36);
In actual fact, this doesn't do anything - as the value is discarded rather than assigned. However, the idea of this is to generate a number between 0 and 2 ^ 31 - 1 and return it in base 36.
Math.random() returns a number from 0 (inclusive) to 1 (exclusive). It is then multipled by 2^31 to produce the range mentioned. The .toString(36) then converts it to base 36, represented by 0 to 9 followed by A to Z.
The end result ranges from 0 to (I believe) ZIK0ZI.
As to why it's there in the first place ... well, examine the slide. This line appears right at the top. Although this is pure conjecture, I actually suspect that the code was cropped down to what's visible, and there was something immediately above it that this was assigned to.

Obfuscated code without function calls

I've seen examples of code that uses only operators and "" to perform complex string operations. Basically, the idea was that something like ((+"+")+"")[+""] gives you a letter N, etc. I forgot where I found it, and I'm having no luck finding proper google keywords. Does anyone have a link at hand?

Basically there are two main concepts used here:
making a Number out of string, i.e. Number(str), which shortcut is +str;
stringifying numeric values, i.e. String(n), which shortcut is n+"".
Hence, if we look at the expression thoroughly, we'll see:
+"+" === NaN
NaN + "" === "NaN"
+"" === 0
"NaN"[0] === "N"
There are a lot of things you can do in JavaScript in the same way. One funny example is provided in the following question: What are JavaScript's builtin strings?

Rationale for why JavaScript converts primitive values to numbers in == operator comparisons when one is boolean?

I know the rule:
If the two operands are not of the same type, JavaScript converts the operands then applies strict comparison. If either operand is a number or a boolean, the operands are converted to numbers if possible; else if either operand is a string, the other operand is converted to a string if possible.
So, if("true") passes but if("true" == true) fails because it is handle like if(NaN == 1).
I was wondering what the rational is behind this when one value is boolean. In other weak typed languages like php, this is handle this differently--if one value is a boolean, the other is converted to a boolean for comparisons (and not covert both to numbers as in javascript).
I'm assuming this choice was made for the == operator on careful consideration. Can anyone provide rational as to why this was the chosen functionality? Is there a common use case that this was chosen to address? I'm betting is wasn't just a mistake.

A remarkably quick response just in from Brendan Eich from the es-discuss#mozilla.org mailing list :
Consider Perl:
$ perl -e 'print 0 == "true";'
1
Ok, poor rationale -- but I created JS in May 1995, in the shadow of AWK, Perl 4, Python 1.2 (IIRC), TCL.
I should have paid more attention to AWK than Perl, given
$ awk 'END {print(0 == "0")}'
1D
$ awk 'END {print(0 == "")}'
0D
In some ways, JS's == operator splits the difference between Perl (where non-numeric strings such as "true" convert to 0) and AWK (where only "0" converts to 0) by converting to NaN. That way, at least, we have
js> 0 == ""
true
js> 0 == "true"
false
But the full truth is not that I was carefully emulating other languages. Rather, some Netscapers working to embed JS (then "Mocha") in a PHP-like server (LiveWire) wanted sloppy conversions, so programmers could match HTTP header strings (server side) or HTML form fields (client side) against, e.g., 404 and the like, without explicit coercion by the programmer.
But it was the 90s, I was in a tearing hurry, these ex-Borland Netscapers were persistent. So, as I said at Strange Loop last year, "I was an idiot! I gave them what they wanted!"
Implicit conversions are my biggest regret in JS's rushed design, bar none. Even including 'with'!
Does anyone know the exact reason the choice was made not to convert to boolean any value compared against a boolean in with the == operator?
The general idea is the narrower type should widen. Thus, true == 1 follows by projecting boolean {false, true} onto {0, 1}, as in C++.
But why not widen true to string, since the other operand in your example is "true"? Good question. The bias toward comparing strings as numbers if either operand is a number or a boolean stems from the HTTP header and numeric-string HTML form field use-cases. Not good reasons, again, but that's how JS "works" :-|.
You can see this in the ECMA-262 Edition 5.1 spec, 11.9.3 The Abstract Equality Comparison Algorithm, steps 6 & 7 (read in light of steps 4 & 5):
4. If Type(x) is Number and Type(y) is String, return the result of the comparison x == ToNumber(y).
5. If Type(x) is String and Type(y) is Number, return the result of the comparison ToNumber(x) == y.
6. If Type(x) is Boolean, return the result of the comparison ToNumber(x) == y.
7. If Type(y) is Boolean, return the result of the comparison x == ToNumber(y).
This is all in a big "else clause where Type(x) and Type(y) for x == y are not the same.
Sorry there's no pearl (sic) of wisdom here. In addition to implicit conversions, == and != do not widen operands directly (no intermediate conversions) to the narrowest width that can hold the other operand without data loss. This narrowing string to number is just a botch.
If we fixed this botch, we'd still have:
0 == "0"
1 == "1"
true != "1"
false != "0"
But we would also have what your example wants:
true == "true"
false != ""
Per my calling the preference for number over string conversion a botch, we would not have true == "1" or false == "0", because that narrows from string to number. It's true the narrowing loses no bits, and one can widen 0 back to "0" and 1 back to "1", but I meant to illustrate what removing all number-over-string bias from the implicit conversion spec for == would do.
Would such a change break a lot of code on the web? I'd bet large sums it would.
Some take this botch, on top of any implicit conversion under the hood, as another reason to use === and !== always (because they never convert), and to utterly shun == and !=. Others disagree (especially when testing x == null, a one-operator way to test x === null || x === undefined).
Since the web grows mostly-compatibly until very old forms die off, we're stuck with == and !=, so I say it pays to learn what the sloppy equality operators do. Having done that, it seems to me one may use them where they win: when you know the operands are same-type, e.g.
typeof x == "function", etc.
x == null
And otherwise, use === and !==.
The bias toward comparing strings as numbers if either operand is a number or a boolean stems from the HTTP header and numeric-string HTML form field use-cases. Not good reasons, again, but that's how JS "works" :-|.
One more note: it could be argued that narrowing from string to number would be ok (as in, useful most of the time, and not unsafe) if any non-numeric, non-empty string-to-number implicit conversion attempt threw an exception.
Here's where another path-dependent bias in JS's design bit: no try/catch in JS1 or any ECMA-262 standard till ES3.
The lack of exception handling also meant undefined is imputed for missing obj.foo property get where obj has no such property. That is still biting back, perhaps as much as or more than implicit conversions bite ==. It is also the basis of web JS's winning "object detection" pattern, which fairly beats all other versioning schemes I've seen, especially a-priori ones based on explicit numbering and opt-in.
If only I'd taken the time to add an existential operator for object detection, so one could write
function emulateRequestAnimationFrame(...) {...}
if (!window.requestAnimationFrame?)
window.requestAnimationFrame = emulateRequestAnimationFrame;
IOW, if only I'd made window.noSuchProperty throw but window.noSuchProperty? evaluate to truthy or false (details still TBD, see the "fail-fast object destructuring" thread revival, the Nil idea).

I think some clarification is in order. According to the ECMA specification the entire expression for the if statement (the part within the parentheses) is converted to a boolean.
So imagine it like this:
if (ToBoolean("true" == true)) { //false
vs
if (ToBoolean("true")) { //true
I suppose the rational for why the ToBoolean coercion was added to the if expression was to ensure the expression always evaluates safely and correctly.
ToBoolean coerces a single value to a boolean. A comparison does not coerce each value to a boolean, that wouldn't make sense as you get some pretty strange results. It checks for equality, a different operation. As for why one value isn't converted to boolean when the other is one I am not sure, but try the Mozilla ECMA mailing list: https://mail.mozilla.org/listinfo/es-discuss
See:
http://www.ecma-international.org/ecma-262/5.1/#sec-9.2
http://www.ecma-international.org/ecma-262/5.1/#sec-12.5
http://www.ecma-international.org/ecma-262/5.1/#sec-11.9.1
http://www.ecma-international.org/ecma-262/5.1/#sec-11.9.3
http://www.ecma-international.org/ecma-262/5.1/#sec-8.7.1

We Keep Coding

JavaScript is the programming language of the Web.

javascript string comparison in a foreign language - javascript

Related

ANTLR: How to allow boolean values in arithmetic operations

Perils of <= and >= in Javascript. How to avoid them?

What does Math.random() do in this JavaScript snippet?

Obfuscated code without function calls

Rationale for why JavaScript converts primitive values to numbers in == operator comparisons when one is boolean?

Categories

Resources