parsing international numerals

parsing international numerals - javascript

I'm trying to figure out how my script will behave if rendered in a browser using Chinese (or other) locale using Chinese numerals (or another non-Latin symbol set). Can't seem to find any info on this on the interwebs.
Looking at the page
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number/toLocaleString
we see examples of localized numbers when converting from number to string, but what about the other way around? I tried parseInt("一二三") in IE11 debug console which returned NaN, but I'm not using Chinese Windows. Could someone test this?
My confusion comes from JavaScript having loosely typed data, so what if I end up running into an implicit string-to-number conversion, such as this:
var a = "١٢٣";
var b = .01;
console.log(a*b);
Mind you my variables a and b could come from user input in a more complex example. How can you make sure that input coming from a non-Latin symbology is converted to the right number-representation internally before you do arithmetic if parseInt and implicit conversion don't work?

It won't work for several reasons. Notice firstly that while there is a toLocalString but there is no parseLocalStringInt or fromLocaleString. Secondly javascript only really does implicit type coercion when particular operators are used e.g. ==. * however can't be used in this fashion and even == and other operators only support very limited coercion in comparison with what you are describing.
This coercion can still be very dangerous or useful depending on your point of view e.g.
0 == false is true
but 0 === false is false but it certainly isn't as powerful and you think it is.

Related

In JavaScript, should stringified numbers be converted to integers before subtracting them? (best practices)

I recently wrote some code that is grabbing text from two separate elements and then subtracting them, to my surprise I didnt have to convert them first to integers. I did a little looking around and it seems JavaScript converts strings to numbers when using the subtraction operator. Does anyone know if this is ok to leave them as strings or for best practice should they first me converted to integers? and if so why? Thank you.
example:
"10" - "6" = 4

[…] to my surprise I didnt have to convert them first to integers […]
A couple more surprises, then:
JavaScript resolves many operations (such as arithmetic) by implicitly coercing incompatible values to a different type, instead of raising an exception. This makes JavaScript “weakly typed” to that extent.
There is no integer type built into JavaScript; the only number type is IEEE-754 floating-point.
So, your string values were coerced to floating-point values, in the context of the arithmetic operation. JavaScript didn't tell you this was happening.
This is a source of bugs that can remain hidden for a long time, because if your string values would successfully convert to a number, the operation would succeed, even if you would expect those values to raise an error.
js> "1e15" - "0x45" // The reader might have expected this to raise an error.
999999999999931
The brief “Wat” presentation by Gary Bernhardt is packed with other surprising (and hilarous) results of JavaScript's implicit type coercion.
Does anyone know if this is ok to leave them as strings or for best practice should they first me converted to [numbers]?
Yes, in my opinion you should do arithmetic only on explicitly-converted numbers, because (as you discovered) for newcomers reading the code, the implicit coercion rules are not always obvious.

Is JavaScript string comparison just as fast as number comparison?

I'd like to write a little library for JavaScript enums. For me to do that, I need to decide how to store the enum values. Therefore, I'd like to use the fastest way when comparing, but I also want something that is debuggable, so I'm torn between using strings or numbers. I know I could use objects too, but that would be another question
For example
// I don't want this because when debugging, you'd see just the value 0
var Planets = {Earth:0, Mars:1, Venus: 2}
// I'd prefer this so that Planets.Earth gives me a nice readable value ("Earth")
var Planets = {Earth: 'Earth', Mars: 'Mars'}
But I'm afraid that when I compare them using if (myPlanet === Planet.Earth), the string comparison could take a lot longer (say if it were in a tight loop). This should be the case because http://ecma-international.org/ecma-262/5.1/#sec-11.9.6 says
If Type(x) is String, then return true if x and y are exactly the same sequence of characters (same length and same characters in corresponding positions); otherwise, return false.
But when I wrote a test case, I found that they take the same amount of time http://jsperf.com/string-comparison-versus-number-comparison/2 so it doesn't seem like it's scanning the whole string.
I know this could be a micro optimization, but my question is: is string equality comparison done using pointers and therefore just as fast as number equality comparison?

String comparison could be "just as fast" (depending on implementation and values) - or it could be "much slower".
The ECMAScript specification describes the semantics, not the implementation. The only way to Know for Certain is to create an applicable performance benchmark on run it on a particular implementation.
Trivially, and I expect this is the case1, the effects of string interning for a particular implementation are being observed.
That is, all string values (not String Objects) from literals can be trivially interned into a pool such that implIdentityEq("foo", "foo") is true - that is, there need only one string object. Such interning can be done after constant folding, such that "f" + "oo" -> "foo" - again, per a particular implementation as long as it upholds the ECMAScript semantics.
If such interning is done, then for implStringEq the first check could be to evaluate implIdentityEq(x,y) and, if true, the comparison is trivially-true and performed in O(1). If false, then a normal string character-wise comparison would need to be done which is O(min(n,m)).
(Immediate falseness can also be determined with x.length != y.length, but that seems less relevant here.)
1 While in the above I argue for string interning being a likely cause, modern JavaScript implementations perform a lot of optimizations - as such, interning is only a small part of the various optimizations and code hoistings that can (and are) done!
I've created an "intern breaker" jsperf. The numbers agree with the hypothesis presented above.
If a string is interned then comparison is approximate in performance to testing for "identity" - while it is slower than a numeric comparison, this is still much faster than a character-by-character string comparison.
Holding the above assertion, IE10 does not appear to consider object-identity for pass-fast string comparisons although it does use a fast-fail length check.
In Chrome and Firefox, two intern'ed strings which are not equal are also compared as quickly as two that are - there is likely a special case for comparing between two different interned strings.
Even for small strings (length = 8), interning can be much faster. IE10 again shows it doesn't have this "optimization" even though it appears to have an efficient string comparison implementation.
The string comparison can fail as soon as the first different character is encountered: even comparing long strings of equal length might only compare the first few characters.
Do common JavaScript implementations use string interning? (but no references given)
Yes. In general any literal string, identifier, or other constant string in JS source is interned. However implementation details (exactly what is interned for instance) varies, as well as when the interning occurs
See JS_InternString (FF does have string interning, although where/how the strings are implicitly interened from JavaScript, I know not)

There are cases when string comparison can be much slower (comparing dynamically generated strings)
The following is 77% slower (in chrome and IE) than all the other tests
var StringEarth = 'Ear' + 'th';
for (var i = 0; i < ITERATIONS; i++) {
x = StringPlanets.Venus === StringEarth;
}
The flaw in the tests mentioned in the question is the fact that we are testing against literal strings. It seems that JavaScript is optimized so that string comparison for string literals is done just by testing a pointer. This can be observed by creating the strings dynamically. My best guess is that strings from the literal string pool are marked so that they can be compared using addresses only.
Note that string comparison seems just as fast in FF even for dynamic strings. Also, that it's just as slow for even literal strings.
Conclusion All browsers behave differently so string comparison may or may not be slower.

In general, at best String interning (making a string with a given value into a unique reference or a O(1) comparable symbol) is going to take O(n) time, as it can't do that effectively without looking at all the characters involved.
The question of relative efficiency then amounts to over how many comparisons the interning is going to be amortized.
In the limit, a very clever optimizer can pull out static expressions which build up strings and intern them once.
Some of the tests above, use strings which will have been interned in which case the comparison is potentially O(1). In the case where enums are based on mapping to integers, it will be O(1) in any implementation.
The expensive comparison cases arise when at least one of the operands is a truly dynamic string. In this case it is impossible to compare equality against it in less than O(n).
As applied to the original question, if the desire is to create something akin to an enum in a different language, the only lookout is to ensure that the interning can is done in only a few places. As pointed out above, different Browser use different implementations, so that can be tricky, and as pointed out in IE10 maybe impossible.
Caveat lacking string interning (in which case you need the integer version of the enum implementation give), #JuanMendes' string-based enum implementations will be essentially O(1) if he arranges for the value of the myPlanet variable to be set in O(1) time. If that is set using Planets.value where value is an established planet it will be O(1).

== vs. === in a remote JavaScript file. Which one is faster? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
JavaScript === vs == : Does it matter which “equal” operator I use?
When comparing two operands of the same type for equality in JavaScript, using == or === doesn't make any conceptual difference, so I'm wondering which operator is actually faster when, like in my case, a JavaScript file has to be downloaded from a remote Internet location.
While the strict equality operator === may perform faster on many user agents, it also requires 8 more bits of uncompressed information to be carried along the network with the JavaScript file.
As it happens, today's average CPUs are much faster in executing several hundred conditional jumps than Internet connections are in delivering one single bit, so I'd be keen to using == instead of === and != instead of !== when possible. Yet I'm confused by reading so many blogs that recommend doing the opposite.
Is there any important point I'm missing?

As you say, for comparison where both operands are guaranteed to be of the same type, the two operators are specified to perform precisely the same steps and, barring compiler optimizations, are likely to perform near enough identically. Therefore there is a slight advantage in terms of file size in using == over === in those cases.
However, some people argue that consistency is more important: === is usually closer to what you intend when testing equality, and only using === and !== is something many people find helpful and readable. Personally, I have the opposite rule and only use === when there is uncertainty about the types of the operands, but I wouldn't recommend either way over the other.
If you understand the differences between strict and non-strict equality and you're confident that using == and != won't cause you or anyone else working your code any problems reading and understanding code in the future, go ahead and use them.

Presumably blogs that recommend one over the other are doing so for reasons of logical consistency with analogous operations and not for speed. Trying to trim your Javascript programs to point of shaving off individual characters from the script is unwise; running it through minify or another automated tool before serving is one thing, but hand-tuning Javascript for minimum file size or for execution speed on the level of individual operands is an endless, thankless task that will make your site harder to maintain.
Use whichever operand makes more logical sense to you, so you won't be confused when you don't remember this line of inquiry two years from now.

If I may quote from Douglas Crockford's Javascript: The Good Parts:
JavaScript has two sets of equality operators: === and !==, and their
evil twins == and !=. The good ones work the way you would expect.
If the two operands are of the same type and have the same value, then
=== produces true and !== produces false. The evil twins do the right thing when the operands are of the same type, but if they are of
different types, they attempt to coerce the values. The rules by
which they do that are complicated and unmemorable ... The lack of
transitivity is alarming. My advice is to never use the evil twins.
Instead, always use === and !==.
Whether there is a performance difference or not, it would be difficult to justify the use of the "evil twins".

var test="1";
console.log(test==1); // true
console.log(test===1); // false
== checks whether test has 1 or not but === checks whether test has 1 and also checks it's data type. In that case 2nd expression is false because it's data type is string (test is string) but the right hand operand is not string. The following test is different
var test=1;
console.log(test==1); // true
console.log(test===1); // true
Because test contains Integer 1 that evaluates to Boolean true and right hand operand is also same. === also checks whether both operands type.

How to let user enter a JavaScript function securely?

What I'm doing is creating a truth table generator. Using a function provided by the user (e.g. a && b || c), I'm trying to make JavaScript display all combinations of a, b and c with the result of the function.
The point is that I'm not entirely sure how to parse the function provided by the user. The user could namely basically put everything he wants in a function, which could have the effect of my website being changed etc.
eval() is not secure at all; neither is new Function(), as both can make the user put everything to his liking in the function. Usually JSON.parse() is a great alternative to eval(), however functions don't exist in JSON.
So I was wondering how I can parse a custom boolean operator string like a && b || c to a function, whilst any malicious code strings are ignored. Only boolean operators (&&, ||, !) should be allowed inside the function.

Even if you check for boolean expressions, I could do this:
(a && b && (function() { ruinYourShit(); return true; })())
This is to illustrate that your problem is not solvable in the general case.
To make it work for your situation, you'd have to impose severe restrictions on variable naming, like requiring that all variables are a single alphabet letter, and then use a regular expression to kick it back if anything else is found. Also, to "match" a boolean expression, you'd actually have to develop a grammar for that expression. You are attempting to parse a non-regular language (javascript) and therefore you cannot write a regular expression that could match every possible boolean expression of arbitrary complexity. Basically, the problem you are trying to solve is really, really hard.
Assuming you aren't expecting the world to come crashing down if someone ruins your shit, you can develop a good enough solution by simply checking for the function keyword, and disallowing any logical code blocks contained in { and }.

I don't see the problem with using eval() and letting the user do whatever (s)he wants, as long as you have proper validation/filters on any of your server-side scripts that expect input. Sure, the user could do something that ruins your page, but (s)he can do that already. The user can easily enough run any javascript (s)he wants on your page already, using any number built-in browser features or add-ons.

What I would do would be to first split the user input into tokens (variable names, operators and possibly parentheses for grouping), then build an expression tree from that and then generate the relevant output from the expression tree (possibly after running some simplification on it).
So, say, break the string "a & !b" into the token sequence "a" "&" "!" "b", then go through and (eventually) build something like: booland(boolvar("a"), boolnot(boolvar("b"))) and you then have a suitable data structure to run your (custom, hopefully injection-free) evaluator over.

I ended up with a regular expression:
if(!/^[a-zA-Z\|\&\!\(\)\ ]+$/.test(str)) {
throw "The function is not a combination of Boolean operators.";
return;
}

Is JavaScript an untyped language?

I've found that some people call JavaScript a "dynamically, weakly typed" language, but some even say "untyped"? Which is it really?

JavaScript is untyped:
(source: no.gd)
Even Brendan Eich says so. On Twitter, he replied to a thread that linked to this question:
... academic types use "untyped" to mean "no static types"...
So the problem is that there's a few different definitions of untyped.
One definition has been talked about in one of the above answers - the runtime doesn't tag values and just treats each value as bits. JavaScript does tag values and has different behaviour based on those tags. So JavaScript obviously doesn't fit this category.
The other definition is from Programming Language Theory (the academic thing that Brendan is referring to). In this domain, untyped just means everything belongs to a single type.
Why? Because a language will only generate a program when it can prove that the types align (a.k.a. the Curry-Howard correspondence; types are theorems, programs are proofs). This means in an untyped language:
A program is always generated
Therefore types always match up
Therefore there must only be one type
In contrast to a typed language:
A program might not be generated
Because types might not match up
Because a program can contain multiple types
So there you go, in PLT, untyped just means dynamically typed and typed just means statically typed. JavaScript is definitely untyped in this category.
See also:
What to know before debating type systems
Does “untyped” also mean “dynamically typed” in the academic CS world?

strong/weak can be thought of in relation to how the compiler, if applicable, handles typing.
Weakly typed means the compiler, if applicable, doesn't enforce correct typing. Without implicit compiler interjection, the instruction will error during run-time.
"12345" * 1 === 12345 // string * number => number
Strongly typed means there is a compiler, and it wants you an explicit cast from string to integer.
(int) "12345" * 1 === 12345
In either case, some compiler's features can implicitly alter the instruction during compile-time to do conversions for you, if it can determine that is the right thing to do.
Thus far, JavaScript can be categorized as Not-Strongly-Typed. That either means it's weakly-typed or un-typed.
dynamic/static can be thought of in relation to how the language instructions manipulate types.
Dynamically typed means the value's type is enforced, but the variable simply represents any value of any type.
x = 12345; // number
x = "string"; // string
x = { key: "value" }; // object
y = 123 + x; // error or implicit conversion must take place.
Statically typed means the variable type is strongly enforced, and the value type is less-so enforced.
int x = 12345; // binds x to the type int
x = "string"; // too late, x is an integer - error
string y = 123; // error or implicit conversion must take place.
Thus far, JavaScript can be categorized as Not-Statically-Typed. Also, it appears to be Dynamically Typed, if typed at all. So we need to see what Typing means.
Typed means that the language distinguishes between different types such as string, number, boolean, object, array, null, undefined and so on. Also each operation is bound to specific types. So you cannot divide an integer by a string.
2 / "blah" // produces NaN
Untyped means the operation of dividing integer by string would result in treating the first four bytes of string as integer. This is because Untyped operations take place directly on bits, there are no types to observe. The outcome will be something quite unexpected:
2 / "blah" // will be treated as 2 / 1500275048
Since JavaScript behaves according to the definition of being Typed, it must be. And therefore it must be Dynamically Typed, and Weakly Typed.
If anybody claims JavaScript is Untyped, it is merely for academic theory, not for practical application.

JavaScript is weakly typed. It is most certainly not "untyped" but its weakly typed nature allows for a lot of flexibility in terms of implicit conversions.
Keep in mind that JavaScript is also dynamically typed. This method of typing allows what is know as "duck typing".
For comparison consider that JavaScript is not strongly typed nor is it statically typed. Sometimes understanding what something isn't can help you see better what it is.

To the author's point JavaScript is also classified as Dynamically typed. Wiki states that Dynamically typed languages are type checked at runtime instead of in a compiler while Weakly Typed refers to the ability to change type on the fly within your code. So yes it is both Dynamically typed AND Weakly typed.

The problem here that is confusing a lot of programmers is that definitions like this are not standardized somewhere. The term untyped programming language is ambiguous. Does that refer to a language that has no data types or a language that is a lambda calculus untyped variant?
JavaScript/ECMAScript has a type system and all of its functions' domains will accept any Reference specification type. So that means JavaScript has a single data type, in reality. That is a matter of implementation that is more important to the very advanced JavaScript programmers. The average JavaScript programmer only cares about the abstract language data types that have been specified by ECMAScript.
In the context of the everyday programmer, not the researcher or theoretical computer scientist, the term untyped is a misnomer because most people aren't doing lambda calculus. Thus the term confuses the masses and seems to declare that JavaScript does not have any data types which is simply not true. Anyone who has ever used typeof knows that JavaScript has its own language data types:
var test = "this is text";
typeof(test);
yields
"string"
ECMAScript defines the following eight language types: Undefined, Null, Boolean, String, Symbol, Number, BigInt, and Object.
A more accurate designation for JavaScript would be implicitly typed, dynamically typed, or weakly/loosely typed (or some combination thereof), in that JavaScript uses type coercion in some cases which makes the type implicit because you don't have to explicitly specify the type of your variables. It falls under weakly typed because, unlike some languages which distinguish between float and integer etc, it just uses one number type to encompass all numbers, and makes use of the type coercion mentioned previously[Section 9 of ECMAScript Spec], in strong contrast to a strongly-typed language which would have very specific data types (i.e. you would have to specify int or float).
The definitions of statically and dynamically-typed languages are not standardized, however neither was the size of a byte when computers were beginning to evolve. Static and dynamic typing most often refer to the presence of certain language features. One of which is type-checking at runtime, or what is called dynamic type-checking. If you've used JavaScript, you already know that it definitely waits until runtime to check types, which is why you get TypeError exceptions during execution of your code. Example here
I think the top-voted answer is confusing JavaScript functions' polymorphism with functions that will accept literally anything (as with untyped variants of Lambda Calculus) which is an Association Fallacy.

Remember that JavaScript allows you to ask what is the typeof(your_variable), and compare types: 5==="5" returns false.
Thus I don't think you can call it untyped.
It is dynamically and (estimated as) weakly typed.
You may want to know it uses Duck typing (see andrew's link) and offers OOP though Prototyping instead of classes and inheritance.

While it is typed (you can ask "typeof someVar" and learn its specific type, it's very weak.
Given:
var a = "5";
you might say that a is a string. However, if you then write:
var b = a + 10;
b is an int equal to 15, so a acted just like an int. Of course, you can then write:
var c = a + "Hello World";
and c will equal "5Hello World", so a is again acting like a string.

I'd argue JavaScript is strongly and dynamically typed.
But that depends on a bunch of terms which seem to to be loosely defined at best, so the above is true only if you accept the definitions below...
Strong / Weak
"Strong" typing prevents operations intended for one type of data from being executed on another type. For example, trying to write to an Nth element of an array, where the object is not an array. Strong typing is required for memory safety.
"Weak" is the opposite of strong.
Static / Dynamic
"Static" typing checks types before program execution (at compile/transpile time). This requires the type information to be encoded into the language's syntax.
"Dynamic" is the oposite of static.
JavaScript will not let you corrupt the memory (hence "strong") but does all the checking and type conversions/coercions at run-time (hence "dynamic").
I find it helpful to think of "strong vs weak" as being orthogonal to "static vs dynamic". There are languages that are strong and static (e.g. C# without unsafe context), strong and dynamic (most "scripting" languages seem to fall into that category), weak and static (C/C++).
Not sure what would be weak and dynamic... assembler, perhaps :)

another interesting example of loosely typed JS:
console.log(typeof(typeof(5)))
the result is string. why? because the initial typeof results in 'integer' which in itself is a string. I would assume in a strongly typed language this type of changing types would not be a thing. perhaps i am mistaken but that was the first instance where i started to understand how CRAZY JS can be lol

We Keep Coding

JavaScript is the programming language of the Web.