How does Javascript's string comparison work at the lower level?

How does Javascript's string comparison work at the lower level? - javascript

Why does the comparison operator work faster regardless of the length of string?
Consider the following two functions which I use to compare strings with a length of 100 million:
With ===:
// O(?)
const compare1(a, b) {
return a === b;
}
With character by character comparison:
// O(N)
const compare2(a, b) {
if(a.length !== b.length) return false;
const n = Math.max(a.length, b.length);
for(var i=0; i<n; i++){
if(a[i] !== b[i]) return false;
}
return true;
}
When testing these two functions, I find that the speed of the compare1 function is significantly faster.
In the case of the compare2 function, I think the overhead will be severe in interpreting JavaScript code, accessing and comparing memory.
But from my understanding, the compare1 function may also have to compare N characters. Does it run so much faster because it all happens at a lower level?

There are two considerations:
=== comparison of string primitives is implemented with lower level, compiled code, which indeed executes faster than explicit JavaScript looping, which has additional work, including in updating a JavaScript variable (i), performing an access with that variable (a[i]); where all the prescribed ECMAScript procedures must be followed.
The JavaScript engine may optimise memory usage and use the knowledge that two strings are the same (for instance when that is already detected at parsing time, or one string is assigned to a second variable/property) and only store that string once (cf. String pool). In that case the comparison is a trivial O(1) comparison of two references. There is however no way in JavaScript to inspect whether two string primitives actually share the same memory.
As an illustration of the second point, note how the comparison-time is different for two cases of comparing long strings that are equal -- which probably is a hint that this string pooling is happening:
function compare(a, b) {
let sum = 0, start, p;
for (let i = 0; i < 10; i++) { // Perform 10 comparisons
start = performance.now();
p = a === b;
sum += performance.now() - start;
}
return sum / 10; // Return average time to make the comparison
}
console.log("Hold on while strings are being created...");
setTimeout(function () {
// Create long, non-trivial string
let a = Array.from({length: 10000000}, (_, i) => i).join("");
let b = a.slice(0); // Plain Copy - engine realises it is the same string & does not allocate space
let c = a[0] + a.slice(1); // More complex - engine does not realise it is the same string
console.log("Strings created. Test whether they are equal:", a === b && b === c);
console.log(compare(a, b) + "ms");
console.log(compare(a, c) + "ms");
});

Related

javscript recursion string reverse

I tried to find a recursive solution on my own to reverse a string s but it returns 'undefined' for strings >1 char.
Can somebody please help me to debug this?
I know it is not the elegant way but finding the error could help me advancing.
Thank U very much!
var revFunc = function (s, r, n, i) {
if (i > n) {
return r;
}
r = r + s[n - i];
i++;
revFunc(s, r, n, i);
};
var reverse = function (s) {
var r = "";
var n = s.length - 1;
var i = 0;
if (n < 1) {
return s;
}
return revFunc(s, r, n, i);
};

Not a good fit for recursion
While the minor correction already presented fixes your function, I want to suggest that this version shows a misunderstanding about recursion.
Your function is using recursion to update what would otherwise be local variables in a while loop. It could be rewritten like this:
var reverse = function (s) {
var r = "";
var n = s.length - 1;
var i = 0;
if (n < 1) {
return s;
}
while (i <= n) {
r = r + s[n - i]
i++
}
return r
};
console .log (reverse ('abcde'))
The recursion you use is just another means of writing that same code.
Recursion is meant for something a bit different. We want our algorithms to match our data structures. For recursion to really be meaningful, we should should use it on a recursive data structure.
But we can do so for this problem.
Thinking of a string recursively
There are often many different ways to think of a single data structure. We can think of a string as a recursive structure this way:
A string is either
The empty string, "", or
a character followed by a string
We can call the two parts of a non-empty string the head and the tail.
Using the recursive definition in a recursive algorithm
With that formulation, we can write a recursive algorithm that matches the structure:
The reversal of a string is either
The reversal of the empty string (which is still the empty string), or
The reversal of tail, followed by the head
Then the code practically writes itself:
const reverse = (string) =>
string .length == 0
? ""
: reverse (string .slice (1)) + string[0]
console .log (reverse ('abcdefg'))
Variants
There are certainly variations of this possible. We could want to be ready for when tail-call-optimization is finally common in JS environments, and could change this to a tail-recursive version, something like
const reverse = (string, result = "") =>
string .length == 0
? result
: reverse (string .slice (1), string [0] + result)
(Note that there is a potential problem with this style. Default parameters can get in the way of using the function in certain contexts. You cannot, for instance, successfully call strings .map (reverse) with this version. You could do this by splitting it into a helper function and a main function, but that's taking us afield.)
Or, taking advantage of the fact that strings are iterables, we could write a nicely simple version using parameter destructuring along with a default value:
const reverse = ([head = undefined, ...tail]) =>
head == undefined
? ""
: reverse (tail) + head
The advantages of recursion
And it's hard to imagine getting much simpler than that. This version captures the algorithm quite directly. And the algorithm is cleanly tied to our description of the data structure.
Note one thing that none of these functions use: local variables. Simple recursion can often be written in such a way that there is no need for them. This is a major blessing, because that means there is no place for mutable state to hide. And mutable state is the source of many, many programming errors.

You forgot to return
return revFunc(s, r, n, i);
Updated:
var revFunc = function (s, r, n, i) {
if (i > n) {
return r;
}
r = r + s[n - i];
i++;
return revFunc(s, r, n, i);
};
Suggestion: Please use better name for variables, it sucks s,r,n,i

Can you try a simple solution :
let str = "hello";
let reverse_str = str.split("").reverse().join("");

Is it safe to use 'undefined' as sentinel in a Javascript while loop?

Is it safe to use this kind of loop in Javascript?
denseArray = [1,2,3,4,5, '...', 99999]
var x, i = 0
while (x = denseArray[i++]) {
document.write(x + '<br>')
console.log(x)
}
document.write('Used sentinel: ' + denseArray[i])
document.write('Size of array: ' + i)
It is shorter than a for-loop and maybe also more effective for big arrays, to use a built in sentinel. A sentinel flags the caller to the fact that something rather out-of-the-ordinary has happened.
The array has to be a dense array to work! That means there are no other undefined value except the value that come after the last element in the array. I nearly never use sparse arrays, only dense arrays so that's ok for me.
Another more important point to remember (thank to #Jack Bashford reminded) is that's not just undefined as a sentinel. If an array value is 0, false, or any other falsy value, the loop will stop. So, you must be sure that the data in the array does not have falsy values that is 0, "", '', ``, null, undefined and NaN.
Is there something as a "out of range" problem here, or can we consider arrays in Javascript as "infinite" as long memory is not full?
Does undefined mean browsers can set it to any value because it is undefined, or can we consider the conditional test always to work?
Arrays in Javascript is strange because "they are Objects" so better to ask.
I can't find the answer on Stackoverflow using these tags: [javascript] [sentinel] [while-loop] [arrays] . It gives zero result!
I have thought about this a while and used it enough to start to worry. But I want to use it because it is elegant, easy to see, short, maybe effective in big data. It is useful that i is the size of array.
UPDATES
#Barmar told: It's guaranteed by JS that an uninitialized array
element will return the value undefined.
MDN confirms: Using
an invalid index number returns undefined.
A note by #schu34: It is better to use denseArray.forEach((x)=>{...code}) optimized for it's use and known by devs. No need to encounter falsy values. It has good browser support.

Even if your code won't be viewed by others later on, it's a good idea to make it as readable and organized as possible. Value assignment in condition testing (except for the increment and decrement operators) is generally a bad idea.
Your check needs to be a bit more specific, too, as [0, ''] both evaluate to false.
denseArray = [1,2,3,4,5, '...', 99999]
for(let i = 0; i < denseArray.length; i++) {
let x = denseArray[i]
document.write(x + '<br>');
console.log(x);
if (/* check bad value */) break;
}
document.write('Used sentinel: ' + denseArray[i])
document.write('Size of array: ' + i)
From my experience it's usually not worth it to save a few lines if readability or even reliability is the cost.
Edit: here's the code I used to test the speed
const arr = [];
let i;
for (i = 0; i < 30000000; i++) arr.push(i.toString());
let x;
let start = new Date();
for(i = 0; i < arr.length; i++) {
x = arr[i];
if (typeof x !== 'string') break;
}
console.log('A');
console.log(new Date().getTime() - start.getTime());
start = new Date();
i = 0;
while (x = arr[i++]) {
}
console.log('B');
console.log(new Date().getTime() -start.getTime());
start = new Date();
for(i = 0; i < arr.length; i++) {
x = arr[i];
if (typeof x !== 'string') break;
}
console.log('A');
console.log(new Date().getTime() - start.getTime());
start = new Date();
i = 0;
while (x = arr[i++]) {
}
console.log('B');
console.log(new Date().getTime() -start.getTime());
start = new Date();
for(i = 0; i < arr.length; i++) {
x = arr[i];
if (typeof x !== 'string') break;
}
console.log('A');
console.log(new Date().getTime() - start.getTime());
start = new Date();
i = 0;
while (x = arr[i++]) {
}
console.log('B');
console.log(new Date().getTime() -start.getTime());
The for loop even has an extra if statement to check for bad values, and still is faster.

Searching for javascript assignment in while gave results:
Opinions vary from it looks like a common error where you try to compare values to If there is quirkiness in all of this, it's the for statement's wholesale divergence from the language's normal syntax. The for is syntactic sugar adding redundance. It has not outdated while together with if-goto.
The question in first place is if it is safe. MDN say: Using an invalid index number returns undefined in Array, so it is a safe to use. Test on assignments in condition is safe. Several assignments can be done in the same, but a declaration with var, let or const does not return as assign do, so the declaration has to be outside the condition. Have a comment abowe to explain to others or yourself in future that the array must remain dense without falsy values, because otherwise it can bug.
To allow false, 0 or "" (any falsy except undefined) then extend it to: while ((x = denseArray[i++]) !== undefined) ... but then it is not better than an ordinary array length comparision.
Is it useful? Yes:
while( var = GetNext() )
{
...do something with var
}
Which would otherwise have to be written
var = GetNext();
while( var )
{
...do something
var = GetNext();
}
In general it is best to use denseArray.forEach((x) => { ... }) that is well known by devs. No need to think about falsy values. It has good browser support. But it is slow!
I made a jsperf that showed forEach is 60% slower than while! The test also show the for is slightly faster than while, on my machine! See also #Albert answer with a test show that for is slightly faster than while.
While this use of while is safe it may not be bugfree. In time of coding you may know your data, but you don't know if someone copy-paste the code to use on other data.

Javascript Performance: `if (a in b) {b[a]()}` vs `a in b && b[a]();` vs object traversing

I noticed I can use a && b(); as a shorthand for if(a) {b()} or if(a) b();. The MOST usual case for this indecision is for objects having 4-5 elements.
Question: is there a performance gain/loss for using the shorthand when compared with an iteration of if(){} or regular object traversing for (var i in b) { b[i]() } ?
As suggested: some code
var b = { x: 20, y: 30 }, // sample object
fn = function(v){ return v+5; }, // sample function
endValue = 0; // some lol
// go through all elements via if
if ('x' in b) { endValue += fn(b['x']); }
if ('y' in b) { endValue += fn(b['y']); }
//go through all elements via shorthand
('x' in b) && (endValue += fn(b['x']));
('y' in b) && (endValue += fn(b['y']));
//go through all elements via object traversing
for (var i in b) {
endValue += fn(b[i]);
}
I would assume there could be very very negligible performance differences, I need to use the above shorthand method to have a very specific order of the object elements into the computing of endValue.

Is there a performance gain/loss for using the shorthand when compared with an if(){}?
No. With any decent compiler they perform absolutely the same. You should however use the if statements for readability and conciseness.
Is there a performance gain/loss for using an iteration of if(){} vs regular object traversing for (var i in b) { b[i]() }?
That cannot really be compared as they are doing fundamentally different things, and assuming they have the same outcome the performance depends a lot on the actual objects you pass in and their properties. Use the one whose behaviour you need.

closure compiler by google extensively uses shorthand codes and from benchmarks there was minor to none performance gains. https://closure-compiler.appspot.com/home

JavaScript most efficient function to find an item in an Array? and explain this tricky code

What can be the most efficient way to find an item in an array, which is logical and understood by a web developer?
I came across this piece of code:
var inArray = function(a, b, c, d) {
for (c in b) d |= b[c] === a;
return !!d
};
It works fine. Can someone please explain me the code?
I also came across an exact same question that might make mine as a duplicate. But my real question lies in explanation of the above code and why bitwise operator has been used into it.
Also, can there be a way without any for loop, or iteration to get index of an item in an Array?

var inArray = function(a, b, c, d) {
for (c in b) d |= b[c] === a;
return !!d
};
That is some terrible code, and you should run away from it. Bitwise operators are completely unnecessary here, and c and d being parameters makes no sense at all (as Raymond Chen pointed out, the author of the code likely did it to safe space of declaring local variables -- the problem though is that if true is passed in for d the code is suddenly broken, and the extra parameters destroys any sense of understanding that glancing at the declaration would provide).
I'll explain the code, but first, here's a better option:
function inArray(arr, obj) {
for (var i = 0; i < arr.length; ++i) {
if (arr[i] === obj) {
return true;
}
}
return false;
}
Note that this is dependent on the array being an actual array. You could use a for (k in arr) type loop to generalize it to all objects.
Anyway, on to the explanation:
for (c in b) d |= b[c] === a;
This means that for every key in b (stored in c), we will check if b[c] === a. In other words, we're doing a linear scan over the array and checking each element against a.
d |= val is a bitwise or. The bits that are high in val will be set high in d. This is easier to illustrate in languages where bits are more exposed than in JS, but a simple illustration of it:
10011011
01000001
--------
11011011
It's just OR'ing each individual bit with the same location bit in the other value.
The reason it's an abuse here is that it convolutes the code and it depends on weird implicit casts.
x === y returns a boolean. A boolean being used in a bitwise expression makes little sense. What's happening though is that the boolean is being converted to a non-zero value (probably 1).
Similarly, undefined is what d will be. This means that d will be cast to 0 for the bitwise stuff.
0 | 0 = 0, but 0 | 1 = 1. So basically it's a glorified:
for (c in b) d = (d || (b[c] === a));
As for !!x that is just used to cast something to a bool. !x will take x, implicitly cast it to a bool and then negate it. The extra ! will then negate that again. Thus, it's likely implicitly casting to a bool (!!x being true implies that x is at least loosely true (1, "string", etc), and !!x implies that x is at least loosely false (0, "", etc).
This answer offers a few more options. Note though that all of them are meant to fallback to the native indexOf that will almost certainly be faster than anything we can code in script-land.

This code is pretty poorly written, and barely readable. I wouldn't qualify it as "awesome"...
Here's a rewritten version, hopefully more readable:
function inArray (aElt, aArray) {
var key;
var ret = false;
for (key in aArray)
ret = ret || (aArray[key] === aElt);
return ret;
}
The for loop seems like the most natural way to do that. You could do it recursively but that doesn't feel like the easiest approach here.

What can be the most efficient way to find an item in an array, which is logical and understood by a web developer?
Either the native method, haystack.indexOf(needle) !== -1 or a looped method which returns as soon as it can
function inArray(needle, haystack) {
var i = haystack.length;
while (i) if (haystack[--i] === needle) return true;
return false;
};

You asked for a recursive version
var inArray = function(arr, needle, index) {
index = index || 0;
if (index >= arr.length)
return false;
return arr[index] === needle
? true
: inArray(arr, needle, index + 1);
}
console.log(inArray([1,5,9], 9));

javascript: speedy Array.contains(otherArray)?

I have an array of arrays. The inner array is 16 slots, each with a number, 0..15. A simple permutation.
I want to check if any of the arrays contained in the outer array, have the same values as
a test array (a permutation of 16 values).
I can do this easily by something like so:
var containsArray = function (outer, inner) {
var len = inner.length;
for (var i=0; i<outer.length; i++) {
var n = outer[i];
var equal = true;
for (var x=0; x<len; x++) {
if (n[x] != inner[x]) {
equal = false;
break;
}
}
if (equal) return true;
}
return false;
}
But is there a faster way?
Can I assign each permutation an integral value - actually a 64-bit integer?
Each value in a slot is 0..15, meaning it can be represented in 4 bits. There are 16 slots, which implies 64 total bits of information.
In C# it would be easy to compute and store a hash of the inner array (or permutation) using this approach, using the Int64 type. Does Javascript have 64-bit integer math that will make this fast?

That's just about as fast as it gets, comparing arrays in javascript (as in other languages) is quite painful. I assume you can't get any speed benefits from comparing the lengths before doing the inner loop, as your arrays are of fixed size?
Only "optimizations" I can think of is simplifying the syntax, but it won't give you any speed benefits. You are already doing all you can by returning as early as possible.
Your suggestion of using 64-bit integers sounds interesting, but as javascript doesn't have a Int64 type (to my knowledge), that would require something more complicated and might actually be slower in actual use than your current method.

how about comparing the string values of myInnerArray.join('##') == myCompareArray.join('##'); (of course the latter join should be done once and stored in a variable, not for every iteration like that).
I don't know what the actual performance differences would be, but the code would be more terse. If you're doing the comparisons a lot of times, you could have these values saved away someplace, and the comparisons would probably be quicker at least the second time round.
The obvious problem here is that the comparison is prone to false positives, consider
var array1 = ["a", "b"];
var array2 = ["a##b"];
But if you can rely on your data well enough you might be able to disregard from that? Otherwise, if you always compare the join result and the lengths, this would not be an issue.

Are you really looking for a particular array instance within the outer array? That is, if inner is a match, would it share the same reference as the matched nested array? If so, you can skip the inner comparison loop, and simply do this:
var containsArray = function (outer, inner) {
var len = inner.length;
for (var i=0; i<outer.length; i++) {
if (outer[i] === inner) return true;
}
return false;
}
If you can't do this, you can still make some headway by not referencing the .length field on every loop iteration -- it's an expensive reference, because the length is recalculated each time it's referenced.
var containsArray = function (outer, inner) {
var innerLen = inner.length, outerLen = outer.length;
for (var i=0; i<outerLen; i++) {
var n = outer[i];
var equal = true;
for (var x=0; x<innerLen; x++) {
if (n[x] != inner[x]) {
equal = false;
}
}
if (equal) return true;
}
return false;
}
Also, I've seen claims that loops of this form are faster, though I haven't seen cases where it makes a measurable difference:
var i = 0;
while (i++ < outerLen) {
//...
}
EDIT: No, don't remove the equal variable; that was a bad idea on my part.

the only idea that comes to me is to push the loop into the implementation and trade some memory for (speculated, you'd have to test the assumption) speed gain, which also relies on non-portable Array.prototype.{toSource,map}:
var to_str = function (a) {
a.sort();
return a.toSource();
}
var containsString = function (outer, inner) {
var len = outer.length;
for (var i=0; i<len; ++i) {
if (outer[i] == inner)
return true;
}
return false;
}
var found = containsString(
outer.map(to_str)
, to_str(inner)
);

var containsArray = function (outer, inner) {
var innerLen = inner.length,
innerLast = inner.length-1,
outerLen = outer.length;
outerLoop: for (var i=0; i<outerLen; i++) {
var n = outer[i];
for (var x = 0; x < innerLen; x++) {
if (n[x] != inner[x]) {
continue outerLoop;
}
if (x == innerLast) return true;
}
}
return false;
}

Knuth–Morris–Pratt algorithm
Rumtime: O(n), n = size of the haystack
http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm

We Keep Coding

JavaScript is the programming language of the Web.

How does Javascript's string comparison work at the lower level? - javascript

Related

javscript recursion string reverse

Is it safe to use 'undefined' as sentinel in a Javascript while loop?

Javascript Performance: `if (a in b) {b[a]()}` vs `a in b && b[a]();` vs object traversing

JavaScript most efficient function to find an item in an Array? and explain this tricky code

javascript: speedy Array.contains(otherArray)?

Categories

Resources