javascript: speedy Array.contains(otherArray)? - javascript

I have an array of arrays. The inner array is 16 slots, each with a number, 0..15. A simple permutation.
I want to check if any of the arrays contained in the outer array, have the same values as
a test array (a permutation of 16 values).
I can do this easily by something like so:
var containsArray = function (outer, inner) {
var len = inner.length;
for (var i=0; i<outer.length; i++) {
var n = outer[i];
var equal = true;
for (var x=0; x<len; x++) {
if (n[x] != inner[x]) {
equal = false;
break;
}
}
if (equal) return true;
}
return false;
}
But is there a faster way?
Can I assign each permutation an integral value - actually a 64-bit integer?
Each value in a slot is 0..15, meaning it can be represented in 4 bits. There are 16 slots, which implies 64 total bits of information.
In C# it would be easy to compute and store a hash of the inner array (or permutation) using this approach, using the Int64 type. Does Javascript have 64-bit integer math that will make this fast?

That's just about as fast as it gets, comparing arrays in javascript (as in other languages) is quite painful. I assume you can't get any speed benefits from comparing the lengths before doing the inner loop, as your arrays are of fixed size?
Only "optimizations" I can think of is simplifying the syntax, but it won't give you any speed benefits. You are already doing all you can by returning as early as possible.
Your suggestion of using 64-bit integers sounds interesting, but as javascript doesn't have a Int64 type (to my knowledge), that would require something more complicated and might actually be slower in actual use than your current method.

how about comparing the string values of myInnerArray.join('##') == myCompareArray.join('##'); (of course the latter join should be done once and stored in a variable, not for every iteration like that).
I don't know what the actual performance differences would be, but the code would be more terse. If you're doing the comparisons a lot of times, you could have these values saved away someplace, and the comparisons would probably be quicker at least the second time round.
The obvious problem here is that the comparison is prone to false positives, consider
var array1 = ["a", "b"];
var array2 = ["a##b"];
But if you can rely on your data well enough you might be able to disregard from that? Otherwise, if you always compare the join result and the lengths, this would not be an issue.

Are you really looking for a particular array instance within the outer array? That is, if inner is a match, would it share the same reference as the matched nested array? If so, you can skip the inner comparison loop, and simply do this:
var containsArray = function (outer, inner) {
var len = inner.length;
for (var i=0; i<outer.length; i++) {
if (outer[i] === inner) return true;
}
return false;
}
If you can't do this, you can still make some headway by not referencing the .length field on every loop iteration -- it's an expensive reference, because the length is recalculated each time it's referenced.
var containsArray = function (outer, inner) {
var innerLen = inner.length, outerLen = outer.length;
for (var i=0; i<outerLen; i++) {
var n = outer[i];
var equal = true;
for (var x=0; x<innerLen; x++) {
if (n[x] != inner[x]) {
equal = false;
}
}
if (equal) return true;
}
return false;
}
Also, I've seen claims that loops of this form are faster, though I haven't seen cases where it makes a measurable difference:
var i = 0;
while (i++ < outerLen) {
//...
}
EDIT: No, don't remove the equal variable; that was a bad idea on my part.

the only idea that comes to me is to push the loop into the implementation and trade some memory for (speculated, you'd have to test the assumption) speed gain, which also relies on non-portable Array.prototype.{toSource,map}:
var to_str = function (a) {
a.sort();
return a.toSource();
}
var containsString = function (outer, inner) {
var len = outer.length;
for (var i=0; i<len; ++i) {
if (outer[i] == inner)
return true;
}
return false;
}
var found = containsString(
outer.map(to_str)
, to_str(inner)
);

var containsArray = function (outer, inner) {
var innerLen = inner.length,
innerLast = inner.length-1,
outerLen = outer.length;
outerLoop: for (var i=0; i<outerLen; i++) {
var n = outer[i];
for (var x = 0; x < innerLen; x++) {
if (n[x] != inner[x]) {
continue outerLoop;
}
if (x == innerLast) return true;
}
}
return false;
}

Knuth–Morris–Pratt algorithm
Rumtime: O(n), n = size of the haystack
http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm

Related

Is it safe to use 'undefined' as sentinel in a Javascript while loop?

Is it safe to use this kind of loop in Javascript?
denseArray = [1,2,3,4,5, '...', 99999]
var x, i = 0
while (x = denseArray[i++]) {
document.write(x + '<br>')
console.log(x)
}
document.write('Used sentinel: ' + denseArray[i])
document.write('Size of array: ' + i)
It is shorter than a for-loop and maybe also more effective for big arrays, to use a built in sentinel. A sentinel flags the caller to the fact that something rather out-of-the-ordinary has happened.
The array has to be a dense array to work! That means there are no other undefined value except the value that come after the last element in the array. I nearly never use sparse arrays, only dense arrays so that's ok for me.
Another more important point to remember (thank to #Jack Bashford reminded) is that's not just undefined as a sentinel. If an array value is 0, false, or any other falsy value, the loop will stop. So, you must be sure that the data in the array does not have falsy values that is 0, "", '', ``, null, undefined and NaN.
Is there something as a "out of range" problem here, or can we consider arrays in Javascript as "infinite" as long memory is not full?
Does undefined mean browsers can set it to any value because it is undefined, or can we consider the conditional test always to work?
Arrays in Javascript is strange because "they are Objects" so better to ask.
I can't find the answer on Stackoverflow using these tags: [javascript] [sentinel] [while-loop] [arrays] . It gives zero result!
I have thought about this a while and used it enough to start to worry. But I want to use it because it is elegant, easy to see, short, maybe effective in big data. It is useful that i is the size of array.
UPDATES
#Barmar told: It's guaranteed by JS that an uninitialized array
element will return the value undefined.
MDN confirms: Using
an invalid index number returns undefined.
A note by #schu34: It is better to use denseArray.forEach((x)=>{...code}) optimized for it's use and known by devs. No need to encounter falsy values. It has good browser support.
Even if your code won't be viewed by others later on, it's a good idea to make it as readable and organized as possible. Value assignment in condition testing (except for the increment and decrement operators) is generally a bad idea.
Your check needs to be a bit more specific, too, as [0, ''] both evaluate to false.
denseArray = [1,2,3,4,5, '...', 99999]
for(let i = 0; i < denseArray.length; i++) {
let x = denseArray[i]
document.write(x + '<br>');
console.log(x);
if (/* check bad value */) break;
}
document.write('Used sentinel: ' + denseArray[i])
document.write('Size of array: ' + i)
From my experience it's usually not worth it to save a few lines if readability or even reliability is the cost.
Edit: here's the code I used to test the speed
const arr = [];
let i;
for (i = 0; i < 30000000; i++) arr.push(i.toString());
let x;
let start = new Date();
for(i = 0; i < arr.length; i++) {
x = arr[i];
if (typeof x !== 'string') break;
}
console.log('A');
console.log(new Date().getTime() - start.getTime());
start = new Date();
i = 0;
while (x = arr[i++]) {
}
console.log('B');
console.log(new Date().getTime() -start.getTime());
start = new Date();
for(i = 0; i < arr.length; i++) {
x = arr[i];
if (typeof x !== 'string') break;
}
console.log('A');
console.log(new Date().getTime() - start.getTime());
start = new Date();
i = 0;
while (x = arr[i++]) {
}
console.log('B');
console.log(new Date().getTime() -start.getTime());
start = new Date();
for(i = 0; i < arr.length; i++) {
x = arr[i];
if (typeof x !== 'string') break;
}
console.log('A');
console.log(new Date().getTime() - start.getTime());
start = new Date();
i = 0;
while (x = arr[i++]) {
}
console.log('B');
console.log(new Date().getTime() -start.getTime());
The for loop even has an extra if statement to check for bad values, and still is faster.
Searching for javascript assignment in while gave results:
Opinions vary from it looks like a common error where you try to compare values to If there is quirkiness in all of this, it's the for statement's wholesale divergence from the language's normal syntax. The for is syntactic sugar adding redundance. It has not outdated while together with if-goto.
The question in first place is if it is safe. MDN say: Using an invalid index number returns undefined in Array, so it is a safe to use. Test on assignments in condition is safe. Several assignments can be done in the same, but a declaration with var, let or const does not return as assign do, so the declaration has to be outside the condition. Have a comment abowe to explain to others or yourself in future that the array must remain dense without falsy values, because otherwise it can bug.
To allow false, 0 or "" (any falsy except undefined) then extend it to: while ((x = denseArray[i++]) !== undefined) ... but then it is not better than an ordinary array length comparision.
Is it useful? Yes:
while( var = GetNext() )
{
...do something with var
}
Which would otherwise have to be written
var = GetNext();
while( var )
{
...do something
var = GetNext();
}
In general it is best to use denseArray.forEach((x) => { ... }) that is well known by devs. No need to think about falsy values. It has good browser support. But it is slow!
I made a jsperf that showed forEach is 60% slower than while! The test also show the for is slightly faster than while, on my machine! See also #Albert answer with a test show that for is slightly faster than while.
While this use of while is safe it may not be bugfree. In time of coding you may know your data, but you don't know if someone copy-paste the code to use on other data.

Javascript how `length` is implemented

You know that in Javascript you can access the length of an text/array with length property:
var obj = ["Robert", "Smith", "John", "Mary", "Susan"];
// obj.length returns 5;
I want to know how this is implemented. Does Javascript calculates the length property when it is called? Or it is just a static property which is changed whenever the array is changed. My question is asked due to the following confusion in best-practices with javascript:
for(var i = 0; i < obj.length; i++)
{
}
My Problem: If it is a static property, then accessing the length property in each iteration is nothing to be concerned, but if it is calculated on each iteration, then it cost some memory.
I have read the following definition given by ECMAScript but it doesn't give any clue on how it is implemented. I'm afraid it might give a whole instance of array with the length property calculated in run-time, that if turns out to be true, then the above for() is dangerous to memory and instead the following should be used:
var count = obj.length;
for(var i = 0; i < count; i++)
{
}
Array in JavaScript is not a real Array type but it's an real Object type.
[].length is not being recalculated every time, it is being operated by ++ or -- operators.
See below example which is behaving same like array.length property.
var ArrayLike = {
length: 0,
push: function(val){
this[this.length] = val;
this.length++;
},
pop: function(){
delete this[this.length-1];
this.length--;
},
display: function(){
for(var i = 0; i < this.length; i++){
console.log(this[i]);
}
}
}
// output
ArrayLike.length // length == 0
ArrayLike.push('value1') // length == 1
ArrayLike.push('value2') // length == 2
ArrayLike.push('value3') // length == 3
ArrayLike.pop() // length == 2
ArrayLike.length === 2 // true
var a = ["abc","def"];
a["pqr"] = "hello";
What is a.length?
2
Why?
a.length is updated only when the index of the array is a numeric value. When you write
var a = ["abc","def"];
It is internally stored as:
a["0"] = "abc"
a["1"] = "def"
Note that the indexes are really keys which are strings.
Few more examples:
1.)
var a = ["abc","def"];
a["1001"] = "hello";
What is a.length?
1002
2.) Okay, let's try again:
var a = ["abc","def"];
a[1001] = "hello";
What is a.length?
1002
Note here, internally array is stored as
a["0"] = "abc"
a["1"] = "def"
a["1001"] = "hello"
3.) Okay, last one:
var a = ["abc"];
a["0"] = "hello";
What is a[0]?
"hello"
What is a.length?
1
It's good to know what a.length actually means: Well now you know: a.length is one more than the last numerical key present in the array.
I want to know how this is implemented. Does Javascript calculates the length property when it is called? Or it is just a static property which is changed whenever the array is changed.
Actually, your question cannot be answered in general because all the ECMA specs say is this:
The length property of this Array object is a data property whose
value is always numerically greater than the name of every deletable
property whose name is an array index.
In other words, the specs define the invariant condition of the length property, but not it's implementation. This means that different JavaScript engines could, in principle, implement different behavior.

Javascript a recursive function with no clear base case?

I am wondering, what is the best approach to write a recursive function with no direct base case (say: factorial), for instance, to count the number of elements in a nested array I have two approaches in mind, the first one below is preferred as it returns result directly:
the second one keeps the count in a variable attached to the function, works fine, but dealing with the result & resetting the variable is bizarre.
any pointers are appreciated.
You can simply return the value you are interested in:
function countElements(arr) {
var count = 0;
for (var i=0; i<arr.length; i++) {
if (arr[i] instanceof Array) {
count += countElements(arr[i]); // recursion here
} else {
count++; // normal element counts as 1
}
}
return count;
}
Demo: http://jsbin.com/ejEmOwEQ/1/edit
WARNING: The function might not end if the array contains self reference (var arr = []; arr.push(arr); countElements(arr);)
The correct way to write this is simply:
function countElements (obj) {
if (obj instanceof Array) {
var count = 0;
for (var i in obj)
count += countElements(obj[i]);
return count;
}
return 1
}
The terminating condition you're looking for is if not instanceof Array. Which in my code above is simply the fall through from the if instanceof Array block.
You do not need to keep a temp variable like count in recursive functions. You're still thinking iteratively (well, that for loop is iterative so you need a count variable there).
Recursive functions do everything by accepting arguments and returning results. No assignments are necessary. In fact, the code above can be written purely recursively without using a for loop and therefore without needing to use a count variable:
function countElements (obj) {
if (obj instanceof Array) {
if (obj.length) {
return countElements(obj.shift()) + countElements(obj);
}
return 0;
}
return 1;
}
There are 3 rules: if object is not an array we return 1, if object is an empty array we return 0 otherwise we count the first item in the array + the sum of the rest of the array.

Javascript for..in vs for loop performance

I was clustering around 40000 points using kmean algorithm. In the first version of the program I wrote the euclidean distance function like this
var euclideanDistance = function( p1, p2 ) { // p1.length === p2.length == 3
var sum = 0;
for( var i in p1 ){
sum += Math.pow( p1[i] - p2[i], 2 );
}
return Math.sqrt( sum );
};
The overall program was quite slow taking on average 7sec to execute. After some profiling I rewrote the above function like this
var euclideanDistance = function( p1, p2 ) { // p1.length === p2.length == 3
var sum = 0;
for( var i = 0; i < p1.length; i++ ) {
sum += Math.pow( p1[i] - p2[i], 2 );
}
return Math.sqrt( sum );
};
Now the programs on average take around 400ms. That's a huge time difference just because of the way I wrote the for loop. I normally don't use for..in loop for arrays but for some reason I used it while writing this function.
Can someone explain why there is this huge difference in performance between these 2 styles?
Look at what's happening differently in each iteration:
for( var i = 0; i < p1.length; i++ )
Check if i < p1.length
Increment i by one
Very simple and fast.
Now look at what's happening in each iteration for this:
for( var i in p1 )
Repeat
Let P be the name of the next property of obj whose [[Enumerable]] attribute is true. If there is no such property, return (normal, V,
empty).
It has to find next property in the object that is enumerable. With your array you know that this can be achieved by a simple integer increment, where as the algorithm to find next enumerable is most likely not that simple because it has to work on arbitrary object and its prototype chain keys.
As a side note, if you cache the length of p1:
var plen = p1.length;
for( var i = 0; i < plen; i++ )
you will get a slight speed increase.
...And if you memoize the function, it will cache results, so if the user tries the same numbers you will see a massive speed increase.
var eDistance = memoize(euclideanDistance);
function memoize( fn ) {
return function () {
var args = Array.prototype.slice.call(arguments),
hash = "",
i = args.length;
currentArg = null;
while (i--) {
currentArg = args[i];
hash += (currentArg === Object(currentArg)) ?
JSON.stringify(currentArg) : currentArg;
fn.memoize || (fn.memoize = {});
}
return (hash in fn.memoize) ? fn.memoize[hash] :
fn.memoize[hash] = fn.apply(this, args);
};
}
eDistance([1,2,3],[1,2,3]);
eDistance([1,2,3],[1,2,3]); //Returns cached value
credit: http://addyosmani.com/blog/faster-javascript-memoization/
First You should be aware of this in the case of for/in and arrays. No big deal if You know what You are doing.
I run some very simple tests to show the difference in performance between different loops:
http://jsben.ch/#/BQhED
That is why prefer to use classic for loop for arrays.
The For/In loop, simply loops through all properties in an object. Since you are not specifying the number of iterations the loop needs to take, it simply 'guesses' at it, and continues on until there are no more objects.
With the second loop, you are specifying all possible variable... a)a starting point, b) the number of iterations the loop should take before stopping, c) increasing the count of the starting point.
You can think of it this way... For/In = guesses the number of iterations, For(a,b,c) you are specifying

How to search through an array in Javascript? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Easiest way to find duplicate values in a JavaScript array
I am looking to find if two values are the same in an Array. I have written the following code:
function validatePassTimeFields(passtimes) {
var success = true;
var length = passtimes.length;
var hashMap = new Object();
for (var j=0; j<length; j++) {
if(hashMap[passtimes[j].value]==1) {
success = false;
alert("Duplicate Found");
break;
}
hashMap[passtimes[j].value]=1;
}
return success;
}
I am new to Javascript, so I tried using HashMap like to find if there is any duplicate. IS it the best way of finding a duplicate in JavaScript? or I can optimize it?
Your function is already very good, apart from the issue that it only works for arrays with strings or numbers. For a more difficile approach to care also about objects see this answer. I don't think that matters for you as you have an explicit and restricted use case (checking identity by the value property).
However, some points I'd do different:
Don't use the success variable and break from the loop, but just return from the whole function.
Instead of the constructor new Object usually the shortcut object literal {} is used
Instead of setting the values in the hashMap to 1 one might use true; you also could omit the equality operator == and just check for the truthiness of the property. I even would use the in operator.
function validatePassTimeFields(passtimes) {
var length = passtimes.length;
var hashMap = {};
for (var j=0; j<length; j++) {
if (passtimes[j].value in hashMap) {
alert("Duplicate Found");
return false;
}
hashMap[passtimes[j].value] = 1;
}
return true;
}
// You would only need to optimize it if you want to use it elsewhere-
function noduplicates(array){
var next, O= {},
L= array.length;
while(L){
next= array[--L];
if(O[next]) return false;
O[next]= 1;
}
return true;
}
function validatePassTimeFields(passtimes){
if (noduplicates(passtimes)) return true;
alert("Duplicate Found");
return false;
}
It might be worth checking out underscore's implementation of this functionality. If you are just looking to eliminate dupes, you can use _.uniq(), but if you are more interested in just knowing that there are dupes or the pure implementation details, you might enjoy checking out the source of this method, which is very nicely documented.
I know this isn't a direct code answer to the question - there are a few here already so it wouldn't be useful to repeat. But I thought it was worth mentioning as underscore is a great utility library and the source is a great place to learn more about well-written javascript.
It seems that you do not want to find the duplicates, only to see if there are any?
You're pretty close, here's a working function;
var hasDuplicates = function (arr) {
var _store = {};
for (var i = 0; i < arr.length; i++) {
if (typeof _store["_" + arr[i]] !== "undefined") {
return true;
}
_store["_" + arr[i]] = true;
}
return false;
};
The underscores in the associative array are necessary for storing numeric values. The hasDuplicates() function only works objects which have a toString() method.
To check for duplicates;
var yourArray = [1, 5, 7, 3, 5, 6];
if (hasDuplicates(yourArray)) {...

Categories