I am currently doing program 2 for the startup engineering course offered on coursera
I'm programming using and ubuntu instance using Amazon web services and my programming is constantly hanging. There might be something wrong with my node.js program but I can't seem to locate it.
This program is meant to produce the first 100 Fibonacci numbers separated with commas.
#! /usr/bin/env node
//calculation
var fibonacci = function(n){
if(n < 1){return 0;}
else if(n == 1 || n == 2){return 1;}
else if(n > 2){return fibonacci(n - 1) + fibonacci(n-2);}
};
//put in array
var firstkfib = function(k){
var i;
var arr = [];
for(i = 1; i <= k; i++){
arr.push(fibonacci(i));
}
return arr
};
//print
var format = function(arr){
return arr.join(",");
};
var k = 100;
console.log("firstkfib(" + k +")");
console.log(format(firstkfib(k)));
The only output I get is
ubuntu#ip-172-31-30-245:~$ node fib.js
firstkfib(100)
and then the program hangs
I don't know if you are familiar with Time complexity and algorithmic analysis, but, it turns out that your program has an exponential running time. This basically means that, as the input increases, the time it takes to run your program increases exponentially. (If my explanation is not very clear, check this link)
It turns out that this sort of running time is extremely slow. For example, if it takes 1 ms to run your program for k=1, it would take 2^100 ms to run it for k=100. This turns out to be a ridiculously big number.
In any case, as Zhehao points out, the solution is to save the value of fib(n-1) and fib(n-2) (in an array, for example), and reuse it to compute fib(n). Check out this video lecture from MIT (the first 15 mins) on how to do it.
You may want to try printing out the numbers as they are being computed, instead of printing out the entire list at the end. It's possible that the computation is hanging somewhere along the line.
On another note, this is probably the most inefficient way of computing a list of fibonacci numbers. You compute fibonacci(n) and then fibonacci(n+1) without reusing any of the work from the previous computation. You may want to go back and rethink your method. There's a much faster and simpler iterative method.
writing intense computational code in nodeJS leads to blocking. since Fibonacci is an intense computational code so might end up blocking.
Related
I need some help to optimize the code below. I can't understand how to rewrite the code to avoid the deoptimizations.
Below is the code that works very slow on the Node platform. I took it from benchmarksgame-team binary-trees benchmark and added minor changes.
When it is run with --trace-deopt it shows that the functions in the hot path are depotimized, i.e.
[bailout (kind: deopt-lazy, reason: (unknown)): begin. deoptimizing 0x02117de4aa11 <JSFunction bottomUpTree2 (sfi = 000002F24C7D7539)>, opt id 3, bytecode offset 9, deopt exit 17, FP to SP delta 80, caller SP 0x00807d7fe6f0, pc 0x7ff6afaca78d]
The benchmark, run it using node --trace-deopt a 20
function mainThread() {
const maxDepth = Math.max(6, parseInt(process.argv[2]));
const stretchDepth = maxDepth + 1;
const check = itemCheck(bottomUpTree(stretchDepth));
console.log(`stretch tree of depth ${stretchDepth}\t check: ${check}`);
const longLivedTree = bottomUpTree(maxDepth);
for (let depth = 4; depth <= maxDepth; depth += 2) {
const iterations = 1 << maxDepth - depth + 4;
work(iterations, depth);
}
console.log(`long lived tree of depth ${maxDepth}\t check: ${itemCheck(longLivedTree)}`);
}
function work(iterations, depth) {
let check = 0;
for (let i = 0; i < iterations; i++) {
check += itemCheck(bottomUpTree(depth));
}
console.log(`${iterations}\t trees of depth ${depth}\t check: ${check}`);
}
function TreeNode(left, right) {
return {left, right};
}
function itemCheck(node) {
if (node.left === null) {
return 1;
}
return 1 + itemCheck2(node);
}
function itemCheck2(node) {
return itemCheck(node.left) + itemCheck(node.right);
}
function bottomUpTree(depth) {
return depth > 0
? bottomUpTree2(depth)
: new TreeNode(null, null);
}
function bottomUpTree2(depth) {
return new TreeNode(bottomUpTree(depth - 1), bottomUpTree(depth - 1))
}
console.time();
mainThread();
console.timeEnd();
(V8 developer here.)
The premise of this question is incorrect: a few deopts don't matter, and don't move the needle regarding performance. Trying to avoid them is an exercise in futility.
The first step when trying to improve performance of something is to profile it. In this case, a profile reveals that the benchmark is spending:
about 46.3% of the time in optimized code (about 4/5 of that for tree creation and 1/5 for tree iteration)
about 0.1% of the time in unoptimized code
about 52.8% of the time in the garbage collector, tracing and freeing all those short-lived objects.
This is as artificial a microbenchmark as they come. 50% GC time never happens in real-world code that does useful things aside from allocating multiple gigabytes of short-lived objects as fast as possible.
In fact, calling them "short-lived objects" is a bit inaccurate in this case. While the vast majority of the individual trees being constructed are indeed short-lived, the code allocates one super-large long-lived tree early on. That fools V8's adaptive mechanisms into assuming that all future TreeNodes will be long-lived too, so it allocates them in "old space" right away -- which would save time if the guess was correct, but ends up wasting time because the TreeNodes that follow are actually short-lived and would be better placed in "new space" (which is optimized for quickly freeing short-lived objects). So just by reshuffling the order of operations, I can get a 3x speedup.
This is a typical example of one of the general problems with microbenchmarks: by doing something extreme and unrealistic, they often create situations that are not at all representative of typical real-world scenarios. If engine developers optimized for such microbenchmarks, engines would perform worse for real-world code. If JavaScript developers try to derive insights from microbenchmarks, they'll write code that performs worse under realistic conditions.
Anyway, if you want to optimize this code, avoid as many of those object allocations as you can.
Concretely:
An artificial microbenchmark like this, by its nature, intentionally does useless work (such as: computing the same value a million times). You said you wanted to optimize it, which means avoiding useless work, but you didn't specify which parts of the useless work you'd like to preserve, if any. So in the absence of a preference, I'll assume that all useless work is useless. So let's optimize!
Looking at the code, it creates perfect binary trees of a given depth and counts their nodes. In other words, it sums up the 1s in these examples:
depth=0:
1
depth=1:
1
/ \
1 1
depth=2:
1
/ \
1 1
/ \ / \
1 1 1 1
and so on. If you think about it for a bit, you'll realize that such a tree of depth N has (2 ** (N+1)) - 1 nodes. So we can replace:
itemCheck(bottomUpTree(depth));
with
(2 ** (depth+1)) - 1
(and analogously for the "stretchDepth" line).
Next, we can take care of the useless repetitions. Since x + x + x + x + ... N times is the same as x*N, we can replace:
let check = 0;
for (let i = 0; i < iterations; i++) {
check += (2 ** (depth + 1)) - 1;
}
with just:
let check = ((2 ** (depth + 1)) - 1) * iterations;
With that we're from 12 seconds down to about 0.1 seconds. Not bad for five minutes of work, eh?
And that remaining time is almost entirely due to the longLivedTree. To apply the same optimizations to the operations creating and iterating that tree, we'd have to move them together, getting rid of its "long-livedness". Would you find that acceptable? You could get the overall time down to less than a millisecond! Would that make the benchmark useless? Not actually any more useless than it was to begin with, just more obviously so.
Here's a simple JavaScript performance test:
const iterations = new Array(10 ** 7);
var x = 0;
var i = iterations.length + 1;
console.time('negative');
while (--i) {
x += iterations[-i];
}
console.timeEnd('negative');
var y = 0;
var j = iterations.length;
console.time('positive');
while (j--) {
y += iterations[j];
}
console.timeEnd('positive');
The first loop counts from 10,000,000 down to 1 and accesses an array with a length of 10 million using a negative index on each iteration. So it goes through the array from beginning to end.
The second loop counts from 9,999,999 down to 0 and accesses the same array using a positive index on each iteration. So it goes through the array in reverse.
On my PC, the first loop takes longer than 6 seconds to complete, but the second one only takes ~400ms.
Why is the second loop faster than the first?
Because iterations[-1] will evaluate to undefined (which is slow as it has to go up the whole prototype chain and can't take a fast path) also doing math with NaN will always be slow as it is the non common case.
Also initializing iterations with numbers will make the whole test more useful.
Pro Tip: If you try to compare the performance of two codes, they should both result in the same operation at the end ...
Some general words about performance tests:
Performance is the compiler's job these days, code optimized by the compiler will always be faster than code you are trying to optimize through some "tricks". Therefore you should write code that is likely by the compiler to optimize, and that is in every case, the code that everyone else writes (also your coworkers will love you if you do so). Optimizing that is the most benefitial from the engine's view. Therefore I'd write:
let acc = 0;
for(const value of array) acc += value;
// or
const acc = array.reduce((a, b) => a + b, 0);
However in the end it's just a loop, you won't waste much time if the loop is performing bad, but you will if the whole algorithm performs bad (time complexity of O(n²) or more). Focus on the important things, not the loops.
To elaborate on Jonas Wilms' answer, Javascript does not work with negative indice (unlike languages like Python).
iterations[-1] is equal to iteration["-1"], which look for the property named -1 in the array object. That's why iterations[-1] will evaluate to undefined.
Given an array having .length 100 containing elements having values 0 to 99 at the respective indexes, where the requirement is to find element of of array equal to n : 51.
Why is using a loop to iterate from start of array to end faster than iterating both start to end and end to start?
const arr = Array.from({length: 100}, (_, i) => i);
const n = 51;
const len = arr.length;
console.time("iterate from start");
for (let i = 0; i < len; i++) {
if (arr[i] === n) break;
}
console.timeEnd("iterate from start");
const arr = Array.from({length: 100}, (_, i) => i);
const n = 51;
const len = arr.length;
console.time("iterate from start and end");
for (let i = 0, k = len - 1; i < len && k >= 0; i++, k--) {
if (arr[i] === n || arr[k] === n) break;
}
console.timeEnd("iterate from start and end");
jsperf https://jsperf.com/iterate-from-start-iterate-from-start-and-end/1
The answer is pretty obvious:
More operations take more time.
When judging the speed of code, you look at how many operations it will perform. Just step through and count them. Every instruction will take one or more CPU cycles, and the more there are the longer it will take to run. That different instructions take a different amount of cycles mostly does not matter - while an array lookup might be more costly than integer arithmetic, both of them basically take constant time and if there are too many, it dominates the cost of our algorithm.
In your example, there are few different types of operations that you might want to count individually:
comparisons
increments/decrements
array lookup
conditional jumps
(we could be more granular, such as counting variable fetch and store operations, but those hardly matter - everything is in registers anyway - and their number basically is linear to the others).
Now both of your code iterate about 50 times - they element on which they break the loop is in the middle of the array. Ignoring off-by-a-few errors, those are the counts:
| forwards | forwards and backwards
---------------+------------+------------------------
>=/===/< | 100 | 200
++/-- | 50 | 100
a[b] | 50 | 100
&&/||/if/for | 100 | 200
Given that, it's not unexpected that doing twice the works takes considerably longer.
I'll also answer a few questions from your comments:
Is additional time needed for the second object lookup?
Yes, every individual lookup counts. It's not like they could be performed at once, or optimised into a single lookup (imaginable if they had looked up the same index).
Should there be two separate loops for each start to end and end to start?
Doesn't matter for the number of operations, just for their order.
Or, put differently still, what is the fastest approach to find an element in an array?
There is no "fastest" regarding the order, if you don't know where the element is (and they are evenly distributed) you have to try every index. Any order - even random ones - would work the same. Notice however that your code is strictly worse, as it looks at each index twice when the element is not found - it does not stop in the middle.
But still, there are a few different approaches at micro-optimising such a loop - check these benchmarks.
let is (still?) slower than var, see Why is using `let` inside a `for` loop so slow on Chrome? and Why is let slower than var in a for loop in nodejs?. This tear-up and tear-down (about 50 times) of the loop body scope in fact does dominate your runtime - that's why your inefficient code isn't completely twice as slow.
comparing against 0 is marginally faster than comparing against the length, which puts looping backwards at an advantage. See Why is iterating through an array backwards faster than forwards, JavaScript loop performance - Why is to decrement the iterator toward 0 faster than incrementing and Are loops really faster in reverse?
in general, see What's the fastest way to loop through an array in JavaScript?: it changes from engine update to engine update. Don't do anything weird, write idiomatic code, that's what will get optimised better.
#Bergi is correct. More operations is more time. Why? More CPU clock cycles.
Time is really a reference to how many clock cycles it takes to execute the code.
In order to get to the nitty-gritty of that you need to look at the machine level code (like assembly level code) to find the true evidence. Each CPU (core?) clock cycle can execute one instruction, so how many instructions are you executing?
I haven't counted the clock cycles in a long time since programming Motorola CPUs for embedded applications. If your code is taking longer then it is in fact generating a larger instruction set of machine code, even if the loop is shorter or runs an equal amount of times.
Never forget that your code is actually getting compiled into a set of commands that the CPU is going to execute (memory pointers, instruction-code level pointers, interrupts, etc.). That is how computers work and its easier to understand at the micro controller level like an ARM or Motorola processor but the same is true for the sophisticated machines that we are running on today.
Your code simply does not run the way you write it (sounds crazy right?). It is run as it is compiled to run as machine level instructions (writing a compiler is no fun). Mathematical expression and logic can be compiled in to quite a heap of assembly, machine level code and that is up to how the compiler chooses to interpret it (it is bit shifting, etc, remember binary mathematics anyone?)
Reference:
https://software.intel.com/en-us/articles/introduction-to-x64-assembly
Your question is hard to answer but as #Bergi stated the more operations the longer, but why? The more clock cycles it takes to execute your code. Dual core, quad core, threading, assembly (machine language) it is complex. But no code gets executed as you have written it. C++, C, Pascal, JavaScript, Java, unless you are writing in assembly (even that compiles down to machine code) but it is closer to actual execution code.
A masters in CS and you will get to counting clock cycles and sort times. You will likely make you own language framed on machine instruction sets.
Most people say who cares? Memory is cheap today and CPUs are screaming fast and getting faster.
But there are some critical applications where 10 ms matters, where an immediate interrupt is needed, etc.
Commerce, NASA, a Nuclear power plant, Defense Contractors, some robotics, you get the idea . . .
I vote let it ride and keep moving.
Cheers,
Wookie
Since the element you're looking for is always roughly in the middle of the array, you should expect the version that walks inward from both the start and end of the array to take about twice as long as one that just starts from the beginning.
Each variable update takes time, each comparison takes time, and you're doing twice as many of them. Since you know it will take one or two less iterations of the loop to terminate in this version, you should reason it will cost about twice as much CPU time.
This strategy is still O(n) time complexity since it only looks at each item once, it's just specifically worse when the item is near the center of the list. If it's near the end, this approach will have a better expected runtime. Try looking for item 90 in both, for example.
Selected answer is excellent. I'd like to add another aspect: Try findIndex(), it's 2-3 times faster than using loops:
const arr = Array.from({length: 900}, (_, i) => i);
const n = 51;
const len = arr.length;
console.time("iterate from start");
for (let i = 0; i < len; i++) {
if (arr[i] === n) break;
}
console.timeEnd("iterate from start");
console.time("iterate using findIndex");
var i = arr.findIndex(function(v) {
return v === n;
});
console.timeEnd("iterate using findIndex");
The other answers here cover the main reasons, but I think an interesting addition could be mentioning cache.
In general, sequentially accessing an array will be more efficient, particularly with large arrays. When your CPU reads an array from memory, it also fetches nearby memory locations into cache. This means that when you fetch element n, element n+1 is also probably loaded into cache. Now, cache is relatively big these days, so your 100 int array can probably fit comfortably in cache. However, on an array of much larger size, reading sequentially will be faster than switching between the beginning and the end of the array.
I'm doing some work involving processing an insane amount of data in browser. As a result I'm trying to optimize everything down to the nuts and bolts. I don't need anyone telling me that I'm wasting my time or that premature optimization is the root of all evil.
I would just like to know if anyone that understands how JS works would know whether or not a lesser than boolean runs faster than a lesser than equals boolean. What I mean by that is, would:
return (i<2? 0:1)
Be parsed and run faster than:
return (i<=1? 0:1)
In this example we're assuming that i is an integer. Thanks.
JavaScript standard desribes the steps that needs to be taken in order to evaluate those expressions. You can take a look at ECMAScript 2015 Language Specification, section 12.9.3.
Be aware that even if there is slightly difference between steps of those two operation, other stuff in your application will have much more influence on performance that these simple operations that you cannot control in JavaScript. For example work of garbage collector, just-in-time compiler, ...
Even if you try measuring time in JavaScript, this will not work as just taking time stamps has much bigger influence on the performance than the actual expression you want to measure. Also the code that you wrote might not be the one which is really evaluated as some preoptimizations might me taken by the engine prior to actual running the code.
I wouldn't call this micro-optimisation, but rather nano-optimisation.
Cases are so similar you'll most likely have a measure precision below the gain you can expect...
(Edit)
If this code is optimised, the generated assembly code will just change from JAto JAE (in (x86) , and they use the same cycle count. 0,0000% change.
If it is not, you might win one step within a selectof the engine...
The annoying thing being that it makes you miss the larger picture : unless i'm wrong, you need a branch here, and if you're that worried about time, the statistical distribution of your input will influence WAY more the execution time. (but still not that much...)
So walk a step back and compare :
if (i<2)
return 0;
else
return 1;
and :
if (i>=2)
return 1;
else
return 0;
You see that for ( 100, 20, 10, 1, 50, 10) (1) will branch way more and for (0, 1, 0, 0, 20, 1), (2) branches more.
That will make much more difference... that might just as well be very difficult to measure !!!
(As a question left to the reader, i wonder how return +(i>1) compiles, and if there's a trick to avoid branching... )
(By the way i'm not against early optimisation, i even posted some advices here, if it might interest you : https://gamealchemist.wordpress.com/2016/04/15/writing-efficient-javascript-a-few-tips/ )
I have created a fiddle using performance.now API and console.time API's
Both the API says how much ms of the time was taken to execute the functions/loops.
I feel the major difference is the result, performance.now gives more accurate value i.e. upto 1/1000th ms.
https://jsfiddle.net/ztacgxf1/
function lessThan(){
var t0 = performance.now();
console.time("lessThan");
for(var i = 0; i < 100000; i++){
if(i < 1000){}
}
console.timeEnd("lessThan");
var t1 = performance.now();
console.log("Perf -- >>" + (t1-t0));
}
function lessThanEq(){
var t0 = performance.now();
console.time("lessThanEq")
for(var i = 0; i < 100000; i++){
if(i <= 999){}
}
console.timeEnd("lessThanEq");
var t1 = performance.now();
console.log("Perf -- >>" + (t1-t0));
}
lessThan()
lessThanEq()
I haven't much difference. May be iterating more may give different result.
Hope this helps you.
I need to generate a random big (around 4096 bit) prime number in JavaScript and I'm already using forge. Forge has to have some kind of generator for such tasks as it implements RSA which also relies on random prime numbers. However I haven't found something in the documentation of forge when you just want to get a random prime number (something like var myRandomPrime = forge.random.getPrime(4096); would have been great).
So what would be the best approach to get such a prime (with or without forge) in JavaScript?
Update 06/11/2014: Now, with forge version 0.6.6 you can use this:
var bits = 1024;
forge.prime.generateProbablePrime(bits, function(err, num) {
console.log('random prime', num.toString(16));
});
Finding large primes in JavaScript is difficult -- it's slow and you don't want to block the main thread. It requires some fairly customized code to do right and the code in forge is specialized for RSA key generation. There's no API call to simply produce a large random prime.
There are some extra operations that the RSA code in forge runs that you don't need if you're just looking for a single prime number. That being said, the slowest part of the process is in actually finding the primes, not in those extra operations. However, the RSA code also generates two primes (when you only need one) and they aren't the same bitsize you're looking for. So if you're using the forge API you'd have to pass a bitsize of 8196 (I believe ... that's off the top of my head, so it may be inaccurate) to get a 4096-bit prime.
One way to find a large random prime is as follows:
Generate a random number that has the desired number of bits (ensure the MSB is set).
Align the number on a 30k+1 boundary as all primes have this property.
Run a primality test (the slow part) on your number; if it passes, you're done, if not, add to the number to get to the next 30k+1 boundary and repeat. A "quick" primality test is to check against low primes and then use Miller-Rabin (see the Handbook of Applied Cryptography 4.24).
Step #3 can run for a long time -- and that's usually pretty undesirable with JavaScript (w/node or in the browser). To mitigate this, you can attempt to limit the amount time spent doing primality tests to some acceptable period of time (N milliseconds) or you can use Web Workers to background the process. Of course, both of these approaches complicate the code.
Here's some code for generating a 4096-bit random prime that shouldn't block the main thread:
var forge = require('node-forge');
var BigInteger = forge.jsbn.BigInteger;
// primes are 30k+i for i = 1, 7, 11, 13, 17, 19, 23, 29
var GCD_30_DELTA = [6, 4, 2, 4, 2, 4, 6, 2];
var THIRTY = new BigInteger(null);
THIRTY.fromInt(30);
// generate random BigInteger
var num = generateRandom(4096);
// find prime nearest to random number
findPrime(num, function(num) {
console.log('random', num.toString(16));
});
function generateRandom(bits) {
var rng = {
// x is an array to fill with bytes
nextBytes: function(x) {
var b = forge.random.getBytes(x.length);
for(var i = 0; i < x.length; ++i) {
x[i] = b.charCodeAt(i);
}
}
};
var num = new BigInteger(bits, rng);
// force MSB set
var bits1 = bits - 1;
if(!num.testBit(bits1)) {
var op_or = function(x,y) {return x|y;};
num.bitwiseTo(BigInteger.ONE.shiftLeft(bits1), op_or, num);
}
// align number on 30k+1 boundary
num.dAddOffset(31 - num.mod(THIRTY).byteValue(), 0);
return num;
}
function findPrime(num, callback) {
/* Note: All primes are of the form 30k+i for i < 30 and gcd(30, i)=1. The
number we are given is always aligned at 30k + 1. Each time the number is
determined not to be prime we add to get to the next 'i', eg: if the number
was at 30k + 1 we add 6. */
var deltaIdx = 0;
// find prime nearest to 'num' for 100ms
var start = Date.now();
while(Date.now() - start < 100) {
// do primality test (only 2 iterations assumes at
// least 1251 bits for num)
if(num.isProbablePrime(2)) {
return callback(num);
}
// get next potential prime
num.dAddOffset(GCD_30_DELTA[deltaIdx++ % 8], 0);
}
// keep trying (setImmediate would be better here)
setTimeout(function() {
findPrime(num, callback);
});
}
Various tweaks can be made to adjust it for your needs, like setting the amount of time (which is just an estimate) to run the primality tester before bailing to try again on the next scheduled tick. You'd probably want some kind of UI feedback each time it bails. If you're using node or a browser that supports setImmediate you can use that instead of setTimeout as well to avoid clamping to speed things up. But, note that it's going to take a while to generate a 4096-bit random prime in JavaScript (at least at the time of this writing).
Forge also has a Web Worker implementation for generating RSA keys that is intended to speed up the process by letting multiple threads run the primality test using different inputs. You can look at the forge source (prime.worker.js for instance) to see that in action, but it's a project in itself to get working properly. IMO, though, it's the best way to speed things up.
Anyway, hopefully the above code will help you. I'd run it with a smaller bitsize to test it.
It does more work then you specifically require but you can always use forge to generate a key pair and extract one of the primes from that.
//generate a key pair of required size
var keyPair = forge.pki.rsa.generateKeyPair(4096);
//at this point we have 2 primes p and q in the privateKey
var p = keyPair.privateKey.p;
var q = keyPair.privateKey.q;
The type of p and q are BigInteger they have a p.toByteArray() method to access their representations as a byte array.
If you decide to implement your own method, you may want to read Close to Uniform Prime Number Generation With Fewer Random Bits which has discussion and algorithms for faster generation of well-distributed large n-bit primes. The FIPS 186-4 publication also has a lot of information including algorithms for Shawe-Taylor proven prime construction.
dlongley's answer uses the "PRIMEINC" method, which is efficient but not a good distribution (this may or may not matter to you, and either way he's given a nice framework to use). Note that FIPS recommends a lot of M-R tests (this can be mitigated if your library includes a Lucas or BPSW test).
Re: proven primes, my experience using GMP is that up to at least 8192 bits, both Shawe-Taylor and Maurer's FastPrime are slower than using Fouque and Tibouchi algorithm A1 combined with BPSW + additional M-R tests. Your mileage may vary, and of course the proven prime methods get a proven prime as a result.