I'm making a hex dumper in JavaScript that will analyze the byte data of a file provided by the user. In order to properly display the data preview of the file, I'm escaping html characters using the methods from the top rated answer of this question.
function htmlEncode(value) { return $("<div/>").text(value).html(); }
function htmlDecode(value) { return $("<div/>").html(value).text(); }
I'm not asking for suggestions of how to best encode and decode html characters. What I am curious about is whether or not calling these functions hundreds of thousands of times in rapid succession is creating a metric butt-ton of DOM elements that will slow down the utility over time.
I've noticed that running my dumper on a small file (35 bytes), which thankfully runs almost instantaneously, takes much longer after I've run my dumper on a larger file (132,832 bytes) in the same session. The encode function is essentially run once for each byte.
I know JavaScript has a garbage collector, and these elements aren't tied to anything so I would assume they would get cleaned up after they're done being used, but I don't know the details or inner workings of the collector so I don't want to make any assumptions as to how quickly it will take care of the problem.
Theoretically it's possible that you're generating a lot of memory because you are creating numerous new elements. However, since they are not being added to the DOM, they should be cleaned up either on the next garbage collector cycle or while the stack is being popped (depends on how optimized the engine is).
But, as #juvian pointed out, you can get around this by having one dedicated element that you use for this operation. Not only will it ensure you aren't filling up your memory but it will also be faster since jQuery won't have to repeatedly process the <div/> string, create an element, generate a jQuery object, etc.
Here's my not-completely-scientifically-sound-but-definitely-good-enough-to-get-the-idea proof:
function now() {
if (typeof performance !== 'undefined') {
now = performance.now.bind(performance);
return performance.now();
} else {
now = Date.now.bind(Date);
return Date.now();
}
}
// Load the best available means of measuring the current time
now();
// Generate a whole bunch of characters
var data = [];
var totalNumberOfCharacters = 132832;
for (var i = 0; i < totalNumberOfCharacters; i++) {
data.push(String.fromCharCode((i % 26) + 65));
}
// Basic encode function
function htmlEncode(value) {
return $("<div/>").text(value).html();
}
// Cache a single <div> to improve performance
var $div = $('<div/>');
function cachedHtmlEncode(value) {
return $div.text(value).html();
}
// Encode using the unoptimized approach
var start = now();
var unoptimized = '';
for (var i = 0; i < totalNumberOfCharacters; i++) {
unoptimized += htmlEncode(data[i]);
}
var end = now();
console.log('unoptimized', end - start);
document.querySelector('pre').innerText = unoptimized;
// Encode using the optimized approach
start = now();
var optimized = '';
for (var i = 0; i < totalNumberOfCharacters; i++) {
optimized += cachedHtmlEncode(data[i]);
}
end = now();
console.log('optimized', end - start);
document.querySelector('pre').innerText = optimized;
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<pre></pre>
Related
I'm working on some kind of 1:1 chat system, the environment is Node.JS
For each country, there is a country room (lobby), for each socket client there is a js class/object is being created and each object is in a list with their unique user id.
This unique id is preserved even users logged in from different browser tabs etc..
Each object stored in collections like: "connections" (all of them), "operators"(only operators), "{countryISO}_clients" (users) and the reference key is their unique id.
In some circumstances, I need to access these connections by their socket ids.
At this point, I can think of 2 resolutions.
Using a for each loop to find the desired object
Creating another collection, this time instead of unique id use socket id (or something else.)
Which one makes sense? Because in JS since this collection will be a reference list instead of a copy, it feels like it makes sense (and beautiful looking) but I can't be sure. Which one is expensive in memory/performance terms?
I can't make thorough tests since I don't know how to create dummy (simultaneous) socket connections.
Expected connected socket client count: 300 - 1000 (depends on the time of the day)
e.g. user:
"qk32d2k":{
"uid":"qk32d2k",
"name":"Josh",
"socket":"{socket.io's socket reference}",
"role":"user",
"rooms":["room1"],
"socketids":["sid1"]
"country":"us",
...
info:() => { return gatherSomeData(); },
update:(obj) => { return updateSomeData(obj); },
send:(data)=>{ /*send data to this user*/ }
}
e.g. Countries collection:
{
us:{
"qk32d2k":{"object above."}
"l33t71":{"another user object."}
},
ca:{
"asd231":{"other user object."}
}
}
Pick Simple Design First that Optimizes for Most Common Access
There is no ideal answer here in the absolute. CPUs are wicked fast these days, so if I were you I'd start out with one simple mechanism of storing the sockets that you can access both ways you want, even if one way is kind of a brute force search. Pick the data structure that optimizes the access mechanism that you expect to be either most common or most sensitive to performance.
So, if you are going to be looking up by userID the most, then I'd probably store the sockets in a Map object with the userID as the key. That will give you fast, optimized access to get the socket for a given userID.
For finding a socket by some other property of the socket, you will just iterate the Map item by item until you find the desired match on some other socket property. I'd probably use a for/of loop because it's both fast and easy to bail out of the loop when you've found your match (something you can't do on a Map or Array object with .forEach()). You can obviously make yourself a little utility function or method that will do the brute force lookup for you and that will allow you to modify the implementation later without changing much calling code.
Measure and Add Further Optimization Later (if data shows you need to)
Then, once you get up to scale (or simulated scale in pre-production test), you take a look at the performance of your system. If you have loads of room to spare, you're done - no need to look further. If you have some operations that are slower than desired or higher than desired CPU usage, then you profile your system and find out where the time is going. It's most likely that your performance bottlenecks will be elsewhere in your system and you can then concentrate on those aspects of the system. If, in your profiling, you find that the linear lookup to find the desired socket is causing some of your slow-down, then you can make a second parallel lookup Map with the socketID as the key in order to optimize that type of lookup.
But, I would not recommend doing this until you've actually shown that it is an issue. Premature optimization before you have actual metrics that prove it's worth optimizing something just add complexity to a program without any proof that it is required or even anywhere close to a meaningful bottleneck in your system. Our intuition about what the bottlenecks are is often way, way off. For that reasons, I tend to pick an intelligent first design that is relatively simple to implement, maintain and use and then, only when we have real usage data by which we can measure actual performance metrics would I spend more time optimizing it or tweaking it or making it more complicated in order to make it faster.
Encapsulate the Implementation in Class
If you encapsulate all operations here in a class:
Adding a socket to the data structure.
Removing a socket from the data structure.
Looking up by userID
Looking up by socketID
Any other access to the data structure
Then, all calling code will access this data structure via the class and you can tweak the implementation some time in the future (to optimize based on data) without having to modify any of the calling code. This type of encapsulation can be very useful if you suspect future modifications or change of modifications to the way the data is stored or accessed.
If You're Still Worried, Design a Quick Bench Measurement
I created a quick snippet that tests how long a brute force lookup is in a 1000 element Map object (when you want to find it by something other than what the key is) and compared it to an indexed lookup.
On my computer, a brute force lookup (non-indexed lookup) takes about 0.002549 ms per lookup (that's an average time when doing 1,000,000 lookups. For comparison an indexed lookup on the same Map takes about 0.000017 ms. So you save about 0.002532 ms per lookup. So, this is fractions of a millisecond.
function addCommas(str) {
var parts = (str + "").split("."),
main = parts[0],
len = main.length,
output = "",
i = len - 1;
while(i >= 0) {
output = main.charAt(i) + output;
if ((len - i) % 3 === 0 && i > 0) {
output = "," + output;
}
--i;
}
// put decimal part back
if (parts.length > 1) {
output += "." + parts[1];
}
return output;
}
let m = new Map();
// populate the Map with objects that have a property that
// you have to do a brute force lookup on
function rand(min, max) {
return Math.floor((Math.random() * (max - min)) + min)
}
// keep all randoms here just so we can randomly get one
// to try to find (wouldn't normally do this)
// just for testing purposes
let allRandoms = [];
for (let i = 0; i < 1000; i++) {
let r = rand(1, 1000000);
m.set(i, {id: r});
allRandoms.push(r);
}
// create a set of test lookups
// we do this ahead of time so it's not part of the timed
// section so we're only timing the actual brute force lookup
let numRuns = 1000000;
let lookupTests = [];
for (let i = 0; i < numRuns; i++) {
lookupTests.push(allRandoms[rand(0, allRandoms.length)]);
}
let indexTests = [];
for (let i = 0; i < numRuns; i++) {
indexTests.push(rand(0, allRandoms.length));
}
// function to brute force search the map to find one of the random items
function findObj(targetVal) {
for (let [key, val] of m) {
if (val.id === targetVal) {
return val;
}
}
return null;
}
let startTime = Date.now();
for (let i = 0; i < lookupTests.length; i++) {
// get an id from the allRandoms to search for
let found = findObj(lookupTests[i]);
if (!found) {
console.log("!!didn't find brute force target")
}
}
let delta = Date.now() - startTime;
//console.log(`Total run time for ${addCommas(numRuns)} lookups: ${delta} ms`);
//console.log(`Avg run time per lookup: ${delta/numRuns} ms`);
// Now, see how fast the same number of indexed lookups are
let startTime2 = Date.now();
for (let i = 0; i < indexTests.length; i++) {
let found = m.get(indexTests[i]);
if (!found) {
console.log("!!didn't find indexed target")
}
}
let delta2 = Date.now() - startTime2;
//console.log(`Total run time for ${addCommas(numRuns)} lookups: ${delta2} ms`);
//console.log(`Avg run time per lookup: ${delta2/numRuns} ms`);
let results = `
Total run time for ${addCommas(numRuns)} brute force lookups: ${delta} ms<br>
Avg run time per brute force lookup: ${delta/numRuns} ms<br>
<hr>
Total run time for ${addCommas(numRuns)} indexed lookups: ${delta2} ms<br>
Avg run time per indexed lookup: ${delta2/numRuns} ms<br>
<hr>
Net savings of an indexed lookup is ${(delta - delta2)/numRuns} ms per lookup
`;
document.body.innerHTML = results;
I try to understand about the ArrayBuffer in js as it one of the transferable types between a thread and a worker.
I find huge performance gap into variable creation and I'm unable to find an answer over the internet.
I tried several benchmarking and arrays litterals are always much much more faster to declare than TypedArrays. I tried in node 11, chrome and firefox, results are coherent.
var LIMIT = 10000;
console.time("Array insertion time");
for (var i = 0; i < LIMIT; i++) {
var arr = new Array();
}
console.timeEnd("Array insertion time");
console.time("ArrayBuffer insertion time");
for (var i = 0; i < LIMIT; i++) {
var buffer = new ArrayBuffer(LIMIT * 4);
var arr = new Int32Array(buffer);
}
console.timeEnd("ArrayBuffer insertion time");
I receive crazy results:
Array insertion time: 1.283ms
ArrayBuffer insertion time: 53.979ms
I thought it would be faster for JS Engine to declare a TypedArray than a litteral. I thought ArrayBuffer was a very optimized call for allocating memory to the programm.
You are simply not doing the same thing at all...
When you declare an ArrayBuffer, the browser will ask for a static memory slot, the size of this ArrayBuffer.
An Array on the other hand doesn't have a static memory slot, it will get reassigned while its length will get updated.
So if you want to perform a fair test, then you need to assign some data in these Arrays, because currently, they're just empty objects for what the engine is concerned, i.e they have a very low foot-print, and are very fast to generate.
var LIMIT = 5000; // I have to lower the LIMIT because Array is so slow
console.time("Array insertion time");
for (var i = 0; i < LIMIT; i++) {
// to be fair, they should hold the same data
var arr = new Array(LIMIT * 4).fill(0);
}
console.timeEnd("Array insertion time");
console.time("ArrayBuffer insertion time");
for (var i = 0; i < LIMIT; i++) {
var buffer = new ArrayBuffer(LIMIT * 4);
var arr = new Int32Array(buffer);
}
console.timeEnd("ArrayBuffer insertion time");
Primitive types are generally always going to be faster. They're used so frequently they get the most attention for optimization in the engine. Typed arrays most likely have more overhead because they're performing type checks on operations like inserting. That doesn't come for free.
Additionally, your 2nd example is doing more work declaring the buffer then converting it to a typed array. You're also performing very small operations where ArrayBuffers are intended to store much larger long buffers like binary audio or images.
I'm messing around with the performances of JavaScript's push and pop functions.
I have an array called arr.
When I run this:
for (var i = 0; i < 100; i++) {
for (var k = 0; k < 100000; k++) {
arr.push(Math.ceil(Math.random() * 100));
arr.pop();
}
}
I get a time of 251.38515999977244 Milliseconds (I'm using performance.now() function).
But when I run a custom push and pop:
Array.prototype.pushy = function(value) {
this[this.length] = value;
}
Array.prototype.poppy = function() {
this.splice(-1, 1);
}
for (var i = 0; i < 100; i++) {
for (var k = 0; k < 100000; k++) {
arr.pushy(Math.ceil(Math.random() * 100));
arr.poppy();
}
}
The time is 1896.055750000014 Milliseconds.
Can anyone explain why there's such a huge difference between these?
To those who worry about time difference. I ran this test 100 times and computed an average time. I did that 5 times to ensure there weren't any outlying times.
Because the built-in function is written is whatever language the browser was written in (probably C++) and is compiled. The custom function is written in Javascript and is interpreted.
Generally interpreted languages are much slower than compiled ones. One usually doesn't notice this with Javascript because for the most part, you only execute a couple lines of JS between human interactions (which is always the slowest part).
Running JS in a tight loop as your done here, highlights the difference.
The reason is that the built-in function was specifically designed and optimized to perform a specific function. The browser takes whatever shortcuts possible when using the built-in function that it may not be as quick to recognize in the custom function. For example, with your implementation, the function needs to get the array length every single time the function is called.
Array.prototype.pushy = function(value) {
this[this.length] = value;
}
However, by simply using Array.prototype.push, the browser knows that the purpose is to append a value on to the array. While browsers may implement the function differently, I highly doubt any needs to compute the length of the array for every single iteration.
I have an application which I have to push a lot values to array, so I test the execution time:
var st = new Date().getTime();
var a = [];
for (var i = 0; i < 20971520; i++) {
a.push(i);
}
var ed = new Date().getTime();
console.info((ed - st) / 1000);
console.info(a.length);
I run the codes in Firefox Console and Chrome console directly, and it cost 37 seconds. And during the execution, even the mouse can be moved in Chrome, but there is no interactive effect.
Then I change the codes:
function push() {
var st = new Date().getTime();
var a = [];
for (var i = 0; i < 20971520; i++) {
a.push(i);
}
var ed = new Date().getTime();
console.info((ed - st) / 1000);
console.info(a.length);
}
var tr = setTimeout(push, 50);
Simplify put the codes in a function, and call it using the setTimeout, it cost 0.844 second. And during the execution, I can operation in Chrome normally.
What's going on here?
I know that the setTimeout will put the control to the browser to do the UI job, which will make the page responsive. For example, when I do some calculation during the mousemove of the page, I would put the calculation executed delayed to prevent it blocking the UI.
But why it reduce the total execution time of the same codes?
And during the execution, I can operation in Chrome normally.
Not true. The main chrome window will be just as frozen as the other case (just for a shorter while). The debug tools are a separate thread and will not slow down though.
But why it reduce the total execution time of the same codes?
It does if you run in dev tools. If you execute the code in actuality where the VM can make property optimizations the times are comparable (nearly 1 second). e.g.
var st = new Date().getTime();
var a = [];
for (var i = 0; i < 20971520; i++) {
a.push(i);
}
var ed = new Date().getTime();
console.info('normal', (ed - st) / 1000);
console.info(a.length);
function push() {
var st = new Date().getTime();
var a = [];
for (var i = 0; i < 20971520; i++) {
a.push(i);
}
var ed = new Date().getTime();
console.info('timeout', (ed - st) / 1000);
console.info(a.length);
}
var tr = setTimeout(push, 0);
http://jsfiddle.net/gu9Lg52j/ you will see the normal executes just as fast as setTimeout.
Also if you wrap the code in a function and execute in a console the time will be comparable even without a setTimeout as the VM can make optimizations between function definition and execution:
function push() {
var st = new Date().getTime();
var a = [];
for (var i = 0; i < 20971520; i++) {
a.push(i);
}
var ed = new Date().getTime();
console.info('timeout', (ed - st) / 1000);
console.info(a.length);
}
push();
Both variations of code should run with almost identical speed (the latter example might be faster but not 10 times faster).
Inside the Chrome developer tools, there is a different story. The expressions are evaluated inside a with block. This means the variables such as a and i are first searched inside another object (the __commandLineAPI). This adds an additional overhead which results in the 10 times longer execution time.
All JavaScript engines perform various optimizations. For example V8 uses 2 compilers, a simple one used by default and an optimizing one. Code not compiled by the optimizing compiler is slow, very slow.
A condition for the optimizing compiler to run is that the code must be in a (not too long) function (there are other conditions). The first code you tried in the console isn't in a function. Put your first code in a function and you'll see it performs the same than the second one, setTimeout changes nothing.
It makes zero sense to check for performances in the console when the main performance factor is the optimizing compilation. If you're targeting node, use a benchmarking framework. If you're targeting the browser, use a site like jsperf.
Now, when you have to do a really long computation in the browser (which doesn't seem to be the case here), you should consider using web workers which do the job in a background thread not impacting the UI.
setTimeout, like others noticed, doesn't speed up the array creation and does lock the browser. If you are concerned about browser lockup during the array creation, web workers (see MDN) may come to the rescue. Here is a jsFiddle demo using a web worker for your code. The worker code is within the html:
onmessage = function (e) {
var a = [], now = new Date;
for (var i=0; i<20971520; i++) {
a.push(i);
}
postMessage({timings: 'duration: '+(new Date()-now) +
'Ms, result: [' + a[0] + '...'+a[a.length-1] + ']'});
}
So I'm looking for some advice on the best method for toggling the class (set of three) of an element in a loop ending at 360 iterations. I'm trying to avoid nested loops, and ensure good performance.
What I have:
// jQuery flavour js
// vars
var framesCount = '360'; // total frames
var framesInterval = '5000'; // interval
var statesCount = 3; // number of states
var statesCountSplit = framesInterval/statesCount;
var $scene = $('#scene');
var $counter = $scene.find('.counter');
// An early brain dump
for (f = 1; f < framesCount; f += 1) {
$counter.text(f);
for (i = 1; i < statesCount; i += 1) {
setTimeout(function() {
$scene.removeClass().addClass('state-'+i);
}, statesCountSplit);
}
}
So you see for each of 360 frames there are three class switchouts at intervals. Although I haven't tested I'm concerned for the performance hit here once that frames value goes into the thousands (which it might).
This snippet is obviously flawed (very), please let me know what I can do to make this a) work, b) work efficiently. Thanks :-)
Some general advice:
1) Don't declare functions in a loop
Does this really need to be done in a setTimeout?
for (i = 1; i < statesCount; i += 1) {
setTimeout(function() {
$scene.removeClass().addClass('state-'+i);
}, statesCountSplit);
}
2) DOM operations are expensive
Is this really necessary? This will toggle so fast that you won't notice the counter going up. I don't understand the intent here, but it seems unnecessary.
$counter.text(f);
3) Don't optimize early
In your question, you stated that you haven't profiled the code in question. Currently, there's only about 1000 iterations, which shouldn't be that bad. DOM operations aren't too bad, as long as you aren't inserting/removing elements, and you're just modifying them.
I really wouldn't worry about performance at this point. There are other micro-optimizations you could apply (like changing the for loop into a decrementing while loop to save on a compare), but you gave no indication that the performance is a problem.
Closing thoughts
If I understand the logic correctly, you're code doesn't match it. The code will currently increment .counter as fast as the processor can iterate over your loops (should only take a few milliseconds or so for everything) and each of your "class switchouts" will fire 360 times within a few milliseconds of each other.
Fix your logic errors first, then worry about optimization if it becomes a problem.
Don't use a for loop for this. This will generate lots of setTimeout events which is known to slow browsers down. Use a single setTimeout instead:
function animate(framesCount, statesCount) {
$scene.removeClass().addClass('state-'+statesCount);
if (framesCount) {
setTimeout(
function(){
animate(framesCount-1,(statesCount%3)+1);
},
statesCountSplit
);
}
}
animate(360*3,1);