I am using WebGL to do hardware skinning, however updating my model node hierarchies is causing a huge hit for performance.
Every node needs to query its current location/rotation/scale keyframes, construct a local matrix with them, and multiply it with its parent's world matrix if it has a parent.
The matrix math itself is as optimized as it gets (special variants of matrix construction based on gl-matrix).
Still, if I update many models, with tens of nodes each (some even with hundreds, sadly), this hogs all of the execution time of the browser.
I have tried using a dirty state for when nodes don't actually need updating, but simply checking if their local data changed (mostly just checking if the location or rotation changed) actually causes the same amount of processing as just calculating the matrices.
WebCL would have been ideal, but that seems to go nowhere since 2014.
I am starting to think of running it all in a shader, but I can't quite wrap my head on how to design it (e.g. storing the keyframes, which are a map of frame->data, or how to write the data back).
Another way is to cache all of the animation transformations in a texture, but this doesn't scale well. For models with a low enough amount of keyframes, this is ok, but for ones with long animations, this turns to hundreds of megabytes very fast.
This is mostly because I can't think of any way to store sparse data. If that were possible, then I could store the same amount of transformations as there are keyframes, which would not take a lot of memory (right now, I store the transformations for every single frame).
Granted, this would require to do matrix interpolation, and I am not sure how reliable that is.
Does anyone have any ideas?
I dont think its practical to offload the entire node hierarchy calculations to the GPU. The best you can do is upload the 2 absolute world transformations keyframes to GPU and let GPU interpolate between. But I am not sure if the interpolated world transformation is same as if you actually calculated via node hierarchy. If that is possible, then that would be a feasible solution. Note that you cannot interpolate between matrices either. You need to convert it in a form that support interpolation, such as with quaternions + additional data.
I actually ran into this problem as well for my project. I found the updating transformation calculation to be the most time consuming operation, despite having a full collision system + response going as well.
I solved the problem by reducing the amount of times this updating transformation need to be called. For example, if you can conclude that the entire model is out of your view frustum, then you dont need to calculate the transformations at all. This may reduce the amount of calculation you need to do to between ~1/6 to ~1/4.
Secondly, for distant objects, you dont need to update their transformation every frame. Just update their transforms every few frames or so. Remember, there's games that are shipped with only 30FPS and thus a few skipped frame for distant objects may not be noticeable.
Finally, and this may not work for Javascript at all so I didnt do (yet), is that you should store and access data in a cache coherent manner. See these slides. Could have a 10x performance increase. But again, may not work for Javascript because well, Javascript arrays are not guaranteed to pack their data sequentially.
Related
I am building a relatively simple Three.js Application.
Outline: You move with your camera along a path through a "world". The world is all made up from simple Sprites with SpriteMaterials with transparent textures on it. The textures are basically GIF images with alpha transparency.
The whole application works fine and also the performance is quite good. I reduced the camera depth as low as possible so objects are only rendered quite close.
My problem is: i have many different "objects" (all sprites) with different many textures. I reuse the Textures/Materials reference for the same type of elements that are used multiple times in the scene (like trees and rocks).
Still, i'm getting to the point where memory usage is going up too much (above 2GB) due to all the textures used.
Now, when moving through the world, not all objects are visible/displayed from the beginning, even though i add all the sprites to the scene from the very start. Checking the console, the objects not visible and its textures are only loaded when "moving" further into the world when new elements actually are visible in the frustum. The, also the memory usage goes gradually up and up.
I cannot really us "object pooling" for building the world due to its "layout" lets say.
To test, i added a function that removes objects from the scene and disposes their material.map as soon as the camera passed by. Sth like
this.env_sprites[i].material.map.dispose();
this.env_sprites[i].material.dispose();
this.env_sprites[i].geometry.dispose();
this.scene.remove(this.env_sprites[i]);
this.env_sprites.splice(i,1);
This works for the garbage collection and frees up memory again. My problem is then, when moving backwards with the camera, the Sprites would need to be readded to the scene and the materials/texture loaded again, which is quite heavy for performance and does not seem the right approach to me.
Is there a known technique on how to deal with such a setup in regards to memory management and "removing" and adding objects and textures again (in the same place)?
I hope i could explain the issue well enough.
This is how the "World" looks like to give you an impression:
Each sprite individually wastes a ton of memory as your sprite imagery will probably rarely be square and because of that there will be wasted space in each sprite, plus the mipmaps which are also needed for each sprite take a lot of space. Also, on lower end devices you might hit a limit on the total number of textures you can use -- but that may or may not be a problem for you.
In any case, the best way to limit the GPU memory usage with so many distinct sprites is to use a texture atlas. That means you have at most perhaps a handful of textures, each texture contains many of your sprites and you use distinct UV coordinates within the textures for each sprite. Even then you may still end up wasting memory over time due to defragmentation in the allocation and deallocation of sprites, but you would be running out of memory far less quickly. If you use texture atlases you might even be able to load all sprites at the same time without having to deallocate them.
If you want to try and tackle the problem programmatically, there's a library I wrote to manage sprite texture atlases dynamically. I use it to render text myself, but you can use any canvas functions to create your images. It currently uses a basic Knapsack algorithm (which is replaceable) to manage allocation of sprites across the textures. In my case it means I need only 2 1024x1024 textures rather than 130 or so individual sprite textures of wildly varying sizes and that really saves a lot of GPU memory.
Note that for offline use there are probably better tools out there that can generate a texture atlas and generate UV coordinates for each sprite, though it should be possible to use node.js together with my library to create the textures and UV data offline which can then be used in a three.js scene online. That would be an exercise left to the reader though.
Something thing you can also try is to pool all your most common sprites in your "main" texture atlases which are always loaded; load and unload the less commonly used ones on the fly.
I got several objects where every one of them got priority value. Priority value can be between 1(lowest) to 200(higgest). Every value is represented by a color, lowest value got green color "rgba("0","255",0,1)"; and highest value got "rgba("255","0",0,1)";
I calculate color value by classic equation where every priority value determine different value(different color). So in the end i got possible chance of 200 different colors in range from green(0) to yellow(100) to red(200) based on priority.
My question is: When I'm redrawing on canvas all objects every 100ms. Is it better to calculate those values everytime to get wanted color or generate only ONCE in initialization function an array of 200 colors where value on array[100] will be color for object with 100 priority.
I expect there won't be like big a difference but still one of those approach must be better.
Calculate once is the better option in almost every case (classically called a lookup table). Memory is cheaper than CPU cycles which means consumer hardware has plenty of RAM, but is always needing cycles.
In this case you are right, 200 colours every 100ms is insignificant even at full frame rate of 16.666...ms (60fps), but clients will have many applications/tabs/services running on the device and anything a programmer does to reduce CPU load will benefit the client.
There is also a added benefit that programmers tend to forget. CPU cycles require much more power than memory. For a single machine adding a few million cycles is nothing, but if every programmer wrote in a manner that reduced overall load the world wide savings in power are considerable. I am off to hug a tree now, hope that helps.
from question i understand that you don't want calculation of color to affect your animation, use Web worker to run your calculation on separate thread Web Workers And one more thing Read all Style at one time and Write style at one time don't do Read/Write together because it may cause Layout Thrashing Layout Thrashing
The following answer shows that matrix transform functions are faster then regular transform:
https://stackoverflow.com/a/20892177/157397
That sounds logical to me because the browser will translate transform functions into a matrix anyway.
I am writing a JS Class that renders objects using CSS. In order to be more usable for anyone, I would like to use understandable attributes (the same used in the separate transform functions):
http://codepen.io/meodai/pen/gCbrt
The thing is I would have to calculate the values for the matrix with JS. I would like to know if it is faster to let the browser handle it (like it is now), or should my class calculate a matrix? What would be faster in the end?
Well, first of all, as Seldaek says, don't try to beat the browser on performance, unless there is a real issue.
Second, most of the time, the key issue in performance is rendering, not parsing the value of the property. That would be only noticeable if you are changing your values really fast. And, in that case, probably the easy way to optimize it is to change the transform less often. And leave the smoothing to the browser, via some transition.
And third, you can hit unexpected issues. to say one, the matrices for a 0deg rotation and a 360deg rotation are the same. However, when you are at 359 deg, it's not the same to change that to 360deg than to 0deg. The browser somehow preserves the rotation state here; and you will have problems handling that working only with matrices.
I would not try to beat the browser at the performance game unless you really detect an issue. It's likely they optimize more than you would already, and if not then they might in the future.
I'm investigating the possibility of producing a game using only HTML's canvas as the display media. To take an example task I need to do, I need to construct the game environment from a number of isometric tiles. Of course, working in 2D means they by necessity come in rectangular packages so there's a large overlap between tiles.
I'm old enough that the natural solution to this problem is to call BitBltMasked. Oh wait, no, an HTML canvas doesn't have something as simple and as pleasing as BitBlt. It seems that the only way to dump pixel data in to a canvas is either with drawImage() which has no useful drawing modes that ignore the alpha channel or to use ImageData objects that have the image data in an array.. to which every. access. is. bounds. checked. and. therefore. dog. slow.
OK, that's more of a rant than a question (things the W3C like tend to provoke that from me), but what I really want to know is how to draw fast to a canvas? I'm finding it very difficult to ditch the feeling that doing 100s of drawImages() a second where every draw respects the alpha channel is inherently sinful and likely to make my application perform like arse in many browsers. On the other hand, the only way to implement BitBlt proper relies heavily on a browser using a hotspot-like execution technique to make it run fast.
Is there any way to draw fast across every possible implementation, or do I just have to forget about performance?
This is a really interesting problem, and there's a few interesting things you can do to solve it.
First, you should know that drawImage can accept a Canvas, not just an image. The "sub-Canvas"es don't even need to be in the DOM. This means that you can do some compositing on one canvas, then draw it to another. This opens a whole world of optimization opportunities, especially in the context of isometric tiles.
Let's say you have an area that's 50 tiles long by 50 tiles wide (I'll say meters for the sake of my own sanity). You might divide the area into 10x10m chunks. Each chunk is represented by its own Canvas. To draw the full scene, you'd simply draw each of the chunks' Canvas objects to the main canvas that's shown to the user. If only four chunks (a 20x20m area), you would only perform four drawImage operations.
Of course, each of those individual chunks will need to render its own Canvas. On game ticks where nothing happens in the chunk, you simply don't do anything: the Canvas will remain unchanged and will be drawn as you'd expect. When something does change, you can do one of a few things depending on your game:
If your tiles extend into the third dimension (i.e.: you have a Z-axis), you can draw each "layer" of the chunk into its own Canvas and only update the layers that need to be updated. For example, if each chunk contains ten layers of depth, you'd have ten Canvas objects. If something on layer 6 was updated, you would only need to re-paint layer 6's Canvas (probably one drawImage per square meter, which would be 100), then perform one drawImage operation per layer in the chunk (ten) to re-draw the chunk's Canvas. Decreasing or increasing the chunk size may increase or decrease performance depending on the number of update you make to the environment in your game. Further optimizations can be made to eliminate drawImage calls for obscured tiles and the like.
If you don't have a third dimension, you can simply perform one drawImage per square meter of a chunk. If two chunks are updated, that's only 200 drawImage calls per tick (plus one call per chunk visible on the screen). If your game involves very few updates, decreasing the chunk size will decrease the number of calls even further.
You can perform updates to the chunks in their own game loop. If you're using requestAnimationFrame (as you should be), you only need to paint the chunk Canvas objects to the screen. Independently, you can perform game logic in a setTimeout loop or the like. Then, each chunk could be updated in its own tick between frames without affecting performance. This can also be done in a web worker using getImageData and putImageData to send the rendered chunk back to the main thread whenever it needs to be updated, though making this work seamlessly will take a good deal of effort.
The other option that you have is to use a library like pixi.js to render the scene using WebGL. Even for 2D, it will increase performance by decreasing the amount of work that the CPU needs to do and shifting that over to the GPU. I'd highly recommend checking it out.
I know that GameJS has blit operations, and I certainly assume any other html5 game libraries do as well (gameQuery, LimeJS, etc etc). I don't know if these packages have addressed the specific array-bounds-checking concern that you had, but in practice their samples seem to work plenty fast on all platforms.
You should not make assumptions about what speedups make sense. For example, the GameJS developer reports that he was going to implement dirty rectangle tracking but it turned out that modern browsers do this automatically---link.
For this reason and others, I suggest to get something working before thinking about the speed. Also, make use of drawing libraries, as the authors have presumably spent some time optimizing performance.
I have no personal knowledge about this, but you can look into the appMobi "direct canvas" HTML element which is allegedly a much faster version of normal canvas, link. I'm confused about whether this works in all browsers or just webkit browsers or just appMobi's own special browser.
Again, you should not make assumptions about what speedups make sense without a very deep knowledge of web browser internal processes. That webpage about "direct canvas" mentions a bunch of things that slow down canvas-drawing: "Reflowing text, mapping hot spots, creating indexes for reference links, on and on." Alpha-blending and array-bounds-checking are not mentioned as prominent causes of slowness!
Unfortunately, there's no way around the alpha composition overhead. Clipping may be one solution, but I doubt there would be much, if any, performance gain. Not to mention how complicated such a route would be to implement on irregular shapes.
When you have to draw the entire display, you're going to have to deal with the performance hit. Although afterwards, you have a whole screen's worth of pre-calculated alpha imagery and you can draw this image data at an offset in one drawImage call. Then, you would only have to individually draw the new tiles that are scrolled into view.
But still, the browser is having to redraw each pixel at a different location in the canvas. Which is quite expensive. It would be nice if there was a method for just scrolling pixels, but no luck there either.
One idea that comes to mind is that you could implement multiple canvases, translating each individual canvas instead of redrawing the pixels. This would allow the browser to decide how to redraw those pixels, in a more native way, at least in theory anyway. Then you could render the newly visible tiles on a new, or used/cached, canvas element. Positioning it to match up with the last screen render.
But that's just my two blits... I mean bits... duh, I mean cents :]
I've been building Conway's Life with javascript / jquery in order to run it in a browser Here. Chrome, Firefox and Opera or Safari do this pretty fast so preferably don't use IE for this. IE9 is ok though.
While generating the new generations of Life I am storing the previous generations in order to be able to walk back through the history. This works fine until a certain point when memory fills up, which makes the browser(tab) crash.
So my question is: how can I detect when memory is filling up? I am storing an array for each generation in an array which forms the history of generations. This takes massive amounts of memory which crashes the browser after a few thousands of generations, depending on available memory.
I am aware of the fact that javascript can't check the amount of available memory but there must be a way...
I doubt that there is a way to do it. Even if there is, it would probably be browser-specific. I can suggest a different way, though.
Instead of storing all the data for each generation, store snapshots taken every once in a while. Since the Conway's Game of Life is deterministic, you can easily re-generate future frames from a given snapshot. You'll probably want to keep a buffer of a few frames so that you can make rewinding nice and smooth.
In reality, this doesn't actually solve the problem, since you'll run out of space eventually. However, if you store every n frames, your application will last n times longer, which might just be long enough. I would recommend that you impose some hard limits on how far into the past you can rewind so that you have a cap on how much you have to store. Determine that how many frames that would be (10 minutes at 30 FPS = 18000 frames). Then, divide frames by how many frames you can store (profile various web browsers to figure this out) and that is the interval between snapshots you should use.
Dogbert pretty much nailed it. You can't know exactly how much available memory there is but you can know how potentially large your dataset will be.
So, take the size of each object stored in the array, multiply by array dimensions and that's the size of one iteration. Multiply that by the desired number of iterations to see how much space total it will take, and adjust accordingly.
Or, inspired by Travis, simply run the pattern in reverse from the last known array. It is deterministic after all.