Thursday, July 11, 2013

Damn, Voxel Data is BIG

I left off the last post with a little question about how many voxels I might need to represent a planet.  There were a few guesses of  4.7810528e+209.8782084e+17 and 10^21 which at first glance seem absurdly large but in fact they were pretty close to the mark.  You see the thing about voxel data is it's big.  Very big.

Apart from the inherent inefficiencies of using GPUs to render voxels with raycasting when they would generally be much happier doing what they were designed for and be rasterising triangles, it's the amount of memory voxel data sets consume that prohibit their use in many scenarios and while that's pretty much accepted wisdom and hardly newsworthy - I thought it might be interesting to just take a moment to put out there some cold hard numbers.

So, how many voxels and therefore of course bytes of storage do you need to render a planet?  The answer as you would expect depends on the fidelity of your representation and your method of storage, but these are my numbers what what they're worth:

To encompass a maximum data set of 25,000 Km cubed I am using 22 levels of detail each offering eight times the number of voxels of the previous one (i.e. twice the number along each axis).  The lowest detail level uses a 19^3 grid of voxel bricks each of which is 14^3 voxels in size so the entire lowest detail level provides 266^3 voxels.  Dividing the 25,000 Km region by this equates to each voxel encompassing a cubic region of space 93.98 Km on a side (830,185 Km^3).  That's pretty big!

So obviously the lowest level of detail is pretty coarse but even at that low detail level there are 266*266*266 voxels making up the space which equates to 18,821,096 individual voxels.  So that's nearly Twenty Million voxels for just one very crude detail level!

The next level down gives us 532^3 voxels each 46.99 Km on a side so we end up with over 150 Million of them there - things are scaling up pretty quickly.  Continuing this down to the maximum detail 22nd level gives us the following results:
Voxel counts, sizes and storage requirements at each level of detail
As you can see the numbers get silly very quickly, and this is storing just a single distance field value per voxel in half precision floating point (i.e. two bytes per voxel) without any thought to storing normals or occlusion for lighting or any kind of texturing information

So to put it another way, storing the full 25,000 Km sized cube of space at a maximum detail level of 4.5 cm per voxel would take 173,593,970,549,359,000,000,000,000 (173 million billion billion) voxels taking a mind boggling 294 Zetabytes of storage just for the distance field!  Putting that astronomical number into real world terms, if you burnt it all to dual layer DVD disks and stacked them on top of each other the pile would be nearly five times as high as the distance from the Earth to the edge of the Solar System!

I know the cost of hard drive storage continues to decrease but I'm pretty sure I can neither afford that sort of storage nor fit it in my PC!

Let's think about it a bit more though, firstly 25,000 Km is the maximum size my system can store - let's for the sake of argument say that I'm only interested in storing something Earth sized for now. It's not a perfect sphere but The Earth has a generally agreed approximate radius of 6371 Km which when plugged in to my formula reduces the numbers quite substantially:
Same voxel statistics but for an Earth sized cube of space
Which while better is still pretty extreme at a cool 39 Zetabytes - but let's keep going anyway. These numbers are for the entire cubic space but planets are conventionally round so with the volume of a sphere being (4/3)*PI*(Radius^3) our Earth sized planet has a volume of 1,083,206,916,845 KM^3, just 52.4% of the entire cube volume.

Before proceeding further lets define the planet a bit more accurately though; while they are basically spheres a perfectly smooth spherical surface isn't very interesting so we need some amount of surface detail. Conversely though we don't typically want to be able to travel to the planet's core so data below a certain depth underground probably isn't needed. Combine these two and you end up with a spherical shell of some determined thickness that defines the area of interest for rendering. My chosen thickness is currently 18Km (about 59,000 feet) which is approximately twice the height of Everest; I am anticipating that having roughly 10 Km available for the tallest mountains and 8 Km available for the deepest caves or underwater abysses ought to be sufficient

Making this assumption that we are only interested in a hollow shell of data is a great optimisation both of storage and rendering performance because instead of tracing the ray through the entire cubic data set we can first intersect it with the outer sphere of the planet's data shell and start tracing from there removing the need to store a great many of the distance samples. Assuming there aren't any holes in the shell you can also ignore any space inside the shell as you're bound to hit something solid before getting there.

Some basic calculations shows that for an Earth sized planet an 18 Km thick shell takes up just 0.23% of the planet's bounding cube and has a volume of about 9 billion Km^3 requiring something like 1.017E+23 voxels taking 172 ZB at two bytes per voxel.  Damn, that's still pretty large.

This estimation is pretty crude as it doesn't take account of the topology of the actual terrain which will allow many more bricks to be discarded and there are basic storage optimisations such as compressing the brick data but no matter which way you look at it you simply can't store brick data for a whole planet at this kind of detail.

There is also a major downside to all these assumptions of course - we are restricted to sensible planet shapes with sensible features which is a shame as being able to create crazy shaped planets you can travel through the core of sounds like a lot of fun.  Even though my efforts at the moment are on conventional planet shapes therefore I'm taking care to make sure there are no artificial restrictions in place that would preclude more radical geology. These limitations are simply optimisations to make the data set more manageable and easier to generate (i.e. faster) while I develop the system, pushing the boundaries of what's possible is a large part of this project's raison d'être.

Although all these crazy numbers make this project sound like an exercise in futility, remember that these are for the highest 4.5cm per voxel detail level.  Each level lower takes only 1/8th the amount and of course you only need high detail data for the immediate vicinity of the viewpoint; the key therefore is to have a system that can generate brick data on demand for any given position in the clipmap hierarchy.  Combine this with a multi-tier memory and disk caching strategy and you get something usable.

Remember also that one of the benefits of clipmaps is that the memory overhead for rendering is fixed regardless of detail level.  I support ten renderable levels from the set of 22 so it makes no difference whether I'm rendering levels 0 to 9 or level 12 to 21 the overhead is the same.

Finally, you're maybe wondering why bother to store the bricks at all when there are some pretty cool projects out there showing what can be achieved with noise based functions directly on the GPU such as iq's Volcanic.  The main reason I'm sticking with them is that I want to be able to add all sorts of voxely infrastructure onto my planets such as roads, cities and bridges which are hard to do dynamically in real-time but also because I want to experiment more with creating truly heterogeneous terrains rather than the fairly homogeneous looking constructs real time noise tends to favour.  That's the goal anyway.

I realise this has been a pretty dry number-heavy post so +1 if you're still with me, hopefully the next one will be a bit more visual.


  1. Hey John,

    I think you could also consider using compression to reduce the amount of data you are really storing by a truly massive amount.

    Another thing to think about: Use procedural detail generation for areas of the planet that have yet to be modified. That is to say, use your noise functions up until the player actually goes and interacts with it. Or better yet. STILL use procedural generation algorithms, but then modify the result with the stored changes a player may have made.
    That way you get the best of both worlds.

  2. I've been thinking about this stuff in the exact same thought path. Though I've been considering 'fields' of data. The only data stored would be surface topology down to maybe 1m at a side. Everything else is generated on the fly trough seed points. "forest starts here, with seed XXXXX" that kinda stuff. Then have generated detail based on material types and terrain seeds.

    But for everything underground and even the material itself... defined by fields. Giant boundaries that define volumes. anything inside of it is one material or another. Within that field sub fields are possible, say the area is primarily gabbro, inside of it might be concentrations of metal ores.

    All this might be 100 meters down, never looked at until some one digs down and hits Gabbro, and only then does it store voxel data for the shaft dug and temporary data for the surrounding voxels (to make digging in to it quick and efficient)

    This way, the important underground data is there, material type, wealth, perhaps a field defining a cave system, but ungenerated, or an underground lake, also not generated (but from the surface could allow wells to be sunk without defining the underground.)

  3. @nesetails yeah, tought about that too but didn't know anything of fields
    just told myself that there must be some way to predefine underground

  4. Hi! I have a technical (ish) question. I get that most of the difficulties when rendering voxels is how to store the data and how to load it efficiently. Now, I'm also a bit obsessed with the idea of procedural generation of planets and other celestial bodies, and I wonder if couldn't we avoid the problem altogether by describing the planet completely by equations.


Comments, questions or feedback? Here's your chance...