Tuesday, February 15, 2011

Generating (a lot of) Data

In my previous post I introduced the Osiris project that I’ve started working on and outlined the basic construction system for the planet I’m trying to procedurally build.  With that set up, the next step was to create a system for generating and storing the multitude of data that is required to represent an entire planet.

Now planets are pretty large things and the diversity and quantity of data that is required to represent one at even fairly low fidelity gets very large very quickly.  A requirement of this system though is that I want to be able to run my demo and immediately fly down near to the surface to see what’s there – I don’t want to have to sit waiting for minutes or hours while it churns away in the background building everything up.

Atmospheric scattering shader and starfield skybox from orbit
For this to work I obviously need some form of asynchronous data generation system that can run in the background spitting out bits and pieces of data as quickly as possible while the main foreground thread is dealing with the user interface, camera movement and most importantly rendering the view.

This fits quite well with modern CPUs where the number of logical cores and hardware threads is continuing to rise providing increasing scope for such background operations, but that does also mean that the data generation system needs to be able to run on an arbitrary number of threads rather than just a single background one.  An added bonus of such scalability is that time can even be stolen from the primary rendering thread when not much else is going on – for example when the view is stationary or when the application is minimised.

Another view of the atmospheric scattering shader and starfield skybox
Ultimately this work should be able to be farmed off to secondary PCs in some form of distributed computing system or even out into the cloud – but to support that data generation has to be completely decoupled from the rendering and able to operate in isolation.  Even if such distribution never happens though designing in such separation and isolation is still a valuable architectural design goal.

So I need to be able to generate data in the background, but to achieve my interactive experience I also need it to be generating the correct data in the background, which in this case means that at any given moment I want it to be generating data for the most significant features that are closest to the viewpoint.  This determination of what to generate also needs to be highly dynamic as the viewpoint can move around very quickly – thousands or even tens of thousands of miles per hour at times – so it’s no good queuing up thousands of jobs, the current set of what’s required needs to be generated and maintained on the fly.

Specular reflection on the water and lens flare visible
Finally as generation of data can be a non-trivial process the system needs to be able to cache data it’s already generated on disk for rapid reloading on subsequent runs or even for later on in the same run if the in-memory data had to be flushed to keep the total memory footprint down.  I can’t simply cache everything however as for an entire planet the amount of data for the level of fidelity I want to reproduce can easily run into terabytes so it’s important to only cache up to a realistic point – say a few gigabytes worth – with the rest being always generated on demand.

To maximise the effectiveness of disk caching I also want to include compression in the caching system – the computation overhead of a standard compression library such as zlib shouldn’t be exorbitantly expensive compared to the potentially gigabytes of saved disk space.

This is quite a shopping list of requirements of course, which brings home the unavoidable complexity of generating high fidelity data on a planetary scale, but even non-optimal solutions to these primary requirements should allow me to build on top of such a generic data generation system and start to look at the planetary infrastructure generation and simulation work that I am primarily interested in.

Another view of the atmospheric scattering shader and starfield skybox
Unfortunately of course data generation architecture lends itself only so well to pretty pictures so rather than some dull boxes and lines representation of data flow the images with this post show the atmospheric scattering shader that I’ve also recently added – it’s probably the single biggest improvement in both visual impact and fidelity and suddenly makes the terrain look like a planet rather than just a textured ball – more on this atmospheric shader in a later post.