## Adaptive Multisampling for Area Lights

I recently added a first pass at area lights in the ray tracer. They are rather limited at the moment as the only point lights are supported and their shape is fixed as a sphere of user-controllable radius r. The sampling is only an approximation and non-uniform, but the results are good enough for now:

### Basic Lighting Equation and Shadow Term

The basic lighting equation we’re using is something like this:

$$I = I_A + \sum S\cdot (I_D + I_S)$$

where $S$ is our shadow term. Prior to adding area lights, the $S$ term was a boolean value that determined whether a light should or shouldn’t contribute light (i.e. energy) to the surface (i.e. “add color” to it). This is basic Computer Graphics 101 Shadows – nothing special.

With area lights, $S$ now represents a “coverage” value from 0.0 to 1.0 representing an approximation of what percentage of the light surface is visible at the point being illuminated. That value is then used to scale the energy of the light passed into the rest of the equation.

### The Pseudo-Code

Here’s the basic algorithm being used:

• else…
• If the light radius is zero (i.e. a true point light, not an area light), return the classic boolean 0/1 value for the shadow term
• else…
• Create a disc (circular region in 3-space) around the light, oriented toward the intersection point
• Sample at the disc center and at N points around the circumference
• If all those samples yield the same value (i.e. 0 or 1), then return that value
• else…
• Do many more samples to random locations on the disc representing the light surface and use the mean (i.e. average) value as the shadow term

### The Code

A goal of LxEngine is to keep the code base as self-explanatory as possible, so hopefully the code largely speaks for itself:

float _shadowTerm (const point_light_f& light, const intersection3f& intersection)
{
{
const float baseTerm = _shadowTermSingle(light.position, intersection);

//
// Check if the light is an area light, if so take multiple samples
//
{
//
// Compute a 3-space disc about the light oriented toward the interesction point
//
const vector3f L    (normalize(light.position - intersection.positionWc));
const disc3f   disc (light.position, L, radius);

auto sampler = [this, &intersection](const glgeom::point3f& pt) -> float
{
};

//
// Take several samples along the circumference of the disc to get an
// some sort of guess at the variance.  If the light is not completely
// visble or completely obscured, then generate far more samples using
// a random distribution on the disc to come up with a estimate as to
// what percentage of the disc is visible from the intersection point.
//
const size_t kInitial = 6;
const size_t kFull = 512;
const float  kEpsilon = 1e-4f;

const float term = sample_disc_circumference<float>(disc, kInitial, sampler);
float value = glm::mix(baseTerm, term, 1.0f / float(kInitial + 1));
if (value > 1.0f - kEpsilon || value < kEpsilon )
return value;
else
return sample_disc_random<float>(disc, kFull, lx0::random_unit, sampler);
}
else
return baseTerm;
}
else
return 1.0f;
}

### Code for Sampling the Circumference

Why do we sample the circumference? The assumption, which certainly isn’t true in the most general case, is that it’s most likely that if the light is partially obscured, one of the boundary points on the light surface will have a different shadow term than some other point on the boundary (or the surface center). Again, that’s not mathematically correct, but we’re assuming it’s accurate enough of the times for the kind of data sets we’re dealing with…

#### Sampling Along the Circumference of a Disc

GLGeom provides the functions we need to easily generate a set of samples along the circumference:

template <typename T>
T
sample_disc_circumference (
const glgeom::disc3t<T>&                     disc,
size_t                                       samples,
std::function<T (const glgeom::point3t<T>&)> sampleFunc)
{
auto offsets = perpendicular_circular_set(disc.normal, samples);

auto sum = T(0);
for (auto it = offsets.begin(); it != offsets.end(); ++it)
{
sum += sampleFunc(disc.origin + disc.radius * (*it));
}
return sum / T(samples);
}

…which in turn uses a function to generate a set of vector orthogonal to a base vector…

#### Generating a Set of Equally-Spaced Vectors Perpendicular to a Base Vector

template <typename T>
std::vector<vector3t<T>>
perpendicular_circular_set (const vector3t<T>& w, int N)
{
typedef vector3t<T> vector3;
typedef T           scalar;

vector3 u,v;
perpendicular_axes_smooth(w, u, v);

std::vector<vector3t<T>> results;
results.reserve(N);

scalar step = glgeom::two_pi().value / N;
for (int i = 0; i < N; ++i)
{
scalar ang = (glgeom::pi().value * i) / N;
scalar x = cos(ang);
scalar y = sin(ang);

vector3 p = x * u + y * v;
results.push_back(p);
}
return results;
}

…which in turn uses a function to generate an arbitrary, but consistent and “continuous”, perpendicular vector from the base…

#### Generating an Continuously-Defined, Arbitrary Basis About a Vector

template <typename T>
vector3t<T>
perpendicular_axis_smooth (const vector3t<T>& w)
{
vector3t<T> sum;

auto q = abs(w);
sum += (T(1) - q.x) * cross_with_x(w);
sum += (T(1) - q.y) * cross_with_y(w);
sum += (T(1) - q.z) * cross_with_z(w);

return normalize(sum);
}

template <typename T>
void
perpendicular_axes_smooth (const vector3t<T>& w, vector3t<T>& u, vector3t<T>& v)
{
u = perpendicular_axis_smooth(w);
v = normalize(cross( normalize(w), u ));
}

…and lastly we have the case where we want to randomly sample from the disc…

#### Randomly Sampling from a Disc

One point worth noting: this is not a uniform sampling from the disc. A uniform sampling would mean that given an infinite number of samples for any given area of the disc, the same number of samples would fall in that area as any other same-sized area within the disc.

Assuming our randomFunc below does return uniform values ranged from $[0,1)$, the below function clearly is not uniform across the disc as the area of the disc varies with $r^2$ and the radius value has a uniform, linear distribution.

Uniform sampling from the disc is being saved for another day. One thing at a time.

template <typename T>
T
sample_disc_random (
const glgeom::disc3t<T>&                     disc,
size_t                                       samples,
std::function<T ()>                          randomFunc,
std::function<T (const glgeom::point3t<T>&)> sampleFunc)
{
// Create a basis from the normal direction
glgeom::vector3t<T> u, v;
perpendicular_axes_smooth(disc.normal, u, v);

//
// Sample from within the disc
//
T sum = T(0);
for (size_t i = 0; i < samples; ++i)
{
// Generate a random point within the disc, then transform to 3-space
glm::detail::tvec2<T> offsetDisc (randomFunc(), randomFunc());
offsetDisc = (2 * randomFunc() - 1) * disc.radius * glm::normalize(offsetDisc);

const glgeom::vector3t<T> offsetWs = u * offsetDisc.x + v * offsetDisc.y;

sum += sampleFunc(disc.origin + offsetWs);
}
return sum / T(samples);
}

Written by arthur

September 13th, 2011 at 3:43 pm

Posted in lxengine

## Bethesda Softworks’ Morrowind

For various reasons, one of them being to test LxEngine with “real” data, I’ve been experimenting with loading and displaying the game data from Bethesda Softwork’s 2002 game, The Elder Scrolls III: Morrowind (buy it here on Steam). There’s a fair amount of information out there about the Morrowind file formats – as it is a highly moddable game.  I’ve been using NifTools to parse the actual models and been using custom code for the ESM/BSA parsing (neither are very complicated formats).

The primary purpose of the project has been to test out LxEngine with dated, but production-quality data and data formats.  The experiment thus far has been serving it’s purpose.  It has raised questions like, “Hey, what should the engine do when the current cell has 17 lights and the current shader only supports 8 at a time?”  The LxEngine project has hardly been lacking in TODOs, but in any case, this is helping identify the necessities versus the niceties.

A secondary purpose of the project is to learn a bit more about how Morrowind works, so that potentially as a side-effect of working on my own goals produce some useful contribution to the OpenMW project.  (This project certainly isn’t meant to compete with OpenMW – the goals here are to demo some basic rendering, physics, sound, etc. from Morrowind to test out LxEngine’s capabilities.  The goal of the OpenMW project is to produce a fully playable game with full fidelity to the original.)

As for the current progress, here are some screenshots:

Morrowind data game rendered (with obvious limitations!) with LxEngine

An actual, in-game screenshot from Bethesda Softworks' Morrowind of that same scene

Update: Texture Mapping

Adding texture mapping involved a couple core changes:

• Adding UVs and texture samplers to the LxEngine GLSL shader builder. This is less complex than some of the existing features of the shader builder, but hadn’t yet been added.  The support is somewhat minimal and will require revisiting for multi-texturing.
• Adding DDS texture format support to the GL Rasterizer.  DDS stands for DirectDraw Surface, i.e. a Microsoft DirectX format, that furthermore has some strange patent issues, which seems to bode poorly but video cards usually handle this format natively. There’s a EXT_texture_compression_s3tc OpenGL extension that allows DDS format data to be passed more or less directly to the card.  There’s a simple nVidia tutorial showing how to do this.
• Passing DDS streams from within a BSA understood by the TES3 loader to the LxEngine Rasterizer which knows nothing about Morrowind format data.  This was the fun one – which actually still requires a bit of work – abstracting the LxEngine rasterizer from the texture data source in a flexible, general way that both (a) allows the Rasterizer to know nothing about BSA files while the BSA loader knows nothing about OpenGL and (b) still streams the data directly from a disk-based std::istream to OpenGL without any superfluous copies.    The Rasterizer now allows textures to be created with a “type” and “acquire callback”.  In this case, the type is a stream and the callback is over in the TES3 loader: the only shared concept necessary is the std::istream.

Ever wonder what the Morrowind UV mapping looked like?

And after a couple bug fixes (like, ehem, remembering to open the binary DDS stream in std::ios::binary mode)…

Morrowind cell rendered with LxEngine

Next, I need to add multisampling support to the renderer: these screenshots would look so much better with it enabled!

### Update: Multisampling

Multisampling…

16x Multisampling

Written by arthur

July 13th, 2011 at 6:15 pm

Posted in lxengine

## SuperPixel

A new feature of adaptive multisampling has been added. The adaptive multisampling code currently works by taking four samples per pixel, measuring the delta between the largest and the smallest sampled color values, and if that delta exceeds a fixed threshold, eight more samples are taken.

The code has also been refactored such that the sampling mechanism is an pluggable interface. The adaptive multisampling is one such implementation. Others implementations are a standard one sample per pixel, four fixed grid samples per pixel sampler, and N samples jittered about the pixel.

The adaptive implementation works by changing the internal sample class from a simple RGB float tuple to a SuperPixel class. In this context, “super pixel” refers to a pixel with more data than the standard single color plus depth information. For the particular implementation here, the additional data is straightforward. Each super pixel tracks the sum of the sampled floating-point RGB values, integer count of the number of samples, as well as the maximum and minimum value of all samples thus far. As each sample is recorded, the values of the super pixel are updated accordingly.

The sampling interface is simply a loop where the sampler is asked for a sample location, the sample is taken, and then the sampler is asked if another sample is needed. Using this design, the adaptive sampler is quite straightforward. After the fourth sample, it checks the delta between the minimum and maximum samples. If the value is below the threshold, it tells the render loop to move on to the next pixel, others it queues up for eight more samples.

The code looks like this:

SuperPixel spixel;
spixel.setCenter( frustum.cellCenter(x, y, width, height) );

while (!sampler->done(spixel))
{
vec3f target;
sampler->generate(spixel, target);
…

First Pass Rasterization

The current multisampling approach samples a minimum of four samples per pixel to get some determination of the color variance at that pixel. It would be useful to instead take one sample per pixel, but check the variance against neighboring pixels. Theoretically, there’s no difference between this and a regular grid sample done at a 1/4th resolution. In reality, this requires some architectural changes to the code as it is currently written.

With the above in mind, it would also be interesting to explore a fast first-pass hardware rasterization of the image. The rasterization could track the depth, the surface normal, and the material identifier. That information would likely give a good indication of shading discontinuity without even having to run a shading algorithm. Tracking directly by color would likely work, but for accuracy the rasterization shading algorithm would require fidelity with the raytraced shading algorithm, which could turn into a time-consuming maintanance issue.

Of course, the same could be said of the geometric representation: the tessellated sphere representation must match the raytraced parametric representation to avoid inaccuracies.

The gap between a coarsely tessellated and finely tessellated sphere is shown below by in the light red arcs.  All those pixels would be provide inaccurate first pass information in the rasterization pass.  This is very significant since one of the intentions of the first-pass would be to correctly identify object boundaries for additional multisampling.

On the other hand, a first-pass hardware rasterization done with attention to accuracy could like build in multiple advanced acceleration techniques to get ray-traced quality results faster. For example, basic visibility tests and occlusion culling could rapidly create potentially visible sets for rectangular segments of the viewport. More obviously, it could be used as an instantaneous draft-quality preview of the scene to be rendered.

Written by arthur

January 24th, 2010 at 9:55 pm