Cube Maps

With basic texturing in place, I obviously got tempted to play around a bit more…

First, I added support for cube maps to the rasterizer as I’ve always been anxious to go back to these and dynamically generate them in 3D (i.e. rotating spherical model of procedurally generated planets with nice atmospheres, etc.).  Nothing too tricky about basic cube map support. The big difference is simply calling glTexImage2D() six times rather than the usual one and also generating a uvw texture coordinate (the uvw is effectively encoded as the vector from the center of the object to the point on the object for basic spherical mapping) rather than a usual simple uv.

Second, I added support for generating cube maps – in other words, creating the six tiles, converting those to coordinates in 3-space and then calling a callback to generate a color for that point in space.  With that done, I then could easily plug-in the noise and pattern functions from GLGeom to generate some seamless, not-too-distorted spherical textures.

Spot procedural distorted by noise rendered to a cube map

The next step is to use something like the above to generate clouds, terrain, and water layers and start making some planets…

Update

Here’s a quick image of a phong-shaded, colored noise map with a separate noise map for the specular highlight:

Written by arthur

August 6th, 2011 at 5:53 pm

Posted in lxengine

Tagged with , , ,

Migrating to OpenGL 3.2 Core Profile

I’ve been moving LxEngine from using OpenGL 3.2 Compatibility Profile to the Core Profile. That means saying goodbye to various bits of functionality I had been using in the interest of cleaner, more concise (and forward-moving) OpenGL API usage.

Here’s a list of the minor items that had to be corrected:

This was the first issue as LxEngine currently uses quad lists rather than triangle lists for much of its geometry.

The fix is straightforward: split each quad into two triangles.  Now where to do this?  This is done in the rasterizer right before the glBindBuffer() calls.  Why?  Because, apparently (I have no verification of this), modern drivers actually split quads into triangles before uploading to the card – so effectively doing the translation myself before the glBindBuffer() call should be more or less equivalent in terms of speed.  This allows any data higher up in the application to retain quad format data for a bit less memory usage.  If the higher level data wants to avoid the translation, it can use the triangle-based APIs.  Thus quads in LxEngine have become a compression technique…

In terms of memory usage, for an index-buffer based quad list, the index buffer size increases by 3/2 (i.e. 6 indices describe the quad instead of 4).  Any associated face data is obviously double in size.  However, vertex data itself does not increase at all since it is an index-based approach.   For an unindexed primitive, the vertex array sizes all increase by 3/2 or 150% as every 4 vertices is again now 6.   At this point, LxEngine is assuming a soup of unconnected quads as input so there’s no attempt made to stripify the triangles to recover some of that memory loss.

glMatrixMode

Short answer: store the matrices in a software stack, set them as GLSL uniforms before the draw call, use GLM to do the dirty-work of the math for you

gl_ModelViewMatrix, gl_ProjectionMatrix, and gl_NormalMatrix are gone.  How is this fixed?

Add uniform mat4 and mat3 variables to cover for them.  Presumably the model-view and projection matrices are directly available within your code, so simply set the uniforms to those values instead of making a glLoadMatrix call.

Now, if you’re also using functions like glRotate() and/or glPushMatrix() then simply move the model-view and projection matrices to a software stack within your application.  Right before making any draw calls, set the uniform variable to the current top of the stack for those values.

If you want to avoid writing all the math to do a rotation like glRotate(), then I suggest using GLM.

Oh, and gl_NormalMatrix is the upper 3×3 of the model-view matrix, so simply compute that using GLM via some code like this:
 glm::mat3 normalMatrix = glm::mat3(glm::inverseTranspose(viewMatrix)); GLint idx = glGetUniformLocation(progId, "myNormalMatrix"); glUniformMatrix3fv(idx, 1, GL_FALSE, glm::value_ptr(normalMatrix)); 

glAlphaFunc

Not much to say about this one: check the color.a in the fragment shader and use the discard keyword to fail the alpha test. Use uniforms if you want a non-hard coded test. (LxEngine only uses it for masking, therefore a fixed test works fine.)

GL_CLAMP

Replace GL_CLAMP with GL_CLAMP_TO_EDGE, most likely.  See http://www.khronos.org/opengles/documentation/opengles1_0/html/glTexParameter.html

layout(triangles)

The GL_GEOMETRY_INPUT_TYPE and GL_GEOMETRY_OUTPUT_TYPE are now set in the geometry shader itself using the GLSL keyword “layout”. See http://www.lighthouse3d.com/tutorials/glsl-core-tutorial/geometry-shader/.

Written by arthur

June 26th, 2011 at 5:01 pm

Posted in lxengine

Rendering Voxels

The image below is not in fact of a checker procedural material, but rather is a first image from some work towards adding voxel rendering to LxEngine.  As version 0, it’s more or less simply rendering an array of cubes.

A single cell of 16x16x4 voxels

Here’s a rough the change list:

• Added support for solid colored materials (i.e. no lighting, just color)
• Added support for specifying the light set and camera to use a particular pass (previously this had to be specified per item being rendered)
• Improved the OpenGL error checking in the rasterizer a bit

Update: 2011.06.25

Procedurally generated voxel "world"

The sample has been updated to support multiple voxel cells with a procedurally generated (noise-based) height function.  The rendering has been rewritten to create a single mesh for each cell which – as anticipated – is orders of magnitude faster than the original naive voxel-by-voxel rendering algorithm.

Building the mesh…

The algorithm for building the mesh is quite simple:

• For each voxel in the cell (i.e. each of the 16x16x4 blocks)…
• Check if the block above is empty or there is no block (i.e. the block is on the top boundary); if so add a quad for the top of the block to the vertex buffer
• Repeat the prior step for the other five sides of the block

The entire cell is treated as a single vertex buffer.  This means one draw call for all 16x16x4 blocks, rather than 1024 draw calls.  Graphics hardware operates far more efficiently one large batches; thus, the single draw call for the whole cell takes roughly as much time as each of the 1024 calls took when drawing each block individually.  (Combining the buffer, of course, means the same shader and parameters need to be used for the entire buffer.  The solution to handle textured blocks – not yet implemented – will be to use a texture atlas containing all the textures so that texture parameters do not have to be reset.)

The second advantage is all the interior blocks of a cell get discard implicitly.  In the naive algorithm, every block has drawn, regardless of whether it was completely obscured or not.  Due to the grid nature of the cell, checking neighbors is sufficient to see if a particular face is never visible.  Since “most” blocks in the anticipated data sets are fully interior, this potentially trims out a sizeable chunk of obscured vertices and faces.  In a “solid” cell of 16x16x4 block, there are potentially 24,576 vertices and 6144 faces (assuming each face requires a unique set of vertex parameters for each of its vertices); with the interior cells stripped out, this becomes 768 faces with 3,072 vertices.  That’s 12.5% the number of vertices and faces.  If vertices can be shared between faces, the number of vertices drops to 768 as well, or 3.1% of the original count.

Written by arthur

June 24th, 2011 at 12:01 pm

Posted in lxengine

To experiment with core OpenGL 3.2, I created a new simple renderer.  Before jumping into the rendering code however, I wanted to build up a solid architectural base for the renderer.  Among other things, a solid architecture means supporting multiple threads from the start.

An OpenGL 3.2 triangle: impressive, eh?

The eventual target would be a more complex hierarchy of thread interactions.  Each parent thread acts largely as a control thread for communicating with its children and its own parent.  The theoretical model below might apply to a rendering system with multiple displays (both rasterized and ray-traced) with multiple plug-ins for concurrent control over the underlying graphics database (local UI and web-based).

Starting simple though, there are basic premises that must be kept in mind:

Thread shutdown is a request, not a synchronous command
One thread should not directly and instantaneously shutdown another thread.  The other thread may be anywhere in its execution.  The shutdown must occur at a well-defined point in the execution to ensure everything can be clean-ed up properly. The request can happen at any point in time; the shutdown itself must happen at a well-defined point in the execution.

This is a fairly fundamental threading issue.  The threading design should leverage the operating system’s abilities suspend the thread and resume it only when it has work to do.  Busy waiting should be avoided.

As a basic design principle, the code related to threading management should be as independent of the thread’s core work as possible.  The reason is simply to reduce complexity.  As a consequence, since I’m working on Windows, this also keeps the platform-dependent threading code isolated from the platform-independent code (e.g. the OpenGL rendering code).

The Pattern
The generic pattern for the event based threads is as follows:

init
do
wait for event
switch event
case event1  handle1();
case event2  handle2();
…
while not exit event
shutdown

The Implementation
This was implemented on Windows.  The pattern above is remarkably simple to implement on Windows.  The wait code translates to a call to WaitForMultipleObjects().  Events are set up simple calls to CreateEvent().  Events are triggered with SetEvent().  Note that the Windows API exposes a lot of options with events; this obscures the fact that, when the advanced options are not needed, the API is actually quite straightforward.

For the view thread, as mentioned above, it has to handle and windows messages as well as any events sent to the thread.   Fortunately, the Windows APIs have a function that specifically blocks until there is either an event or a windows message available:  MsgWaitForMultipleObjects().  Therefore setting up the view thread to respond to either windows message or an single quit event (as sent from the console) is straightforward:

virtual void run()
{
init();

bool bQuit = false;
do
{
// Suspend the thread until an event or windows message is available
HANDLE hArray[1] = { m_hExitEvent };
DWORD ret = ::MsgWaitForMultipleObjects(1, &hArray[0],
FALSE, INFINITE, QS_ALLINPUT);
switch (ret)
{
case WAIT_OBJECT_0:
bQuit = true;
break;

// A return value of 1 more than the number of handles
// passed in implies a windows message is available.
case WAIT_OBJECT_0 + 1:
bQuit = m_windowHost.processEvents();
break;
};

} while (!bQuit);

shutdown();
}

The shutdown process as a whole code works as follows: the view/console thread calls a method on the Application singleton which calls SetEvent() on the quit handle for each thread in the app.  The main start-up thread therefore simply sits and waits for all the threads to terminate (using WaitForSingleObject() on the thread handle):

void Application::requestExit()
{
}
….

int Application::run()
{

return 0;
}

int main (int argc, char** argv)
{
Application app;
return app.run();
}

References

Written by arthur

February 27th, 2010 at 10:51 am

Posted in lxengine

Tagged with ,