Archive for the ‘opengl’ tag
Cube Maps
With basic texturing in place, I obviously got tempted to play around a bit more…
First, I added support for cube maps to the rasterizer as I’ve always been anxious to go back to these and dynamically generate them in 3D (i.e. rotating spherical model of procedurally generated planets with nice atmospheres, etc.). Nothing too tricky about basic cube map support. The big difference is simply calling glTexImage2D() six times rather than the usual one and also generating a uvw texture coordinate (the uvw is effectively encoded as the vector from the center of the object to the point on the object for basic spherical mapping) rather than a usual simple uv.
Second, I added support for generating cube maps – in other words, creating the six tiles, converting those to coordinates in 3-space and then calling a callback to generate a color for that point in space. With that done, I then could easily plug-in the noise and pattern functions from GLGeom to generate some seamless, not-too-distorted spherical textures.
The next step is to use something like the above to generate clouds, terrain, and water layers and start making some planets…
Update
Here’s a quick image of a phong-shaded, colored noise map with a separate noise map for the specular highlight:
Migrating to OpenGL 3.2 Core Profile
I’ve been moving LxEngine from using OpenGL 3.2 Compatibility Profile to the Core Profile. That means saying goodbye to various bits of functionality I had been using in the interest of cleaner, more concise (and forward-moving) OpenGL API usage.
Here’s a list of the minor items that had to be corrected:
GL_QUADS
Short answer: split your quads into triangles and duplicate your data as needed
This was the first issue as LxEngine currently uses quad lists rather than triangle lists for much of its geometry.
The fix is straightforward: split each quad into two triangles. Now where to do this? This is done in the rasterizer right before the glBindBuffer() calls. Why? Because, apparently (I have no verification of this), modern drivers actually split quads into triangles before uploading to the card – so effectively doing the translation myself before the glBindBuffer() call should be more or less equivalent in terms of speed. This allows any data higher up in the application to retain quad format data for a bit less memory usage. If the higher level data wants to avoid the translation, it can use the triangle-based APIs. Thus quads in LxEngine have become a compression technique…
In terms of memory usage, for an index-buffer based quad list, the index buffer size increases by 3/2 (i.e. 6 indices describe the quad instead of 4). Any associated face data is obviously double in size. However, vertex data itself does not increase at all since it is an index-based approach. For an unindexed primitive, the vertex array sizes all increase by 3/2 or 150% as every 4 vertices is again now 6. At this point, LxEngine is assuming a soup of unconnected quads as input so there’s no attempt made to stripify the triangles to recover some of that memory loss.
glMatrixMode
Short answer: store the matrices in a software stack, set them as GLSL uniforms before the draw call, use GLM to do the dirty-work of the math for you
gl_ModelViewMatrix, gl_ProjectionMatrix, and gl_NormalMatrix are gone. How is this fixed?
Add uniform mat4 and mat3 variables to cover for them. Presumably the model-view and projection matrices are directly available within your code, so simply set the uniforms to those values instead of making a glLoadMatrix call.
Now, if you’re also using functions like glRotate() and/or glPushMatrix() then simply move the model-view and projection matrices to a software stack within your application. Right before making any draw calls, set the uniform variable to the current top of the stack for those values.
If you want to avoid writing all the math to do a rotation like glRotate(), then I suggest using GLM.
Oh, and gl_NormalMatrix is the upper 3×3 of the model-view matrix, so simply compute that using GLM via some code like this:
glm::mat3 normalMatrix = glm::mat3(glm::inverseTranspose(viewMatrix));
GLint idx = glGetUniformLocation(progId, "myNormalMatrix");
glUniformMatrix3fv(idx, 1, GL_FALSE, glm::value_ptr(normalMatrix));
glAlphaFunc
Not much to say about this one: check the color.a in the fragment shader and use the discard keyword to fail the alpha test. Use uniforms if you want a non-hard coded test. (LxEngine only uses it for masking, therefore a fixed test works fine.)
GL_CLAMP
Replace GL_CLAMP with GL_CLAMP_TO_EDGE, most likely. See http://www.khronos.org/opengles/documentation/opengles1_0/html/glTexParameter.html
layout(triangles)
The GL_GEOMETRY_INPUT_TYPE and GL_GEOMETRY_OUTPUT_TYPE are now set in the geometry shader itself using the GLSL keyword “layout”. See http://www.lighthouse3d.com/tutorials/glsl-core-tutorial/geometry-shader/.
Rendering Voxels
The image below is not in fact of a checker procedural material, but rather is a first image from some work towards adding voxel rendering to LxEngine. As version 0, it’s more or less simply rendering an array of cubes.
Here’s a rough the change list:
- Added support for solid colored materials (i.e. no lighting, just color)
- Added support for specifying the light set and camera to use a particular pass (previously this had to be specified per item being rendered)
- Improved the OpenGL error checking in the rasterizer a bit
Update: 2011.06.25
The sample has been updated to support multiple voxel cells with a procedurally generated (noise-based) height function. The rendering has been rewritten to create a single mesh for each cell which – as anticipated – is orders of magnitude faster than the original naive voxel-by-voxel rendering algorithm.
Building the mesh…
The algorithm for building the mesh is quite simple:
- For each voxel in the cell (i.e. each of the 16x16x4 blocks)…
- Check if the block above is empty or there is no block (i.e. the block is on the top boundary); if so add a quad for the top of the block to the vertex buffer
- Repeat the prior step for the other five sides of the block
This has two major advantages:
The entire cell is treated as a single vertex buffer. This means one draw call for all 16x16x4 blocks, rather than 1024 draw calls. Graphics hardware operates far more efficiently one large batches; thus, the single draw call for the whole cell takes roughly as much time as each of the 1024 calls took when drawing each block individually. (Combining the buffer, of course, means the same shader and parameters need to be used for the entire buffer. The solution to handle textured blocks – not yet implemented – will be to use a texture atlas containing all the textures so that texture parameters do not have to be reset.)
The second advantage is all the interior blocks of a cell get discard implicitly. In the naive algorithm, every block has drawn, regardless of whether it was completely obscured or not. Due to the grid nature of the cell, checking neighbors is sufficient to see if a particular face is never visible. Since “most” blocks in the anticipated data sets are fully interior, this potentially trims out a sizeable chunk of obscured vertices and faces. In a “solid” cell of 16x16x4 block, there are potentially 24,576 vertices and 6144 faces (assuming each face requires a unique set of vertex parameters for each of its vertices); with the interior cells stripped out, this becomes 768 faces with 3,072 vertices. That’s 12.5% the number of vertices and faces. If vertices can be shared between faces, the number of vertices drops to 768 as well, or 3.1% of the original count.
Event-based Thread Pattern
To experiment with core OpenGL 3.2, I created a new simple renderer. Before jumping into the rendering code however, I wanted to build up a solid architectural base for the renderer. Among other things, a solid architecture means supporting multiple threads from the start.
The first goal, in short, was to support a view thread responsible for the display window (including both the OpenGL rendering and windows messages like key presses and mouse clicks) and a separate thread for a basic text console window (for input and output of debugging commands). Since the view thread is not a pure rendering thread and handles windows message, it needs to process events such as the window close button being clicked. Likewise the console thread has a “quit” command to shutdown the application. The point is that the threads are largely designed as “siblings” rather than having a parent-child relationship: the view thread can shutdown itself and the console thread – and vice-versa. A good initial challenge therefore was to ensure the threads can each shutdown the whole application (i.e. their own thread and any sibling threads) in a clean, correct manner.
The eventual target would be a more complex hierarchy of thread interactions. Each parent thread acts largely as a control thread for communicating with its children and its own parent. The theoretical model below might apply to a rendering system with multiple displays (both rasterized and ray-traced) with multiple plug-ins for concurrent control over the underlying graphics database (local UI and web-based).
Starting simple though, there are basic premises that must be kept in mind:
Thread shutdown is a request, not a synchronous command
One thread should not directly and instantaneously shutdown another thread. The other thread may be anywhere in its execution. The shutdown must occur at a well-defined point in the execution to ensure everything can be clean-ed up properly. The request can happen at any point in time; the shutdown itself must happen at a well-defined point in the execution.
Busy waiting is bad
This is a fairly fundamental threading issue. The threading design should leverage the operating system’s abilities suspend the thread and resume it only when it has work to do. Busy waiting should be avoided.
The thread management should be encapsulated from the thread’s task
As a basic design principle, the code related to threading management should be as independent of the thread’s core work as possible. The reason is simply to reduce complexity. As a consequence, since I’m working on Windows, this also keeps the platform-dependent threading code isolated from the platform-independent code (e.g. the OpenGL rendering code).
The Pattern
The generic pattern for the event based threads is as follows:
init
do
wait for event
switch event
case event1 handle1();
case event2 handle2();
…
while not exit event
shutdown
end thread
The Implementation
This was implemented on Windows. The pattern above is remarkably simple to implement on Windows. The wait code translates to a call to WaitForMultipleObjects(). Events are set up simple calls to CreateEvent(). Events are triggered with SetEvent(). Note that the Windows API exposes a lot of options with events; this obscures the fact that, when the advanced options are not needed, the API is actually quite straightforward.
For the view thread, as mentioned above, it has to handle and windows messages as well as any events sent to the thread. Fortunately, the Windows APIs have a function that specifically blocks until there is either an event or a windows message available: MsgWaitForMultipleObjects(). Therefore setting up the view thread to respond to either windows message or an single quit event (as sent from the console) is straightforward:
{
init();
bool bQuit = false;
do
{
// Suspend the thread until an event or windows message is available
HANDLE hArray[1] = { m_hExitEvent };
DWORD ret = ::MsgWaitForMultipleObjects(1, &hArray[0],
FALSE, INFINITE, QS_ALLINPUT);
switch (ret)
{
case WAIT_OBJECT_0:
bQuit = true;
break;
// A return value of 1 more than the number of handles
// passed in implies a windows message is available.
case WAIT_OBJECT_0 + 1:
bQuit = m_windowHost.processEvents();
break;
};
} while (!bQuit);
shutdown();
}
The shutdown process as a whole code works as follows: the view/console thread calls a method on the Application singleton which calls SetEvent() on the quit handle for each thread in the app. The main start-up thread therefore simply sits and waits for all the threads to terminate (using WaitForSingleObject() on the thread handle):
{
m_spViewThread->signalExit();
m_spConsoleThread->signalExit();
}
….
int Application::run()
{
m_spConsoleThread->start();
m_spViewThread->start();
m_spViewThread->join();
m_spConsoleThread->join();
return 0;
}
int main (int argc, char** argv)
{
Application app;
return app.run();
}
References
- OpenGL 3.2 tutorials and information used to write the renderer: http://www.gamedev.net/community/forums/topic.asp?topic_id=544126
- Simple Win32 tutorial for creating a thread – http://www.relisoft.com/Win32/active.html







