Archive for the ‘ray tracer’ tag
I obviously must enjoy writing basic ray tracers…
It has plenty of flaws still at this point, but the sphere on the right has bump mapping working. The interesting part is that the bump map can be specified via anything that returns a height: a 2d texture, a 2d procedural, a constant (not that that would be useful!), or anything that can be plugged into the shader graph that produces a scalar output. This does cause make the computation of the tangent space and derivative of the function a bit more of a challenge, but I’ll save that for some later date. It’s a first pass at bump mapping at this point.
An image from the LxEngine ray tracer:
Yup, I know – it really does need area lights and soft shadows.
The priorities of LxEngine have always been to produce simple, general, and correct code rather than pursuing optimal performance. This is arguably a justifiable set of priorities given that LxEngine is a long-term hobby project with a single developer aimed at experimenting with a variety of different technologies. In any case, I did spend a bit of time looking into the ray tracer’s performance and was able to improve it significantly via a few relative simple code changes.
I also tweaked out another 18% or so of an improvement via mere compiler flags. I’m going to confine this post to simply mentioning the effects of a few of those Visual Studio 2010 compiler optimization flags.
Let’s start with what’s most interesting: the results. These are averaged times from 6 runs of a sample scene with a few lights and a couple reflective objects at 1024×1024 resolution.
2,464 ms /arch:SSE2 /fp:fast
2,619 ms /fp:fast
2,956 ms /arch:SSE2
3,038 ms /fp:precise (default)
The slowest results came from the default compiler settings, which is to not set the /arch flag and use precise floating point precision. In this case, the floating point stack is used with most calculations taking place in 80-bit precision on the FPU before being copied back to host memory.
The first change I attempted was enabling /arch:SSE2, which tells the Visual Studio compiler (cl.exe) that it should use Streaming SIMD Extensions 2. With a ray tracer there are surely plenty of optimization opportunities for single instruction multiple data (SIMD) instructions and I wondered how much the compiler alone, without any code modifications, could take advantage of this. The result was surprisingly not much at all. The times for this single benchmark were only about 2.7% faster. I didn’t expect the compiler to be able to rework the ray tracer functions into fully parallel computations, but I figured that the presence of eight extra XMM registers alone would have more significant impact. Lesson: make benchmarks, not assumptions, when optimizing.
The then looked at the generated assembly code and found the CVTPS2PD instruction used quite frequently. Huh? Convert single precision to double precision? Why is it doing that? All my data in single precision, 32-bit floating point form. Why are conversions happening? The reason was the /fp:precise flag. Even if the final results are single-precision, the intermediate calculations were being done in double-precision to retain as much precision as possible during multiple floating point operations.
When I turned on the /fp:fast flag, the generated assembly became much more straightforward as the CVTPS2PD instructions all disappeared. The benchmark also yielded noticeably faster results (18.9% faster), which is quite significant given no coding effort was required to get this speed improvement. Now, of course, it’s important to note that both the /arch:SSE2 and /fp:fast flags do change the behavior of the code. The XMM registers used by the /arch:SSE2 flag – even in /fp:precise mode – still operate with at most 64-bits of precision and with the /fp:fast enabled, that is reduced to 32-bits precision. In the initial code, the 80-bit FPU representation was used. A change from 80-bits to 32-bits is non-trivial. I haven’t any analysis on the actual effect of the precision change, but it does need to be considered.
The final result was unfortunately the mysterious one. I could imagine how enabling SSE2 didn’t have much of an effect if the code was constantly converting from single to double precision; thus that explained my surprise at the minimal effect of /arch:SSE2 without /fp:fast. But trying to test out all possibilities, I enabled /fp:fast without any SIMD instructions or registers made available. The result was nearly as fast as using the SSE2 registers and instructions at 32-bits of precision. Huh? I haven’t dug through the the assembly comparing /fp:precise and /fp:fast without SSE2 enabled so at this point, I’m simply very surprised. Rather than spout out an untested, unresearched theory on what the compiler must be doing in this case to manage to save so much time, I’ll just leave it at that: I’m surprised.
Added basic reflection to the ray-tracer sample:
Reflection is added as by providing an optional std::function
Another minor step forward…It’ll be significantly more interesting when support for reflection in the rasterizer is added.
Here’s a link to the XML scene file that created it.
The script support is added completely independently of the ray tracing code. This is the way it should be and, it turns out, that’s exactly the way it is too.
spDocument->attachComponent("ray", create_raytracer() ); spDocument->attachComponent("scripting", create_scripting() );
The object created by create_scripting() is quite simple and mostly just C++ boilerplate to create a class and respond to Document changes.
(3) There is no three. The ray tracing code itself has no knowledge of whether the Element was created via a script or was part of the initial XML. In MVC terms, the ray tracing code just works since it is properly abstracted from the Model changes.
It was really cool to have the LxEngine architecture surprise me (i.e. the guy designing this and constantly setting unrealistically lofty goals of how I want this all to work) with how seriously easy it was to add a useful feature like scripting.