Archive for the ‘ray tracer’ tag

Ray Tracer Update

without comments

Further progress on the Javascript ray tracer…

The patterns library, written in LxLang and translated to Javascript, is now used as well as a new mappers library written in LxLang (dynamically translated until it is developed further).

Written by arthur

January 4th, 2012 at 7:25 pm

Ray Tracing (Yet Again)

without comments

I obviously must enjoy writing basic ray tracers…

My recent realization has been that developing in Javascript – or more precisely in a browser-enabled language – makes a lot of sense for me at this point.  Immediately demonstrable results along with a flexible, dynamic language for experimentation are higher utility than highly optimized offline compilation for my goals.  At least for now.

So, I’m working on rewriting the LxEngine ray-tracer and ShaderBuilder in Javascript.  I’m rather excited about the possibilities that a dynamic language will open.  In any case, it’s still at a very early stage but I’m excited by the rapid progression:

Written by arthur

December 27th, 2011 at 7:30 pm

Bump mapping: First pass

without comments

It has plenty of flaws still at this point, but the sphere on the right has bump mapping working. The interesting part is that the bump map can be specified via anything that returns a height: a 2d texture, a 2d procedural, a constant (not that that would be useful!), or anything that can be plugged into the shader graph that produces a scalar output. This does cause make the computation of the tangent space and derivative of the function a bit more of a challenge, but I’ll save that for some later date. It’s a first pass at bump mapping at this point.

Written by arthur

September 12th, 2011 at 1:04 pm

Where are we?

without comments

An image from the LxEngine ray tracer:

Things that aren’t necessarily obvious from the image: the two textures used in the image are procedurally generated from Javascript (a 2d texture and a cube map), the checker is a true procedural, 9 regular grid samples per pixel, multithreaded ray tracing over 8 threads, SSE2 enabled, 1024×1024 final resolution, trace depth limit 5, 17 seconds render time on a Core i7 laptop.

Yup, I know – it really does need area lights and soft shadows.

VS2010 Optimization Flags

without comments

The priorities of LxEngine have always been to produce simple, general, and correct code rather than pursuing optimal performance. This is arguably a justifiable set of priorities given that LxEngine is a long-term hobby project with a single developer aimed at experimenting with a variety of different technologies. In any case, I did spend a bit of time looking into the ray tracer’s performance and was able to improve it significantly via a few relative simple code changes.

I also tweaked out another 18% or so of an improvement via mere compiler flags. I’m going to confine this post to simply mentioning the effects of a few of those Visual Studio 2010 compiler optimization flags.

The Results

Let’s start with what’s most interesting: the results. These are averaged times from 6 runs of a sample scene with a few lights and a couple reflective objects at 1024×1024 resolution.

2,464 ms     /arch:SSE2 /fp:fast

2,619 ms     /fp:fast

2,956 ms     /arch:SSE2

3,038 ms     /fp:precise (default)

 

Comments

The slowest results came from the default compiler settings, which is to not set the /arch flag and use precise floating point precision. In this case, the floating point stack is used with most calculations taking place in 80-bit precision on the FPU before being copied back to host memory.

/arch:SSE2

The first change I attempted was enabling /arch:SSE2, which tells the Visual Studio compiler (cl.exe) that it should use Streaming SIMD Extensions 2. With a ray tracer there are surely plenty of optimization opportunities for single instruction multiple data (SIMD) instructions and I wondered how much the compiler alone, without any code modifications, could take advantage of this. The result was surprisingly not much at all. The times for this single benchmark were only about 2.7% faster. I didn’t expect the compiler to be able to rework the ray tracer functions into fully parallel computations, but I figured that the presence of eight extra XMM registers alone would have more significant impact. Lesson: make benchmarks, not assumptions, when optimizing.

The then looked at the generated assembly code and found the CVTPS2PD instruction used quite frequently. Huh? Convert single precision to double precision? Why is it doing that? All my data in single precision, 32-bit floating point form. Why are conversions happening? The reason was the /fp:precise flag. Even if the final results are single-precision, the intermediate calculations were being done in double-precision to retain as much precision as possible during multiple floating point operations.

/fp:fast

When I turned on the /fp:fast flag, the generated assembly became much more straightforward as the CVTPS2PD instructions all disappeared. The benchmark also yielded noticeably faster results (18.9% faster), which is quite significant given no coding effort was required to get this speed improvement. Now, of course, it’s important to note that both the /arch:SSE2 and /fp:fast flags do change the behavior of the code. The XMM registers used by the /arch:SSE2 flag – even in /fp:precise mode – still operate with at most 64-bits of precision and with the /fp:fast enabled, that is reduced to 32-bits precision. In the initial code, the 80-bit FPU representation was used. A change from 80-bits to 32-bits is non-trivial. I haven’t any analysis on the actual effect of the precision change, but it does need to be considered.

The final result was unfortunately the mysterious one. I could imagine how enabling SSE2 didn’t have much of an effect if the code was constantly converting from single to double precision; thus that explained my surprise at the minimal effect of /arch:SSE2 without /fp:fast. But trying to test out all possibilities, I enabled /fp:fast without any SIMD instructions or registers made available. The result was nearly as fast as using the SSE2 registers and instructions at 32-bits of precision. Huh? I haven’t dug through the the assembly comparing /fp:precise and /fp:fast without SSE2 enabled so at this point, I’m simply very surprised. Rather than spout out an untested, unresearched theory on what the compiler must be doing in this case to manage to save so much time, I’ll just leave it at that: I’m surprised.

Written by arthur

August 24th, 2011 at 10:53 am

Reflection

without comments

Added basic reflection to the ray-tracer sample:

Reflection is added as by providing an optional std::function trace function to the C++ shader builder. The phong shader fragment now takes an optional reflectivity parameter, which, if set to a value greater than zero, computes the reflection direction (via glm::reflect()) and calls the trace function to compute a reflected ray. There’s nothing particularly special about this, but the use of a function object keeps the shader builder abstracted from the scene graph / spatial indexing mechanism. In the ray tracer sample itself, the function object is actually a wrapper on a method on the ray tracer main component.

Another minor step forward…It’ll be significantly more interesting when support for reflection in the rasterizer is added.

Written by arthur

August 18th, 2011 at 12:45 pm

Posted in lxengine

Tagged with , ,

Scripting

with one comment

In a nice wow moment, I added Javascript support to the ray tracer sample in a matter of minutes.  It “just worked” using the existing architecture with very little code added to the sample itself.

(Ok, admittedly, I did do a somewhat lengthy refactoring and clean-up submission to the LxEngine internal Javascript support – but just to make it more modular now the vision of the LxEngine architecture is a bit more mature than it was back when I first started integrating V8 into the project.)

The Result

Here’s a simple scene created via an XML document with a Javascript for loop to create the various spheres and their materials.

LxEngine Ray Tracer Scripting

Here’s a link to the XML scene file that created it.

The Code

The script support is added completely independently of the ray tracing code.  This is the way it should be and, it turns out, that’s exactly the way it is too.

(1) The LxEngine Javascript subsystem (one of the standard ones) is plugged in, via 1 lines of code:

  DocumentPtr spDocument = spEngine->loadDocument(options.filename);
  spDocument->attachComponent("javascript", lx0::createIJavascript() );

(2) A new Document::Component is registered to look for <Script> Elements, which it passes the contents to the Javascript subsystem to do the interpreting.  The scripts themselves use the LxQuery API (a JQuery-like, pure Javascript library for manipulating the LxEngine DOM) – which manipulates Elements within the Document without needing any app-specific knowledge (i.e. it doesn’t care that this is a ray tracer, only that attributes and values on Elements in the Document are being changed).

    spDocument->attachComponent("ray", create_raytracer() );
    spDocument->attachComponent("scripting", create_scripting() );

The object created by create_scripting() is quite simple and mostly just C++ boilerplate to create a class and respond to Document changes.

The LxEngine Javascript subsystem itself was added mostly back in November for other projects, but due to the architecture, it was fully reusable without any changes.

(3) There is no three. The ray tracing code itself has no knowledge of whether the Element was created via a script or was part of the initial XML.   In MVC terms, the ray tracing code just works since it is properly abstracted from the Model changes.

It was really cool to have the LxEngine architecture surprise me (i.e. the guy designing this and constantly setting unrealistically lofty goals of how I want this all to work) with how seriously easy it was to add a useful feature like scripting.

Written by arthur

May 9th, 2011 at 3:40 pm