## VS2010 Optimization Flags

The priorities of LxEngine have always been to produce simple, general, and correct code rather than pursuing optimal performance. This is arguably a justifiable set of priorities given that LxEngine is a long-term hobby project with a single developer aimed at experimenting with a variety of different technologies. In any case, I did spend a bit of time looking into the ray tracer’s performance and was able to improve it significantly via a few relative simple code changes.

I also tweaked out another 18% or so of an improvement via mere compiler flags. I’m going to confine this post to simply mentioning the effects of a few of those Visual Studio 2010 compiler optimization flags.

### The Results

Let’s start with what’s most interesting: the results. These are averaged times from 6 runs of a sample scene with a few lights and a couple reflective objects at 1024×1024 resolution.

2,464 ms     /arch:SSE2 /fp:fast

2,619 ms     /fp:fast

2,956 ms     /arch:SSE2

3,038 ms     /fp:precise (default)

The slowest results came from the default compiler settings, which is to not set the /arch flag and use precise floating point precision. In this case, the floating point stack is used with most calculations taking place in 80-bit precision on the FPU before being copied back to host memory.

### /arch:SSE2

The first change I attempted was enabling /arch:SSE2, which tells the Visual Studio compiler (cl.exe) that it should use Streaming SIMD Extensions 2. With a ray tracer there are surely plenty of optimization opportunities for single instruction multiple data (SIMD) instructions and I wondered how much the compiler alone, without any code modifications, could take advantage of this. The result was surprisingly not much at all. The times for this single benchmark were only about 2.7% faster. I didn’t expect the compiler to be able to rework the ray tracer functions into fully parallel computations, but I figured that the presence of eight extra XMM registers alone would have more significant impact. Lesson: make benchmarks, not assumptions, when optimizing.

The then looked at the generated assembly code and found the CVTPS2PD instruction used quite frequently. Huh? Convert single precision to double precision? Why is it doing that? All my data in single precision, 32-bit floating point form. Why are conversions happening? The reason was the /fp:precise flag. Even if the final results are single-precision, the intermediate calculations were being done in double-precision to retain as much precision as possible during multiple floating point operations.

### /fp:fast

When I turned on the /fp:fast flag, the generated assembly became much more straightforward as the CVTPS2PD instructions all disappeared. The benchmark also yielded noticeably faster results (18.9% faster), which is quite significant given no coding effort was required to get this speed improvement. Now, of course, it’s important to note that both the /arch:SSE2 and /fp:fast flags do change the behavior of the code. The XMM registers used by the /arch:SSE2 flag – even in /fp:precise mode – still operate with at most 64-bits of precision and with the /fp:fast enabled, that is reduced to 32-bits precision. In the initial code, the 80-bit FPU representation was used. A change from 80-bits to 32-bits is non-trivial. I haven’t any analysis on the actual effect of the precision change, but it does need to be considered.

The final result was unfortunately the mysterious one. I could imagine how enabling SSE2 didn’t have much of an effect if the code was constantly converting from single to double precision; thus that explained my surprise at the minimal effect of /arch:SSE2 without /fp:fast. But trying to test out all possibilities, I enabled /fp:fast without any SIMD instructions or registers made available. The result was nearly as fast as using the SSE2 registers and instructions at 32-bits of precision. Huh? I haven’t dug through the the assembly comparing /fp:precise and /fp:fast without SSE2 enabled so at this point, I’m simply very surprised. Rather than spout out an untested, unresearched theory on what the compiler must be doing in this case to manage to save so much time, I’ll just leave it at that: I’m surprised.

Written by arthur

August 24th, 2011 at 10:53 am

## Solution Folders with CMake (project_group())

### Update (2011.05.25): Native CMake support

In a frustrating case of less-than-intuitive usability, hard to locate documentation, and/or user error, the original post below on using Python to modify the Visual Studio solution to organize projects in folders is not necessary: CMake 2.8.3 does natively support solution folders.

The syntax is simple.  Here’s what I appended to the outer-most CMakeLists.txt in the Lx0 project to organize the sub-projects into folders (note – I have not tested this on any compiler other than VS2010 Professional):

#
# Organize projects into folders
#
SET_PROPERTY(GLOBAL PROPERTY USE_FOLDERS ON)

SET_PROPERTY(TARGET lxengine                PROPERTY FOLDER "Libs/LxEngine")
SET_PROPERTY(TARGET glgeom_benchmark        PROPERTY FOLDER "Libs/GLGeom")
SET_PROPERTY(TARGET glgeom_unittest         PROPERTY FOLDER "Libs/GLGeom")

SET_PROPERTY(TARGET sm_lx_cube_rain         PROPERTY FOLDER "Samples/LxEngine")
SET_PROPERTY(TARGET sm_lx_cube_asteriods    PROPERTY FOLDER "Samples/LxEngine")
SET_PROPERTY(TARGET sm_lxcanvas             PROPERTY FOLDER "Samples/LxEngine")
SET_PROPERTY(TARGET sm_terrain              PROPERTY FOLDER "Samples/LxEngine")
SET_PROPERTY(TARGET sm_raytracer            PROPERTY FOLDER "Samples/LxEngine")
SET_PROPERTY(TARGET sm_lxcraft              PROPERTY FOLDER "Samples/LxEngine")
SET_PROPERTY(TARGET sm_rasterizer           PROPERTY FOLDER "Samples/LxEngine")
SET_PROPERTY(TARGET sm_ogre_minimal         PROPERTY FOLDER "Samples/ThirdPartyLibs")
SET_PROPERTY(TARGET sm_v8_basic             PROPERTY FOLDER "Samples/ThirdPartyLibs")
SET_PROPERTY(TARGET cpp_smartptr            PROPERTY FOLDER "Samples/Cpp")

SET_PROPERTY(TARGET bm_lxvar                PROPERTY FOLDER "Benchmarks")
SET_PROPERTY(TARGET elm_reference           PROPERTY FOLDER "Sandbox")
SET_PROPERTY(TARGET elm_function            PROPERTY FOLDER "Sandbox")
SET_PROPERTY(TARGET sb_fixedpoint           PROPERTY FOLDER "Sandbox")
SET_PROPERTY(TARGET ut_jsonparser           PROPERTY FOLDER "UnitTest")
SET_PROPERTY(TARGET ut_lx_vector            PROPERTY FOLDER "UnitTest")

Three parts worth noting:

1. SET_PROPERTY(GLOBAL PROPERTY USE_FOLDERS ON) – for some reason you need to turn this feature on before you can use it (shouldn’t simply not using it be equivalent to it being off??)
2. TARGET target name – as you might guess, after the name after the TARGET keyword should be the name of your executable or library as you define it elsewhere in the CMake project setup
3. PROPERTY FOLDER “Folder/SubFolder” – use a forward slash to denote sub-folders

I’m hesitant to complain about tools I’ve done nothing to contribute to, but it is frustrating that CMake is an incredibly useful and powerful tool but is lacking in the usability department (e.g. unnecessarily unique syntax – why is this a TARGET PROPERTY and not a PROJECT_FOLDER() macro?, hard to track down / poorly organized documentation – I found this via a Google search that pointed me to a diff in the CMake git server such that I realized my earlier efforts were not also calling SET_PROPERTY(GLOBAL PROPERTY USE_FOLDERS ON)).

CMake already supported exactly what I needed but it took non-trivial effort to just track down that it did!

### Original Post (2010.12.16)

CMake supports the source_group() command for grouping files within a project. I like this feature as it maps well to how I organize projects conceptually.  CMake, however, appears to lack a means to create solution folders, i.e. folders for actually categorizing the projects themselves into a hierarchy.   With LxEngine, which has a lot of sub-projects – this would be a quite helpful feature for use with Visual Studio.  In other words, what is really desired (by me) is a project_group() CMake function.

In an effort to learn a tiny bit more Python – and to get the behavior I wanted (via a little less work than properly hacking the CMake sources to add project_group() functionality; though others have expressed interest and even possibly worked on such a feature, but as far as I can tell, it’s not yet a feature as of CMake 2.8.3), I put together a version 0.0 script for post-processing the Visual Studio 2010 solution file and inject the folder nesting that I want.  The result ends up with a project structure looking like this:

Solution Folders after post-processing

The script, as I mentioned, is definitely a version 0.0:

• It’s not yet reusable
• It’s hard-coded to the LxEngine project layout (but that’s easy to change)
• It’s hard-coded to the Visual Studio 2010 file format
• It’s low-quality Python code (apologies, I’m new to the language and just wanted to “get it to work”)
• It’s a post-process rather than something CMake could invoke directly

However, it does work correctly – so if anyone wants to hack together their own temporary workaround to get solution folders/ project_group() functionality from CMake, feel free to use this code as a template.

The source is located here on github.

Update:

Despite the closed bug report on vtk.org, CMake 2.8.3 for windows does not appear to have a project_group() or project_folder() command.  Nor does setting a manual property via ‘set_property(TARGET mproject PROPERTY FOLDER “MyFolder”)’ seem to have any effect.  This is true even for the nightly build 2.8.3.20101215-g4bf09.  I am assuming from the comments in the bug that the changes must only exist in a branch and not the main line?

#
# Organize projects into folders
#
SET_PROPERTY(GLOBAL PROPERTY USE_FOLDERS ON)
SET_PROPERTY(TARGET lxengine                PROPERTY FOLDER “Libs/LxEngine”)
SET_PROPERTY(TARGET glgeom_benchmark        PROPERTY FOLDER “Libs/GLGeom”)
SET_PROPERTY(TARGET glgeom_unittest         PROPERTY FOLDER “Libs/GLGeom”)
SET_PROPERTY(TARGET sm_lx_cube_rain         PROPERTY FOLDER “Samples/LxEngine”)
SET_PROPERTY(TARGET sm_lx_cube_asteriods    PROPERTY FOLDER “Samples/LxEngine”)
SET_PROPERTY(TARGET sm_lxcanvas             PROPERTY FOLDER “Samples/LxEngine”)
SET_PROPERTY(TARGET sm_terrain              PROPERTY FOLDER “Samples/LxEngine”)
SET_PROPERTY(TARGET sm_raytracer            PROPERTY FOLDER “Samples/LxEngine”)
SET_PROPERTY(TARGET sm_lxcraft              PROPERTY FOLDER “Samples/LxEngine”)
SET_PROPERTY(TARGET sm_rasterizer           PROPERTY FOLDER “Samples/LxEngine”)
SET_PROPERTY(TARGET sm_ogre_minimal         PROPERTY FOLDER “Samples/ThirdPartyLibs”)
SET_PROPERTY(TARGET sm_v8_basic             PROPERTY FOLDER “Samples/ThirdPartyLibs”)
SET_PROPERTY(TARGET cpp_smartptr            PROPERTY FOLDER “Samples/Cpp”)
SET_PROPERTY(TARGET bm_lxvar                PROPERTY FOLDER “Benchmarks”)
SET_PROPERTY(TARGET elm_reference           PROPERTY FOLDER “Sandbox”)
SET_PROPERTY(TARGET elm_function            PROPERTY FOLDER “Sandbox”)
SET_PROPERTY(TARGET sb_fixedpoint           PROPERTY FOLDER “Sandbox”)
SET_PROPERTY(TARGET ut_jsonparser           PROPERTY FOLDER “UnitTest”)
SET_PROPERTY(TARGET ut_lx_vector            PROPERTY FOLDER “UnitTest”)

Written by arthur

December 16th, 2010 at 3:24 am

Posted in tools

## Unnecessary Complexity in the Software Development Tool-Chain

C++ is still predominantly the best, or at least most used, language for performance or graphics intensive applications aimed at the consumer. I consider myself a C++ programmer and very much admire the practical and flexible language design. Yet, still I am beginning to wonder if the software development community could hugely benefit from a new primary language.   But if I think highly of the language, why do I suggest a new one might be necessary?

It’s the tool chain and the process more than the language itself.

First of all, no matter how good a language is, it’s useless if the tools are not available to use that language to accomplish your goals. (I think this statement can safely be made without explicit justification.)

Now consider if your goal is easy, rapid, globally collaborative development.  Are the best tools really there to suit this problem?  Yes, there are tools, but do they fit that goal as well as they could?

What if your goal is to eliminate as much build engineering overhead as possible before software engineering on a particular problem can begin?

Old isn’t necessarily bad, but the current tool chain for C++ development effectively is very similar to the days of early Unix development.  That’s fine, but does that tool chain really fit the ideal process of global development community working to continually contribute to open source software development?  Consider how those tools would do under a serious usability study of global collaborative development.

To save typing, I’ll assume the reader is at least a bit familar with web development (e.g. Javascript) and C++ development. Rather than write out a formal argument for my point, I’ll simply say this: consider how much overhead it takes a novice to throw together a JQuery-based webpage versus the overhead in building a large open source project.  Or, as another example, head to the discussion forums on any major third-party C++ SDK and count the percentage of posts about missing headers, linker errors, dependencies, library versions, or other configuration questions that have absolutely nothing to do with the purpose of the SDK itself – and then compare that to a Javascript-based library.  I know this is not  a fair comparison, but bear with me for a moment.

Now, just for fun, imagine that building that large open source project were as easy as viewing a webpage.  I’m not talking about development on that large project, let’s simply focus on the process of building the project.   Seriously consider it.   Overlook the size / initial download problem for a moment and just compared the steps involved in those two tasks.   A webpage usually “just works” if you have a modern browser and the web page was written by a decent developer.  It’s a zero step process: merely providing the name of the webpage is viewing it.  On the other hand, a large open source project build – well, the steps to do that could be almost anything…it very rarely “just works” and instead often learning new skills with every project.

Maybe I’m wrong about it, but imagine the influx of casual contributions to open source projects if large application builds were effectively zero-overhead to get up and run from source…but I’m getting ahead of myself.   For now, focus simply on the notion of building (rather than developing) a large open source project as being a zero-step process.

-

The next question: is there any technical reason why it couldn’t just work in the majority of cases (or at least to the same percentage as a well developed web site works)?

I don’t think there is.

-

I’m not going to pretend that there haven’t been lots of attempts to solve the broadly defined “write once, run anywhere” problem.  Java, of course.  .NET jumps to mind as a bit closer to “build anywhere” notion discussed here given it’s multi-language support.  I think .NET is conceptually fairly right on, but in practice it hasn’t happened.   Full virtual machines and runtime environments aside, tools like CMake exist to alleviate the build problem – but how effective are they?  CMake itself is yet another “overhead” item that needs to be learned which likely has nothing to do with the actual problem the developer is trying to solve.  It may be better, but it’s still another build engineering hurdle requiring knowledge from the user.

Or to invert issue completely, consider how HTML5 and technologies like WebGL are in a way approaching this fundamental “building C++ applications is hard” problem: they aim to make it easier to bring application graphics development to the simplicity of browser-based technology development rather than bring the simplicity of browser-based technology development to application graphics development. (I do realize the last statement has quite a few embedded assumptions, but I generally think it’s a true statement – or at least true enough to convey the crux of the point.)  Better yet, isn’t the development of Chrome OS implying the same general trend of pushing traditional low-level development closer to the convenience of browser-based apps rather than vice-versa?

Ok, so one more way of looking at this…

How difficult would it be to allow browsers to support compiled languages?  Technically, not very.   Chrome already compiles Javascript into native machine code – and caches the Javascript files associated with pages.   What technical difference is there really if that source file is C++?  Yes, C++ pointers, etc. would throw some wrenches into security and verifiability but otherwise, it’s just a different compiler component implementation.   Why not extend that embed the make system into ‘browser’?   And why not embed the actual source control system implementations (at least the ‘get’ aspect) into the browser?  The user would end up with a ‘browser’ that effectively views project files, can pull the relevant source to local caches, compile it, and run the application.   There’s still the first-run (i.e. priming the cache) problem – but that’s solvable too (how about pre-compiled caches of popular configurations could be hosted on the application’s website?).  Why not hide the entire build tool chain in an application ‘browser’ of sorts?

Java essentially attempted (and still attempts) to solve much of this.  Java certainly didn’t eliminate the problem, but that fact does not undo the theory behind the idea.   As noted previously, in some manners, Chrome OS effectively is hiding the application “build” in the browser.   But what about approaching the problem by making the existing build processes more like a browser rather than making more apps work from a browser?

-

Ok, so what of this problem?  Why is this more than just a rant that build engineering in C++ should be easier and just work like a browser?

I suspect that a very real source of the problem is that many experienced developers dismiss the “tool chain” issue simply because they are already so used to it. It’s not a lack of technical knowledge out there.  It’s a lack of motivation.

Sure – learning CMake may teach you about multi-platform programming and all other sorts of useful information – but to a novice programmer who wants to experiment with 3D graphics, does it really make sense that he needs at least some degree of expertise in build tools first?  No, it doesn’t.  Software should be about tackling the problem you are interested in primarily, and then secondarily be about understanding the periphery of the problem so you can improve your initial solution (i.e. in this case, learning more about multi-platform programming via CMake after the novice programmer has his experiment working on his own machine).

Encapsulating the compiler tool-chain in a browser doesn’t eliminate or ‘solve’ build engineering forever; instead, it would need to evolve by a standard like HTML.   What would be needed is a standard for project builds to deliver content flexibly, but reliably.   The issue isn’t so much with C++; it’s with the build tool-chain that’s become associated with the delivery of C++ content.

-

I generally try to avoid complaining about a problem without proposing a solution: the first step toward the solution might be for serious C++ developers to no longer be content with build engineering skills being a prerequisite to software engineering.

My gut feeling remains stuck on what seems obvious: on modern, powerful workstations, there’s no technical reason that the build process for large applications needs to be so complex.  And if there’s no technical reason, what is really preventing the solution?

Written by arthur

December 8th, 2010 at 12:44 pm

Posted in tools

## Exploring a New Area

As mentioned in the prior post, I’ve taken a break from the lower-level graphics programming that I’ve been posting about on this blog.

I’ve started a small project creatively called “Adventure”.  The vision (which isn’t very well-defined at this point) is to create a small game that is some sort of cross-breed between a linear, story-driven adventure game (e.g. King’s Quest) and a simulation game (e.g. SimCity).  What exactly this means will evolve, but overaching theme is to create a “complete” engine that heavily utilizes existing third-party toolkits, is part “sim” and part “story.”  I want to take some time away from a single low-level component and spend more time on understanding the big picture – not to mention honing my skills battling the Not Invented Here tendency many developers, my self included, seem to have.

King’s Quest I and SimCity 3000

As I mentioned, completeness is a higher priority than anything else.  Therefore, I’ve start out simple with version 0.0 being a simple text adventure game.  In the current incarnation there are only a half dozen rooms and the player simply needs to figure out the right commands to enter in order to win.

(Needless to say, with a name like “Adventure”, I’ve  put creativity in the game concept and story on the back burner as well.)

While it is a simple, text-based game at this point, I am trying to build a solid base to expand it into a more technically noteworthy game.  For example, the build system uses a nested CMake hierarchy.  The code is linking against Boost, which opens up a lot of doors for reuse of some solid, useful code.  The game data is loaded from Lua files which allow for custom scripted actions in each room.

The Adventure code itself is not much, but the intention is to establish a good base to build upon where code is added, to replaced, as more advanced functionality comes on-line.  I intent to spend a significant amount of time building a very clean base so that the architecture evolves fluidly.

Architectural Concepts

The game architecture at this point revolves around three key concepts.  They are fairly simple, but the interesting part is that even though they are simple, it’s quickly apparent how these concepts allow for a fairly complex game system to be constructed.

Cell
A Cell is a self-contained part of the game world.  The player is in one and exactly one Cell at any point during game play.  The only way to exit a Cell is via a well-defined connection to another Cell.   Any actions that take place by the player are local to that Cell.

These restrictions keep the game space well-defined so that the game simple to code for, stable, and logical.  Of course, exceptions to these rules (for example, some actions may affect other Cells) but as a general guideline this works well.

This concept works well for a single text-based game, but it also should works for a  graphical game.  The cell is basically the current “area” whether that’s a single enclosed room or a large outdoor map.  The concept can be expanded upon by making Cells into a heirarchy or expanding the notion of locality of action to chain to neighboring cells to an N-th degree.

Action
The next concept is an Action.  An Action is simply the user doing something that causes the game world to react.  Quite simple.

In the text-based world, the Action is usually just a command like “get”, “look”, “listen”.  It’s a verb that – depending on the verb – may act upon a particular Entity in the Cell or act upon the Cell itself directly.

In a graphical world, the Action may include additional details implicit from where in particular the player is located or looking.  However, the basic notion concept of a single discrete Action, with accompanying Entity and some details about the Action, more or less translate equivalently whether it is text or graphics.

Entity
An Entity is an object in a Cell that can be receive an Action.  This may be a non-playable character, a lamp, a whisp of fog – it’s anything that an Action can be applied to.

State of the Code

The current build has fairly decent starting points defined for Cells and Actions.  Next is a good, generic, scriptable Entity concept.

Recently added the ability to dynamically reload the game data while in game.  It stores the file modification time of each cell it loads and reloads and newly modified files upon an update command.   Given that the system is using reference counted boost::shared_ptr<> objects, this works correctly even if the cell that the player is actively in is reloaded (the old version won’t be discarded until it is no longer referenced).

On a related note, I also updated my perforce server to the latest version (thank you to Perforce for making it so straightforward – server upgrades of personal data always sound like potential for disaster) as part of starting this project.

Written by arthur

July 5th, 2010 at 11:58 am

Posted in lxengine

## Textures and the Sky

Not much writing today, but instead an image:

Texture mapping and the start of a sky

What’s New
Here’s what’s new in this image:

• Two RGB texture mapped spheres (one solid colored, the other phong shaded)
• PNG image support using LodePNG