Archive for the ‘Java’ tag

The Web Browser as a Build System

with one comment

This post can largely be qualified as daydreaming, or maybe just hoping aloud. In any case, here’s a look at some browser changes that I think could revolutionize the web.

The best part about this is that there’s absolutely nothing new being proposed below. It’s about taking existing ideas and fitting them together in a way that would work efficiently.

A Standard Browser Bytecode/Virtual Machine

Create an open ISO standard bytecode. Create a sandboxed virtual machine to run the code.

Now the browser can run script via Javascript or via any language that can be compiled into that byte code. Share the runtime environment so that chunks of Javascript and bytecode can be interweaved.

Wait, isn’t that what Java was? Key difference: integrate the DOM with the virtual machine. Honestly, that’s the key difference. You don’t end up with a disparate Java applet sitting in a hosted rectangle, essentially disconnected from the rest of the page, with a totally different looking UI. No, you get the simplicity of HTML5 for the UI because the virtual machine is seamlessly part of the page. Don’t reinvent what makes HTML so popular: use it instead.

Wait, emscripten already does this. Yes! In many ways it does, and it’s a fantastic idea. The big differences are that I’d love to see the LLVM interpreter as a standard part of the browser – plus, I imagine it’d be more efficient to embed LLVM’s interpreter into the browser native code than interpret the bytecode via a virtual machine running through the Javascript interpreter. Let’s get native support for integer types running in the VM.

Wait, what about Google’s NaCl? To me, it seems a lot of excellent technology but not put in its optimal place. Where is the advantage to running native binaries directly rather than adopting a standard bytecode instead since those native binaries need to be compiled through a custom toolchain anyway? Also, NaCl code can’t directly access the DOM and, in my mind, that’s critical.

Base it on LLVM bytecode

Let’s make it based on a subset of LLVM’s bytecode. Update the version via committee as the various parts stabilize and are agreed to be “good”. Let the virtual machine evolve to handle the needs of future web programming, rather than bloating the Javascript language into a do-all language.

Perhaps Microsoft’s .NET CIL is sufficient. I don’t know if it is “open” enough or not, though from what I’ve read it does seem pretty solid from a technical perspective.

The question really is: are there enough quality, free tools for generating the bytecode to hit a critical mass such that more quality tools start to be developed around it?

Using C++ on the Web Would Be Possible

This is the part I like, since I think in many ways, C++ is the best general purpose, production-code language.

Clang compiles C++ code into LLVM bytecode. If the browser can safely run sandboxed LLVM code, it can now run C++. Use clang on the developer machine to compile C++ code into bytecode, host the bytecode on the web, run the bytecode. (Ensuring simple Clang installers for all major platforms would certainly be a helpful side project here.)

Some run-time linking will be necessary on the part of the VM to hook that bytecode into the runtime environment (i.e. the DOM), but theoretically nothing overly burdensome has to happen.

(As a slight aside, don’t allow “unsafe” or “trusted” scripts to run outside the sandbox and access the machine resources directly. Just don’t. Ever. If hardware resources need to be exposed, let them be exposed as they are now by advancements in the web standards as WebGL, the HTML5 File API, etc. are being exposed. Use currently existing plug-in mechanisms outside the standard scripting environment if you really need to reach beyond the run-time environment. These will be available to scripts, as the DOM is, but not implemented as scripts. Keep the script environment itself clean and always sandboxed.)

Using C++ as a Web Scripting Language Would Be Possible

Now, Clang is self-hosting, so compile Clang (or a subset of it) to LLVM bytecode. The web page environment can now have the moral equivalent of a “compileCpp(source)” function.

Use a trivial bit of Javacript to transform script tags of type=”text/C++11″ into LLVM bytecode – and voila, C++11 can now be used as an interpreted scripting language.

Languages like CoffeeScript or even LESS already do a similar transformation from a new language to a browser standard; however, rather than targeting a language designed for direct programming, target a generalized bytecode intended from the start to be a target of other language transformations. (Things being used for what they were designed for tend to be more efficient.)

The potential uses for C++ on the web seem enormous to me: prototyping native code in a dynamic environment, a sandboxed development environment for new C++ modules, live unit tests of code bases, rapid previews of code modules and segments, live testing during code reviews, finally a semi-standard cross-platform UI solution for C++, shared code between web and desktop versions, a strict typing language to add robustness to critical production web code, etc.

The list of uses almost seem self-evident to me. (Perhaps I like C++ too much.)

Browser Caching Enhancements

I’m not an expert on browsing caching capabilities, but the above assumes either uploading the bytecode or compiling the C++ code on the web page load.

Let’s ensure the browser has sufficient logic to write out a compiled version of the C++ code to a cache (on a per-section basis, i.e. like O/OBJ object files). Visit the page the first time and wait while it compiles; visit a second time and it loads the compiled code from the cache (and probably only does some minimal linking). I mean literally the first visit to the server and the second visit: cache this on the server, not per-user.

Of course, some sort of “trusted user” mechanism would need to be in place if the compilation is occurring on the client-side and the caching being done on to a shared server-side. (Definitely a solvable problem – even if means something like invoking a PHP script in a password protected directory to write out the cached file; but it would be more fluid to have a standardized browser mechanism for this.)

Next, let’s make sure those caches work with CDNs (content delivery networks). There’s no reason to host the standard C++ library on my site and have that compile separately for every page that uses it. Instead, the script src tag references a library version hosted on a CDN where there will be a cached, compiled copy already available.

Also, once the idea of CDNs comes into place, the problem of “dependencies” becomes much simpler. Add a URL to the library on the CDN: never worry about compiling or installing any dependency. The URL would likely implicitly or explicitly contains the version number, compiler used, compile flags, etc. CDNs could even host the compilers they used to absolutely ensure no ABI (application binary interface) issues between their hosted code and locally hosted code.

A Web-based Distributed Compiler/Make

Now simply think of the above in broader terms: cache on a module-by-module basis, add in dependencies between segments of code, mark everything with version numbers, cache on CDNs…and it’s not much of a stretch to think of how a full Make system can be hosted in the browser (i.e. via a script/bytecode, not directly in the browser source) and how CDNs start making the whole internet start to act like a large distributed compiler.

Because this is all distributed, it’s not really bloating what the web browser does. A web page might have only a couple dozen lines of custom code. The rest is pre-compiled code sitting out there on CDNs in the same way stock photography or massive databases are sitting out there hosted on high-end servers elsewhere.

With CDNs and caching, “compile-time” should become a non-issue for the end-user; download time is the only bottleneck – and heck, that’s already true for native apps today.

A Bit of Client-Side Caching

Let’s also reserve a chunk of disk space for client-side caching of LLVM-to-native code caching. The browser can work with the VM’s JIT to cache optimized native code for the bottleneck sections of the bytecode. This should speed up the most frequently used apps to near natively-compiled code speeds (see current C# versus C++ performance benchmarks if you want a estimate of “near”).

For safety, simplicity, and security, let’s keep any cached native code local: don’t host anything that runs outside the safety of a sandbox on a CDN. (Sure, NaCl could run the native code in a sandbox as well, but let’s keep things simple.)

Other General Languages and Domain-Specific Languages

Now repeat this process for languages other than C++. Anything that can be compiled to LLVM bytecode can safely run on the web. Anything language that has a compiler that can be compiled into LLVM bytecode can be run on the web.

Any language that has a compiler hosted somewhere on the web can be used (no worrying about if the user has
the right version of Python installed or not!). The barrier to entry for domain-specific languages drops notably.

<script compiler=’http://cdn.somesite123.org/hosted-compilers/python-2.7.0′ src=’mycode.py’ />

<script compiler=’http://cdn.somesite123.org/hosted-compilers/joelanguage-0.2.3′ src=’dslcode.txt’ />

An Quick Aside on Code Reuse

I’m convinced that code reuse is dependent primarily on how easy it is to reuse the code.

Having an explosion of different languages on the web and of different language versions will not undermine code reuse. Rather, if including large, production code modules was as simply as adding support for JQuery via a CDN – then heck, a lot more code reuse would happen. Forget the build dependencies, install locations, etc. that can add serious overhead to using a third-party library (especially on Windows). All this disappears. Sure the specific locations for a particular compiler, library, etc. might not be standardized but because the same instance is shared across the web that itself making that single instance a standard: it’s going to be there for anyone who connected to the internet.

Standards make things easier to use: and a standard virtual machine does not create a standard for authoring code but rather creates a standard for using existing code. I conjecture that increasing the number of languages available on the web in this manner would actually drastically increase code reuse on the web.

These are Not New Ideas

Obviously, nothing above is new. I’ve mentioned Java, .NET, NaCl, etc. and I’m sure others can point out much earlier projects exemplifying these ideas.

I think one of the major strengths of Unix was something Windows still has failed to realize: include a compiler – that is always there – with the operating system and far more people will build tools and applications for it. (To be clear: I’m using “operating system” here to mean the full default operating environment, not just the kernel.)

The browser is becoming in many ways like an OS. What I’m envisioning is creating a standard “tool-chain” that’s always there in the browser. Look how successful simply having a standard interpreter in the form of Javascript has become. Now imagine if you took the good ideas behind Java, .NET, CDNs, Make systems, etc. and provided that instead of just Javascript.

Again, nothing here is new. However, the idea of connecting all these ideas into a simple, usable, coherent, and standard system seems so tantalizingly within reach at this point in the evolution of the web. It’s agitating. It’s exciting.

The web browser as a build system: I can hardly imagine how the web and computing world would change.

Postscript: Interested?

Think this is a great idea and have the discretionary funds to hire me to work on this?

I’m only half joking :)

Written by arthur

December 13th, 2011 at 9:50 am

Unnecessary Complexity in the Software Development Tool-Chain

without comments

C++ is still predominantly the best, or at least most used, language for performance or graphics intensive applications aimed at the consumer. I consider myself a C++ programmer and very much admire the practical and flexible language design. Yet, still I am beginning to wonder if the software development community could hugely benefit from a new primary language.   But if I think highly of the language, why do I suggest a new one might be necessary?

It’s the tool chain and the process more than the language itself.

First of all, no matter how good a language is, it’s useless if the tools are not available to use that language to accomplish your goals. (I think this statement can safely be made without explicit justification.)

Now consider if your goal is easy, rapid, globally collaborative development.  Are the best tools really there to suit this problem?  Yes, there are tools, but do they fit that goal as well as they could?

What if your goal is to eliminate as much build engineering overhead as possible before software engineering on a particular problem can begin?

Old isn’t necessarily bad, but the current tool chain for C++ development effectively is very similar to the days of early Unix development.  That’s fine, but does that tool chain really fit the ideal process of global development community working to continually contribute to open source software development?  Consider how those tools would do under a serious usability study of global collaborative development.

To save typing, I’ll assume the reader is at least a bit familar with web development (e.g. Javascript) and C++ development. Rather than write out a formal argument for my point, I’ll simply say this: consider how much overhead it takes a novice to throw together a JQuery-based webpage versus the overhead in building a large open source project.  Or, as another example, head to the discussion forums on any major third-party C++ SDK and count the percentage of posts about missing headers, linker errors, dependencies, library versions, or other configuration questions that have absolutely nothing to do with the purpose of the SDK itself – and then compare that to a Javascript-based library.  I know this is not  a fair comparison, but bear with me for a moment.

Now, just for fun, imagine that building that large open source project were as easy as viewing a webpage.  I’m not talking about development on that large project, let’s simply focus on the process of building the project.   Seriously consider it.   Overlook the size / initial download problem for a moment and just compared the steps involved in those two tasks.   A webpage usually “just works” if you have a modern browser and the web page was written by a decent developer.  It’s a zero step process: merely providing the name of the webpage is viewing it.  On the other hand, a large open source project build – well, the steps to do that could be almost anything…it very rarely “just works” and instead often learning new skills with every project.

Maybe I’m wrong about it, but imagine the influx of casual contributions to open source projects if large application builds were effectively zero-overhead to get up and run from source…but I’m getting ahead of myself.   For now, focus simply on the notion of building (rather than developing) a large open source project as being a zero-step process.

-

The next question: is there any technical reason why it couldn’t just work in the majority of cases (or at least to the same percentage as a well developed web site works)?

I don’t think there is.

-

I’m not going to pretend that there haven’t been lots of attempts to solve the broadly defined “write once, run anywhere” problem.  Java, of course.  .NET jumps to mind as a bit closer to “build anywhere” notion discussed here given it’s multi-language support.  I think .NET is conceptually fairly right on, but in practice it hasn’t happened.   Full virtual machines and runtime environments aside, tools like CMake exist to alleviate the build problem – but how effective are they?  CMake itself is yet another “overhead” item that needs to be learned which likely has nothing to do with the actual problem the developer is trying to solve.  It may be better, but it’s still another build engineering hurdle requiring knowledge from the user.

Or to invert issue completely, consider how HTML5 and technologies like WebGL are in a way approaching this fundamental “building C++ applications is hard” problem: they aim to make it easier to bring application graphics development to the simplicity of browser-based technology development rather than bring the simplicity of browser-based technology development to application graphics development. (I do realize the last statement has quite a few embedded assumptions, but I generally think it’s a true statement – or at least true enough to convey the crux of the point.)  Better yet, isn’t the development of Chrome OS implying the same general trend of pushing traditional low-level development closer to the convenience of browser-based apps rather than vice-versa?

Ok, so one more way of looking at this…

How difficult would it be to allow browsers to support compiled languages?  Technically, not very.   Chrome already compiles Javascript into native machine code – and caches the Javascript files associated with pages.   What technical difference is there really if that source file is C++?  Yes, C++ pointers, etc. would throw some wrenches into security and verifiability but otherwise, it’s just a different compiler component implementation.   Why not extend that embed the make system into ‘browser’?   And why not embed the actual source control system implementations (at least the ‘get’ aspect) into the browser?  The user would end up with a ‘browser’ that effectively views project files, can pull the relevant source to local caches, compile it, and run the application.   There’s still the first-run (i.e. priming the cache) problem – but that’s solvable too (how about pre-compiled caches of popular configurations could be hosted on the application’s website?).  Why not hide the entire build tool chain in an application ‘browser’ of sorts?

Java essentially attempted (and still attempts) to solve much of this.  Java certainly didn’t eliminate the problem, but that fact does not undo the theory behind the idea.   As noted previously, in some manners, Chrome OS effectively is hiding the application “build” in the browser.   But what about approaching the problem by making the existing build processes more like a browser rather than making more apps work from a browser?

-

Ok, so what of this problem?  Why is this more than just a rant that build engineering in C++ should be easier and just work like a browser?

I suspect that a very real source of the problem is that many experienced developers dismiss the “tool chain” issue simply because they are already so used to it. It’s not a lack of technical knowledge out there.  It’s a lack of motivation.

Sure – learning CMake may teach you about multi-platform programming and all other sorts of useful information – but to a novice programmer who wants to experiment with 3D graphics, does it really make sense that he needs at least some degree of expertise in build tools first?  No, it doesn’t.  Software should be about tackling the problem you are interested in primarily, and then secondarily be about understanding the periphery of the problem so you can improve your initial solution (i.e. in this case, learning more about multi-platform programming via CMake after the novice programmer has his experiment working on his own machine).

Encapsulating the compiler tool-chain in a browser doesn’t eliminate or ‘solve’ build engineering forever; instead, it would need to evolve by a standard like HTML.   What would be needed is a standard for project builds to deliver content flexibly, but reliably.   The issue isn’t so much with C++; it’s with the build tool-chain that’s become associated with the delivery of C++ content.

-

I generally try to avoid complaining about a problem without proposing a solution: the first step toward the solution might be for serious C++ developers to no longer be content with build engineering skills being a prerequisite to software engineering.

My gut feeling remains stuck on what seems obvious: on modern, powerful workstations, there’s no technical reason that the build process for large applications needs to be so complex.  And if there’s no technical reason, what is really preventing the solution?