Why Open Source Will Rule Scientific Computing (Part 5)

In my first blog in this series of six articles, I offered five reasons why open source will rule scientific computing. In this post, I discuss reason #4: Scalability. (Click through to see parts one, two, three and four of this discussion.)

I think we can all agree that the technological world is getting bigger; data is getting way bigger; computing systems are becoming more complex; research teams are growing in size and scope; and technical solutions require integrating multiple technologies. As a result, keeping up and making use of current technology is becoming more difficult, and even the biggest organizations are challenged by the need to maintain staff and resources.

The failure to address this scalability challenge shows up in many surprising and insidious ways. For example, as data becomes bigger it is natural to address the computational problems by using parallel computing, i.e., use multiple computing processes to operate on data in common memory. However, users typically discover that the initial benefits of shared memory, parallel computing rapidly disappear due to scalability issues. In this case, the very simple operation of writing a number to a specific location in computer memory becomes problematic as the number of computing processes grows large and traffic jams result from thousands of simultaneous write requests.

But scalability issues show up in many other less obvious, but equally challenging and interesting ways. For example, as system complexity grows, how do you develop software? Test and debug it? Create and implement intricate algorithmic solutions? Manage a large software community? If you are licensing software, how do you deal with possibly tens of thousands of computers scattered across an organization, many with multiple processors and frequent upgrades? While we are far from answering these and dozens more scalability questions, it does appear that open source approaches do offer some advantages.

For example, as many have argued before me, open source software processes scale better than proprietary models when it comes to developing and testing software. For example, Eric Raymond famously stated in his book The Cathedral and the Bazaar that “open-source peer review is the only scalable method for achieving high reliability and quality”. The book Wikinomics argues pretty persuasively that open source approaches successfully pull together teams from disparate organizations and with widely ranging talents to solve difficult problems. I particularly love the recent example of collaborative mathematics as described in the October 2009 issue of Nature.

In this project, Timothy Gowers of Cambridge ran an experiment in collaboration by describing on his blog the Polymath Project. While the goal was to solve a problem in mathematics, Dr. Gowers wanted to attack the problem using collaborative techniques inspired by open source software development (Linux) and Wikipedia. Surprisingly, within six weeks the problem was solved with the contribution of over 800 comments and 170,000 words on the project wiki page, involving participants as diverse as high school teachers to mathematics luminaries. To quote the article referenced above, the project was successful on many fronts:

      For the first time one can see on full display a complete account of how a serious mathematical result was discovered.
      It shows vividly how ideas grow, change, improve and are discarded, and how advances in understanding may come
      not in a single giant leap, but through the aggregation and refinement of many smaller insights. It shows the
      persistence required to solve a difficult problem, often in the face of considerable uncertainty, and how even
      the best mathematicians can make basic mistakes and pursue many failed ideas.

Gathering a brain trust like this together is nearly impossible in the usual hierarchical organization, and I believe open source approaches are far more capable of solving difficult technology problems. I think the future of scientific computing is to learn how to grow, manage and coordinate large, ad hoc communities (i.e., address the scalability problem). This will challenge many of us, whether we are trying to coordinate international teams of academic and industrial researchers (e.g., a National Center of Biomedical Computing such as NA-MIC), or businesses that must learn how to assemble, manage and motivate disparate communities to provide effective technology solutions.

In the next and last blog entry of this series, I will discuss reason #5: Business Model

Questions or comments are always welcome!