Software Forethought

Here’s an all too common scenario. A bunch of really smart scientists and medical researchers get together. They envision a research program of unprecedented scale. They obtain funding, tens to hundreds of millions of dollars, from academic, commercial, non-profit enterprises, and from investors and philanthropists. The plans are drawn up, brick and mortar is under design and soon to be built! And at some point the data starts flooding in…..

Oops there’s a problem, what to do about the data, how do we maintain provenance, analyze, visualize, and share it? How are we going build reproducible software systems and translate our research to application? Well let’s just hire some software folks, they’ll take care of it, we’ll provide computing budgets for the scientists with which to buy software, we’ve budgeted a cluster (what more do you want after all?), we might even hire a computer scientist or two! In the mean time some of the more tech savvy scientists (never formally trained in software process) will start writing some code. And so it goes, we muddle along.

Just another case of software as an afterthought. I’ve seen this scenario in action much too often across groups ranging in scale from small research teams to large research institutions. To be blunt (and gentler then many deserve) the results are predictably poor: primitive computing and visualization capabilities, lost data, fractured and incompatible workflows, non existent software processes, inefficient collaboration methods, and poor science. It makes you want to cry for the waste of talent and resources, not to mention the missed scientific discoveries and the possibility of better health care outcomes.

I think it’s time to make software cosiderations a forethought, a fundamental driver of the scientific process. I’m convinced doing so will unleash a torrent of innovation. For a long time the scientific process, consisting of theory, experiment and computing (and maybe a data-intensive scientific discovery if you ascribe to the fourth paradigm) has been driven by the experimentalists and theorists, and computing has gone along for the ride. But more and more science is computationally driven, yet I don’t think we’ve reflected this in our thinking, and more importantly in the way we do science. It seems pretty clear to me: it’s time to place software and computing front and center in the practice of science, and then work with the theorists and experimentalists to solve their problems using the full potentail of computing technology.

You probably think I’m making this stuff up (I’m not) and being overly dramatic (I am, but it’s a blog after all)! For example, today I attended the dazzling, public announcement of the New York Genome Center. This $125 million venture allies some of the most brilliant minds in the genomics field, along with leading commercial, academic and research institutions. It is truly amazing, exciting and inspiring to see this come together, and I expect someday Kitware will be a part of it (after all it’s in our New York back yard and they desparately need us, they just don’t know it yet).

However as a scientific computing professional I can’t help but be concerned. How many software/computing institutions are founders of this Center? How many computational scientists are in leadership positions (as indicated from the biographies handed out at the event)? Zero and zero. More worrying, during the presentations there were all the tell-tale signs of software as an afterthought. Phrases referring to software as a challenge and a limiting concern; vague statements about data sharing plans; and a singular lack of recognition that to a large degree, understanding the genome is a computing problem. As a computational scientist sensitized to these code words, they are strong signals that software will come late to the party (once the data comes flooding in).  It could be that there is much going on behind the scenes, everything is under control, and I’m wrong (again) about this, but I’ve seen software treated as an afterthought too often to not get my spider senses tingling.

As data becomes larger and our modeling and understanding of systems more complex, the computing problem grows in importance. The point is fast approaching (that is, if we are not there already) where the innovation bottleneck will be computing, in particular software. Thus without progress on this front, the worst case scenario is that adding more theory and experimentation will do little to advance science and the medical arts. To make progress we need to break through the computing bottleneck by treating it as the serious challenge that it is.

My recommendation is that we have to think about software proactively. If you are forming a research enterprise, include a software computing professional in a leadership position, or look around and add one to your board. Alternatively engage scientific computing organizations in which software and big data are core competencies. With the right mix of people, organizations, and considered forethought, this Center and many other research initiatives like it can succeed beyond our wildest imaginings.

Questions or comments are always welcome!