If an Experiment Fails in a Forest, Does Anyone Hear?

February 10, 2013

There are many reasons why Open Science is a good thing. For some it’s a moral argument that stresses sharing the results of (usually publicly funded) scientific research with society, preventing fraud through transparency, and benefiting teaching through the use of open materials. Others see the growing complexity and challenges of science as demanding collaboration; so that larger teams with their wider expertise can be brought to bear. Clearly there are personal benefits too; as Steve Lawrence has shown there is a correlation between sharing the results of research and the number of paper citations. Many innovators and entrepreneurs are also fond of Open Science because sharing technology can accelerate the innovation process and empower small business by reducing intellectual rights barriers. And there are a lot of us that just like to have fun–the communities and relationships that form in an open environment make the hard work of science that much more enjoyable.

While I agree with all of these sentiments, they miss the crucial issue: reproducibility. It was for good reason that the Royal Society was formed in 1660 with the militant motto “Nullius in verba” or rendered in English “take nobody’s word for it.” Once the scientific method was formalized and practiced by these and other pioneers we began to benefit from the power of science. The early scientists (actually Natural Philosophers) realized that an understanding of physical reality was based on the practice of objectively performing “experiments” and repeatedly reproducing the same results. Only then could something be called truth, and incorporated into our foundational knowledge base.

I sometimes wonder whether the early scientists were responding to a time of superstition; I can’t help but think of the rather humorous comedy skit from Monty Python. In the famous “How to Tell a Witch” the reasoning behind the determination as to whether an accused woman is a witch is something to behold, and ends up testing whether the purported witch weighs the same as a duck. It’s easy to see how superstition and emotion can produce a faulty decision about physical reality; yet using erroneous facts produce similar results. While hilarious when Monty Python plays it, unfortunately there is anecdotal evidence that for many people such faulty “reasoning” processes are alive and well. If you think scientists are immune, consider the recent study that found that 90% of published papers in preclinical cancer research describe work that is not reproducible, and therefore wrong. Such people never learned or forgot about the importance of reproducibility, with the result that we are developing therapies and pharmaceuticals on shifting ground.

I don’t think that many of us in the open source world became involved thinking that we were following in the footsteps of hallowed scientists. For most of us, the reasons articulated in the first paragraph were enough. Yet very quickly we realized that without reproducible results, i.e., testing, we were doomed to build on unstable foundations. So in the end the essential requirement that our software be built on “truth” demanded that we test constantly to ensure that our results were repeatable no matter what happened to the underlying platform, data, and algorithms. It is for this very reason that we are proponents of Open Access journals like the Insight Journal, and use tools like CMake, CTest and CDash at the heart of our software development process. In this way, when an experiment (test) fails in the forest we hear it, and take the necessary steps to maintain the integrity of our foundational reality (and to the benefit of our users who build on it).

Once the imperative of reproducibility is accepted, all of the other open practices follow. Without Open Access publications to describe and explain the experimental process, Open Data to provide controlled input, and Open Source to rerun computations and analysis, it is not possible to reliably reproduce experiments. So to my way of thinking, if you are a technologist then there is no choice but to practice Open Science. Anything else is tantamount to arguing that a witch weighs the same as a duck.

Tags:

Company News Data and Analytics Software Process

7 comments to If an Experiment Fails in a Forest, Does Anyone Hear?

Luis Ibanez says:

February 10, 2013 at 3:40 pm

Well said !

—

Another worthy quote about
The Royal Society’s motto ‘Nullius in verba’:

“It is an expression of the determination of Fellows to withstand the domination of authority and to verify all statements by an appeal to facts determined by experiment.”

http://royalsociety.org/about-us/history/

Reply
Lisa Avila says:

February 12, 2013 at 11:07 pm

For me there are two parts to this – access and cost – that you’ve merged into one adjective “open”. I agree that science (and really anything where a conclusion is drawn – be it science or economics or politics or whatever) requires full availability of whatever input data and analysis methods led to the final conclusion. But does that availability have to be free? Ultimately someone must pay – it costs something to host data and source code, and to maintain the source code so that it remains valuable over time. And it costs something to perform the research in the first place. It does make sense to make the access free for government-funded research – a good chunk of taxpayer money goes into creating the scientific results, so it seems reasonable to use a bit more money to make those results openly available so that we can all build upon the research we’ve all paid for. But I struggle with the concept of privately funded research – how do you recoup those costs? You can patent the technology and protect it that way (then you can make it open but no one can use it without licensing it). That is not really an option I’d want to encourage – but what are the alternatives? You can keep it secret (and it might be the experiment that succeeds in the forest.) You can make the research results a “product” that you sell to recoup your investment in performing the research (in which case you’ve given access – but it is not free.) In my opinion this is preferable to patenting – at least you haven’t prohibited others from independently performing the same research. I see this as “speculative science as a service” – that is, you perform the science in hopes that there is a market for the results, and you sell the results. This is different from the more typical consulting method of “science as a service” where a customer pays you to perform science that is delivered only to them (and they are likely to keep it secret and/or patent it.)

At this point I realize that I am going to get a very long, very detailed rebuttal from Luis claiming that those last two examples are not true science. But with that more narrow view-point, I fear that leaves science as something that can only be truly performed via government funding (or self-funded at commercial companies with patenting for protection against the investment).

Reply
Luis Ibanez says:

February 13, 2013 at 8:02 am

Lisa

Thanks for clarifying the distinction between access and cost. Will’s article is really focused on access and why is it that: there is no Science, where there is no open access.

You bring two topics related to cost:

A. Cost of Publishing.

The publishing cost issue has been solved a long time ago, and it is quite simple: publishing costs are 1% of the cost of doing research.

Therefore, that 1% needed to cover publishing costs should be budgeted upfront as part of the cost of doing business in research. The funds should be used to pay for open-access publications, and then: problem solved.

It is important to highlight the 1% ratio, because what seems to be lost from the conversation about cost, is that the pay-wall publishing, in the quest for making their business out of that 1%, ruin the remaining 99% by locking it away from those who need access to it. Every time that a publisher puts a $30 fee as the obstacle to download a paper, they are blocking the output of a project that cost typically $500,000. This is: killing an elephant in order to feed an ant. We may rather feed the ant first, and then let the elephant happily roam the plains.

In the open access model, the publisher gets paid $5,000 upfront (which is indeed 1% of the research cost), and then the article must be publicly accessible to anyone. The publisher got paid, the cost to the funding institution is the same, and the 99% actually gets to be heard out of the forest.

To reason properly on economic matters, we should always be looking at the entire picture, not just about how good is for publisher X to have a monopoly in the market, and to have a business where they get their input for free (papers from researchers) to sell it back at high prices to the same researchers. There is no question that this is good for the publisher. It is not so good for researchers, nor universities, nor society at large.

B. Cost of Doing Research

You also bring into the conversation the question of how do we fund research, particularly in the private sector.

The worn out solutions are patents and trade secrets. They look good on paper, but not so much in practices. Patents do not produce money, except (in some cases) in the chemical and pharmaceutical industries. In the technology sector, patents are a waste of resources. More is spent on patent litigation than what is made on patent licenses. The patent process is only good for patent attorneys and for patent trolls who drain $50 Billion out of the economy every year.

The curious irony in the case of the pharmaceutical industry, is that, again, the closedness of their approach results in very inefficient research: it cost $1B and it takes 15 years to take a drug from conception to market. In 2010, only 21 new compounds were brought to the market.

The pharmaceutical industry has a crisis of productivity:
http://sciencecareers.sciencemag.org/career_magazine/previous_issues/articles/2011_12_09/caredit.a1100136

Here is where it is important to think like an economist, not like an accountant.

The pharmaceutical industry has started exploring the Open Innovation approach.
http://www.biztradeshows.com/conferences/pharma-innovation-forum/

The World Pharma Innovation Congress just passed a couple of weeks ago:
http://www.terrapinn.com/conference/world-pharma-innovation-congress-usa/programme-conference-day-one-tuesday-31st-january.stm

and covered topics such as:

“Unveiling the truth: comparing a closed-innovation past with open innovation”
* Moving beyond the blockbuster mentality for sustainable drug development
* Balancing strategic insights with collaborative discussion
* Allocating intellectual property to advance scientific knowledge

and

“How to use the tools necessary for implementing an open innovation model”

* Addressing challenges of IP regulation for open innovation discovery
* Understanding different innovation models for therapeutic development
* Increasing efficiency with new platforms to access information

Overall, we need to revise the assumptions made by old-time accepted answers, such as patents and trade secrets. They may turn out to be the result of thinking too small. Even venture capital firms are starting to realize that a patent is not worth that much, and things like “time to market”, are much more important.

Open Innovation, which is still funded, is simply more efficient, and therefore it is better for business.

Reply
Will Schroeder says:

February 13, 2013 at 8:13 am

Maybe I didn’t read thoroughly enough, but I believe you missed a few options Lisa on how to pay. There is the advertising model, I would say Google practices this (hosting terabytes of data http://www.wired.com/wiredscience/2008/01/google-to-provi/). Host “communities” of data and code, etc. and advertise in them.

There are models based on authors paying a fee at time of submission.

There are models based on levels of access. In other words, a premium is paid to access data at high speed (the free stuff is throttled). Similarly, the core data/code can be available free, and then payment for advanced services (like visualization or analysis) is required. Here the “paid-for” services sit alongside the data so data download is not required, and the “paid-for” computational services run fast due to their proximity to the data.

Along these same lines, I can also envision models where compute capacity providers (e.g., Amazon EC2) host data for free (and maybe even actively pay for acquisition of data) then charge for the use of their compute systems to perform analytics, etc. on the data they are hosting.

I’m sure there are more crazy ideas. I can even imagine non-profits or public-private partnerships where donated funds are used to develop (i.e., do the science) and maintain communities including all the data, code, and publications necessary to maintain an Open Science designation.

Reply
Lisa Avila says:

February 13, 2013 at 10:17 am

I knew I could count on a response from you, Luis! I do have to argue with the 1% number though. If you extend the concept of publishing to include providing all the algorithms and data necessary to reproduce the results, then I say publication costs quite a bit more than 1%. And here I am not only counting the time to create the initial publication (writing a more detailed paper, cleaning up source code and documenting it so that it is usable by others, getting appropriate permissions on data or creating mock data that is representative of the real thing, etc.) but also the cost of maintaining the released software (fixing issues that arise as new versions of operating systems / compilers are released, fixing issues found over time, costs of storage, backup and bandwidth, system maintenance, etc.) This requires work beyond the initial scope of the research, and often cannot be performed by the researchers (they were students but graduated last year, or they were at one company five years ago and now have all moved on to other positions). For this we need a much expanded version of something like PubMed – one that maintains not only the static resources like the paper and input data, but also the evolving resources such as the source code and reviewer comments. This need caretakers, and that requires funding.

Moving on to the argument about how open leads to more innovation – I agree. I think the problem is that in some cases the research is a step you take in order to create something you sell (let’s say a car company designing a new feature) and in other cases the research is the end result (a software company doing R&D and releasing it all as open source). In the first example you can fund your own research and embrace the boost in innovation you get by engaging a wider community (as long as you are good at being the first to market, and produce a high quality product that people want). But in the second case, how do you derive revenue from your research? Yes, I realize you can raise the funding first then perform the research (research-for-hire, but where you have an agreement up-front to release the results). As more product companies embrace open innovation maybe that type of funding will increase – but right now that is a rarity. Will, I don’t think advertising revenue can cover research costs – sure someone big like Google or Amazon might acquire lots of data that they host for free (like maps) and generate their revenue through advertising – but I really can’t see raising much revenue by selling advertising space at itk.org for example(and that seems to perhaps taint the research results – you wouldn’t want to annoy your advertisers by publishing something that might upset them…) I do agree with your service model – that is essentially the same things as having the research lead to a product where you derive your revenue. But that requires some separation between what you give away for free and what you provide as a service. You have to have something of value that folks can’t get for free otherwise they won’t pay for your service – and if that can’t be some secret algorithm or secret data(because you’ve released all that as open access) then essentially you are reselling computing services (CPU time, storage, backup, etc.) and convenience (it is easier to use the service than set it up yourself.) The problem I find here is that this is then relatively low-margin services (if the cost is too high you just encourage folks to download the software and do it themselves, or another company to set up a similar service at a lower cost) which requires high volume. But targeted scientific research doesn’t always lend itself to high volume. 🙂

Reply
Will Schroeder says:

February 13, 2013 at 11:05 am

Here’s a timely announcement related to paying for science: PeerJ. $99 entitles you to publish an article a year, for life. $300 nets you unlimited articles published per year.
http://science.slashdot.org/story/13/02/13/1343220/peerj-a-new-open-access-megajournal-launches

Reply
Marcus Hanwell says:

February 13, 2013 at 11:20 am

Doesn’t a lot of this stem back to what scientific publication is about, promoting your work or disseminating the results of scientific research. If you wish to disseminate original research, then making it openly available and reproducible is absolutely essential so that the widest audience can verify and even build upon it. It is perfectly valid to do research without publishing the results, but if you want a scientific article discussing the results the obligation to provide all necessary information to verify and reproduce the claimed results is essential. There is a nice set of slides examining a world in which mathematical proofs were not required,

http://jarrodmillman.com/talks/siam2011/ms148/leveque.pdf

Reply

If an Experiment Fails in a Forest, Does Anyone Hear?

7 comments to If an Experiment Fails in a Forest, Does Anyone Hear?

Leave a ReplyCancel reply