NIH ponders on Reproducibility Rules

August 4, 2013

In a recent blog post in Nature, Meredith Wadman shared details on the NIH ongoing initiatives to foster the verification of reproducibility in the research activities funded by the agency.

“Nullius in Verba”
“Take Nobody’s word for it”

“It is an expression of the determination of Fellows of the Royal Society
to withstand the domination of authority
and to verify all statements by an appeal to facts determined by experiment.”

 

After having hosted two workshops on the issues reproducibility in 2012, the National Institutes of Health (NIH) are just beginning to take active measures to restore the proper practice of scientific research.

Lawrence Tabak, the agency’s principal deputy director, stated

“There is certainly sufficient information now that the NIH
feels it’s appropriate to look at this at a central-agency level,”

Wadman reports that

This summer, Tabak and other senior NIH officials, including Story Landis, director at the neurology institute, and Harold Varmus, director at the cancer institute, are assessing input gathered from the directors of the agency’s 27 institutes and centres. They will then confer with NIH director Francis Collins, who will decide what steps to take.

This is on the context of recent findings, that Wadman captures in the following list:

  • September 2011  Bayer HealthCare finds inconsistencies between in-house and published data in almost two-thirds of 67 projects.
  • March 2012  Amgen publication shows that the findings from only 11% of 53 landmark papers can be replicated by company scientists.
  • January 2013  Elizabeth Iorns, chief executive of Science Exchange, asks more than 22,000 study authors if they want her company to arrange verification of their experiments; nearly 2,000 say yes.
  • May 2013  A survey at the MD Anderson Cancer Center finds that more than half of its respondents have tried and failed to reproduce published data.
  • July 2013  Science Exchange launches a verification programme for commercially sold antibodies.

 

Tabak suggested:

“If the premise isn’t validatable,
then we’re done;
it doesn’t matter how well you wrote the grant,”

Wadman states also that:

Agency officials are also considering a requirement
that independent labs validate the results of important
preclinical studies as a condition of receiving grant funding.

There is of course, the question of who will pay for reproducibility and how much would that be. This is what makes so significant that finally the conversation has escalated to the level of the largest US Federal funding agency.

It has been estimated that the cost of a reproducibility study will be in the range of $25,000. This is about 5% of the original cost of doing the research, and when put in context, it is a small cost to pay for ensuring that the initial social investment in scientific research truly leads to results that stand to inspection.

Scientific reproducibility is therefore emerging as a Market Service, whose customers are any of the R&D organizations that take as input the results of basic research funded by public agencies.

The pioneer in this field is Science Exchange,

a start-up that has taken a free-market approach to bring the benefits of reproducible research to those who actually care about the research being more than a flashy paper published in journals where reviewers do not make any effort for independent verification.

Other key players in the space of Reproducible Research are

  • Run My Code: a site that allows researchers to replicate scientific results in order to check their validity and robustness.
  • Figshare: a place where data, negative results and true verification materials are shared publicly.
  • PLoS ONE: the progressive mega-journal that is require software-intensive papers to include source code under open source licenses.
  • Mendeley: The platform for managing and sharing research papers.

These progressive organizations are taking upon themselves to restore the honest practice of the scientific method, that is anchored on the practice of reproducibility verification.

(http://www.flickr.com/photos/timoni/6766900327 under CC SA 2.0 License)

The Renaissance of True Science is now Unstoppable.

3 comments to NIH ponders on Reproducibility Rules

  1. I am an Open Source, Open Science advocate. Have been since 1968.

    The devil is in the details.

    Does reproducibility of software results mean that anyone (with reasonable programming skills) can reproduce the results? Or maybe at least someone outside the author’s organization.

    For example, I write a paper that uses Cobol (or C#, or C++, or FORTRAN or PL/1 or Ada or APL) to produce the results in my paper. Or VTK 2.1? Or ITK 1.2? If I publish that code, data and results does that satisfy the reproducibility requirement.

    What if the results are created with a commercial product like MatLab? I don’t have a license. I can’t reproduce the results.

    When we presented Marching Cubes at Siggraph in 1987, the following Monday I received about 6 e-mails from people who implemented (their version) of Marching Cubes. I didn’t publish any code. Is that reproducibility?

    Maybe access to the code that actually produced the results is sufficient. I may not be able to run the code, but at least I can read their implementation.

  2. Luis,

    In the mid 1990’s I wrote a paper and created a website called “Marching Through the Visible Man” and later “Marching Through The Visible Woman”.
    http://www.crd.ge.com/esl/cgsp/projects/vm/
    This was pre-VTK, using VTK’s predecessor LYMB, a closed source system.

    Here I proposed:

    …a simple methodology to report results on the Visible Human data sets. These are by no means rigid rules, just guidelines for reporting results. The goal is to allow researchers to reproduce the results reported by others and to compare their results.

    Define methods and materials. Identify the software used in each processing step and the origin of the software. Cite any references that describe the algorithms. Describe the hardware configurations used for processing the data.
    Define parameters for each algorithm. Be specific enough so that another researcher, using the same or similar software, could reproduce the results. Parameters include region of interest coordinates, threshold values, opacity values and connectivity seeds.
    Report results Include timings for each step. Although the actual times often depend on particular implementations, these times will give the reader a general feel for the processing times.

    If additional images or animations are available report the site where they can be obtained. Use Uniform Resource Locators (URL’s) when appropriate.

  3. Bill,

    You bring up very good points.

    The question at hand when publishing a result is:

    “What is the likelihood of this result being wrong?”

    Note that it is not the likelihood of it being right. This point is masterly described by Karl Popper on his statistical interpretation of scientific reproducibility.

    This question is answered by continuous attempts to prove the result wrong, and, in the process of failing on those attempts, we build confidence on the result…without ever being able to prove it completely.

    Reproducibility is not measured by a boolean value, but by an unbounded positive integer: The number of times that others have manage to replicate the result.

    How many repetitions are enough ?

    That is a question of risk assessment versus available resources.

    For example, before a pharmaceutical company invests $800 million and 10 years on pursuing a drug candidate, based on the results of a published paper… how many times would they like to see the results of such paper being verified by independent groups ? This is indeed the scenario described here http://www.kitware.com/blog/home/post/105. Waiting for the clinical trials stage, is certainly too late and too expensive.

    Before we dictate public policy on national debt…(http://www.kitware.com/blog/home/post/479),…how many times should independent observers have look at our spreadsheets to make sure that we have not excluded three of the largest economies and completely change the conclusions of our study? We now know that the number is certainly: more than one.

    You bring up some of the practical challenges for implementing reproducibility verification. I certainly agree with most of them. However, they are easily solved with 1960’s technology: Makefiles and bash scripts.

    The challenges to reproducibility verification are not technological, but cultural: it is the classic case of resistance to change, once we have grown accustomed to a comfortable way of operating, and converted scientific research into a way of life.

    The difficulty is exacerbated by poor education on the Scientific Method. Our graduate students learn a lot about “publish or perish”, a lot about “pursuing original research”, a lot about “how to pursue funding” and not at all about the most essential aspects of the scientific method: It has to be verifiable by independent observers.

    Most graduate students have been trained as inventors and not as scientists, two very different professions.

    The outcome of NIH funded research is about 66,000 papers per year. From an annual budget input of $30B, this is an average cost of $450K per paper. How much is it reasonable to invest on ensuring that the results of those papers stand to scrutiny ?

    This can certainly be done cheaper if the original authors plan for reproducibility from day one. That is, if they truly work according to the scientific method.

    How many independent replications are enough ?

    We can always use one more… but it is also true that after many replications, there are certainly diminishing returns…

    What we know for sure is that the current absence of reproducibility is definitely Not Science, and should not be labelled as such.

Leave a Reply