[Insight-users] [Insight-developers] Fwd: Reproducibility: When the Reproduction Fails !
David Doria
daviddoria at gmail.com
Sun Mar 27 10:24:32 EDT 2011
n Sun, Mar 27, 2011 at 6:58 AM, Luis Ibanez <luis.ibanez at kitware.com> wrote:
> ---------- Forwarded message ----------
> From: Kitware Blog at http://www.kitware.com/blog/home/post/105
>
> Reproducibility: When the Reproduction Fails ! by Luis Ibanez (Open
> Source, Data Publishing)
>
> Just try to repeat it...!
>
> In a recent blog post:
>
> http://blog.stodden.net/2011/03/19/a-case-study-in-the-need-for-open-data-and-code/
>
> Victoria Stodden elaborates on a clear example of why reproducibility
> must be a practical requirement for scientific publications, and how
> open-data an open-code are essential to make reproducibility verification
> a practical reality.
>
>
> This is the case of a team at MD Anderson led by Keith Baggerly, that
> went out an try to replicate the results of a set of papers published
> by a research group at Duke University. To their surprise, the
> Baggerly team was unable to replicate the results disclosed in the
> original papers, and thanks to the fact that most of the data and
> software was publicly available, they were able to perform "Forensics"
> on the papers and discover that they were plagued with poor practices,
> inconsistent management of data, and unexplicable results.
>
>
> This forensics work led to the termination of clinical trials at Duke
> last November and the resignation of Anil Potti.
>
>
> Among the interesting aspects of this case, is the fact that the
> forensics team had to use the Freedom of Information Act (FOIA) in
> order to get an investigation report that Duke sent to the National
> Cancer Institute. Duke refuse to share that report with the forensic
> team under the argument that it was "confidential" information.
> However, given that NCI is a federal agency, is subject to comply with
> the Freedom of Information Act.
>
> ________________________________
> "The Importance of Reproducible Research in High Throughput Biology:
> Case Studies in Forensic Bioinformatics"
> Keith A. Baggerly,
>
> A full video of the presentation is available here:
>
> http://videolectures.net/cancerbioinformatics2010_baggerly_irrh/
>
> This lecture is mandatory material for anyone who honestly cares about
> scientific research.
>
>
> As good practitioners,
> Baggerly's team put online, publicly available, ALL THE MATERIALS
> needed to replicate what they did. You can find them here:
>
> http://bioinformatics.mdanderson.org/Supplements/ReproRsch-Chemo/
>
> A good case of a group who practice what they preach, and teach by example.
>
> Reports on bioinformatics work at MD Anderson are now required to be
> reported in Sweave (A combination of R and LaTeX) to ensure that the
> report are REPRODUCIBLE from the original data to the final results.
>
>
> Memorable quotes from this talk:
>
> "We wrote a paper on this [the fact that the results of the original
> experiment couldn't be reproduced], we first circulate it to a
> Biological Journal, and we got the comment back that: "We are sorry,
> this story seems TOO NEGATIVE, can you FIX that ?".
> "Duke administrators accomplished something monumental:
> They triggered a public expression of outrage from biostatisticians".
> "The most common mistakes, are simple ones"
>
> Off-by-one in an data table / software
> Mixing up the sample labels
> Mixing up the gene lables...
> The fun thing about simple mistakes is that if you see them, they are
> easy to fix. But if the documentation (the paper report) is poor, you
> will not see them.
>
> We suspect that the simple mistakes are more common that we would like to admit.
> Please "label the columns" and "provide code".
> When I'm reviewing papers, I look and see
>
> Do they tell me where the data is ?
> Do they give me code ?
> Do they have URLs, are the links live ?
>
> "Reconstructing this takes a lot of time. We estimate that this took
> us between 1500 and 2000 hours to figure out how all this work" (the
> effort of replicating a published paper, due to the lack of details in
> the processes used).
> "At some stage, not only us, but also others should be able to
> precisely reproduce what we did, in other words :if they start with
> the same numbers they should get the same results"
>
> ________________________________
> Useful Links
>
> Google group on Reproducible Research
> http://groups.google.com/group/reproducible-research
> http://reproducibleresearch.net/index.php/Main_Page
>
> ________________________________
> http://www.nature.com/nm/journal/v17/n1/full/nm0111-135.html
>
> "We wish to retract this article because we have been unable to
> reproduce certain crucial experiments showing validation of signatures
> for predicting response to chemotherapies, including docetaxel and
> topotecan. Although we believe that the underlying approach to
> developing predictive signatures is valid, a corruption of several
> validation data sets precludes conclusions regarding these signatures.
> As these results are fundamental to the conclusions of the paper, we
> formally retract the paper. We deeply regret the impact of this action
> on the work of other investigators."
I would like to call your attention to this week's insight podcast
about reproducibility in science in which Luis articulates some of the
points mentioned above. Information about the podcast and the audio
can be found here:
http://inscight.org/2011/03/23/episode_5/
I implore you to listen carefully to the points raised here and always
consider these issues in your daily work!
David
More information about the Insight-users
mailing list