Scientific Reproducibility: Begley’s Six Rules

Posted September 26th, 2012 in Biotech startup advice, Translational research

The lack of robust reproducibility in the scientific literature is both shocking and troubling, and has been a widely covered topic over the past couple years.

One of the earliest blogs here at LifeSciVC was on the dirty secret that more than half of academic work couldn’t be replicated in an industrial setting, and how it shaped the way we view starting new companies as venture investors.  It got the attention of BioCentury/SciBX as well as the Wall Street Journal.

Later in 2011, some real data was added to strengthen the case: a Bayer Healthcare team published work showing that only 25% of the academic studies they examined could be replicated (Prinz et al. Nat. Rev. Drug Discov. 10, 712, 2011).  And then earlier this year, Glenn Begley (formerly Amgen) and Lee Ellis (MDACC) showed that of 53  “landmark” oncology studies from 2001-2011, each highlighting big new apparent advances in the field, only 11% (only 6!) could be robustly replicated in work done at Amgen (Begley & Ellis Nature 483, 531–533, 2012).  Adding insult to injury, the number of citations for the unreproducible findings actually outpaced those with reproducible findings according to the Amgen work: averaging 248 vs 231 citations, respectively, for papers in high impact factor journals and an even more astonishing 169 vs 13 citations for papers from other journals.

These are frightening statistics for an industry predicated upon building on the prior work of others and the integrity of peer review for sorting the good from the bad.

As we think about starting new companies, initiating drug discovery campaigns, or even launching clinical studies, how do we deal with this issue?

First and foremost, we need to get better at assessing the scientific literature, and all of us involved in translational medicine need to hold ourselves to a higher standard – including investigators, their institutions, journal editors, grant-funding bodies, VCs, Pharma, etc…  From an industrial perspective, we should also do better diligence – assessing science with better filters about what’s robust and what’s not.

To that end, Glenn Begley has some up with a great list of rules for what makes a robust, high quality paper with the hallmarks of reproducibilty, based on their review of scores of papers.  Like Lipinski’s Rule of Five for predicting oral activity of small molecule drugs, I’d like to propose calling these Begley’s Six Rules for Reproducibility:

1) Were studies blinded?

2) Were all results shown?

3) Were experiments repeated?

4) Were positive and negative controls shown?

5) Were reagents validated?

6) Were the statistical tests appropriate?

Lets take each one of these in turn.

1) Most studies aren’t blinded with experimental and control arms.  Furthermore, by my estimate, less than 20% of Methods sections even mention whether the work was blinded to prevent experimenter bias, and in most cases the blinding methodology isn’t included.

2) Results from multiple studies are rarely shown in the same paper, as its usually only the “representative” example figure (read = best single result).  Outliers often disappear from figures  (e.g., telltale sign are n’s differing randomly between arms).  Many western and northern blots show only a computer-generated slice of the gel, without size markers.  Its also often unclear if the exposures were in the linear range of the staining.

3) N-of-1 experiments are sadly fairly commonplace in the literature.  Assays often don’t have replicate values included.  Nor are aggregate n’s often used.  Its true that some long term animal models are a chore to do multiple times, and often critical reagents are expensive – but repeating studies before publishing should be the bar.

4) The use of both positive and negative controls to benchmark an experimental system is frequently not done.  In fairness, with a novel model, there might not be a positive control.  But if there is, it should be included and described.  Selection of the right controls is also an issue: e.g., when studying the role of a single kinase in a disease, a promiscuous dirty kinase inhibitor that happens to hit the target of interest is probably not a great control.

5) Validated reagents are essential to draw robust conclusions.  Unfortunately, Begley and his colleagues found this to be frequently overlooked, especially the strength of immunohistochemistry probes and western antibodies (e.g., species cross-reactivity).  Authors should highlight where the validated reagents were obtained.

6) Statistics is a big gap for most papers.  Proper powering of animal studies with a pre-agreed stat plan is a rarity.  Showing n’s and SEM bars in figures is important.  Also, what’s the right p-value to use; for instance, p-values of 0.05 aren’t relevant to post hoc analysis hunting for signals on a chip.

These Six Rules are good guidelines for those of us in the business of finding and commercializing the next cutting edge science.  Thinking about these during diligence around an investigator’s work will undoubtedly improve the outcome of academic-to-industry translational efforts.  Furthermore, Tech Transfer Offices should hold these Rules up when they are working on invention disclosures and external outreach for the work.  Lastly, more CROs should track the literature and propose to do reproducibility work in line with these Rules for high impact science out of top tier academic centers; my guess is many academic institutions would support those studies.

Importantly, adherence to these Rules won’t make reproducible translation work 100% of the time.  Fundamentally, there are “language” differences between academic and industrial work.  The often used phrase “safe and well-tolerated” in an academic animal study means the animals didn’t look sick nor did they die.  But it doesn’t mean that even gross organ pathology was ruled out, much less full histopathology, chemistry and blood counts, liver enzyme levels, etc…  This language difference is an important factor in translation, but is much more nuanced than Begley’s Six Rules and needs to be considered in any academic-to-industry transfer.

The entire ‘system’ of biomedical research needs to change in order to address these issues and raise the bar.  Grant funding bodies need to demand it, as do journals.  Investigators should want to do it and be rewarded for their work’s robustness.  As one effort aimed at addressing this systematic issue, the Science Exchange’s Reproducibility Initiative launched earlier last month and I’m honored to be on their Advisory Board.  The Initiative aims to provide “both a mechanism for scientists to independently replicate their findings and a reward for doing so”.  It’s received lots of press in ScienceNature Biotech, Slate, BioCentury, and others.  I’m hopeful that it will help create the momentum needed to address this troubling issue.

But, as a parting remark, let’s not forget that this is all about cutting-edge science.  There will always be studies that can’t get repeated – that’s part of the iterative nature of the scientific method of articulating and challenging hypotheses.  But as a system we can’t continue to tolerate ‘hit rates’ of reproducibility below 50% from academic scientific literature, especially from top tier journals and biomedical institutions.

This entry was posted in Biotech startup advice, Translational research. Bookmark the permalink.
  • Andy Spencer

    Excellent post. I agree … there are so many bad papers out there. A fix for #2: In this digital age, remove or vastly increase character limits on manuscripts. Fitting data into the limited space can be tough! A fix for #3: Reviewers need to reject n=1 experiments out of hand. VCs: always go to the lab notebooks during diligence!

  • Anonymous

    I’ve never worked in drug discovery or clinical trials, but for biotech research in general I think the system needs to distinguish between research (i) for tech transfer invention disclosure situations, (ii) for inclusion in patent applications, (iii) for investors, and (iv) for peer-review/academic publications. I think it’s OK to have different standards as long as people know different standards are involved. Whilst your suggestions for improving reproducibility are clearly laudable, things are not going to change very quickly for most types of research, and therefore investors are going to have to develop ways of analysing the risks that take into account lack of reproducibility.
    Also, in the situations I have encountered statistics are a very difficult thing to achieve objectivity on. Clued up scientists know how to phrase the initial hypothesis and which parameters to measure, how often, to give the best stats. Their careers depend on the right results being achieved, and there might not be anyone else in the world who can provide an objective independent analysis. That’s life.

  • Have pity for the poor graduate students! I think you just tacked 18 months onto their PhD!

    But seriously, your point is very well taken. Some irreproducibility is to be expected, but this level is alarming.

  • Stanford Grad Student

    You’re not going to change anything until the incentives for scientists change. Scientists are people too and will act in their own self-interest, thinking primarily of their careers. Academic journals reward conciseness with each figure representing a novel, high impact result. If most of your figures depict controls, you have a pretty low density of “interesting science” per page and are unlikely to have your work published in a prominent place. And with so many grad students and postdocs in biomedical fields competing for a small set of jobs, it makes a big difference whether or not you can publish in a high impact factor journal!

    The journals Science and Nature both have 5 page, 6 figure limits. Cell is slightly better with a 7 figure, 55,000 character limit. For physicists, the most prominent journal is Physical Review Letters, which I believe has a 6 page limit with no supplemental info permitted. The primary concern of journal editors is not to publish the most reliable science out there, but to publish exciting and sexy results that increase their standing in the industry (which in turn allows them to make more money from subscriptions).

    Plenty of my professors are fed up with this and make grand pronouncements about how much they hate the current system. They tell their grad students and others they shouldn’t be enslaved to the hegemony of Science/Nature/Cell and promise that what really matters is the quality of their science. Yet every time a prospective faculty candidate gives a job talk, the speaker is inevitably introduced by these same individuals with some variant of the phrase “And while a grad student and postdoc, he manage to publish 3 Science papers and 1 Cell paper!” What goes unmentioned is whether they published elsewhere or what science they may have published in lesser journals, which just goes to show that talk is cheap.

    I wish things were different, but we’re kinda stuck. Any scientist (or journal, for that matter) that tries to move away from the current system will ultimately be penalized. So I’m not optimistic about any kind of change for the foreseeable future.

  • Caltech Grad Student

    Great points Stanford. I agree, no change will come anytime soon due to the entrenched incentive structure specifically in biology where PhDs are 6+ years to get a good paper. Journals currently on top do not want the status quo to change.

  • Rod Riedel

    Thanks very much for this interesting post.
    Here’s an additional suggestion to consider: before licensing any asset from an academic laboratory, start-ups/VCs should first audit the laboratory notebooks in the lab and second, conduct an independent confirmatory study, designed along the parameters described in Begley’s Six Rules, using an independent contract laboratory. Start-ups/VCs should obtain input and agreement from the academic entity about the confirmatory study design, its endpoints and what result would be considered a success. If necessary, start-ups/VCs should agree to share the data with the academic entity offering the asset. A licensing decision could then be based on a set of data as reliable as possible – the investment would of course still carry considerable risk, but a early showstopper (an in-licensed project that turns out to have no sound scientific basis) would at least be removed from consideration… Right now, industry-wide discussions indicate that the incidence of “showstoppers” is alarmingly high, with many unfortunatly discovered only after several years’s effort and millions of dollars investment.

  • Anton Bespalov

    Dr Booth,

    Thank you very much for this post! It is very good that investors pay attention to quality aspects of research, aspects that usually escape attention of both scientific experts and due diligence teams. Quality (Begley’s six and beyond) is especially important for non-regulated areas of biomedical science where data have commercial value.

    I am working for a recently established company ( that is the first to provide expert evaluation of drug research quality as a service. I will be happy to provide you with more information, if interested.

    Anton Bespalov, MD