Neural networks, cloud computing, deep learning, and in silico wizardry are on the cusp of disintermediating pharmaceutical drug discovery, cutting billions of billions off the industry’s cost of new drugs and reducing the time to get new medicines approved to just a few processing cycles. “Software eats biotech”, or so goes this new variant of a decades-old thesis. This time could be different – we could be at the singularity when “humans transcend biology” – but I don’t think so.
For reflection, here’s a quote about computer-aided drug discovery (CADD), highlighting its importance and impact:
“Drug companies know they simply cannot be without these computer techniques. They make drug design more rational. How? By helping scientists learn what is necessary, on the molecular level, to cure the body, then enabling them to tailor-make a drug to do the job… This whole approach is helping us avoid the blind alleys before we even step into the lab… Pharmaceutical firms are familiar with those alleys. Out of every 8,000 compounds the companies screen for medicinal use, only one reaches the market. The computer should help lower those odds … This means that chemists will not be tied up for weeks, sometimes months, painstakingly assembling test drugs that a computer could show to have little chance of working. The potential saving to the pharmaceutical industry: millions of dollars and thousands of man-hours”
What’s great about this quote is that you can hear its echo in current Silicon Valley tech-solves-biotech pitches, but it was from a Discover magazine article in August 1981 called “Designing Drugs With Computers”. A couple months later Merck’s initial foray into CADD was featured in a cover article in Fortune magazine with a great cover image (right): “Next Industrial Revolution: Designing Drugs by Computer at Merck.” These references highlight that over the past four decades, computer scientists have been applying their trade against hard drug R&D problems with new software. And if software has been truly eating biotech, then it’s been doing it very slowly – as there certainly seems to be plenty of biotech left to eat.
Don’t get me wrong, I’m not a Luddite on the role of in silico technology in improving R&D; quite the contrary, in fact, as I’m a big proponent and so is Atlas Venture where I’m a partner. We’ve put our money behind the concept of CADD-inspired drug discovery many times. I co-founded Nimbus Therapeutics in 2009 with Ramy Farid of Schrödinger, one of the best science-driven CADD software shops in the world, and served as it’s acting CEO for several years (more here on the original launch of Nimbus). And either myself or my firm has been affiliated with many other companies in the CADD-inspired space, like Vitae (acquired by Allergan), SGX (acquired by Lilly), Avila (acquired by Celgene), Numerate, and a number of others.
We’re certainly advocates for the past impact and future promise of both major technology streams of CADD: structure-based drug design (think protein 3D-structures with drugs bound) and ligand-based drug design (think chemistry and quantitative structure-activity relationships, or QSAR). Over the past few decades, many approved drugs had significant CADD-enabled success, including “early” wins like the HIV protease inhibitors saquinavir, ritonavir, and indinavir in 1995-1996 over twenty years ago.
Nearly every successful small molecule drug discovery campaign I’ve been a part of has a healthy dose of CADD insight injected into it to help identify leads, as well as analyze various biophysical data (crystallography, NMR, spectroscopy) to explore the role of pharmacophores, solvation states, electrostatics/hydrophobicity, and more recently the PK/ADME space, including solubility and permeability. The prospective use of these novel CADD-enabled insights is possible today due to the significant improvements in the methodologies and computer power used for molecular dynamics simulations, free energy perturbation analyses, and quantum mechanics modeling of proteins and their ligands. These techniques enable “virtual” compound library screens, as well as better analysis and understanding of the findings from actual library screens (fragments, HTS, DELs). And as computer processing power has scaled into the cloud, additional insight has the potential to be gleaned from these efforts going forward. Broadly defined, the CADD field has been a hugely important contributor to the state of modern drug R&D.
But here’s the critical takeaway: it’s only one of many contributors to overcoming the challenges of drug discovery today. There remains a wonderful abundance of artful empiricism in the discovery of new drugs, especially around human biology, and this should be embraced. Learning how to harness the latter while pursuing the benefits of CADD is critical.
To put it bluntly, we are far far away from a world where computers discover drugs, test them virtually in a cloud of robotic assays, and get them to patients with a few clicks of a mouse.
Furthermore, adding another dose of realist-skepticism, as these software tools have come online in the industry over the past couple decades, R&D productivity has largely gone downwards, not upwards. Correlation is not causation, but many in silico techniques haven’t panned out in making better drugs. More QSAR doesn’t always help if the data inputs aren’t ideal, they just make the piles bigger (and reduce the signal to noise ratio). More crystallography and computational work around inappropriate structures, unnatural conformers, or inaccurate protein homologues doesn’t help drive a discovery campaign in the right direction.
And lots of snake-oil charlatans have “sold” the promise of their variant of CADD over time. Many supposed “algorithms” and in silico platforms aren’t based on sound science. Over-fitting of data is a common mistake: these make great retrospective case studies (“we would have found this drug far faster than the actual chemists”), but often fail to deliver on actual prospective projects. A quick or clever piece of code doesn’t get you a breakthrough medicine. Sadly, though, lots of the claims the field makes substitute science-based conclusions with hype-based proclamations. Hype has been the bane of CADD’s reputation. Further, the field has done itself a disservice in many respects by adopting esoteric yet seductively impressive nomenclature like conducting “Metropolis Monte Carlo simulations” of “Grand Canonical Ensembles” (I mean, come on, who wouldn’t want to invest in that?). Charlatans often exploit this nomenclature to good effect, especially with naïve investors, but in the end its been lots of over-promise, under-deliver from this vein of CADD. The proliferation of Silicon Valley buzzwords today in drug discovery seem eerily familiar to prior snake oil speak. These and other false promises have left burnt R&D program budgets and unhappy investors in their wake.
But if current structure- and ligand-based approaches to CADD are “just” contributors to drug discovery efforts (albeit important ones), what else drives success and why can’t we model it? What’s stopping a computer from clicking its way to new medicines?
The answer is that instead of “software eats biotech”, the reality of drug discovery today is that biology consumes everything.
The primary failure mode for new drug candidates stems from a simple fact: human biology is massively complicated. Drug candidates interfere with the wrong targets or systems leading to bad outcomes (“off-target” toxicity). They interfere with the right targets but with the wrong effects (“on-target” or mechanism-based toxicity). They are most often promiscuous and interact with lots of things, some known and many unknown. Beyond their target pharmacology, drugs interact with the human body in countless ways, rendering them ineffective or worse (absorption, distribution, metabolism, excretion being four important ones). And, of critical importance, the biology might just not work at ameliorating a specific disease, improving mortality, or elevating quality of life – we often pick the wrong target to interrogate, which is a (the) major cause of attrition in Phase 2 and beyond. To make it more wonderful in its challenges, variation amongst patients (and, even more so, species!) in how biology manifests also leads to added complexity, both good (insightful) and bad (unfortunate). In fairness, even when drugs are approved, we don’t know everything about it.
Biology also drives the costs and timelines in drug R&D, as well as many of the headaches – as it still takes a lot of sweat, time, and money to figure all that biology out. Bespoke cell-based assays tackling new biology can be hard to source or create/validate, and take time. Further, much of research-stage biology is like pregnancy, as even if you throw more money and people at it you can’t speed it up significantly. Formal IND-enabling studies just take time; a 28-day GLP tox study takes months from start to final report no matter how much you are willing to pay for it. Long term mouse efficacy models are by definition long (months). Phase 1 healthy volunteer studies, like single- and multiple-ascending dose studies, are also hard to speed up if you need to ensure safety between doses as you explore new human biology.
Importantly, better chemical equity can help address this biology more expeditiously. It’s well appreciated that the best predictor of success in drug discovery is the quality of the starting chemical equity (the “hits”). Initiating a hit-to-lead or lead optimization project with a big ugly molecule and hoping to shrink it or dial out its liabilities isn’t usually a high probability gamble. CADD can often support getting to or selecting “better” starting points for these drug discovery efforts. Yet even with good chemical equity, it still takes time, money, and the contributions of lots of thoughtful scientists to advance these successfully. Many “good” molecules have broken on the anvil of biology.
It’s worth highlighting a few examples about biology defeating or obstructing CADD-inspired discovery, though the list of programs could be very very long. T-cell kinase ZAP70 has been attacked by CADD since mid 1990s (here), and yet there are no approved drugs against it. MAPK/p38 is another well-trodden CADD target: dozens of publications out there about CADD success stories against p38 with new and improved binders and the like; yet, clinical development is a veritable graveyard for these programs, as figuring out the safe and effective biology of these projects remains a challenge. Or take renin inhibition – after years of great CADD-enabled discovery, the first program got approved but only to find out in subsequent Phase III that drug development wasn’t kind (see #16 in the FDA’s recent roster of failures).
With CADD, predicting affinities easy, but the “answer” in drug discovery isn’t just a protein and ligand with predicted binding affinities or specific activities – it’s making a drug that can go all the way to patients on the market.
Beyond just biology consuming everything (which isn’t yet nor soon to be model-able), the wonderfully powerful “big” computing approaches like AI, machine learning, neural networks, and such technologies all suffer from a range of technology issues when applied to the drug discovery challenge today: perennial “garbage in, garbage out” concerns around the “training” data (even the large sets are full of noise) and how these relate to and capture the complexity of biology; black box algorithms that are hard to understand or deconvolute (a common problem with learning methods right now, as described here) as veins of future biological exploration; and, a dominating model-first, “under-the-lamppost” myopia from CADD practitioners that often misses the value of empiricism and serendipity in science. This often shows up as “we shouldn’t make those compounds [or tackle that target] because the model says they won’t work…” (That’s precisely why they should be made – to continually test the model!). Melanie Senior has an In Vivo article coming out worth reading on the subject: “Can Artificial Intelligence Help Find Better Medicines?”
All of these things, coupled with the vast wonderful complexity of biology, give me great confidence in saying that CADD won’t be mouse-clicking its way to drugs in the absence of any lab coats any time soon.
Three final observations.
CADD delivers the best when its applied in an integrated manner with drug discovery teams. CADD can’t contribute without being a key member of the team, but similarly it can’t do it all by itself. Integrating high quality CADD approaches into teams of seasoned drug hunters who know when and how to apply the in silico insights (or ignore them) is critical. Assembling high performing teams that include experienced veterans in R&D is critical to creating value, and that includes both chemists and biologists – greybeard chemists who have sniffed solvents for decades, and biologists who actually poured their own sequencing gels. It may come as a surprise to Silicon Valley, but most of these drug hunters have real PhDs (and don’t lie about them) or MDs, or both, plus years if not decades of R&D experience. The judgement, insight, and scar-tissue they bring to bear on drug discovery programs is of real value. Sure, their “acquired smarts” involve extensive pattern recognition, which in theory can be replicated by computer code and AI, but it also involves creativity, putting programs in position for positive serendipity, understanding what fork on the road to take, and how to extract the “right” insight from empiricism rather than simply chasing patterns. Silicon Valley’s love affair with both youth and hyped-optimism often ignores this, but does so at its own peril.
Nimbus’ Acetyl-CoA carboxylase (ACC) program, which Gilead just shared exciting Phase 2 data on, exemplified this experienced team model. Rosana Kapeller, Nimbus’ CSO, built and led a great scientific team, including the ACC program leader Gerry Harriman and early development lead Wes Westlin, among many others. External consultants like Jim Harwood, an experienced veteran from Pfizer on the ACC target itself, as well as a broad range of other collaborators were engaged. This scientific team collaborated closely with our founding partners at Schrödinger. We conducted a virtual screen of the ACC binding pocket, informed by a natural ligand and solvation energetics, and identified a singleton hit that led to an exciting series. Optimizing that series involved significant integration of in silico structural insights, but also a vast set of non-CADD activity inputs like cell-based assay data, efficacy, PK/AMDE, and tox screening that informed our modification/optimization choices. To drive liver uptake, an important and defining feature to the program today, moieties were designed by Gerry and others to facilitate its active transport into liver, largely without computational insight. Undoubtedly the broader CADD inputs were helpful in driving this program, but like most successful drug discovery campaigns there were lots of non-CADD elements contributing in major ways to its success (including the very innovative translational approach and fructose challenge model in healthy volunteers). The summary message is that even for a CADD-based company like Nimbus, successful drug discovery isn’t yet hackable by computers alone.
An integrative CADD approach also helps enable “virtual” biotech business models. Today’s ecosystem of globally-distributed CROs and research partners also enables companies to apply “the best of CADD” without having to build a ton of infrastructure. We’ve been zealots for “virtual” biotech without their own wet labs for years (here, here, here) – about half of our portfolio fits this description. In some ways, the “cloud biology” concept espoused of late in Silicon Valley is a riff off this same theme – virtual, distributed biotech models where experiments get done remotely. A big difference in practice, though, is that lab coats still exist in the ecosystem for the vast majority of assays and activities in state-of-the-art virtual biotech models today. High quality partners like Charles River Labs, Evotec, ChemPartner, Wuxi, and many others are key to this model. Lab coats (and scientists in them) are especially required in areas of novel science where bespoke assays or unique approaches need to be developed, validated, and expanded; in fact, hybrid models are likely to be the solution for many of those new areas of biology (see Mike Gilman’s post on the subject). Laboratory science may have gone increasingly virtual – as in distributed remotely – but it’s not yet fully automated for robots and AI just yet. Maybe work at firms like Transcriptic and Emerald Cloud Labs will change that over time, but it’s not likely broadly applicable in the near term. I’m guessing my kids (13, 12, and 9 years old) will still be able to find a career with a lab coat if they wanted, but maybe my grandkids won’t. You can be sure the future scientists of my kids’ age will need to appreciate how to apply the best CADD approaches if they are to be discovering the drugs of the future.
Putting science-first with CADD, rather than hype, is key to delivering the expected returns to patients and shareholders. As noted above, overhyped CADD solutions over the past four decades have damaged the credibility of the field. Promises of huge savings in time and money back in 1981 have failed to be delivered by CADD, in general, to date. Silicon Valley may thrive on the hype-cycle, which is clearly at work today in digital healthcare, but at the end of the day in biotech the real and only currency of long-term value is new medicines. The ever-improving CADD technology suite will continue to contribute to that currency, but we need to avoid the over-promising of the past. Silicon Valley thinks they’ve got the solution and with a flick of the switch will cure Pharma R&D of its productivity troubles. That’s not going to happen. But if lots of money chases the premise that software-eats-biotech and a proliferation of startups ensues, what will happen is lots of folks will lose lots of money. Instead of blaming biotech for those future woes, I hope they will blame their software. Doubtful.
Don’t misinterpret this post, I’m a believer in CADD and the power of big data to drive positive contributions to drug discovery. But I’m also a realist, and an investor who needs to make a return on our investments within my lifetime – and in general returns correlate with discovering and developing high impact new medicines.
Unyielding biology remains the biggest obstacle to new drugs and remains enigmatic in many aspects. The two arms of empiricism, creativity and serendipity, still play critical roles in our quest to unravel human biology. Machines are more than welcome to join us in that quest, but let’s not forget the full constellation of experienced human contributors required to deliver those new medicines.