BioByte 081: AF3 docking accuracy, reflections on top 50 biotechs, brain data lawsuit, "cervix-on-a-chip", ethics x life sci collabs
Welcome to Decoding Bio’s BioByte: each week our writing collective highlight notable news—from the latest scientific papers to the latest funding rounds—and everything in between. All in one place.
Happy Summer! Before we get to the good reads, just want to briefly plug our flagship AI x Bio Summit in NYC next month. It’ll be a fun evening discussing the future of biology on the trading floor at the New York Stock Exchange. We’d love for all of you lovely biocurious and biocommitted readers to join us. If you’re interested in attending, please register here!
What we read
Approaching AlphaFold 3 docking accuracy in 100 lines of code [Alex Rich, Ben Birnbaum, and Josh Haimson, Inductive Bio, June 2024]
The recent publication of AlphaFold 3 (AF3) marks a significant leap in our ability to computationally predict the structure and properties of biomolecular systems, including proteins, DNA, RNA, and small molecules, using only sequences and SMILES strings. While AF3 claims superior accuracy in protein-ligand interactions compared to classical docking tools like Vina, it’s important to recognize that Vina is not the state-of-the-art baseline. The team at Inductive Bio found that stronger baselines can outperform the blind version of AF3, particularly with more drug-like molecules, approaching the accuracy of the pocket-specified AF3.
To assess AF3’s practical impact on drug discovery, Inductive Bio compared its small-molecule docking results to existing techniques. The AF3 paper highlights a 15% improvement in PB-valid poses over Vina without structural inputs, which increases to 26.3% with pocket specification. However, Vina does not represent the highest achievable accuracy with current techniques.
Inductive Bio developed a stronger baseline using open-source tools and 100 lines of code, achieving better results on the PoseBusters dataset by incorporating an ensemble of starting ligand conformations and rescoring docked poses with Gnina, a convolutional neural network. This improved baseline outperformed the blind AF3 and closely matched the pocket-specified version.
These findings suggest AF3’s utility in operating with minimal input information, complementing rather than replacing traditional docking approaches that utilize experimental receptor structures. This nuanced view recognizes AF3’s breakthrough in blind docking while acknowledging the continued relevance of established methods for targets with available experimental structures.
The AlphaFold 3 paper provides limited information about the model's strengths and weaknesses, particularly its performance on typical drug molecules. Further research and open access to AF3 will be crucial in understanding its role in drug discovery. As the field evolves, we anticipate exciting developments and look forward to new capabilities with future iterations of AlphaFold.
Biotech Behemoths in a Review [Eddie Eltoukhy, Pear Bio]
The review highlights the significant progress in the biotech sector over the past 15 years, focusing on 50 top biotech companies. These companies, primarily therapeutics, were selected based on their valuations at IPO or acquisition. The review details the diverse paths to success, including varied therapeutic focuses, origins, and leadership profiles. It also compares biotech startups to tech startups, noting similarities in value creation but differences in exit strategies and leadership characteristics. Some interesting observations:
Oncology and rare disease were leading indications
Most companies had an approved drug in clinic
There were slightly more platform driven companies than asset driven companies
Slightly over-half of founding CEOs stayed with the company to first exit, retaining a median 5.6% equity
The Bay Area saw more of these companies (15) than Boston (10)
A valuation of $2.7B was required to make it into the top 50 companies
How We Live Longer [a16z, June 2024]
A16Z has issued a call for "outsiders" to the healthcare industry to revolutionize the current consumer healthcare experience, which is often confusing, impersonal, and reactive. The goal would be to empower individuals to better understand and manage their health – think of the Apple, Google, or Uber of healthcare – intuitive, personalized, and seamlessly integrated into our lives.
Traditional healthcare providers, while essential, may lack the incentives (typically serving payers rather than patients) and the experience (understanding consumer behavior) to drive this change.
Effects of the first successful lawsuit against a consumer neurotechnology company for violating brain data privacy [Muñoz et al., Nature Biotech Correspondence, 2024]
Last year, the Supreme Court of Chile ruled in favor of former senator Guido Girardi in an appeal filed against Emotiv, a neurotechnology company, for violating “his rights to privacy and psychological integrity”.
The plaintiff purchased Insight, Emotiv’s wireless EEG, which interprets emotions and executes mental commands. The data, however, was only available with a ‘Pro’ license. Even if the account was deleted, the data would remain in the company’s cloud system and may even be transferred to third parties. The plaintiff appealed to the courts, which led to Emotiv to delete the “plaintiff’s brain information from its database”.
Girardi vs Emotiv is a significant ruling as it establishes that privacy of brain data gathered by neurotechnologies for non-medical purposes poses new challenges. The ruling establishes that the state must “directly protect human integrity in its entirety, an issue that includes privacy and confidentiality as well as the rights to psychological integrity and integrity of individuals included in scientific experiments”. This is an important wake up call to ensure that consumer divides do not undermine human rights related to brain and mind known as neurorights.
Six things to keep in mind while reading biology ML papers [Abhishaike Mahajan and Nathan Frey, Owl Posting, June 2024]
Reading biology machine learning papers requires critical evaluation of several factors to understand the practical impact of research findings:
Established Benchmarks: Benchmarks like MoleculeNet and FLIP often don't reflect real-world scenarios due to distribution shifts across datasets. Inductive Bio suggests focusing on assay stratification and ensuring train/test splits contain dissimilar molecules for more generalizable models, even if this reduces benchmark performance.
Baselines: Many papers compare new methods to simplistic baselines, which can misrepresent true performance. For example, Inductive Bio demonstrated that docking baselines in the AlphaFold 3 paper could be significantly improved with minimal adjustments, achieving results closer to those of AlpxhaFold 3.
Curiosity-Driven Research: While innovative approaches drive the field forward, not all are practically useful. Exciting new methods might not outperform simpler, established models on practical tasks, as seen with DNA large language models compared to CNNs.
Assay Limitations: Life sciences assays can be error-prone and context-dependent, which researchers may not always highlight. Understanding these limitations is crucial for accurately interpreting study results.
Evaluation Challenges: Creating reliable benchmarks is complex, and data leakage or improper train/test splits can lead to inflated performance metrics. Awareness of these issues is essential for assessing the validity of reported findings.
Mucus production, host-microbiome interactions, hormone sensitivity, and innate immune responses modeled in human cervix chips [Nature Communications, May 2024]
Researchers at the Wyss Institute have engineered an ‘cervix-on-a-chip’ for better modeling and treatment of women’s health pathologies. This model consists of a microfluidic device lined with cervical epithelial cells to model the complex interactions between these cells, the mucus they produce and the cervical microbiome in both diseased and healthy states. Previous devices have failed to effectively model the cervical mucus in vitro, which is a critical part of maintaining a health cervical environment.
Schematic (left) and cross-sectional view (right) of dual channel microfluidic organ chip lined with human cervical epithelium interfaced across an ECM-coated porous membrane with human cervical fibroblasts.
Learning Molecular Representation in a Cell [Liu et al., arXiv, June 2024]
Researchers from the University of Notre Dame and the Broad Institute of MIT and Harvard have introduced a novel approach called Information Alignment (InfoAlign) for learning molecular representations within cellular contexts. The InfoAlign method integrates molecules and cellular response data, connecting them with weighted edges based on chemical, biological, and computational criteria. This approach addresses the limitations of current molecular representation learning methods by optimizing encoder representations to discard redundant structural information and ensuring alignment with cellular responses, thereby improving predictions of drug efficacy and safety.
The researchers validated InfoAlign against 19 baseline methods across various datasets, demonstrating significant improvements in molecular property prediction and zero-shot molecule-morphology matching. This innovative method shows promise for enhancing the prediction of bioactivity tasks by leveraging comprehensive molecular representations that include cell morphology and gene expression data.
Successful use of anti-CD19 CAR T cells in severe treatment-refractory stiff-person syndrome [Faissner et al., PNAS, June 2024]
Anti-CD19 CAR T-cell therapy is gaining attention in the field of autoimmune diseases due to its potential to target and deplete B cells, which play a crucial role in many autoimmune conditions. Originally developed to treat B-cell malignancies like leukemia and lymphoma, researchers have recognized its potential broader applications.
The therapy works by genetically modifying a patient's T cells to express chimeric antigen receptors (CARs) that specifically target CD19, a protein found on the surface of B cells. When reintroduced into the patient, these modified T cells can effectively eliminate CD19-positive B cells, including those contributing to autoimmune responses. Early clinical trials have shown promising results in treating severe autoimmune diseases such as systemic lupus erythematosus (SLE), myasthenia gravis, and refractory pemphigus. Cell therapy biotech Kyverna recently went public as a leader in this space.
In PNAS this week, a case study is published about a new indication that may benefit from anti-CD19 CAR-T: treatment refractory stiff-person syndrome, a rare neurological disorder characterized by progressive muscle stiffness and painful spasms that is thought to be mediated by autoantibodies to the inhibitory neuronal system. In a single patient with SRS, the cell therapy resulted in reduced leg stiffness, significant improvement in gait, and a walking speed improvement of 100%, all without significant side effects.
Collaborative ethics: innovating collaboration between ethicists and life scientists [Jeantine E. Lunshof & Julia Rijssenbeek, Nature Methods, June 2024]
As the pace of scientific discovery continues to rapidly accelerate, there’s an ever-increasing need for direct collaboration between scientists and philosophers or ethicists. Historically, ethical analyses have been performed at the end of the scientific research process, primarily concentrated around the time of publication or even after. However, this timeline results in little tangible impact on the research being analyzed. To address the disconnect, a team of bioethics researchers from Harvard Medical School and the Wyss Institute for Biologically Inspired Engineering have suggested a new model that involves a blended approach woven directly into each stage of the research process:
Conceptual analysis: Understanding what is being researched and appropriately categorizing and naming it for subsequent analyses
Normative analysis: Determining if the subject of the research is cause for ethical concern
Applied ethics: Establishing boundaries for what the research can and cannot be used for based on the ethical implications
Regulatory science and legal aspects: Combing over final ethical considerations pertaining to implementation (i.e. commercialization via spinoff companies)
The researchers highlight several vital use cases as examples to which the model has already been applied, including biobots, synthetic human entities with embryo-like features (SHEEFs), and neural organoids. Each of these case studies tackled complex ethical issues that will only continue to be exacerbated as science advances, particularly within biological domains.
Ongoing collaboration with philosophers and ethicists in labs has been shown to improve the quality of research as well as the success in securing funding, as many organizations providing grants require significant considerations of ethics to be included in proposals. As such, the implementation of this model as a whole underscores the importance of the ethicists functioning as a fully integrated part of the lab, fostering frequent engagement and visibility so as to build mutual trust between both parties. Despite this emphasis on the ethicist however, the model does give the lead scientists or PI the ultimate say at the end of the day. Going forward, there are still interesting questions that the authors do not fully address such as determining when exactly a lab ethicist is needed, as well as how these collaborations might affect the rate of scientific progress.
If you enjoy reading Decoding Bio, please consider sharing with a friend
What we listened to
Notable Deals
EvolutionaryScale raises $142M seed round - founded by a team of former Meta scientists, the round was led by Lux Capital, Nat Friedman and Daniel Gross. The start-up has coincided with the announcement of their raise with a release of their latest protein language model: ESM3.
Waypoint Bio launches with $14.5M led by Hummingbird Ventures – to develop better cell therapies for solid tumors, a new startup plans to test hundreds of them at once
Formation Bio raises $372M series D led by a16z - the AI native company has created a ‘hyper-efficient’ development engine that has innovated on many time-consuming aspects of the drug discovery process. The new funding will be used to acquire over 10 promising assets to leverage this innovative platform.
TwoStep Therapeutics launches with a $6.5M seed round to advance a range of therapies against solid tumor targets
Exsilio launches with $82M for a new take on genetic medicines
In case you missed it
Biomap State of the Art Overview
Thoughts on data, training volume, and scales of biology
Events
Decoding Bio’s flagship AI x Bio Summit is coming up on July 25 in NYC. Grab your spot here!
Field Trip
Did we miss anything? Would you like to contribute to Decoding Bio by writing a guest post? Drop us a note here or chat with us on Twitter: @ameekapadia @ketanyerneni @morgancheatham @pablolubroth @patricksmalone