BioByte 090: LLMs for novel research, the chemistry synthesis bottleneck, on the origin of viruses, molecular de-extinction, and targeted protein relocalization
Welcome to Decoding Bio’s BioByte: each week our writing collective highlight notable news—from the latest scientific papers to the latest funding rounds—and everything in between. All in one place.
If you haven’t already, please fill out this survey to help us improve Decoding Bio. It takes just a few minutes to complete. Thank you!
What we read
FDA Authorizes First Over-the-Counter Hearing Aid Software [FDA News Release, September 2024]
In a groundbreaking move that merges consumer tech with healthcare, the FDA has authorized Apple's Hearing Aid Feature (HAF) for AirPods Pro, transforming these popular earbuds into over-the-counter hearing aids. This software innovation, designed for adults with mild to moderate hearing impairment, marks a significant leap in accessibility to hearing assistance technology.
A clinical study involving 118 subjects demonstrated that the HAF's self-fitting approach yielded comparable benefits to professional fittings. The study measured amplification levels and speech understanding in noisy environments, with no adverse events reported. Dr. Michelle Tarver from the FDA hailed this authorization as a step forward in addressing the widespread issue of hearing loss, which affects over 30 million American adults. This move aligns with the FDA's 2022 regulations aimed at improving access to safe and effective hearing solutions.
This authorization not only represents a win for consumer convenience but also potentially for public health. Research has linked hearing aid use to reduced cognitive decline and depression in older adults. By leveraging a widely adopted consumer product, Apple and the FDA may have just turned the volume up on hearing health awareness.
As the lines between consumer electronics and medical devices continue to blur, we can expect to see more innovations in this space. In the meantime, it seems that the same product that is going to destroy our hearing is also going to help us assess and restore it. You won’t need to take them out!
Can LLMs Generate Novel Research Ideas? [Si et al., arXiv, September 2024]
A new study has shed light on the potential of artificial intelligence to generate novel research ideas in the field of Natural Language Processing. The study, involving over 100 NLP researchers, compared research ideas generated by human experts with those produced by an AI agent. In a carefully controlled experiment, the researchers found that AI-generated ideas were consistently rated as more novel than those produced by human experts. This difference was statistically significant (p < 0.05) across multiple evaluation methods, suggesting a robust effect.
However, the study wasn't all in favor of AI. While the machine-generated ideas scored higher on novelty, they were judged slightly weaker on feasibility compared to human-generated ideas. This highlights an important trade-off in idea generation between novelty and practicality.
The methodology of the study was rigorous. It recruited 104 expert NLP researchers, with 49 tasked with generating ideas and 79 with reviewing them. Ideas were compared across three conditions: human-written, AI-generated, and AI-generated with human reranking. The evaluation criteria included novelty, excitement, feasibility, effectiveness, and overall quality.
Interestingly, the study also uncovered some limitations of current LLM technology. The AI struggled to generate truly diverse ideas, often producing variations on similar themes. Additionally, the LLMs showed poor performance in evaluating their own outputs, suggesting that human oversight remains crucial in the idea selection process.
These findings have significant implications for the future of scientific research. They suggest that AI could be a powerful tool for expanding the frontiers of knowledge, potentially identifying novel approaches that human researchers might overlook.
Generative ML in chemistry is bottlenecked by synthesis [Owl, Sep 2024]
The synthesis bottleneck in small molecule chemistry significantly limits the practical utility of generative AI models in drug discovery, constraining exploration of chemical space and slowing feedback loops. Overcoming this challenge—through improved synthesis-aware models or expansion of easily accessible chemical libraries—could dramatically accelerate drug development.
In his latest piece, Abhi Mahajan (aka Owl) unpacks the fundamental challenge of synthesis in small molecule chemistry, contrasting it with the relative ease of protein synthesis. While generative models can theoretically explore vast chemical spaces, their practical utility is severely constrained by the difficulty and cost of synthesizing arbitrary small molecules. This synthesis bottleneck manifests in three key ways: limited model creativity due to the need to focus on easily synthesizable molecules, bias in training data towards readily accessible chemical space, and slow feedback loops for model improvement due to the time-consuming nature of chemical synthesis.
For those of us interested in ML applications in drug discovery, there are a couple implications:
The development of synthesis-aware generative models that not only ensure chemical stability but also optimize for ease of synthesis (considering factors like reaction steps, reagent availability, process mass intensity, and yield) could significantly enhance the practical utility of these models. However, defining and quantifying "desirability" in synthesis pathways remains a complex challenge.
While ultra-large virtual screening libraries (e.g., Enamine REAL with 40 billion compounds) offer a potential workaround to the synthesis bottleneck, there's an open question about whether the structural limitations of combinatorially-produced compounds compared to natural products will ultimately limit their utility in drug discovery. As these libraries expand, they may eventually encompass enough practically useful chemical space to mitigate the synthesis problem for many applications.
Where did viruses come from? AlphaFold and other AIs are finding answers[Ewen Callaway, Nature News, September 2024]
Historically, it has been challenging to understand viral evolution through the genome, particularly because the rapid evolution of viral genomes can differentiate them to the point of unrecognizability in a very short time span. Evaluating the host of gene sequences for evolutionary similarity is a difficult, if not sometimes impossible, task, but AI models like AlphaFold and ESMFold can facilitate these comparisons with their scalable predictions of protein structures. Even as the genome itself changes very rapidly, the structure and shape of the encoded protein stays relatively constant, resulting in a far more facile analysis when considering protein structure instead of sequence.
Grove et al. utilized AlphaFold2 and ESMFold to analyze the viral entry proteins of flaviviruses, a set of positive strand RNA viruses that include West Nile virus, Zika virus, dengue fever, and hepatitis C. Some notable findings included the discovery of an enzyme that appears to have been stolen from bacteria (only the second time this has been shown in flaviviruses) and a striking similarity of the previously unknown hepatitis C viral entry protein to those of pestiviruses like swine fever virus.
These crucial discoveries will continue to enable the deciphering of certain enigmatic viral mechanisms and could thus help catalyze the development of a vaccine for diseases like hepatitis C. This accelerated understanding and treatment will only rise in importance as the growth and spread of these diseases continues to escalate.
Language agents achieve superhuman synthesis of scientific knowledge [Skarlinski et al., FutureHouse, September 2024]
We covered PaperQA’s launch last December and are back with PaperQA2—FutureHouse’s AI agent that conducts scientific literature reviews on its own. PaperQA2 is the first agent to beat trained scientists on literary review tasks and represents a new way to interact with scientific research. The agent finds relevant research, summarizes it, and goes back to refine search parameters based on what is learned. It can handle both highly specific scientific questions and broader literature reviews. For the latter, FutureHouse has released WikiCrow, an agent based on PaperQA2 to write accurate Wikipedia-style articles and are working on cataloging all human genes with a WikiCrow article. A notable achievement of this research is the development of LitQA2, a hard benchmark for scientific literature research that both guided the design of PaperQA2 and provides a framework to assess retrieval capabilities of language models on scientific literature beyond abstracts. PaperQA2 was shown to match or exceed the performance of human experts on three key scientific tasks—literature retrieval, summarization, and contradiction detection. The paper further details the methods and RAG systems used but the overarching conclusion is that AI tools are capable of enhancing scientific research workflows at a rate that can outpace human contribution.
Defensins identified through molecular de-extinction [Ferreira et al., Cell, 2024]
Molecular de-extinction is a new field seeking to discover molecules through evolutionary history and use them to solve present-day problems. In this paper, the authors computationally mined genomes to search for defensins. Defensins are proteins that play a key role in host immunity and have been proposed as potential new antibiotics.
Defensins are naturally produced by microorganisms, plants and animals. As defensins are disulfide-rich proteins, they are amenable for genomic mining. The authors mined 8 extinct vertebrate genomes, searching for defensins and found 6 authentic defensin molecules. Two of which are derived from the extinct New Zealand moa (Anomalopteryx didiformis), three from extinct-in-the-wild Spix’s macaw (Cyanopsitta spixii), and one from the western black rhino (Diceros bicornis minor). By integrating phylogenetic analyses and molecular dynamics simulations, they showcased how molecular de-extinction techniques can be used to gain insights into evolutionary trajectories.
Targeted protein localization via protein transport coupling [Ng et al., Nature, 2024]
Protein localization is crucial for therapeutics because the subcellular location of a protein often determines its function, interactions, and regulatory mechanisms. Mislocalization of proteins can lead to various diseases, including cancer, neurodegenerative disorders, and metabolic conditions. Here, authors introduce an approach to targeted protein relocalization that offers potential for treating diseases through interactome rewiring.
In this paper, the authors developed small molecules called targeted relocalization-activating molecules (TRAMs) that can redirect misplaced proteins to their proper cellular locations by using “shuttle” proteins. The team identified a collection of shuttle proteins with strong localization sequences and suitable ligands for incorporation into TRAMs. Using a custom imaging analysis pipeline, they demonstrated that protein steady-state localization could be modulated by coupling target proteins to these shuttle proteins. The team then applied this technique to relocalize various proteins, including disease-driving mutant proteins such as SMARCB1 Q318X, TDP43 ΔNLS, and FUS R495X, using nuclear hormone receptors as shuttles.
In particular, they noticed that TRAM-mediated relocalization of FUS R495X from the cytoplasm to the nucleus correlated with a reduction in stress granules during cellular stress. The researchers also demonstrated the relocalization of endogenous proteins PRMT9, SOS1, and FKBP12 using methionyl aminopeptidase 2 and poly(ADP-ribose) polymerase 1 as cytoplasmic and nuclear shuttles, respectively. In another experiment, small-molecule-mediated redistribution of nicotinamide nucleotide adenylyltransferase 1 from nuclei to axons in primary neurons slowed axonal degeneration, mimicking a genetic phenotype observed in mice resistant to certain types of neurodegeneration.
Notable Happenings
Independent Directors of 23andMe Resign from Board
Once a leader in DTC genetic testing, 23andMe’s stock has plummeted 99% from its 2021 peak. The company’s original business model—selling one-time genetic testing kits—has proven unsustainable as customers rarely require ongoing services. To counter this, 23andMe launched the 23andMe+ subscription, which offers continuous health insights and advanced ancestry reports. The modest 10% year-on-year growth hasn't lived up to investor expectations for a high-growth tech company.
The real allure for investors has always been the potential of 23andMe’s massive genetic data trove to fuel breakthroughs in drug development. However, this promise has yet to materialize, and the company has struggled to raise capital for its drug pipeline in a challenging economic environment. Meanwhile, its $400 million acquisition of Lemonaid Health, aimed at integrating genetic data with telemedicine, has failed to deliver significant synergies or revenue growth.
Compounding these challenges, recent board resignations have highlighted internal disagreements with CEO Anne Wojcicki over the company's direction. Investors are now questioning how 23andMe will reposition itself amid strategic misalignment and financial headwinds.
Looking ahead, the company must demonstrate it can unlock value from its genetic data. Cost management and a clearer strategy are critical for restoring investor confidence. Whether focusing on scaling its subscription service or making headway in drug development, 23andMe faces significant pressure to deliver on its early promise or risk a sale.
Genentech, a biotech with a storied past, confronts new turbulence in the present
Genentech, a biotech leader known for innovation, is facing new challenges with the closure of its cancer immunology group and the departure of key research chief Ira Mellman. These moves hint at a potential shift from in-house research to a more traditional in-licensing model, sparking concern about the company’s future direction.
While Tecentriq continues to generate over $4 billion annually, Genentech has struggled to produce follow-up successes. The recent phase III failure of its anti-TIGIT drug further highlights difficulties in expanding its oncology pipeline.
Pressure is likely mounting on R&D head Aviv Regev, who is measured more by the number of drugs in trials than publications. Her leadership comes at a time when pharma companies are increasingly balancing the high costs of innovation with the demand for productivity. The closure of the immunology group may indicate a shift toward late-stage development over risky, novel therapies.
This raises a broader question: Is novel research better suited for academic institutions, free from commercial pressures, while big pharma focuses on commercializing external innovations? Genentech’s future may depend on its ability to balance internal innovation with external collaborations in an increasingly competitive biotech landscape.