BioByte 163: CRISPR Targets Cancer, Tahoe's Rhaister Model Challenges the Need for the Virtual Cell and the Bio Bottleneck is Agentic

Pranay Satya, Varun Agarwal, Mikaela Kimpton, and Gia-Bao Dam

Jun 11, 2026

Welcome to Decoding Bio’s BioByte: each week our writing collective highlight notable news—from the latest scientific papers to the latest funding rounds—and everything in between. All in one place.

Aamir Ahmed, Jane Pendjiky and Michael Millar, *Detecting cancer in human tissues, LM*, fluorescence immunohistochemistry & confocal microscopy

🏙️🧬🖥️ Reminder to apply to attend our fourth annual AI x Bio Summit at the NYSE on July 23rd! Spots are filling up quickly, apply here to attend. 🖥️🧬🏙️

What we read

Blogs

Back to basics: Observed statistics are sufficient to predict drug responses [Svensson et al., Tahoe Therapeutics, June 2026]

Why it matters: Predicting how cells, tumors, or patients will respond to a drug is the linchpin of most translational biology and drug discovery, and the field’s current bet is the ‘virtual cell model’ – large neural nets trained to simulate single-cell responses to perturbations (drugs, gene edits, cytokines) they’ve never seen. Tahoe Therapeutics challenges this: it may not be necessary at this scale. Their model, Rhaister, discards simulation and works directly on the summary statistics researchers already compute from screens: for each gene, the relative shift in expression from perturbation (log2 fold change), the statistical confidence (Mann-Whitney p-value), and the magnitude of change (expression delta). On the field’s standard transcriptional benchmarks, this stripped-down approach matches or surpasses the leading virtual-cell model, STATE, and reaches the screen’s own half-sample reference performance: the accuracy you get predicting one half of the measured data from its other half (i.e. an estimate of the assay’s own reproducibility limit). Beyond this reference level, improvement is dominated by measurement noise instead of biological signal – meaning strong models cannot be separated from trivial models. This implication cuts two ways: large, complex models may be unnecessary, and the benchmarks evaluating them may already be maxed out.

Rhaister’s mechanism is deliberately plain. In a new context – a cell line, tumor, or sample – you run only a small, standard ‘panel’ of perturbations (the same handful in every context) and let Rhaister predict that context’s response to every perturbation held out, expressing each as a linear combination of the panel’s responses. The weights are learned from reference contexts where the full screen exists. It trains in seconds and predicts in milliseconds, fast enough to pair with AI agents that reason in real time.

The field assesses these models on a transcriptional task: given a perturbation, predict which genes’ expression shifts and by how much. Across three such screens, Rhaister’s performance scales with the diversity of reference data on hand – it saturates the metrics (scores of how predicted gene-level changes match real ones in magnitude and rank-order) on Tahoe-100M (Tahoe’s massive drug-screen atlas), narrows to a near-tie with a simple additive baseline (non-interaction model, where response = global gene effect + context effect + perturbation effect) on a human-donor cytokine screen (immune cells from blood donors stimulated with cytokines), and degrades on a CRISPR screen with just four contexts. Throughout, Rhaister matched or outperformed STATE – which, on the cytokine screen, trailed the additive baseline – and ceded only two metrics, on the sparsest CRISPR screen. The last screen is indicative: the wins are a product of data more than modeling.

Modeling gene expression isn’t necessarily the key endpoint – measuring and predicting context-specific drug sensitivity (e.g. does the drug inhibit growth of target cells) is imperative too. Here, the team built Emerald Bay, an atlas of 91 cancer drug treatments across 52 tumor cell lines with paired drug-sensitivity and single-cell transcriptomes (~1.8 million cells). This task remains unsolved: against the same half-sample reference (0.99 here), naive baselines score ~0 and Rhaister reaches an r² of 0.26 from sensitivity data alone and 0.31 with transcriptomics added – demonstrating significant room to learn real signal.

The team goes further with Rhaister-O, a ‘zero-shot’ variant that drops the panel entirely and predicts a context’s drug responses from its baseline (untreated) expression alone – matching or surpassing STATE across every metric, the first model to meaningfully clear simple baselines on that harder setting. A bare linear model contending with complex perturbation models isn’t an isolated outcome: over the past year, a string of independent benchmarks reached the same verdict: foundation models for perturbation prediction fail to beat simple baselines (most prominently, Ahlmann-Eltze et al. 2025). However, this doesn’t equate with solving the task. Broadly, strong prediction clusters where data is abundant and the query is in-distribution; at the critical frontier – unseen perturbations, contexts that screens cannot reach – every current model still falls short, and better-structured models on larger, cleaner data offer potential. Tahoe’s contribution is less a better predictor than a useful recalibration, demonstrating where the field has made substantive progress and where it has learned to score-saturate proxies.

Paving the way for agents in biology [Laura Luebbert (based on research by Nasri et al.), Anthropic Research, June 2026]

Why it matters: AI has achieved strong performance on many biology benchmarks, yet most biological research remains fundamentally human-driven. The bottleneck is no longer factual biological knowledge but the ability to execute complex scientific workflows involving literature review, data analysis, hypothesis generation, tool use, and experimental planning. This post argues that progress in AI for biology will come from building agents, not just better models.

Luebbert argues that biology is particularly well-suited for agentic systems because modern research is already conducted through a sequence of computational tasks. Scientists routinely search literature, write code, analyze datasets, run bioinformatics tools, interpret results, and iteratively refine hypotheses. Many of these activities map naturally onto AI systems capable of tool use, planning, and long-horizon execution. However, existing evaluations fail to measure this capability. Most biology benchmarks assess factual recall or narrow prediction tasks, providing little information about whether an AI system can contribute to real scientific discovery. Anthropic proposes shifting evaluation toward workflow-level performance, where agents complete realistic research tasks spanning multiple tools, datasets, and reasoning steps.

The article draws an analogy to software engineering. Early coding models could answer programming questions but were unable to build software systems. Meaningful progress emerged only after models acquired tool use, execution environments, iterative debugging, and the ability to operate across entire development workflows. Anthropic argues that biology is approaching a similar transition – going from models that answer biology questions to agents that participate in research itself. To support this shift, the company highlights collaborations with research institutions and life science organizations aimed at developing realistic evaluation environments and agentic workflows. The goal is not a biology chatbot, but systems capable of operating within scientific research processes alongside human researchers.

Luebbert reframes biological AI from a modeling problem to an agency problem. The next frontier is systems that can autonomously navigate the design - analyze - hypothesize cycle of scientific discovery, forming the foundation for AI-driven biological research.

Papers

Targeting Cancer-Specific Mutations with RNA-Triggered Chromatin Shredding [Zeng et al., Nature, June 2026]

Why it matters: Certain cancer mutations have been historically difficult to target due to the lack of druggable pockets in tumor suppressors. This paper presents the use of a CRISPR-Cas12a2 system that detects RNA transcripts of mutated cancer genes and induces “chromatin shedding” and cell death which may serve as a promising therapeutic approach.

The progression of certain cancers such as ovarian and pancreatic cancer can be attributed to mutations in tumor suppressor genes like TP53 which encodes the p53 transcription factor. However, developing inhibitors and therapies against mutated p53 has proven challenging due to the lack of druggable pockets and restoring normal function. In this paper, a team of scientists from Jennifer Doudna’s group present an alternative strategy to kill tumor cells with mutated p53 by using a CRISPR-Cas12a2 nuclease to detect aberrant mRNA transcripts. The team shows that the Cas12a2 can successfully target a range of mutations in p53 while avoiding wildtype, healthy cells in mammalian cells and in vivo mouse models.

The most well-known CRISPR-based systems have been predominantly used for delivering precise base editing capabilities and feature minimum trans cleavage activity which is essentially indiscriminate, non-targeted cuts to surrounding nucleic acid. Drawing on CRISPR-Cas12a2’s ability to induce cell death in bacteria using trans cleavage in response to foreign transcripts, the authors reasoned that such a system could be used to specifically target cancer cells with mutated mRNA protein transcripts. After proving that Cas12a2 could work in eukaryotic settings (plasmid turned into chromatin using yeast histone proteins), the team set out to prove that the system could kill mammalian cells in a targeted manner. Using HEK293T cells expressing GFP, the authors designed Cas12a2s with guide RNAs targeting GFP transcripts. Followup imaging experiments showed that cells with GFP expression stopped replicating while those without remained unaffected even with the Cas system. Analysis also showed that the affected cells had enlarged and fragmented nuclei consistent with significant DNA damage; furthermore, experiments showed an elevation of histone modifications and markers consistent with DNA double stranded breaks and damage which proved that the Cas12a2 was cleaving DNA rather than activating some other cell death mechanism.

Moving on to cancer-related targets, the team demonstrated that guide RNA design and affinity was crucial for modulating Cas12a2 efficacy against overexpressed oncogene targets. Another important focus of the study was proving the system could be used to target a range of specific mutations in different cancers. First, the authors targeted an EGFR exon deletion found in lung cancer and verified that a guide RNA designed against the deletion junction sequence could induce cell death in mutant carrying cells without affecting wild-type EGFR cells. However, the most interesting application was in TP53 point mutations. The team screened 28 guide RNAs across a range of mutation sites to identify sequences that had nearly a 100-fold selectivity for mutants over wild types. To ensure that the system could be used in vivo, the authors used lipid nanoparticles to deliver Cas12a2s and guide RNAs for p53 into mouse models for liver and lung tumors. In the case of the hepatocellular carcinomas, mice showed a significantly reduced tumor burden with similar results in the models for lung cancer as well as delayed metastatic progression. In summary, this work demonstrates the potential for Cas12a2 to form the basis of a highly specific therapeutic platform for cancers where drug development efforts have continued to struggle.

Notable deals

Lila Sciences in talks to raise $2B Series B from CalPERS and NVentures. After launching with a staggering $200M seed round closely followed by a $350M Series A—giving an initial valuation of over $1B—just this past October, the Flagship Pioneering spinout is reportedly now seeking more funds and an even greater valuation: $8.5B. The massive fundraises hinge on Lila’s promise of building scientific superintelligence—increasing the rate and efficiency of scientific discovery through use of AI tools.
Ethyreal Bio emerges from stealth with $101M total financing between Series A and B. The startup is targeting thyroid diseases with high unmet need via precision therapies. Ethyreal’s lead candidate, ETHY-001, a monoclonal antibody which blocks autoantibody-mediated activation of the thyroid stimulating hormone receptor (TSHR), is already positioned to enter clinical trials later this year with preclinical data being presented next week. The therapeutic is aimed at treating Graves’ disease and thyroid eye disease—two debilitating conditions sharing a causal pathogenic mechanism—via low volume subcutaneous administration by an autoinjector, with half-life extension technology allowing for infrequent dosing. The Series A was co-led by Atlas Venture and Medicxi Ventures with participation from Nandi Life Sciences and Checkpoint Capital, while the Series B was led by Avoro Capital—joined by all investors from Ethyreal’s Series A.
City Therapeutics raises $99.5M Series B co-led by Viking Global Investors and Sofinnova Investments. The company is pioneering RNAi (RNA interference) therapeutics, and will use the funds to push their lead asset through Phase I trials, help to further build out their platform, and advance the rest of their pipeline, ideally putting two more therapeutics into clinical studies by the end of the year. Their lead asset, CITY-FXI targets Factor XI to treat thromboembolic diseases, while their next furthest asset (CITY-RBP4, in preclinical studies) is going after Stargardt disease. Alongside the two leads, new investors included Casdin Capital and NYBC Ventures. Existing investors AN Ventures, ARCH Venture Partners, Fidelity Management & Research Company, Invus, Rock Springs Capital, Regeneron Ventures, Slate Path Capital, and others undisclosed also participated.
Scispot secures $8M Series A led by Avenue Growth Partners.The Canada-based life sciences automation startup is building an AI-native operating layer for labs which can aggregate disparate data sources—lab notebooks, different instruments, spreadsheets, etc—into a cohesive data package, providing lab context to each data point and enabling their usage by AI agents. The company’s tech is currently utilized by 100+ labs across the biopharma ecosystem, with millions of samples and thousands of experiments already supported. Funds from the round will be put towards expanding several of their internal teams as well as their customer base.

In case you missed it

World-first: therapy to make cells young again trialled in a person

David Sinclair has been studying aging for over two decades at his lab at Harvard Medical School. In 2017, he co-founded Life Biosciences with Tristan Edwards in 2017 with the goal of tackling aging via cellular rejuvenation-based methods, aka cellular reprogramming. Earlier this week, the company dosed their first patient using their epigenetic restoration platform, which uses OCT4, SOX2, and KLF4 to shift cell transcriptomic profiles towards more youthful states. The patient is part of a Phase 1 clinical trial of ER-100, intending on treating optic neuropathies like open-angle glaucoma and non-arteritic anterior ischemic optic neuropathy (NAION). These diseases are characterized by damage to retinal ganglion cells (RGCs), which do not naturally regenerate, and gradually cause vision impairment. There is currently no treatment for NAION.

Enveda-180: generation of a large open multimodal MS/MS and ion mobility spectral library for drug-like small molecules

The Deliverome Project: How to break the delivery bottleneck in precision medicine.

What we liked on social channels

Events

Save the Date! Apply to here to attend.

Field Trip

Support Us!

Did we miss anything? Would you like to contribute to Decoding Bio by writing a guest post? Drop us a note here or chat with us on Twitter: @decodingbio.

Discussion about this post

Ready for more?