BioByte 150: Reconstructing Cellular History, FlashPPI Improves Protein-Protein Predictions, LNPs Allow for Pancreatic Drug Delivery, and a Proof of Concept for Parasites in Drug Discovery

Gia-Bao Dam, Pranay Satya, Mikaela Kimpton, and Matthew's Biotech Musings

Mar 05, 2026

Welcome to Decoding Bio’s BioByte: each week our writing collective highlight notable news—from the latest scientific papers to the latest funding rounds—and everything in between. All in one place.

lang=en — David Goodsell, *Abiogenesis*, 2018, watercolor

What we read

Blogs

Mapping parasite molecules to treat autoimmune disease [Borges et al., Ditto Bio, March 2026]

Why it matters: Having co-evolved alongside humans for the entirety of our history, parasites offer unique insights into the mechanisms of our own immune systems—a valuable tool for novel target and drug discovery. With reduced sequencing costs leading to an abundance of genomic data and ever-advancing AI models, Ditto Bio is seeking to leverage this angle to demystify the immune system and unlock new therapeutics for autoimmune diseases and beyond.

Parasites have been evolving alongside humans since we first evolved ourselves. In doing so, they have developed a plethora of ways to modulate the human immune system as necessary for their own survival. As the immune system is constantly adhering to a “seek and destroy” approach toward potentially harmful foreign bodies, viruses and other parasites must conceive new ways to slip through undetected. Recently backed by YC, Ditto Bio is counting on these mechanisms to offer new insights into drug discovery. While not necessarily an entirely novel approach, the past decade has yielded significantly reduced costs of sequencing leading to a proliferation of genomic and transcriptomic data not just around human health and disease but across organisms. Coupled with the extraordinary surge of AI capabilities within the life sciences, there has been no better time to look to parasite genomes than the present.

As covered in their first post, the team at Ditto sought to test how well viruses could yield both clinically relevant and effective targets and therapeutics [molecules] for autoimmune diseases. To do so, they used their proprietary MoleculeMapper platform to computationally map ~10K viral proteins to respective human targets, then compared against existing literature in order to generate the autoimmune target landscape. The predicted quantity of viral molecules to hit each target was then added to a UMAP analysis to examine associations between molecules, targets, and diseases in which they are implicated in pursuit of “surfacing promising cohorts of molecules to move forward to experiments.”

What they found was that this model could indeed surface clinically relevant targets. As a case study, the top 100 genes classified as high-priority targets by the literature for rheumatoid arthritis (RA) were analyzed against the viral data. In comparing, 10 of the 19 RA targets with FDA-approved drugs were included in the top 100 set.

Additionally tumor necrosis factor (TNF)—a well-documented potent proinflammatory cytokine—was highlighted significantly in this functional proof-of-concept as a result of the substantial number of viral proteins that inhibit TNF signalling. The authors hint at their discovery of many more novel targets in this study alone that may represent powerful immune regulators, so it’s safe to say this will be an exciting company and promising area to watch.

Papers

Genetically encoded assembly recorder temporally resolves cellular history [Yan et al., Nature, March 2026]

Why it matters: Understanding how cells change over time requires reconstructing the sequence of events that led to their current state. Most approaches infer history retrospectively from snapshots such as transcriptomes or lineage barcodes, which capture relationships but lose precise temporal ordering. This paper introduces a genetically encoded system that records cellular events directly into DNA through progressive molecular assembly, enabling continuous reconstruction of cellular history with temporal resolution.

Yan et al. develop a genetically encoded assembly recorder (GEAR) that converts transient cellular signals into cumulative DNA records. The system uses programmable recombinases and modular DNA “recording units” that assemble sequentially over time in response to specific stimuli. Each activation event adds a new DNA segment to the growing record, creating an ordered molecular archive that preserves both event identity and timing.

In mammalian cells, the recorder reliably captures sequential activation of signaling pathways and environmental inputs. Distinct stimuli trigger incorporation of specific DNA modules, allowing multiple signals to be logged within the same genomic locus. Sequencing the assembled DNA reveals the exact order in which events occurred, enabling reconstruction of cellular trajectories from a single endpoint measurement. The authors demonstrate temporal resolution of signaling dynamics by recording pulses of pathway activation and differentiating closely spaced events that occur over hours to days. Because the record accumulates over time, the system can integrate multiple exposures and preserve long-lived histories that would otherwise be lost in conventional snapshot assays.

Overall, this work expands the toolkit for cellular recording by introducing assembly-based memory, where biological events are stored as ordered DNA structures rather than static mutations. This provides a scalable strategy for reconstructing cellular histories in development, disease progression, and synthetic biology systems where understanding the sequence of events is critical.

Linear-time prediction of proteome-scale microbial protein interactions [Cornman et al., bioRxiv, March 2026]

Why it matters: FlashPPI offers significant improvements in speed and accuracy for proteome-wide protein-protein interaction screens while achieving comparable results with current structural methods. Such capabilities may offer a path forward towards improved annotation of poorly understood proteins and uncovering new biomolecular mechanisms.

Mapping the space of protein-protein interactions (PPIs) within a cell is key to understanding drivers of biological function. Current computational tools for structure prediction like AlphaFold-Multimer and AlphaFold 3 are technically capable of such tasks, but screening at the scale of a proteome scales quadratically with the numbers of proteins, thus deeming such approaches infeasible. To that end, the team at Tatta Bio has released FlashPPI, a contrastive learning framework that builds on a genome language model to “enable linear-time prediction of physical protein interfaces across a microbial genome.”

Rather than trying to improve the throughput of structural tools for PPI screening, the team chose to represent the problem as a dense retrieval task. This approach aims to learn a latent space where protein embedding vectors and the distance between them reflect the likelihood of an interaction. Crucially, this allows the problem to move away from expensive quadratic time complexity to a more practical linear time search problem. The team chose to develop FlashPPI by building off the gLM2 mixed-modality genome language model that showed promising results on being able to learn some protein-protein contact maps without supervision. Briefly, gLM2 represents genomes with nucleic acid tokens for noncoding regions but also uses a separate set of amino acid tokens to represent coding regions that may become proteins. FlashPPI specifically uses a dual-encoder architecture to embed proteins individually, rather than the conventional concatenation strategy employed by other tools. The model optimizes for an InfoNCE loss to “[maximize] similarity between interacting pairs while minimizing similarity for in-batch negatives.” While this step allows for learning a general idea of what proteins may interact, it does not actually learn residue-level interactions or physical binding interfaces. To that end, the team added a contact head that is supervised by PDB contacts to learn interpretable interface maps; notably, there was significant effort to ensure the model could properly identify true physical interactions from spurious noise.

FlashPPI was evaluated on held-out E.coli PPIs and showed significant improvements both in Area Under the Precision-Recall Curve (AUPRC) metrics as well as general speed improvements (~2400x). The contact head was found to improve the contrastive head and both modules outperformed existing methods. The team attributed their performance gains to the use of the contrastive learning task as well as model initialization based on gLM2, noting that it proved to be a far superior base compared to the ESM2 protein language model embeddings. To benchmark against a structural method, the authors also performed a screen with AF3 on the Mycoplasma genitalium genome PPIs. FlashPPI was able to recover a larger fraction of known experimental interactions across its top-ranked predictions per protein compared with AF3, but ultimately, most interactions fell into the low-scoring range of both models. However, the speed advantage of FlashPPI alongside its comparable results demonstrated the model’s utility for potential proteome wide screens with further improvement. FlashPPI is an interesting step towards high-throughput screening capabilities that may potentially uncover novel biological mechanisms and help address questions surrounding unannotated proteins. It would be interesting to see if such approaches are also compatible with genome language modeling approaches that do not use mixed-modality representations.

Pancreatic-targeted lipid nanoparticles based on organ capsule filtration [Lei et al., Nature, March 2026]

Why it matters: Using a unique insight about the connective-tissue “capsules” surrounding internal organs, Lei et al. engineer a lipid nanoparticle (LNP) platform with strong pancreas selectivity, enabling new organ-specific gene delivery and downstream therapeutic application.

This work starts from a specific observation: most abdominal organs are wrapped in a dense connective-tissue capsule, whereas the pancreas has a much thinner capsule. The authors show that these capsules act as a physical size filter for intraperitoneal nanoparticles. Larger LNPs (on the order of a few hundred nanometers, for example ~300 nm) are effectively excluded from penetrating capsule-covered organs, while the pancreas remains comparatively accessible. As a consequence, larger LNPs preferentially accumulate in the pancreas, creating a pancreas-targeting mechanism driven by organ-scale anatomy rather than cell-specific ligands.

They then iteratively engineer the formulation by dissecting how these particles enter and transfect pancreatic tissue. A key early constraint is a Goldilocks problem: larger particles can be better for pancreas accumulation, but smaller particles are typically better for cellular uptake and downstream mRNA translation. The authors exploit a known feature of LNPs: they adsorb proteins and lipoproteins in vivo, forming a “protein corona” that can change their effective size and biological interactions. They design an amino-acid–modified LNP that starts near ~100 nm but rapidly assembles into much larger LNP–protein complexes in peritoneal fluid (hundreds of nanometers), preserving the uptake and expression advantages of a smaller particle while leveraging capsule filtration to avoid capsule-covered organs. In an initial screen, one candidate achieves very strong pancreas localization (reported as >94% of total bioluminescent signal in pancreas), and the authors further optimize this concept into their lead formulation (AH-LNP) with a mechanistic emphasis on corona composition and receptor-mediated uptake.

Recent work has shown that tuning LNP interactions with biology (for example, via surface chemistry that engages defined receptors) can redirect delivery to specific cell types such as T cells, enabling in vivo CAR-T approaches now reaching the clinic. Lei et al. make the analogous case for organ-selective delivery: pancreas targeting expands what gene delivery can plausibly do in pancreatic disease. As one proof of concept, they pair a KRAS mRNA vaccine with AH-LNP–delivered IL-2 in pancreatic tumor models, reporting sharp tumor control: in KPC mice, tumor incidence drops from 93.3% to 30.0% and survival rises from 23.3% to 90.0%; in a KRAS(G12D) patient-derived xenograft model, they observe 95.2% tumor inhibition with 100% remission within 40 days.

Notable deals

Generate Biomedicines prices IPO at $16/share, raising $400M in the year’s largest biotech offering. The Flagship Pioneering company sold 25 million shares on Nasdaq (GENB), with an additional 3.75 million available via greenshoe option. Generate has raised $800M+ in venture funding since its 2018 founding, plus ~$110M from Amgen and Novartis collaborations. The clinical-stage generative biology company uses its AI platform to design protein-based therapeutics in immunology/inflammation and oncology. Lead candidate GB-0895, an AI-designed potential competitor to AstraZeneca/Amgen’s Tezspire for severe asthma, entered Phase 3 trials in January 2026. Proceeds fund clinical development and platform R&D. First-day trading saw shares close at $12.72, giving a market cap of ~$1.6B.
GSK acquires Canadian biotech 35Pharma for $950M cash to gain pulmonary hypertension candidate HS235. The investigational activin receptor signaling inhibitor has completed Phase 1 in healthy volunteers, with PAH and PH-HFpEF studies set to begin imminently. Beyond pulmonary hypertension, early clinical data showed metabolic benefits including fat-selective weight loss, preservation of lean mass, and improved insulin sensitivity. GSK positions the asset as expanding opportunities across metabolic, inflammatory, vascular, and fibrotic drivers of chronic lung, liver, and kidney disease within its Respiratory, Immunology & Inflammation portfolio. Deal subject to HSR and Canadian regulatory clearances.
Ginkgo Bioworks launches Ginkgo Cloud Lab, opening browser-based access to its autonomous lab infrastructure. The platform runs on Ginkgo’s proprietary Reconfigurable Automation Carts (RACs) providing remote access to 70+ instruments spanning sample prep, liquid handling, analytical readouts, and incubation. Central to the launch is EstiMate, an AI agent that lets scientists submit protocols in natural language and receive immediate feasibility assessments and transparent pricing. The company is inviting academic and biopharma researchers to submit protocols at cloud.ginkgo.bio. Ginkgo also plans to spin off its biosecurity business in H1 2026 to focus squarely on autonomous lab offerings.
Novo Nordisk partners with Vivtex in a deal worth up to $2.1B to develop next-generation oral biologics for obesity and diabetes. The Boston-based biotech will license its oral drug-delivery technologies to Novo Nordisk in exchange for upfront payments, research funding, milestones, and tiered royalties on future sales. The deal comes as Novo faces competitive pressure following CagriSema’s underperformance versus Eli Lilly’s Zepbound in a head-to-head Phase 3 trial.
Eli Lilly inaugurates LillyPod, pharma’s most powerful AI supercomputer, at Indianapolis headquarters. The world’s first NVIDIA DGX SuperPOD with DGX B300 systems houses 1,016 Blackwell Ultra GPUs delivering over 9,000 petaflops of AI performance. Assembled in four months following its November 2025 GTC unveiling, LillyPod enables genomics teams to harness 700 terabytes of data across 290+ terabytes of high-bandwidth GPU memory. The system will train protein diffusion models, small-molecule graph neural networks, and genomics foundation models at scale. Where wet labs historically constrained teams to ~2,000 molecular ideas per target annually, LillyPod creates a computational dry lab capable of simulating billions of hypotheses in parallel before physical validation. Select models will be available through TuneLab, Lilly’s federated learning platform built on NVIDIA FLARE, allowing biotechs to access discovery tools trained on $1B+ of proprietary Lilly data while keeping their own data private. Infrastructure targeted for 100% renewable electricity by 2030 using liquid cooling.

In case you missed it

Can Frontier LLMs Predict Clinical Trial Outcomes?

ADMET Predictions Get AI Boost, Federated Data Network Unites Pharma

What we listened to

What we liked on socials channels

Field Trip

Support Us!

Did we miss anything? Would you like to contribute to Decoding Bio by writing a guest post? Drop us a note here or chat with us on Twitter: @decodingbio.

Discussion about this post

Ready for more?