BioByte 158: Reducing the Amino Acid Alphabet Lends Insights to the Evolution of Life, Bridge Recombinases Enable Microbial Genome Rewriting, and Decoding ncORFs for Expansion of the Human Proteome

Varun Agarwal, Matthew's Biotech Musings, Gia-Bao Dam, and Mikaela Kimpton

May 07, 2026

Welcome to Decoding Bio’s BioByte: each week our writing collective highlight notable news—from the latest scientific papers to the latest funding rounds—and everything in between. All in one place.

Fernan Federici & Jim Haseloff, Confocal micrograph of Bacillus subtilis. Bacillus subtilis is a Gram-positive, rod-shaped bacterium, commonly found in soil. Fluorescent proteins (TagRFP-T, sfGFP, TagBFP, mKate2 and mOrange2), time-lapse confocal microscopy and biophysical models are being used to understand the organization of bacterial biofilms., confocal micrograph

What we read

Papers

Toward life with a 19–amino acid alphabet through generative artificial intelligence design [Liu et al., Science, April 2026]

Why it matters: Modern life uses 20 amino acids. However, this couldn’t have always been the case. What could life have looked like with only 19 amino acids? In a collaboration between Harris Wang’s lab at Columbia and Sergey Ovchinnikov’s at MIT, researchers demonstrate how we can take a step backward in evolutionary time by removing isoleucine from the ribosome with the aid of protein generative AI models.

First, the team had to identify which of the 20 amino acids they would try to remove from E. coli. To do this, they systematically profiled all 20 amino acids using evolutionary conservation, biosynthetic cost, and prior theories of amino acid chronology. Out of the fray, isoleucine emerged as a candidate: it is one of the least conserved amino acids across protein orthologs, is metabolically expensive to produce, and chemically similar to valine. Evolution itself hints at this redundancy: isoleucine-to-valine substitutions are among the most commonly tolerated swaps in natural proteins.

Now came the hard work: actually removing isoleucine and replacing it with amino acids that could maintain protein function. They first started broadly, replacing isoleucine residues in 39 essential or highly expressed E. coli proteins with valine or leucine; this worked ~43% of the time. For the rest, the team turned to generative AI tools including ESM2, MSA Transformer, ProteinMPNN, and AlphaFold2. Rather than only swapping out isoleucine, these models sometimes introduced additional compensatory mutations to preserve protein structure and function. In one case, the final working design of a ribosomal protein required eight additional mutations beyond the isoleucine substitutions themselves.

They focused this effort on the ribosome, one of the cell’s most important molecular machines, composed of both proteins and rRNA. There are 382 isoleucines spread across the 50 ribosomal proteins that contain it. After iterative rounds of design-build-test, the team successfully redesigned every single one. The next challenge was assembling these redesigned components into a functioning system. Even though individual proteins retained ~90% of wild-type fitness, combining dozens of altered subunits risked collapsing ribosomal function altogether. The assembly of a 21-gene ribosomal operon proved especially difficult, requiring the team to debug a bizarre four-base-pair overlap between two neighboring ribosomal genes that repeatedly broke the design. The resulting strain, Ec19, grew more slowly but remained viable. After more than 450 generations, the team found no mutations reverting these substitutions back to isoleucine, suggesting the redesigned ribosome was stable enough to persist.

While not a full 19-amino-acid organism, this paper is a compelling example of how protein AI can do more than design flashy new antibodies - it can help answer fundamental questions about how life itself may have evolved and open up new genome redesign possibilities.

Bridge recombinase enables versatile rewriting of bacterial genomes [Patel et al., bioRxiv, May 2026]

Why it matters: Microbial communities underpin processes from geochemical cycles to human gut health, but the tools to reliably engineer them remain bespoke. Existing approaches pose difficult tradeoffs: CRISPR-Cas systems depend on double-strand breaks (DSBs) and host repair machinery – limiting their use in non-model bacteria and making them poorly suited for large genomic rearrangements, and classical site-specific recombinases like Cre or Flp require pre-installed target sequences at the editing site, preventing their use on native genomes. Bridge recombinases (BRs) – a recently characterized class of RNA-guided enzymes from IS110-family insertion sequences – combine the programmability of CRISPR with the recombinase ability to perform large, single-step insertions, excisions, and inversions without DSBs. Patel et al. demonstrate that this system scales to the physical limits of bacterial genomes, functions across diverse microbes, edits intact human gut communities without prior strain isolation, and supports new editing modalities such as HR-independent search-and-replace and programmable horizontal gene transfer (HGT).

Here’s a quick refresher on how BRs function. A single non-coding bridge RNA (bRNA) specifies both recombination sites through two independently programmable loops – one matching the target (genomic) sequence, and the other matching the donor (incoming) sequence, each via a 14-nucleotide stretch with a conserved core dinucleotide. Reprogramming these loops directs scarless insertions, excisions, or inversions at user-defined sequences without any pre-modification of the genome.

In E. coli, BR efficiency holds across cargo sizes spanning an order of magnitude – the team integrated bacterial artificial chromosomes from 13.8 kbp to 141 kbp at up to 90.9% efficiency with no falloff as cargo size increased. Moreover, they demonstrated the ability to excise 50 kbp of the colibactin biosynthetic gene cluster (a genotoxic metabolite implicated in early-onset colorectal cancer) and achieved 2.3 Mbp inversions – approximately half of the E. coli chromosome and the largest effective inversion possible in a circular genome. The experiments indicate that BR performance is primarily bound by host biology – the amount of exogenous DNA a cell can uptake, the proximity of essential genes to deletion targets, and genome size – rather than enzyme constraints.

Beyond E. coli, the team targeted a 14-nucleotide sequence within the universally conserved 16S ribosomal RNA gene with a single bRNA, directing integration across 11 bacterial species (spanning five phyla), including Bacillus subtilis, Bacteroides thetaiotamicron, and Corynebacterium glutamicum. They then deployed the same approach directly in two human gut microbial communities (an infant stool sample and an adult intestinal mucosal sample), simultaneously editing multiple species without needing to isolate any of them – a meaningful capability given difficulties in culturing prokaryotes. Specificity varied across hosts: more than 99.9% of Enterobacteriaceae integrations were on-target at the 16S locus, but B. thetaiotamicron showed split targeting between the intended site and a single-mismatch off-target in a non-essential pectinesterase gene, suggesting that bRNA accuracy is host-dependent.

The team expanded this technique to achieve two new editing modalities. TRADE (targetable recombinase-assisted DNA exchange) uses two co-expressed bridge RNAs to swap one genomic segment for another in one step – programmable replacement without homologous recombination, which has historically been challenging in bacteria. Two intuitions made this work. First, when two bRNAs share the same core dinucleotide, they pair combinatorially with each other (rather than just their intended partners) – generating off-target products that scale quadratically with the number of co-expressed bRNAs. The team resolved this by giving each bRNA a different core dinucleotide (one CT, one GT), suppressing cross-pairing. Second, most cells stopped after a single recombination event rather than completing both, leaving plasmid-integrated intermediates as the dominant product. They addressed this by adding a counter-selectable marker on the plasmid backbone, killing cells that failed to complete both swaps. The team then extended TRADE to programmable HGT, capturing chromosomal pathways from E. coli and transferring them to phylogenetically distant recipients (e.g. Klebsiella michiganensis, Pseudomonas simiae, Corynebacterium glutamicum) with functional verification: the lacZY operon transferred to C. glutamicum conferred the ability to grow on lactose as the sole carbon source.

A few key limitations persist, namely (a) predictive bRNA design is currently intractable, with significant variation in efficiency and specificity across species and target sites that is not clearly explainable from sequence, and (b) cross-pairing issues compound for multiplexing beyond two simultaneous edits. Nonetheless, mobilizing chromosomal segments between species opens up experimental access to questions about how horizontal gene transfer shapes microbial fitness and ecology, and may lead to what the authors dub ‘synthetic microbiomics’ – the engineering of microbial ecosystems by reprogramming gene flow within them.

Expanding the human proteome with microproteins and peptideins [Deutsch et al., Nature, May 2026]

Why it matters: For decades, the canonical human proteome has been anchored to roughly 19,500 protein-coding genes. Ribosome profiling has been flagging translation at thousands of sites outside those annotations for years, but the field lacked a framework to determine which sequences actually matter, which are noise, and what to call them. This paper closes that gap, establishing the first consortium-grade evidence base for non-canonical open reading frames (ncORFs) and introducing a new protein class that sits between confirmed translation and annotated function.

The TransCODE Consortium queried 7,264 ncORFs against 95,520 proteomics experiments, including 3.5 billion tryptic mass spectra and 240 million HLA immunopeptidomics spectra. About 25% of ncORFs produced detectable HLA-presented peptides – a hit rate that dwarfs conventional tryptic detection, which found peptides for only 2.5%. The disparity reflects a real technical blind spot: microproteins are too small to generate the two tryptic peptides required for HUPO-HPP verification. The immunopeptidome, which requires neither trypsin nor protein stability, sees them clearly.

From this, the paper proposes a tiered annotation framework. Fifteen ncORFs were elevated to candidate protein-coding genes, three of which GENCODE has already annotated. The remaining translated-but-uncharacterized sequences are formally designated peptideins, which are confirmed translation products of indeterminate functional consequence. This gives annotation projects a principled middle ground between “protein” and “noise,” and gives drug hunters a catalog of sequences that are real, presented on HLA, and targetable.

The paper also introduces ORBL, a new evolutionary metric that measures conservation of ORF structure – start codon, stop codon, reading frame openness – independent of amino acid sequence conservation. Conventional tools like PhyloCSF, which require amino acid conservation, fail for most ncORFs by design. ORBL finds that 30% of ncORFs show significant evolutionary constraint on ORFness alone, and that constrained ncORFs are significantly more likely to appear in HLA datasets. The metric will likely become standard in future annotation workflows.

The deepest functional result focuses on c10riboseqorf92, a 123-amino-acid peptidein encoded within the OLMALINC lncRNA. Across CRISPR screens in 485 cancer cell lines, its knockout reduced viability in 85% of lines. Transcriptome profiling confirmed the effect is ORF-specific: re-expressing the coding sequence rescued the phenotype after OLMALINC knockdown. Its knockout signature correlated with mitosis and DNA damage regulation genes. It remains annotated as a peptidein, function in normal physiology is unestablished, but its cancer essentiality profile is striking enough that reclassification seems likely as data accumulate. The human proteome is larger than the reference genome suggested. The question now is which pieces of it are druggable.

Pumpkinseed raises $20M Series A co-led by NfX and Future Ventures to scale protein sequencing platform. Dubbed “The Biology Mining Company”, Pumpkinseed has their sights set on “extracting the molecular intelligence embedded in every cell and tissue at a resolution and scale the field has never had before.” Built via semiconductor manufacturing, the Stanford spinout’s proprietary platform, deSIPHR, is a nanophotonic chip technology featuring 100 million sensors per square centimeter, rendering it capable of reading any protein across the known and unknown proteome at the amino acid level—sans reference catalog—at high-throughput. Such an approach offers significant gains in resolution and scale without sacrificing efficiency, outpacing traditional mass spectrometry approaches. Amidst the recent meteoric rise of machine learning in drug discovery and medicine, nearly every biopharma AI discussion touches on the looming limitation of insufficiencies and deficits in existing datasets. To this Pumpkinseed presents a solution: generating high-resolution and high-fidelity proteomic datasets requisite for next-generation AI and virtual cell models—an advantage that likely positions the company quite well in regards to these near-term advancements. The startup has already secured several notable partnerships, with $12M secured in near-term revenue contracts with Genentech, DARPA, and BARDA, spanning immunology, precision medicine, and rapid biothreat detection and mitigation. This latest financing will power scaling of Pumpkinseed’s platform from peptide to full-length protein sequencing as well as propel progress toward further partnerships across biopharma and biosecurity. Additional investor participation in the round comes from Base4, ADVentures (CVC of Analog Devices, Inc.), and Stanford, as well as other undisclosed investors.
Moleculent announces $20M financing round led by Rubicon Healthcare Partners to map cell-to-cell communication at scale. The Swedish biotech’s functional profiling platform is described in the press release as “the kind of tool that reshapes entire research programs.” Moleculent’s technology is based on their proprietary Proximity Litigation Assay which both detects high-plex cell-cell interactions in their native environment and enables cell typing via detection of individual proteins, providing tissue context. Beyond platform buildout, funds generated from the round will go toward expansion of the company’s commercial operations in the US via their Techstart Early Access Program which provides select academic and pharmaceutical partners priority access to Moleculent’s product ahead of its full commercial launch. In tandem, the company is purportedly developing an automation tool to enable lower “hands-on” time commitment and high reproducibility for large-scale studies to further pave the way for the technology’s integration into translational labs worldwide. Additional participants in the round include ARCH Venture Partners, Eir Ventures, and other unnamed existing investors.
Latus Bio closes $97M Series A led by 8VC for clinical gene therapy pipeline buildout and expansion. Armed with their proprietary AAV capsid discovery platform, Latus is on a mission to make CNS-focused gene therapies scalable, bringing them to the table as a major contender for treatment of more widespread diseases impacting greater patient populations. Funding from the round will primarily serve to drive initial clinical data for the company’s two lead candidates, LTS-201 and LTS-101, targeting Huntington’s Disease and late-infantile neuronal ceroid lipofuscinosis type 2 (CLN2), respectively, as well as advancement of other undisclosed programs extending beyond CNS indications. LTS-101 has already received IND clearance along with Orphan Drug Designation, Rare Pediatric Disease Designation, and Fast Track Designation from the FDA; LTS-201 is close behind, being on track for IND submission in the latter half of 2026. Other participating investors in the round include: DCVC Bio, BioAdvance, Benjamin Franklin Technology Partners, Modi Ventures, Gaingels, Hatch BioFund, Korea Development Bank and Helen’s Pink Sky Foundation (the latter two being new investors).
Bayer agrees to acquire Perfuse Therapeutics in a deal worth up to $2.45B.Perfuse specializes in pioneering treatments for ischemia-induced ocular diseases, which generally involve significant, irreversible vision loss and blindness. The company’s lead candidate, PER-001, a small molecule endothelin receptor antagonist taking the form of an intravitreal bio-erodible implant (administered into the vitreous cavity of the eye), is currently in Phase 2 clinical trials for Open Angle Glaucoma and Diabetic Retinopathy. Both diseases represent major indications affecting ~76-80M and 146M patients, respectively, with projections expanding over the next two decades to 112M and 160M and few existing treatments. Under the terms of the agreement, Bayer will receive full rights to PER-001 in exchange for $300M in upfront payment with further installments contingent on the successful meeting of certain development, regulatory, and commercial milestones.

In case you missed it

Limitations of serial cloning in mammals | Nature Communications

Cooperative Native Contact Formation Facilitates Free Energy Barrier Crossing in Protein Folding

What we listened to

What we liked on socials channels

Field Trip

Support Us!

Did we miss anything? Would you like to contribute to Decoding Bio by writing a guest post? Drop us a note here or chat with us on Twitter: @decodingbio.

Discussion about this post

Ready for more?