BioByte 119: e-tattoos that measure mental workload, introducing Biomni, FrankenROCS identifies drug candidates for Covid, and evolutionary watermarking to trace protein structures with FoldMark

Pablo Lubroth

Pranay Satya

Mikaela Kimpton

, and 4 others

Jun 05, 2025

Welcome to Decoding Bio’s BioByte: each week our writing collective highlight notable news—from the latest scientific papers to the latest funding rounds—and everything in between. All in one place.

We warmly invite the Decoding Bio community to join us for a reception to close out our third annual AI x Bio Summit on July 22. Enjoy an evening of wine and conversation beneath the lights of the storied New York Stock Exchange trading floor as we reflect, connect, and celebrate. It’s a chance to engage with fellow founders, researchers, and operators shaping the future of biology and technology. We can’t wait to see you there! Please register here.

What we read

Blogs

Can you model biology mechanistically? [Jesse Johnson, Scaling Biotech, May 2025]

AI models have cemented their position as an integral part of the drug discovery process, now being nearly ubiquitous across pharma and biotech. From broad patterns that emerged from countless implementations of these technologies, author Jesse Johnson proposes a set of three model types that can be used to classify most usage in the biopharma sphere: mechanistic, black box, and knowledge. Mechanistic models are those that seek to replicate real-world mechanisms, largely relying on external data to do so. Black box models are quite different, relying almost entirely on experimental data. This kind of model seeks to elucidate trends within the provided data, deficient of any mechanistic understanding. Knowledge models are generally those applications which rely on LLMs, extracting information from external data with little mechanistic understanding and a dearth of experimental data.

An example that demonstrates the differences between the three model types is target ID—the process of finding a targetable biological component, usually a protein, in a given indication. Knowledge models are well postured to scrape literature and return components with validated associations to the indication. A mechanistic approach could directly model protein-protein or gene-protein interactions, seeking those which, if regulated, would result in the desired therapeutic effect. Black box models could be used to build correlations between disease prevalence and protein abundance from experimental data. All three approaches are useful but limited in their own right.

Of note, the author mentions that although these models are often integrated after initial processing—such as aggregating the output of the aforementioned approaches to target ID and then ranking potential targets based on prediction abundance in the combined dataset—there is a lack of composite implementation before this stage. Whether earlier integration proves valuable or not, AI is radically transforming biotech and is here to stay, thus an understanding of when to implement it in its different forms is of growing import.

Papers

Biomni: A General-Purpose Biomedical AI Agent [Huang et al., Stanford (pre-print), May 2025]

“Can we build a virtual AI biomedical scientist?”

This group of scientists' response to this question is Biomni, a general-purpose biomedical AI agent that can execute a wide range of research tasks across 25 subfields.

The agent has two main components: 1) Biomni-E1, the foundational environment that defines the biomedical action space, is built upon 150 biomedical tools, 59 databases, 105 software packages and protocols from the literature and 2) Biomni-A1, a general-purpose agent which can flexibly execute a broad spectrum of biomedical tasks by using the tools and datasets provided by Biomni-E1.

Upon user prompt, the agent identifies the most relevant tools, databases and software needed. It then reasons to generate a step-by-step plan expressed as executable code. As the agent uses code as a universal action interface enables the agent, it supports a flexible, iterative and adaptive strategy, which in turn enables “responsive, context-aware” behaviour. This allows Biomni-A1 to generalize to previously unseen tasks and domains.

Biomni was evaluated using Humanity’s Last Exam (HLE) and LAB-Bench (DbQA and SeqQA subtasks). Biomni achieved 17.3% accuracy on HLE, significantly outperforming base LLM (6.0%), coding agent (12.8%) and literature agent (12.2%). In DbQA it matched expert human performance (requires structured querying over biological databases) and exceeded human performance on SeqQA (reasoning over DNA and protein sequences)

To test Biomni in real-world biomedical workflows, the team invited scientists to apply it directly to their own research questions. In one case study (as shown in the image above), a researcher asked ”can we uncover biologically meaningful thermogenic patterns?” based on its input data (459 excel files containing months-long CGM and temperature data from 30 participants). Biomni generated and executed a 10-step analysis and delivered a report which identified the postprandial thermogenic response, identified the average temperature and its variance. The paper includes many other impressive cases.

A wireless forehead e-tattoo for mental workload estimation [Huh et al., Cell, May 2025]

Although wearable electronics hold much promise in performance and medical applications, the historical standards for neuronal monitoring - EEG and EOG - systems have always been a hassle: think bulky headcaps, a tangle of wires, and gooey gel-based electrodes that take forever to set up and quickly become uncomfortable. Plus, because these setups rely on rigid scaffolds and adhesive gels, any head or facial movement tends to mess up the electrode–skin contact, introducing motion artifacts. Many wearable EEG setups—dry-electrode headbands, EEG glasses, in-ear sensors—face issues: headbands lose contact during movement, glasses only capture periocular signals and feel stiff, and in-ear designs sample a limited cortical area and remain rigid. Early e-tattoo prototypes were thinner but suffered high impedance, lacked on-board amplification, or were costly to fabricate. To overcome these gaps, Huh et al. introduce a new material and device that unites low impedance, flexibility, and integrated electronics.

For the electrodes, the authors used graphite-deposited polyurethane (GPU) as a low-cost, stretchable base (≈$20 per sheet, ~50 Ω/sq, ~138% strain) that can be patterned with a simple blade plotter. Because raw GPU still has high skin impedance (>25 kΩ·cm²), they coated only the sensing “pixels” with an adhesive PEDOT:PSS composite (APC). This APC blends conductive PEDOT:PSS (common in organic electronics) with β-cyclodextrin, citric acid, and crosslinked polyvinyl alcohol to create a sticky, stretchable layer that conforms to sweaty, moving skin. By testing PEDOT:PSS loadings from 1.8% to 23.3% by weight, they found 3.6% to be ideal: contact impedance starts around 8 kΩ·cm² at 10 Hz and drops to ~2.5 kΩ·cm² after hours, while stretching over 130% and maintaining ~80 N/m adhesion. In contrast, bare GPU sits above 25 kΩ·cm², and other printed films often require cleanroom or high-temperature processes. They used this material as the foundation of flexible electrodes which they successfully used to capture EEG and EOG signals.

To prove it all actually worked, the group ran people through a dual N-back task (0-back up to 3-back) to vary working memory load. They collected EEG features—things like frontal theta and alpha band power or an engagement index—and EOG markers (blink rate, saccade frequency) in both 2-second “stimulus-level” windows and 40-second “trial-level” epochs. Participants also reported their perceived workload via NASA-TLX, and performance metrics like accuracy and reaction time were recorded. Then a random forest model took those physiological features and successfully classified the N-back level (AUC well above 0.5 for 2-second windows) and even regressed the exact load on 40-second trials (r ~ 0.89). Importantly, those signals matched what a commercial BrainVision system captured—same amplitude ranges and spectral patterns—even while subjects were moving, talking, or turning their heads. This showed you can actually decode mental workload in real time with a flexible, low-profile system that was resilient to motion.

In summary, Huh et al. created an inexpensive, ultra-thin, and flexible wearable “e-tattoo” platform with impressive battery life, then demonstrated its utility in a real-world cognitive monitoring scenario. If this can be manufactured at scale, it could unlock a wide range of wearable sensors designed to boost human performance and enable continuous patient monitoring.

FoldMark: Safeguarding Protein Structure Generative Models with Distributional and Evolutionary Watermarking [Zhang et al., bioRxiv, June 2025]

In a field as competitive as machine learning for structural biology—where advances in protein structure prediction and design drive progress in therapeutics, enzymes, and biomaterials—intellectual property concerns are growing. For example, users of AlphaFold3 know that access to the model is tightly controlled; it raises a natural question: how does DeepMind identify when a structure originates from their proprietary model? To develop FoldMark, Zhang et al., leverage recent advances in LLM watermarking to systematically embed hidden watermarks into generated protein structures. By “signing” each model output with a robust, traceable binary code, developers can later verify provenance and prevent unauthorized use.

Watermarking work has recently undergone a boon in the LLM-based text/image generation fields. For instance, SynthID-Text slightly biases word selection to weave an imperceptible signature into generated sentences, which a decoder can later extract. In the image realm, WaDiff and AquaLoRA fine-tune diffusion models to nudge pixel values or latent features just enough to conceal a hidden message, and CNN-based decoders can recover it even after common edits or compression. These examples reveal two guiding principles: the changes must be imperceptible (so outputs remain high-quality) and recoverable (so watermarks survive typical post-processing). But proteins present unique challenges—atomic coordinates require SE(3)-equivariance, and tiny shifts can break folding or function.

FoldMark’s implementation centers on two key ideas: distributional and evolutionary watermarking. Distributional: a k-bit message is encoded by an SE(3)-equivariant encoder into tiny coordinate shifts across all residues. A matching decoder then aggregates per-residue predictions to recover the message even if parts are cropped or noised. Evolutionary: FoldMark uses evolutionary data (MSA profiles, language-model embeddings, pLDDT scores) to determine which parts of the protein structure to modify: highly conserved, high-pLDDT residues of the protein remain nearly unchanged to preserve important structure, while more flexible loops take on larger watermark signals. They use low-rank LoRA modules to efficiently fine-tune each model for a given watermark, meaning that each new FoldMark encoder/decoder pair is trained to stamp and look for one specific watermark.

FoldMark offers a practical way to watermark protein designs, giving researchers and companies a tool to track provenance and enforce licensing. Benchmarks show that watermarked structures retain over 95% of their original binding affinities and maintain low RMSD compared to unwatermarked counterparts, while decoders recover more than 90% of bits even after realistic edits like molecular dynamics relaxation or partial cropping. As generative models become central to drug discovery and enzyme engineering, FoldMark provides a clear path to protect intellectual property, ensure accountability, and build trust in computational protein design.

Exploration of structure-activity relationships for the SARS-CoV-2 macrodomain from shape-based fragment linking and active learning [Correy et al., Science Advances, May 2025]

Some key bottlenecks in the speed of drug development campaigns are intensive compound screening and optimization steps. While there has been significant progress in de novo protein engineering and molecular design tools, other medicinal chemistry approaches like fragment linking have also demonstrated great promise in being able to guide the design of improved drug candidates from initial weak hits. Briefly, fragment linking assembles small chemical fragments with experimentally validated activity into larger molecules with more ‘drug-like’ activity. These larger molecules are then queried against massive medicinal chemistry datasets like the Enamine REAL database to look for easily synthesizable analogs. However, these queries require searches of vast chemical spaces, and their outputs are usually not optimized for biological activity (like permeability). In this work, a team of scientists from UCSF and Relay Therapeutics describe FrankenROCS, an algorithm that more efficiently explores vast chemical space databases, specifically demonstrating their tool to identify potent drug candidates against the SARS-CoV-2 NSP3 macrodomain (Mac1).

A key advantage of FrankenROCS is its ability to screen in the reagent space, rather than the product space. Unlike conventional screening methods that must loop through virtual molecules, FrankenROCS uses Thompson sampling (TS) to strategically evaluate candidate molecules based on their 3D shape and pharmacore profile. Crucially, integrating Thompson sampling allows for the generation of lead molecules while traversing much less chemical search space. Such reductions in run time allowed the team to screen the whole Enamine REAL database ( > 22 billion virtual molecules). This approach successfully delivered multiple novel, neutral Mac1 inhibitors with diverse binding poses confirmed by x-ray crystallography, ultimately yielding low-micromolar compounds with favorable characteristics. The team identified compound AVI-313 as their most promising target, and reasoned that the replacement of a carboxylic acid with a hydroxyl group on the parent fragment was a major driver of optimized binding affinity.

Methods like FrankenROCS and Thompson sampling show promise in enabling prohibitively expensive compound screens and efficiently guiding downstream optimization efforts. Such pipelines are a necessary counterpart to de novo machine learning drug design efforts by allowing medicinal chemists to pick molecules that have validated reaction mechanisms and synthesizability. However, the authors concede that FrankenROCS does face limitations. While the fragment-linking approach did generate diverse poses, overall molecular diversity of generated compounds was low and some molecules were found to be false positives for activity. The team plans to continue using their method to explore other methods of inhibition for the SARS-CoV-2 virus and release their final assay results in the form of a public use dataset.

Notable deals

Led by Khosla Ventures, Vivodyne raises $40M Series A to remove the need for animal testing during preclinical drug development. Using a fully automated robotics and AI platform, Vivodyne grows thousands of lab-cultured human tissues daily, producing multi-omic data that potentially more accurately recapitulates true human responses. By scaling this human-based workflow in a new 23,000 sq ft facility in South San Francisco, Vivodyne can screen drug candidates directly on complex human tissues (including immune and disease states), reducing both animal cruelty concerns and the risk of late-stage failures.
Sanofi acquires Blueprint Medicines for $9.5B.The French pharma company is notorious for successful partnerships, including that with Regeneron that led to its blockbuster dupixent. Now, this recent acquisition brings approved rare immunology disease medicine avapritinib as well as a pipeline of earlier immunology candidates.
Merck made $3B-plus offer for MoonLake. The deal comes at a time where investor sentiment is souring on the New Jersey-based biotech (shares have fallen 39% compared to 11% for the S&P 500 pharma index over the past year). Moonlake’s lead candidate in progressing through phase III and treats chronic acne and psoriatic arthritis.
Antheia closed a $56 million Series C round led by Global Health Investment Corporation and EDBI to scale its platform that engineers yeast to convert sugars into pharmaceutical ingredients in weeks rather than years. The company’s first commercial shipment of thebaine (used in overdose-reversal medications) proved the technology can be scaled, and Antheia intends to launch more products each year to address drugs prone to shortages. The funding will expand domestic manufacturing in the US alongside collaborations with the BioMaP Consortium, while strategic partnerships in Singapore and Asia will support further global adoption.

What we liked on socials channels

Field Trip

Did we miss anything? Would you like to contribute to Decoding Bio by writing a guest post? Drop us a note here or chat with us on Twitter: @decodingbio.

A guest post by