Adaptyv Bio - Designer Spotlight: Can a language model reason about protein design?

TL;DR
Michael Hla built Pro-1, a language-based reasoning model able not just to design proteins, but also reason about how it’s designing them.
We offered him free binding affinity and thermostability testing in the Adaptyv Foundry for 19 of his FGF-1 (Fibroblast growth factor 1) sequences optimized by Pro-1.
The results are interesting: Pro-1 to improve the melting temperature of 3 designs while maintaining the binding affinity at the same time.
One variant (K116E) reached a similar melting temperature to the most optimized design from literature (Q40P, S47I, H93G, K112N)!
We will run more benchmarks or test cool protein design hypotheses you have - reach out to us!

Pro-1: the protein reasoning model

In March, Michael Hla took the protein design community by storm when he released Pro-1 - the first protein reasoning language model. His proposal is simple: let a language model distil biochemical intuition so it can do all the protein optimization you want. What this means more precisely: use a complex training scheme, with recent innovations in large language model reasoning used by AI labs like OpenAI and DeepSeek, to have a model that both proposes mutations and explains why it chose them.

Michael makes several good points for why protein design should be delegated to language models in his blog post. Some of the more interesting ones we found are interpretability - the model argues for each mutation it makes, pointing to relevant (or hallucinated) biochemical motives and paper references; and flexibility — he mentions Pro-1 can be prompted with sequences, PDB structures, even experimental results.

How to train your protein thinker

We found his training scheme incredibly unique. Michael combines biochemical intuition with synthetic data generation from specialized protein language models, a training framework from reinforcement learning, and a physics-based representation of protein stability. We will briefly describe it, but you should check out Michael’s blog post and thread for more info!

Fine-tuning on synthetic reasoning traces
Language models are often trained on large text corpora in two ways: either words are masked from the input and the model is tasked to predict them (masked language models), or the model has to correctly predict the next token (autoregressive language models). This is the pre-training stage.
Pro-1 uses the pre-trained autoregressive Llama-3.1-8B Instruct and Llama-3.3-70B-instruct, adapting them to the protein design task in a process called fine-tuning. To make models reason about their design, Michael generated synthetic “reasoning traces”: initial proteins from a collection of enzyme sequences (BRENDA database) were “perturbed” with the ESM-3 protein language model. He then generated text explanations for how to get from the perturbed to original proteins with a different language model. This is incredibly unique! Michael points out that “this method needs to be tested more but has substantial implications if it scales well, especially since bio data is exceedingly scarce”.
Reinforcement learning with the Rosetta energy function
Next, Pro-1 uses the group relative policy optimization (GRPO) - a reinforcement learning algorithm now well-known because of DeepSeek’s R1. In summary, Michael takes the Rosetta energy function, which accounts for several physical interactions and it well-correlated with protein stability, to score proteins designed by Pro-1 and then folded with ESMFold. The final value is integrated into GRPO and fed back into the model to improve it. Pro-1 should now output more stable proteins and “learn heuristics about the physical world and the effects of specific mutations”.
Creativity rewards
Michael mentions the final model got “somewhat repetitive and bland, suggesting the same types of point mutations (polar aa -> nonpolar aa)”. He then included a judge model into the training scheme, which scores mutations based on how “creative” they were. It boosted the performance on his benchmark from 43% to 47%!

We were all impressed by Pro-1, so we wanted to put it to the ultimate test: lab validation. We gave Michael some free binding affinity and thermostability assays for any designs he wanted. He chose to optimize the fibroblast growth factor 1 (FGF-1).

Why FGF-1?

As a growth factor, FGF-1 is one of the most versatile proteins. It regulates the fate of bone marrow cells and may promote bone repair, the development of lung epithelial cells with a therapeutic effect on pulmonary fibrosis, and is highly expressed in inflammatory cells. Its role in type 2 diabetes is becoming better understood, with experiments showing FGF-1 injections reduced the levels of glucose and increased the sensitivity to insulin in mice.

It binds to plenty of targets, including the fibroblast growth factor receptor 1 (FGFR1) and FGFR2. FGFR1 aberrations occur in several types of cancer and there are already FGFR1-inhibiting drugs like Pemigatinib for bile duct cancer treatment. FGF-like binders to FGFR1, especially when conjugated with cytotoxic drugs, could be a potent cancer therapeutic.

Michael mentioned another interesting fact about FGF-1: it has a pretty low denaturation temperature. Maintaining its binding while also increasing the melting temperature is a worthwhile task for Pro-1.

How we are measuring melting temperatures and binding affinities

We ran our standard automated assay for affinity characterization and thermostability. Proteins were expressed with a cell-free system, followed by affinity characterization via bio-layer interferometry (BLI) with the FGFR-1 target, FGF-1 wild-type control, and the designs Michael uploaded on our Foundry Portal. BLI measures the binder association and dissociation kinetics via the interference pattern of light reflected from a sensor surface. With these measurements and our in-house post-processing and curve-fitting software, we can calculate the binding affinity ($K_D$) of a protein to its target.

To measure the melting temperature of Michael’s designs, we used our newly-developed thermostability assay. The melting temperature (or $T_m$) represents the temperature at which 50% of a protein is in its unfolded state. This is around 49 °C for the wild-type FGF-1. We are using an automated nanoDSF (nano differential scanning fluorimetry) protocol: proteins are heated up and we measure the fluorescence shift of the tryptophan and tyrosine amino acids as they get more exposed from the protein’s core. We normalize these values and quantify the melting temperatures in our post-processing pipeline.

Pro-1 yields more stable binders

Most of the variants tested were expressed. Out of these, only 6 maintained binding to their target in the same range as the wild-type. In the figure above, we have highlighted the 3 variants with a $T_m$ higher than the measured control - the wild-type FGF-1 with 50.8 °C - that also bind to FGFR1.

What is more impressive is that a single-point mutant (K116E) reached a melting temperature improvement of 24 °C over the wild-type, and that Pro-1 even suggested this variant. When we consider it was trained on synthetic “perturbed” data and reasoning traces and an in silico objective (the Rosetta energy function), these are spectacular results! Most other Rosetta-based thermostability optimization studies also reach an improvement of 20 °C, yet none of them have a model able to explain in writing why it chose those mutations.

Michael showed an example of Pro-1’s reasoning trace for v37 variant (the 7-mutant in our bar plot). We found it interesting how it knows FGF-1’s binding partners and some plausible biochemical interpretations (e.g., mutations that reduce flexibility should increase stability, targeting hydrophobic patches to reduce the chance of aggregation). However do not attempt to fact-check its references: we tried that and we could not find any “Wu et al., 2017” that suggested the K127E mutation could increase stability of the FGF2 heterodimer, nor any “Kim et al., 2015). But this should not diminish Pro-1’s success - who knows, maybe the Pro-2 will align its reasoning with verified references. If OpenAI’s Deep Research can do it, so could Pro-1.

Resources and links

You can find all thermostability data here and the binding affinity data here
Try out Pro-1 here
Say hi to Michael Hla: Website, X, LinkedIn
Have some novel proteins you want to test in the lab? Come talk to us — we’d like to run many more of those protein designer spotlights, so if you have a cool new hypothesis or model to test we’d love to hear from you!

Designer Spotlight: Can a language model reason about protein design?

Pro-1: the protein reasoning model

How to train your protein thinker

Why FGF-1?

How we are measuring melting temperatures and binding affinities

Pro-1 yields more stable binders

Resources and links

Related posts

Designer Spotlight: ProtRL - Reinforcement learning and the Move 37 of protein engineering

Introducing BenchBB and the community paper of the Protein Design Competition