Oct 1, 2025

What GB1 Taught Us About Smarter Protein Optimization

Insights into Risk, Cost, and Efficiency for Better Protein Binder Design

At Adaptyv Bio, we believe that innovation in protein engineering starts with better experimental data. In this post, we benchmark Bayesian optimization (BO) techniques to shed light on how labs can optimize proteins and making the best use of wet lab data. Here’s what we learned.

Protein optimization is a high-stakes puzzle. Every experiment costs time and money, and the search for the best protein binder is like navigating an uneven landscape full of peaks (good binders) and valleys (poor binders). Our task is to find the highest peak while testing as few candidates as possible—a problem that Bayesian optimization is uniquely suited to solve.

What is Bayesian Optimization, and Why Use It?

Bayesian optimization is a smart approach to making decisions when every “test” is expensive. Instead of randomly testing sequences, BO uses machine learning models to predict which sequence to try next, balancing two key strategies:

1. Exploration: Testing sequences we know little about to uncover new peaks.

2. Exploitation: Refining sequences near known peaks to find the highest point.

Think of it as a well-informed treasure hunt: each test builds on what we’ve learned, guiding us closer to the gold.

How Did We Test It?

We used the GB1 protein binding affinity dataset as our testing ground. GB1 is a small protein domain known for binding to immunoglobulins. The dataset maps how small mutations at specific sites affect binding strength, offering a clear “landscape” to optimize.

We tested over 1,000 combinations of BO settings—different machine learning models, embedding techniques, and batch sizes—to find the most efficient and reliable way to navigate the landscape.

What Did We Learn?

1. Gaussian Processes Are the Gold Standard

Gaussian processes (GPs) consistently outperformed other machine learning models in predicting binding affinity. They provide more reliable uncertainty estimates, which are crucial for effective exploration.

• Best combinations used either one-hot encodings or ESM2 embeddings with GPs.

• Exploratory acquisition functions, like Expected Improvement (EI) and Upper Confidence Bound (UCB), worked particularly well with GPs, striking a balance between exploring new areas and refining known peaks.

2. Batch Size Matters

Smaller batch sizes (12 sequences tested per round) were more cost-efficient and less risky. They:

• Reached high-affinity binders faster.

• Reduced the chance of plateauing in suboptimal regions.

Larger batch sizes, while faster in theory, often required more iterations to compensate for early missteps.

3. Risk Awareness is Key

In real-world applications, we can’t afford to gamble on optimization methods that might fail under different conditions. By applying risk metrics like Expected Shortfall (a measure of worst-case performance), we identified methods that consistently delivered good results, even in the least favorable scenarios.

Why Does This Matter?

For labs with limited budgets and time, these insights can significantly improve protein optimization campaigns:

• Efficient Testing: Use GPs with exploratory acquisition functions to quickly identify high-affinity binders.

• Minimize Risk: Choose models and strategies that perform reliably, even under uncertainty.

• Save Resources: Adopt smaller batch sizes to get the most out of your testing budget.

Takeaways for Protein Engineers

Protein optimization doesn’t have to feel like throwing darts in the dark. By leveraging Bayesian optimization, labs can focus their efforts, reduce costs, and manage risks effectively. Whether you’re designing binders, enzymes, or other functional proteins, these strategies can help you achieve better results faster.

Ready to Optimize Smarter?

At Adaptyv Bio, we’re here to help you take your designs to the next level. Whether you’re exploring new landscapes or refining your binders, our platform provides the data you need to succeed.