Feb 1, 2025
Protein inpainting to design better binding interfaces with ProtFill
ProtFill is a new sequence & structure co-design model that allows you to re-design binding interfaces of both antibodies and proteins.
What do we want to say:
why why why
problem: de novo design models are built mainly for antibodies
problem: given a sequence / structure of a binder and a target, can we optimize the designs by redesigning both seq & struc simultaneously
protfill is our model for protein inpainting
protfill is trained on antibody antigen datasets and can recover antibody binding loops?
this problem can be nicely represented by diffusion, first used in image gen, now applied to protein design:
to . Today we introduce ProtFill, a novel method for protein design. While existing alternatives exclusively focus on designing antibodies and nanobodies, ProtFill outperforms them in that task and extends to optimising binding in any other protein.

Architecture
The model consists of one shared encoder and two decoders (one for sequence and one for structure outputs). Both the encoder and the decoders consist of three stacked message passing layers. The encoder generates a structure embedding which is concatenated with a simple sequence embedding (the output of a single embedding layer) and passed to the decoders
ProtFill builds upon the foundation laid by previous models but takes it a step further. It is based on a novel neural network architecture called GVPe that allows passing a richer representation of the protein data along the model. ProtFill also boosts its performance by utilising the idea of recycling, or gradually refining the model predictions.

Diffusion has been shown to generate groundbreaking results for tasks like image generation, and in protein design it opens up the potential for generating molecules with fine-grained properties.
We have profill: oneshot model
We have protfilldiff: diffusion model there you can condition more finetune the generation
In training ProtFill, we've adopted a method known as diffusion, which involves carefully adding and removing noise from the data to teach the model about different protein structures and sequences.

What can you do with ProtFill
We designed protfil to redesign antibody antigen interfaces, model is trained to generate new CDRs on demand.
We show good sequence prediction (amino acid recovery)
We show good structure prediction (rmsd)
MEAN: codesign model, one shot
diffAb: codesign models, diffusion
retrained on the same data set
we see quite improved sequence recovery while showing comparable structure prediction that are close to the ground
suited for antibody optimization
you go and you take a parental antibody e.g.
from pdb
from immunization, phage display etc and generate structure using antibody structure model
if you provide the model with a weakly binding antibody and an antigen epitope
you generate new CDRs
ProtFill beats the closest alternative on amino acid identity prediction by 13-25 percentage points and achieves comparable results in the recovery of atom coordinates.

here’s an example
redesigning differnet targets from PDB
ProtFill beats the closest alternative on amino acid identity prediction by 13-25 percentage points and achieves comparable results in the recovery of atom coordinates.

We generate two datasets using the proteinflow package: diverse (using all of PDB as the source) and antibody / nanobody (using SAbDab). For the experiments done on the antibody dataset, the masked regions are CDRs. The identity of the CDR is masked one at a time during training. For the diverse protein dataset we mask random areas on the complex interface.
Expanding beyond antibody applications
current model architectures are spoecifically designed for antibodies and cannot be used for general protein protein interactions
our model is general enough so that we cna mask any part of a protein structure
in the case of antibodies, that’s the cdrs
in the case of a different ppi, that’^s the interface
this gives us a better founddation to extend model capabilities towards solivng all kinds of protein protein interaction problems
we used protein flow again to generate a PPI test set and trained on that
ProtFill is also the first model that has been shown to work for redesigning binding interfaces up to 50 amino acids long for the general protein case.

Next works
Availability and code
ProtFill is freely available