Feb 1, 2025

A Cartography of Antibody Developability Landscape

In this collaboration with Greiff lab we analyzed antibody development parameters using Protein Language Models


Designing effective monoclonal antibody (mAb) therapeutics face a significant challenge known as “developability”, which reflects an antibody’s ability to progress through development stages based on its physicochemical properties. While natural antibody repertoires may provide valuable guidance for antibody selection, we lack a comprehensive understanding of natural developability parameter (DP) plasticity (redundancy, predictability, sensitivity) and how the DP landscapes of human-engineered and natural antibodies relate to one another. These knowledge gaps hinder fundamental developability profile cartography. To chart natural and engineered DP landscapes, we computed 40 sequence- and 46 structure-based DPs of over two million native and human-engineered single-chain antibody sequences. We found lower redundancy among structure-based compared to sequence-based DPs. Sequence DP sensitivity to single amino acid substitutions varied by antibody region and DPs, and structure DP values varied across the conformational ensemble of antibody structures. Sequence DPs were more predictable than structure-based ones across different machine learning (ML) tasks and embeddings, indicating a constrained sequence-based design space. Human-engineered antibodies were localized within the developability and sequence landscapes of natural antibodies, suggesting that human-engineered antibodies explore mere subspaces of the natural one. Our work charts a roadmap to evaluate the plasticity of antibody developability, providing a more rational perspective for therapeutic mAb design.

Introduction: The development of therapeutic mAbs takes years, and DPs dictate the selection and design of candidates for (pre-)clinical testing. Here, we analyzed the plasticity of the developability landscapes of natural antibodies in terms of DP redundancy (extent of DP intercorrelation), sensitivity (extent of DP change as a function of antibody sequence change), and predictability (predictability of a given DP based on one or several DPs). Methods: To analyze the constraints on natural antibody developability and to relate these to current human-engineered antibody datasets, we assembled a dataset of over 2M native antibody sequences (heavy and light chain isotypes, human and murine) and computed 40 sequence- and 46 structure-based DPs. To reduce redundancy, we determined the minimum-weight dominating sets (MWDS) of DP correlation networks. To quantify sensitivity, we analyzed single-amino-acid substituted variants followed by characterization of the impact of sequence variation on DP distribution. To compute predictability and assess the interdependence of DPs, we trained multiple linear regression (MLR) using developability profile (DPL) and protein language model (PLM) embeddings. These embeddings were used to relate native antibodies to human-engineered ones via principal component analysis (PCA). Moreover, we performed classical molecular dynamics simulations to analyze the distributions of antibody DP values and define how the rigid models fit into these distributions. Results: Our results address all three research areas (redundancy, sensitivity, and predictability). Redundancy: We found a lower degree of interdependence among structure DPs compared to the sequence-based ones for all isotypes of the native dataset, and higher pairwise antibody sequence similarity was not always associated with higher pairwise antibody developability similarity. Native antibody datasets contained species- and chain-specific developability signatures. Sensitivity: We propose methods to quantify the sensitivity of antibody DPs to minimal sequence changes. Predictability: We found that structure-based DPs are less predictable than sequence-based DPs using protein language model (PLM) and multiple linear regression (MLR) embeddings. The comparison between native and human-engineered datasets revealed that human-engineered (therapeutic, patented, and Kymouse) datasets were localized within the native developability landscape.