Feb 1, 2025

Probe and rescue Sequencing for high-throughput protein variant retrieval

retrieval of unique sequences from naive and enriched DNA pools

Pooled NGS is a new workflow developed at Adaptyv Bio Inc. that allows for barcoding thousands of DNA samples in a single run. This high-throughput, automated method is focused on retrieving a high number of protein variants for testing. The pipeline was developed to be most suited for cell-free based downstream assays, but is still compatible with alternative downstream assays. The pipeline is scalable, fast, and cost-effective, making it an attractive option for protein variant generation. In a series of blog posts, we will explore the importance of the pooled NGS pipeline, its various components, and how it can enable protein data generation platforms.

Why it is important

The rise of machine learning-focused approaches in biology demands an abundance of data to train models that can infer statistical correlations. Such data is often obtained through experiments on various protein variants, which are ultimately encoded by different DNA molecules. However, purchasing individual DNA molecules is prohibitively costly. A more scalable approach involves generating pools of DNA molecules through a combination of AI generated design, display techniques, and/or other pooled methods. The DNA pools, transformed into bacterial pools, can then be passed through the pooled NGS pipeline to retrieve a high number of unique variants that can be used for downstream testing.

How it works

Starting Material: Bacterial Pools

As starting material, you need a bacterial pool transfected with variant-coding plasmids. These variant bacterial pools can be generated using various techniques, such as FACS, display technologies, droplet microfluidics, pool cloning, and others. These techniques filter and select from a large starting library of variants ( sequence space on the order of 10^6 to 10^12 variants, depending on the technique), based on certain constraints (affinity, activity, solubility, stability), to arrive at smaller pool sizes of 1,000s-10,000s of unique variants.

Recently, an alternative route to filter and select in the sequence space is to employ AI-powered computational approaches to design 1,000s-10,000s of unique variants. After the design process, DNA can be ordered in the form of oligo pools and then cloned in a library approach to generate the variant bacterial pools (Check out this blogpost on how AI-designed sequences can be fed in a library approach into our PNGS pipeline).

Clonal Isolation: A single bacterial cell dispenser (e.g. Cytena’s B.Sight) is used to dispense individual bacterial cells from a bacterial pool into the wells of 384-well plates. The plates are then incubated for 24 hours to allow for bacterial growth, resulting in wells containing single bacterial clones. It's important to note that not every well will necessarily have a unique DNA sequence, but the wells will reflect the levels of diversity and enrichment present in the original pool.
Sequence Barcoding: To barcode and amplify the DNA sequences in each well, we use a high-throughput liquid handler (e.g. Hamilton Starlet) to perform high-throughput barcoded colony PCRs. The resulting PCR products from each plate are pooled together and purified using magnetic bead-based purification, resulting in a pool of barcoded PCR products per plate. For each pool of barcoded PCR products, we perform an NGS library prep for either Illumina or Oxford Nanopore Technology(ONT). These library preps use barcoded adapters that are unique to each pool. The library preps are ligation-based, and no amplification steps are necessary.
Sequencing: The resulting libraries are pooled together and sequenced on the desired NGS platform.
Data Analysis: The resulting NGS reads are then demultiplexed using the barcodes. The resulting reads are then mapped to each of the wells of the plates. The resulting mapped reads for each well of each plate are then aligned to get the consensus sequence for each well. An analysis is performed to understand the diversity of the sequenced pools and pick wells of interest. The picked wells are then passed along to the next step of the pipeline.
Cherry Picking: Sequences of interest are cherry-picked from the initial plates and grown into new 384 well plates. A colony PCR is then performed on the new plates to generate the DNA sequences that are ready for downstream applications.

Example 1: Sybody pool

Example 2: scFv pool

Operation and Throughput

Currently, the pipeline is not entirely autonomous as it requires minimal human intervention for:

Loading and retrieving plates on the B.Sight and Hamilton systems
Loading reagents on the Hamilton system (e.g., Taq polymerase MM, NGS Library Prep Reagents)
Loading the sequencer with the flow cell, sample, and reagents

The entire process of control and analysis is efficiently managed on the cloud using AWS's infrastructure. Protocols for the Hamilton liquid handler are developed and implemented using PyHamilton, an open-source interface for programming Hamilton liquid-handling robots.

Using the pipeline, it is painless to cost-effectively obtain thousands of unique DNA variants in a single run, which takes approximately one week and requires minimal hands-on time.

Paragraph that talks about how the pipeline can be fitted into different workflows, with a link to a blog post that describes these workflow. Same paragraph should link to another blogpost that describes how using ONT makes things cheaper, faster, and more flexible.

Easy and Approachable for labs of different sizes

The development and deployment of a pipeline like ours is technically approachable and cost-friendly for labs of various sizes in both academia and industry.

Equipment Required:

Process for Clonal Isolation (several options available):
Liquid Handling Robot (e.g., Hamilton Starlet) or any other liquid handler with open-source software for programming. This avoids reliance on companies for protocol development, which can be inefficient and costly.
Thermocycler with a 384-block head.
Sequencing platform. Most academic labs have access to sequencing platforms through core facilities. Still, if an academic or industrial lab does not have access, there are multiple options:

At the moment, the overall per-well cost of running the pipeline is between 50 and 80 cents depending on the throughput and sequencing platform used. This cost will be further reduced to ~40 cents with planned future modifications to the pipeline (Check out this blog post describing how ONT’s sequencing platform allows for a faster, more flexible, and lower-cost workflow). The cost per retrieved sequence depends on the pool’s enrichment profile. Pools with a higher level of diversity and more homogeneous per-sequence representation will have a lower cost per retrieved variant. A pool with a common diversity and enrichment profile requires sequencing an average of 6 wells per unique sequence retrieved. At a per-well cost of 60 cents, the cost per unique sequence retrieved is ~4$. A standard unique DNA sequence of 1000 bp would cost 70$ if ordered as a gBlock from Twist Biosciences.

The developed pipeline is an automated, cost-effective, and scalable method that allows for retrieval of thousands of of unique DNA variants in one run, taking around 1 week, with little hands-on time. The whole process of control and analysis is streamlined on the cloud, making it easy to use and accessible to everyone. The development of the pooled NGS approach was approachable, fast, cheap, and scalable. With the advent of lab automation, the process became even more streamlined. Lower cost of NGS makes this approach low cost and scalable. New NGS platforms like ONT make this approach even more flexible, scalable, and low cost.

Outcomes and Looking Forward

The method described can generate thousands of unique protein variants in a single run, taking around one week with minimal hands-on time. This represents a significant increase in throughput compared to traditional methods, such as 96-well picking followed by Sanger sequencing. The pooled NGS approach can be used to increase throughput of variant testing for any kind of protein application, allowing for deeper searches into the sequence space and the discovery of hits that may not have been found otherwise. For binder search, a range of binders can be obtained, which is important for some applications and for training machine learning models. Well-behaved binders with low expression levels that would have low enrichment in a display pool can sometimes be found.

In conclusion, the pooled NGS approach is a game-changer for protein variant testing. It is scalable, fast, and cost-effective, making it an attractive option for any kind of protein-related application. With lab automation and new NGS platforms, the process has become even more streamlined and accessible to everyone.