 Home > Biochemistry News > Biotechnology News > Deep learning reveals the concept of predictive sequences in immunotherapy immune components

Deep learning reveals the concept of predictive sequences in immunotherapy immune components

 Last Update: 2022-10-03
 Source: Internet
 Author: User

summary

T cell receptor (TCR) sequencing has been used to characterize the immune response

Brief introduction

The emergence of cancer checkpoint inhibition [α-programmed death1 (α-PD1) and α-cytotoxic T-lymphocyte-associated protein 4 (α-CTLA4)] has changed how oncologists understand and treat advanced and invasive diseases (1–III).

A promising area of research is the detection of T cell reactions by extensive analysis of the full sequence of T cells by T cell receptor sequencing (TCR-seq), where the TCR sequence is obtained by next-generation sequencing, allowing characterization of the antigenic determinants of the reaction (12, 13, 21–23).

We use instruction set classifiers from DeepTCR, a set of previously described deep learning algorithms, to search for sequence concepts that can predict the response to immunotherapy (22, 23).

outcome

Expand DeepTCR's full-library classifier with human leukocyte antigens

We first extended the Multi-Instance Learning (MIL) library classifier we described earlier to allow the incorporation of human leukocyte antigens (HLA) into the characterization of TCR to provide a representation of the combined TCR-HLA antigen latent space (Figure 1).

Figure 1.

We extended our previous work by modifying the TCR feature block to incorporate the HLA background, in which a given set

Expand to get more open in the viewer

DeepTCR's instruction set classifier can predict reactions

We first applied DeepTCR to pre-treatment tumor specimens in a patient cohort in the CheckMate-038 clinical trial (Figure S1

Figure 2 Full spectrum classification

(A) Establish a participant operating characteristic (ROC) curve to predict response to immunotherapy (full response and partial response) and provide TCR, HLA, or TCR+HLA information to the supervised reserve classifier [100 Monte Carlo (MC) simulations with sequence size of 37 and test size of 6].

Expand to get more open in the viewer

Unsupervised characterization reveals the nature of predictive antigenic reactions

To describe the distribution of TCR sequences in patients who treat reactive or unresponsive, we trained a variational autoencoder (VAE), which is another model part of the DeepTCR framework (22), to obtain unsupervised characterization on all data to visualize the distribution of unresponders and responders [by uniform manifold approximation and projection (UMAP)] (Figure 3A; Each sample distribution is shown in Figure S6).

Figure 3

(A) To provide a descriptive understanding of the T cell responses of responders and non-responders in CheckMate-038 clinical trials, we attempted to characterize

Expand to get more open in the viewer

During treatment, the predictive signal persists

Next, we would like to ask whether the predictive characteristic of this response (pre-send) persists in the TCR library after treatment (post-sending

Figure 4 TCR records

(A) Three models trained on 35 pairs of post-treatment TCR profiles were applied to pre-treatment and post-treatment TCR

Open in the viewer

The predictive feature of no response is associated with tumor-specific tcr

To further describe the antigen-specificity of the antigenic reaction, we first created residue-sensitive markings as described in the original DeepTCR publication (22) responders (CRPR) and non-responders (SDPD) in the top 50 most predictive TCRs (Figure 5A).
We note that the most predictive residues are located in the central part of the sequence, suggesting that the predicted signal is indeed related to the antigen-specific nature of TCR
.
Then we used Oliveira's previously published melanoma dataset and so on
.
(36) The authors paired TCR sequence data with known specificity (i.
e.
, virus-neoantigen-to-tumor-associated antigens) and known HLA background
for individuals obtaining sequences.
We assign these TCRs the response and the likelihood
of no response for each sequence through the previously trained DeepTCR instruction library classifier.
We note that when looking at the likelihood distribution of different types of antigens, virus-specific TCRs [Epstein-Barr virus (EBV), influenza (influenza), and yellow fever (YF)] had a higher likelihood of response than tumor-specific TCRs [T cell-recognized NeoAg and melanoma antigens (MART-1)] (Figures 5, B, and C).
To further validate these findings, we extracted the TCR(37) cross-exact sequence from the McPas TCR database to match the TCRs found in the CheckMate-038 cohort and then observed the corresponding reaction likelihood of these cross-matched TCRs
.
Again, we noticed that Oliveira's dataset had similar findings and so on
.
(36) Virus-specific TCRs (EBV, cytomegalovirus, influenza, and YF) have a higher likelihood of response than tumor-specific TCRs (MART-1) (Figure 5D).
When looking at these virus-specific TCRs and MART-1-specific TCRs in the unsupervised TCR sequence space, we also found that virus-specific TCRs are more abundant in the responder-specific region, while MART-1-specific TCRs are more abundant in the non-response region of UMAP (Figure 5E).

Figure 5 Specificity and dynamics of predictive TCR
.

(A) Residual sensitivity labels were created for the top 50 most predictive TCR sequences of responders (CRPRs) and non-responders (SDPDs
).
(B) TCR is collected from Oliveria's previous publications, etc
.
(36) Shown in the UMAP space of the phenotype (from single-cell RNA sequencing) and highlighted in red by antigen-specific (tumor-specific vs.
virus-specific tcr), and the corresponding reaction likelihood
determined by the trained DeepTCR library classifier.
The likelihood of the reaction distribution of each class of antigen (C) Olivia and so on
.
(36) Dataset and (D) McPas TCR database (orange, tumor specificity; Grey, viral specificity).

(E) Virus-specific versus MART-1-specific TCRs from the McPas-TCR database are shown in the unsupervised TCR sequence space in the predicted TCR of the CheckMate-038 cohort
.
The color corresponds to the nuclear density estimate
for that point in sequence space.
(F) In pre- and post-treatment TCRs, clonal specific frequency changes are shown as a relationship between
viral specificity and tumor specificity.
Each component pair of box plots represents a 10% probability range (i.
e.
, the first pair of box plots represents 0-10% of the tumor-specific sequence).

The average of each block plot is represented
by a green triangle.
(G) Changes in sequence frequency in each probability box are aggregated in each patient and displayed as a probability function of virus-specific vs.
tumor-specific, showing the net change
in frequency of a given sequence within each patient.

Expand to get more open in the viewer

Tumor-specific responses show more dynamic changes in non-respondents

Finally, we wanted to see if these predicted TCRs had any unique dynamics
.
To do this, we first detected changes in TCR sequence frequency in pre-treatment and treatment samples at the clonal level as a function of tumor specificity versus viral specificity (Figure 5F).
We note that while the predicted virus-specific TCR varies little between the non-responder and the responder, the frequency of tumor-specific TCRs found in pre-treatment samples is significantly reduced in non-responders and responders, while the frequency of tumor-specific TCRs found in post-treatment samples increases
significantly among non-responders and responders.
This finding suggests that tumor-specific clones convert faster
to virus-specific clones in both non-responders and responders.
When pooling frequency changes in each patient, we further observed the same findings that the replacement rate of tumor-specific clones in non-responders was higher than in reactive patients (Figure 5G) suggesting that non-responders had ineffective tumor-specific responses and that tumor-specific TCRs had a higher
conversion rate in these patients.

discuss

In this work, we sought to understand the T cell sequence determinants of immunotherapy response in the clinical setting and their potential antigen specificity
.
While there are studies in this area to understand the quantitative aspects of TCR spectra (i.
e.
, diversity, abundance, etc.
), there is still a need to study the sequence motifs/concepts
that TCR spectral sets may predict immunotherapy response.
In this work, we used and extended the previously described set of deep learning algorithms for TCR spectrum analysis to create models that not only predict clinical responses, but also allow us to understand and propose a biological model that explains the differences
in TCR spectrum among responders/non-responders.

In the field of cancer immunology, many previous work attempts to understand the antigenic determinants of treatment response, often from the perspective of the proposed epitope/antigen
.
We built a computational pipeline to take entire exome sequencing (WES) data and predict epitopes (7–9, 11, 12, 38).
However, the accuracy of these ducts suffers as many successive steps/algorithms are required from mutation to immune-associated epitopes (i.
e.
, expression, protein body lysis, major histocompatibility complex binding, and T cell recognition).
The advantage of directly asking about TCR sequences/sequences is that this is a direct measurement
of antigen-specific responses in the immune response.
However, the current obstacle is to understand the antigen information encoded in the TCR sequence, and there is no high-throughput, efficient way to detect the antigen specificity of the TCR sequence, except for direct empirical
verification of the TCR clone.
Therefore, there is an effort in the field of machine learning to try to extract this antigenic information from TCR sequences, including methods such as DeepTCR (used in this work) (22, 23, 44–50).
Although this field is still in its infancy, as more and more data is used to train these models, they have the potential to revolutionize the way we understand immune response antigen-specificity directly from the full spectrum of TCR, thus avoiding highly variable and inaccurate prediction methods
that attempt to predict related epitopes.
In this work, we show how methods such as DeepTCR can be used in the future not only to create possible biomarkers for cancer prediction, but also to extract meaningful biological insights
from TCR libraries.
"

In the first part of this study, we first extended our previous work to integrate HLA into the representation of TCR sequences
.
While a TCR sequence can be thought of as containing the information needed to understand antigen specificity, it actually contains antigen/epitope information in the context of HLA
.
Given the high heterogeneity of HLA alleles in human populations, there is no guarantee that TCR sequences will respond
to the same epitope/antigen in different individuals.
Therefore, we created a method to create a joint representation
of the TCR sequence and the HLA background.
This joint expression becomes a more complete and reliable method of epitope measurement and allows direct comparison of TCR libraries
between HLA mismatched individuals.
When applying this method to reaction prediction, we found that combining TCR sequence information and HLA background can indeed improve the predictive power
of the model.

While the predictive power of the model is a key advantage of our approach, as we were able to aggregate the sum of TCR information into a complete set to predict relevant information about the response to treatment, much of the work was focused on the interpretability of the model, hoping to reveal previously unrecognized biological ideas
.
We first use a completely unsupervised TCR sequence representation method VAE to describe and visualize the predictive characteristics
of the response.
When we did this, we found that our supervised MIL model did extract relevant predictive features
from the background "noise" of the TCR track.
When using the unsupervised orthogonal validation method, this gives us further evidence that our supervised model did not overfit the data, and when looking at the distribution of the predicted sequences in each patient, we were able to observe that conservative TCR sequence features in responders/unresponders were shared
among multiple patients.
This forces us to ask an unavoidable question: what are
the specificities of these predictive TCR sequences.
By using two previously published datasets with known TCR vs.
specificity, we found that responders were rich in predictive signals, similar to viral responses, while non-responders were rich in signals
similar to tumor-specific responses.
Although initially unexpected, we infer that viral signals represent background T cell responses within tumors [as confirmed by other studies](36)] and that the aggregation of tumor-specific T cells in non-responders is associated
with background viral signaling.
Based on previously published datasets that TCR sequences are not only related to antigen specificity but also to phenotype, we infer that the aggregation of tumor-specific T cells represents terminal differentiation effector T cells that may have been dysfunctional, and therefore, their accumulation
in non-respondents.

When studying the dynamics of these antigen-specific responses before and after the start of immunotherapy, although the antigen-specific signal did not change during treatment, we were surprised to find that tumor-specific T cells had higher
conversion rates in patients who did not respond.
Combining all of these observations, we present a biological model of immunotherapy kinetics and antigen-specific traits, as well as the differences between these traits between therapeutic responders and non-responders (Figure 6).
Notably, non-responders are characterized by dysfunctional tumor-specific T cells undergoing higher levels of conversion when receiving immunotherapy, suggesting that T cells' sustained response to tumors is futile
.
Instead, the respondent maintains an existing tumor-specific response within the tumor, whose function is salvaged by immunotherapy, so that the T cells already present in the tumor are able to exert their anti-tumor activity
effectively.

Figure 6 Dynamic changes
of tumor-specific T cells in immunotherapy.

Before treatment, people who did not respond to immunotherapy had accumulated tumor-specific dysfunctional effector T cells
.
After immunotherapy was started, tumors-specific T cells in non-responders had a higher
rate of renewal compared to those who responded.
Use BioRender.
com site when you create it

Open in the viewer

Finally, this biological model is consistent with what has been reported in previous transcriptomics studies in the field
.
In Oliveira's study and so on
.
(36), non-reactive patients are characterized by high levels of accumulation of tumor-reactive T cells, while in patients with unresponsive melanoma, the level of T cells is significantly elevated
.
This specificity, even at high frequencies, does not produce a potent anti-tumor response because of the high levels of failure measured by single-cell RNA in the tumor microenvironment (36).
Consistent with this view, patients with melanoma who responded to checkpoint blocking had a higher proportion of virus-specific T cells assumed in tumor specimens, while patients who did not respond were characterized by tired tumor-infiltrating lymphocytes (35, 36).

While the findings of this study demonstrate a way to apply interpretable machine learning to TCR spectral analysis and biological insights that people can appreciate, there are certainly limitations to this work
.
The biggest limitation of this study is the smaller
size of the training/validation cohort used in this study.
Deep learning models are notorious for their ability to over-adapt to data, and there are many factors to consider when training these models so that they don't over-adapt to false or irrelevant information
.
To address this major limitation, we ensure that the performance
of the model in the test set is only evaluated during cross-validation.
In addition, by confirming the discovery of this predicted sequence feature with VAE (a completely unsupervised method), we are able to provide further evidence that our supervised model did not overfit
the data.
Finally, we validated the predictive characteristics of the CheckMate-038 cohort in two other clinical cohorts receiving checkpoint blockade therapy, further validating the observed findings
.

Taken together, these findings highlight the utility in deep learning to determine the key specific characteristics of TCR spectra and their dynamics under the influence of immunotherapy and their relationship
to clinical response.
Further work in this area may utilize these described methods to develop biomarkers and contribute to the understanding and development of better targeted therapies
in the era of precision oncology.

method

CheckMate-038 experimental model and participant details

CheckMate-038 is a forward-looking study approved by a multisectoral, multi-agency, and institutional review committee (CA209-038; NCT01621490）
。 Patients in the 2-4 segments received nivolumab (3 mg/kg) (n=21) or nivolumab (1 mg/kg) + ipilimumab (3 mg/kg) every 2 weeks × 4 times every 3 weeks, followed by nivolumab (3 mg/kg) every 2 weeks (n=).
62) until progress or up to 2 years
.
Radiation therapy response assessment is performed approximately every 8 weeks until the disease progresses
.
Usually after 4 weeks, progress
is confirmed by computed tomography.
The patient's tumor response is defined
by RECIST v1.
1.
Unless otherwise indicated, a response to treatment indicates the best
overall response.
All patients have a biopsy
of metastases before starting treatment (1-7 days before the first administration).
Tumor tissue is divided into formalin fixation, paraffin embedding (FFPE), or subsequent RNA/DNA extraction
with RNA storage (Ambion).
PD-L1 expression (dako28-8 antibody) on the surface of tumor cells is
detected in a central laboratory.
The clinical trial protocol and its amendments were approved by the review committees of the relevant bodies and studied
in accordance with the Helsinki Declaration and the harmonized guidelines of the International Conference on Good Clinical Practice.
All patients sign written informed consent before proceeding with any research procedure
.

CheckMate-038 TCR seq and HLA data generation

Tumor biopsy samples are collected before starting treatment and stored in rnater
.
DNA was extracted and submitted to an adaptive biotechnology company for investigation-level TCRβ chain sequencing, where the targeted amplicon bank targets all TCRβ strand V/D/J gene fragments by multiplex polymerase chain reaction and sequenced using the Illumina HiSeq system (51, 52).
Data from individual TCR sequences previously analyzed by Anagnostou, among other things
.
(21) Including V/D/J gene fragment identification and CDR3-β sequence, analyzed by DeepTCR
.
Tumor biopsy DNA is also sent to WES (Personal Genomic Diagnosis) to determine TMB and the patient's HLA genotype (53) is inferred using OptiType Data from patients who agreed to deposition will be submitted to the European Genomic Phenotyping Archive (21).

Data management

TCR-seq files are collected
as original tsv/csv format files from various sources cited in the manuscript.
The sequencing file is parsed to remove the non-productive sequence after obtaining the amino acid sequence
of CDR3.
Clones with different nucleotide sequences but identical amino acid sequences are aggregated under one amino acid sequence and their readings are summed to determine their relative abundance
.
In the analytical code, we also specify to ignore sequences using non-internationally harmonized pure chemical and applied chemical letters (*, X, O) and to delete sequences longer than 40 amino acids
.
For the purposes of the algorithm, the maximum length can be changed, but we chose 40 because we do not expect any sequence of real numbers to be longer
than this length.

Train the DeepTCR track classifier

Before starting treatment, to determine the predictive characteristics of responses in the TCR sequence of the tumor microenvironment, we used DeepTCR(v2.
1.
6), a deep learning framework that reveals the concept of sequences in T cell sequences (22).
We made a significant change to the existing software to allow HLA information
to be included in the TCR's representation.
This is achieved by representing the observed HLA background of a given TCR as a classified multihot-coded variable as input to the
neural network.
All other aspects of the method are the same as described in the original manuscript first proposed by DeepTCR
.
Notably, we used TCR sequence information (CDR3-β and V/D/J), HLA, or TCR+HLA information to fit instruction set classifiers on CheckMate-038 data to demonstrate different types of information, each input contributing to the predictive power
of the model.
For each type of input tested, the same precise training/test segmentation is used during MC cross-validation for a fair comparison when comparing models trained with different input data
.
In addition, due to the small nature of the CheckMate-038 dataset, training must be done
in a way that prevents the repertoire classifier from overfitting.
Therefore, to train instruction set classifiers on these datasets, we used MC cross-validation, where hinge loss was used during model training, which prevents the model from further reducing the loss of any given sample below a defined threshold
.
The idea behind this objective function is that once the sample predictions are correct enough, the network is not encouraged to further reduce its losses, thereby reducing overfitting
of the training data.
Once the predetermined threshold is reached, the model training with this hinge loss is stopped, and the model performance evaluation
of the test data of the train/test segment is maintained while the MC cross-validation is maintained.
We then used a bootstrap method where we performed 5,000 samples of the MC prediction to approximate the confidence interval
near the AUC.
All hyperparameters of the DeepTCR model can be found in the publicly available GitHub repository as shown below (Data and Material Availability).

Verify the queue

TCR-seq data were collected from two previously published manuscripts (34) and Shad (35) consisting
of basal cell/squamous cell carcinoma and melanoma patients, respectively.
The yost dataset includes samples from 11 patients whose TCR sequences are available from pretreatment biopsies and are available on immuneACCESS; The sade dataset includes samples of 19 patients whose TCR sequences are available from pre-treatment biopsies and are available
in original published material.
Both cohorts consisted of patients receiving checkpoint blockade therapy and were assessed for clinical response to treatment by RECIST criteria in a manner similar
to the one performed in the CheckMate-038 cohort.
Patients in these two separate clinical cohorts were then inferred at the instruction library level using the DeepTCR Instruction Library Classifier suitable for the CheckMate-038 cohort and assessed predictive performance
by ROC and AUC measurements.

Unsupervised statements through VAE and UMAP

To provide interpretability of the predicted features found, we used DeepTCR-VAE for unsupervised dimensionality reduction for all TCRs found in the CheckMate-038 cohort
.
Each instance entering VAE is defined by CDR3-β, V/D/J GENE USE, AND HLA's background of TCR
.
With VAE, this input is converted to a potential vector
of 128 dimensions before being further simplified to 2D by UMAP (the default setting for the python package UMAP learn).
For visualization purposes, since each TCR has a frequency associated with it, this information is used to construct a two-dimensional histogram to visualize the density
of these TCRs in the potential space of UMAP.

Post-processing inference

To apply the model from the pretreatment cohort to the post-treatment cohort, we used a method to prevent excessive inflation of performance characteristics because the samples before and after treatment are highly correlated (from the same patient).

To do this, we only use models for post-treatment individuals who are not trained
for pre-treatment tumors in these individuals.
In other words, when the model is trained on a given partition of the pre-processing data and then tested on another partition (test set) of the post-processing data of the pre-processing data and the paired test set
.
This type of cross-validation prevents the model from making predictions about trained patients, either before or after treatment
.

Associate the predictive model with known antigen specificity

To test the antigen specificity of the response/no response prediction signal, we collected two previously published datasets that were empirically validated
against the CDR3 sequence of the countertope/epitope.
Since our clinical cohort consists of melanoma patients, we first used a melanoma-associated dataset in which the authors established a link between TCR sequences, antigen specificity, and gene expression phenotypes (36).
We also use McPas-TCR, a larger dataset containing TCR sequences and their known specificities (37).
Melanoma dataset (36), because this dataset has CDR3-β sequences, the use of V/D/J genes, and the HLA background of individuals, we were able to score
each TCR with pre-trained models.
In our analysis of the McPas TCR database, to maximize the overlap between the TCR found in our cohort of patients and the TCR found in the database, we cross-matched the TCR in the clinical cohort (with predictive probability) with the TCR in the McPas TCR database at the CDR3-β sequence level to match known antigen-specific TCRs to their likelihood of responding/non-responding
。

Clonal kinetics as a function of reaction possibility

In the CheckMate-038 cohort, due to the presence of biopsies of both pre- and post-treatment patients, we wanted to ask about clonal kinetics
based on the information provided by the response prediction model.
To do this, we divided all TCR sequences into 10 sequence categories that represent the spectrum
of the virus-tumor-specific TCR predicted by our model.
We then further classify these sequences into Responder (CRPR) or No Responder (SDPD
).
We then observed their clonal dynamics
before or after treatment.
For the TCR sequences that appear in pre-treatment biopsies, we observed frequency changes after treatment relative to pre-treatment frequency, and for TCR sequences seen in post-treatment biopsies, we observed changes in
pre-treatment relative to post-treatment frequency.
To further quantify the dynamics of TCR at the sample/patient level, we aggregate frequency changes per patient along the virus-to-tumor spectral line to each patient to output a net change in
frequency per patient.

Statistical testing and machine learning models

All statistical tests applied to data are implemented with scipy
.
Statistics module
.
Scikit-learn implements classic machine learning techniques and performance metrics
.

This article is an English version of an article which is originally in the Chinese language on echemi.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to service@echemi.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.