Search more information of high quality chemicals, good prices and reliable suppliers, visit
www.echemi.com
summary
T cell receptor (TCR) sequencing has been used to characterize the immune response
Brief introduction
The emergence of cancer checkpoint inhibition [α-programmed death1 (α-PD1) and α-cytotoxic T-lymphocyte-associated protein 4 (α-CTLA4)] has changed how oncologists understand and treat advanced and invasive diseases (1–III).
A promising area of research is the detection of T cell reactions by extensive analysis of the full sequence of T cells by T cell receptor sequencing (TCR-seq), where the TCR sequence is obtained by next-generation sequencing, allowing characterization of the antigenic determinants of the reaction (12, 13, 21–23).
We use instruction set classifiers from DeepTCR, a set of previously described deep learning algorithms, to search for sequence concepts that can predict the response to immunotherapy (22, 23).
outcome
Expand DeepTCR's full-library classifier with human leukocyte antigens
We first extended the Multi-Instance Learning (MIL) library classifier we described earlier to allow the incorporation of human leukocyte antigens (HLA) into the characterization of TCR to provide a representation of the combined TCR-HLA antigen latent space (Figure 1).
DeepTCR's instruction set classifier can predict reactions
We first applied DeepTCR to pre-treatment tumor specimens in a patient cohort in the CheckMate-038 clinical trial (Figure S1
Unsupervised characterization reveals the nature of predictive antigenic reactions
To describe the distribution of TCR sequences in patients who treat reactive or unresponsive, we trained a variational autoencoder (VAE), which is another model part of the DeepTCR framework (22), to obtain unsupervised characterization on all data to visualize the distribution of unresponders and responders [by uniform manifold approximation and projection (UMAP)] (Figure 3A; Each sample distribution is shown in Figure S6).
During treatment, the predictive signal persists
Next, we would like to ask whether the predictive characteristic of this response (pre-send) persists in the TCR library after treatment (post-sending
The predictive feature of no response is associated with tumor-specific tcr
To further describe the antigen-specificity of the antigenic reaction, we first created residue-sensitive markings as described in the original DeepTCR publication (22) responders (CRPR) and non-responders (SDPD) in the top 50 most predictive TCRs (Figure 5A). We note that the most predictive residues are located in the central part of the sequence, suggesting that the predicted signal is indeed related to the antigen-specific nature of TCR . Then we used Oliveira's previously published melanoma datasetand so on . (36) The authors paired TCR sequence data with known specificity (i. e. , virus-neoantigen-to-tumor-associated antigens) and known HLA background for individuals obtaining sequences. We assign these TCRs the response and the likelihood of no response for each sequence through the previously trained DeepTCR instruction library classifier. We note that when looking at the likelihood distribution of different types of antigens, virus-specific TCRs [Epstein-Barr virus (EBV), influenza (influenza), and yellow fever (YF)] had a higher likelihood of response than tumor-specific TCRs [T cell-recognized NeoAg and melanoma antigens (MART-1)] (Figures 5, B, and C). To further validate these findings, we extracted the TCR(37) cross-exact sequence from the McPas TCR database to match the TCRs found in the CheckMate-038 cohort and then observed the corresponding reaction likelihood of these cross-matched TCRs . Again, we noticed that Oliveira's dataset had similar findingsand so on . (36) Virus-specific TCRs (EBV, cytomegalovirus, influenza, and YF) have a higher likelihood of response than tumor-specific TCRs (MART-1) (Figure 5D). When looking at these virus-specific TCRs and MART-1-specific TCRs in the unsupervised TCR sequence space, we also found that virus-specific TCRs are more abundant in the responder-specific region, while MART-1-specific TCRs are more abundant in the non-response region of UMAP (Figure 5E).
Tumor-specific responses show more dynamic changes in non-respondents
Finally, we wanted to see if these predicted TCRs had any unique dynamics . To do this, we first detected changes in TCR sequence frequency in pre-treatment and treatment samples at the clonal level as a function of tumor specificity versus viral specificity (Figure 5F). We note that while the predicted virus-specific TCR varies little between the non-responder and the responder, the frequency of tumor-specific TCRs found in pre-treatment samples is significantly reduced in non-responders and responders, while the frequency of tumor-specific TCRs found in post-treatment samples increases significantly among non-responders and responders. This finding suggests that tumor-specific clones convert faster to virus-specific clones in both non-responders and responders. When pooling frequency changes in each patient, we further observed the same findings that the replacement rate of tumor-specific clones in non-responders was higher than in reactive patients (Figure 5G) suggesting that non-responders had ineffective tumor-specific responses and that tumor-specific TCRs had a higher conversion rate in these patients.
discuss
In this work, we sought to understand the T cell sequence determinants of immunotherapy response in the clinical setting and their potential antigen specificity . While there are studies in this area to understand the quantitative aspects of TCR spectra (i. e. , diversity, abundance, etc. ), there is still a need to study the sequence motifs/concepts that TCR spectral sets may predict immunotherapy response. In this work, we used and extended the previously described set of deep learning algorithms for TCR spectrum analysis to create models that not only predict clinical responses, but also allow us to understand and propose a biological model that explains the differences in TCR spectrum among responders/non-responders.
In the field of cancer immunology, many previous work attempts to understand the antigenic determinants of treatment response, often from the perspective of the proposed epitope/antigen . We built a computational pipeline to take entire exome sequencing (WES) data and predict epitopes (7–9, 11, 12, 38). However, the accuracy of these ducts suffers as many successive steps/algorithms are required from mutation to immune-associated epitopes (i. e. , expression, protein body lysis, major histocompatibility complex binding, and T cell recognition). The advantage of directly asking about TCR sequences/sequences is that this is a direct measurement of antigen-specific responses in the immune response. However, the current obstacle is to understand the antigen information encoded in the TCR sequence, and there is no high-throughput, efficient way to detect the antigen specificity of the TCR sequence, except for direct empirical verification of the TCR clone. Therefore, there is an effort in the field of machine learning to try to extract this antigenic information from TCR sequences, including methods such as DeepTCR (used in this work) (22, 23, 44–50). Although this field is still in its infancy, as more and more data is used to train these models, they have the potential to revolutionize the way we understand immune response antigen-specificity directly from the full spectrum of TCR, thus avoiding highly variable and inaccurate prediction methods that attempt to predict related epitopes. In this work, we show how methods such as DeepTCR can be used in the future not only to create possible biomarkers for cancer prediction, but also to extract meaningful biological insights from TCR libraries. "
In the first part of this study, we first extended our previous work to integrate HLA into the representation of TCR sequences . While a TCR sequence can be thought of as containing the information needed to understand antigen specificity, it actually contains antigen/epitope information in the context of HLA . Given the high heterogeneity of HLA alleles in human populations, there is no guarantee that TCR sequences will respond to the same epitope/antigen in different individuals. Therefore, we created a method to create a joint representation of the TCR sequence and the HLA background. This joint expression becomes a more complete and reliable method of epitope measurement and allows direct comparison of TCR libraries between HLA mismatched individuals. When applying this method to reaction prediction, we found that combining TCR sequence information and HLA background can indeed improve the predictive power of the model.
While the predictive power of the model is a key advantage of our approach, as we were able to aggregate the sum of TCR information into a complete set to predict relevant information about the response to treatment, much of the work was focused on the interpretability of the model, hoping to reveal previously unrecognized biological ideas . We first use a completely unsupervised TCR sequence representation method VAE to describe and visualize the predictive characteristics of the response. When we did this, we found that our supervised MIL model did extract relevant predictive features from the background "noise" of the TCR track. When using the unsupervised orthogonal validation method, this gives us further evidence that our supervised model did not overfit the data, and when looking at the distribution of the predicted sequences in each patient,we were able to observe that conservative TCR sequence features in responders/unresponders were shared among multiple patients. This forces us to ask an unavoidable question: what are the specificities of these predictive TCR sequences. By using two previously published datasets with known TCR vs. specificity, we found that responders were rich in predictive signals, similar to viral responses, while non-responders were rich in signals similar to tumor-specific responses. Although initially unexpected, we infer that viral signals represent background T cell responses within tumors [as confirmed by other studies](36)] and that the aggregation of tumor-specific T cells in non-responders is associated with background viral signaling. Based on previously published datasets that TCR sequences are not only related to antigen specificity but also to phenotype, we infer that the aggregation of tumor-specific T cells represents terminal differentiation effector T cells that may have been dysfunctional, and therefore, their accumulation in non-respondents.
When studying the dynamics of these antigen-specific responses before and after the start of immunotherapy, although the antigen-specific signal did not change during treatment, we were surprised to find that tumor-specific T cells had higher conversion rates in patients who did not respond. Combining all of these observations, we present a biological model of immunotherapy kinetics and antigen-specific traits, as well as the differences between these traits between therapeutic responders and non-responders (Figure 6). Notably, non-responders are characterized by dysfunctional tumor-specific T cells undergoing higher levels of conversion when receiving immunotherapy, suggesting that T cells' sustained response to tumors is futile . Instead, the respondent maintains an existing tumor-specific response within the tumor, whose function is salvaged by immunotherapy, so that the T cells already present in the tumor are able to exert their anti-tumor activity effectively.
Finally, this biological model is consistent with what has been reported in previous transcriptomics studies in the field . In Oliveira's studyand so on . (36), non-reactive patients are characterized by high levels of accumulation of tumor-reactive T cells, while in patients with unresponsive melanoma, the level of T cells is significantly elevated . This specificity, even at high frequencies, does not produce a potent anti-tumor response because of the high levels of failure measured by single-cell RNA in the tumor microenvironment (36). Consistent with this view, patients with melanoma who responded to checkpoint blocking had a higher proportion of virus-specific T cells assumed in tumor specimens, while patients who did not respond were characterized by tired tumor-infiltrating lymphocytes (35, 36).
While the findings of this study demonstrate a way to apply interpretable machine learning to TCR spectral analysis and biological insights that people can appreciate, there are certainly limitations to this work . The biggest limitation of this study is the smaller size of the training/validation cohort used in this study. Deep learning models are notorious for their ability to over-adapt to data, and there are many factors to consider when training these models so that they don't over-adapt to false or irrelevant information . To address this major limitation, we ensure that the performance of the model in the test set is only evaluated during cross-validation. In addition, by confirming the discovery of this predicted sequence feature with VAE (a completely unsupervised method), we are able to provide further evidence that our supervised model did not overfit the data. Finally, we validated the predictive characteristics of the CheckMate-038 cohort in two other clinical cohorts receiving checkpoint blockade therapy, further validating the observed findings .
Taken together, these findings highlight the utility in deep learning to determine the key specific characteristics of TCR spectra and their dynamics under the influence of immunotherapy and their relationship to clinical response. Further work in this area may utilize these described methods to develop biomarkers and contribute to the understanding and development of better targeted therapies in the era of precision oncology.
method
CheckMate-038 experimental model and participant details
CheckMate-038 is a forward-looking study approved by a multisectoral, multi-agency, and institutional review committee (CA209-038; NCT01621490) 。 Patients in the 2-4 segments received nivolumab (3 mg/kg) (n=21) or nivolumab (1 mg/kg) + ipilimumab (3 mg/kg) every 2 weeks × 4 times every 3 weeks, followed by nivolumab (3 mg/kg) every 2 weeks (n=). 62) until progress or up to 2 years . Radiation therapy response assessment is performed approximately every 8 weeks until the disease progresses . Usually after 4 weeks, progress is confirmed by computed tomography. The patient's tumor response is defined by RECIST v1. 1. Unless otherwise indicated, a response to treatment indicates the best overall response. All patients have a biopsy of metastases before starting treatment (1-7 days before the first administration). Tumor tissue is divided into formalin fixation, paraffin embedding (FFPE), or subsequent RNA/DNA extraction with RNA storage (Ambion). PD-L1 expression (dako28-8 antibody) on the surface of tumor cells is detected in a central laboratory. The clinical trial protocol and its amendments were approved by the review committees of the relevant bodies and studied in accordance with the Helsinki Declaration and the harmonized guidelines of the International Conference on Good Clinical Practice. All patients sign written informed consent before proceeding with any research procedure .
CheckMate-038 TCR seq and HLA data generation
Tumor biopsy samples are collected before starting treatment and stored in rnater . DNA was extracted and submitted to an adaptive biotechnology company for investigation-level TCRβ chain sequencing, where the targeted amplicon bank targets all TCRβ strand V/D/J gene fragments by multiplex polymerase chain reaction and sequenced using the Illumina HiSeq system (51, 52). Data from individual TCR sequences previously analyzed by Anagnostou,among other things . (21) Including V/D/J gene fragment identification and CDR3-β sequence, analyzed by DeepTCR . Tumor biopsy DNA is also sent to WES (Personal Genomic Diagnosis) to determine TMB and the patient's HLA genotype (53) is inferred using OptiTypeData from patients who agreed to deposition will be submitted to the European Genomic Phenotyping Archive (21).
Data management
TCR-seq files are collected as original tsv/csv format files from various sources cited in the manuscript. The sequencing file is parsed to remove the non-productive sequence after obtaining the amino acid sequence of CDR3. Clones with different nucleotide sequences but identical amino acid sequences are aggregated under one amino acid sequence and their readings are summed to determine their relative abundance . In the analytical code, we also specify to ignore sequences using non-internationally harmonized pure chemical and applied chemical letters (*, X, O) and to delete sequences longer than 40 amino acids . For the purposes of the algorithm, the maximum length can be changed, but we chose 40 because we do not expect any sequence of real numbers to be longer than this length.
Train the DeepTCR track classifier
Before starting treatment, to determine the predictive characteristics of responses in the TCR sequence of the tumor microenvironment, we used DeepTCR(v2. 1. 6), a deep learning framework that reveals the concept of sequences in T cell sequences (22). We made a significant change to the existing software to allow HLA information to be included in the TCR's representation. This is achieved by representing the observed HLA background of a given TCR as a classified multihot-coded variable as input to the neural network. All other aspects of the method are the same as described in the original manuscript first proposed by DeepTCR . Notably, we used TCR sequence information (CDR3-β and V/D/J), HLA, or TCR+HLA information to fit instruction set classifiers on CheckMate-038 data to demonstrate different types of information, each input contributing to the predictive power of the model. For each type of input tested, the same precise training/test segmentation is used during MC cross-validation for a fair comparison when comparing models trained with different input data . In addition, due to the small nature of the CheckMate-038 dataset, training must be done in a way that prevents the repertoire classifier from overfitting. Therefore, to train instruction set classifiers on these datasets, we used MC cross-validation, where hinge loss was used during model training, which prevents the model from further reducing the loss of any given sample below a defined threshold . The idea behind this objective function is that once the sample predictions are correct enough, the network is not encouraged to further reduce its losses, thereby reducing overfitting of the training data. Once the predetermined threshold is reached, the model training with this hinge loss is stopped, and the model performance evaluation of the test data of the train/test segment is maintained while the MC cross-validation is maintained. We then used a bootstrap method where we performed 5,000 samples of the MC prediction to approximate the confidence interval near the AUC. All hyperparameters of the DeepTCR model can be found in the publicly available GitHub repository as shown below (Data and Material Availability).
Verify the queue
TCR-seq data were collected from two previously published manuscripts (34) and Shad (35) consisting of basal cell/squamous cell carcinoma and melanoma patients, respectively. The yost dataset includes samples from 11 patients whose TCR sequences are available from pretreatment biopsies and are available on immuneACCESS; The sade dataset includes samples of 19 patients whose TCR sequences are available from pre-treatment biopsies and are available in original published material. Both cohorts consisted of patients receiving checkpoint blockade therapy and were assessed for clinical response to treatment by RECIST criteria in a manner similar to the one performed in the CheckMate-038 cohort. Patients in these two separate clinical cohorts were then inferred at the instruction library level using the DeepTCR Instruction Library Classifier suitable for the CheckMate-038 cohort and assessed predictive performance by ROC and AUC measurements.
Unsupervised statements through VAE and UMAP
To provide interpretability of the predicted features found, we used DeepTCR-VAE for unsupervised dimensionality reduction for all TCRs found in the CheckMate-038 cohort . Each instance entering VAE is defined by CDR3-β, V/D/J GENE USE, AND HLA's background of TCR . With VAE, this input is converted to a potential vector of 128 dimensions before being further simplified to 2D by UMAP (the default setting for the python package UMAP learn). For visualization purposes, since each TCR has a frequency associated with it, this information is used to construct a two-dimensional histogram to visualize the density of these TCRs in the potential space of UMAP.
Post-processing inference
To apply the model from the pretreatment cohort to the post-treatment cohort, we used a method to prevent excessive inflation of performance characteristics because the samples before and after treatment are highly correlated (from the same patient).
To do this, we only use models for post-treatment individuals who are not trained for pre-treatment tumors in these individuals. In other words, when the model is trained on a given partition of the pre-processing data and then tested on another partition (test set) of the post-processing data of the pre-processing data and the paired test set . This type of cross-validation prevents the model from making predictions about trained patients, either before or after treatment .
Associate the predictive model with known antigen specificity
To test the antigen specificity of the response/no response prediction signal, we collected two previously published datasets that were empirically validated against the CDR3 sequence of the countertope/epitope. Since our clinical cohort consists of melanoma patients, we first used a melanoma-associated dataset in which the authors established a link between TCR sequences, antigen specificity, and gene expression phenotypes (36). We also use McPas-TCR, a larger dataset containing TCR sequences and their known specificities (37). Melanoma dataset (36), because this dataset has CDR3-β sequences, the use of V/D/J genes, and the HLA background of individuals, we were able to score each TCR with pre-trained models. In our analysis of the McPas TCR database, to maximize the overlap between the TCR found in our cohort of patients and the TCR found in the database, we cross-matched the TCR in the clinical cohort (with predictive probability) with the TCR in the McPas TCR databaseat the CDR3-β sequence level to match known antigen-specific TCRs to their likelihood of responding/non-responding 。
Clonal kinetics as a function of reaction possibility
In the CheckMate-038 cohort, due to the presence of biopsies of both pre- and post-treatment patients, we wanted to ask about clonal kinetics based on the information provided by the response prediction model. To do this, we divided all TCR sequences into 10 sequence categories that represent the spectrum of the virus-tumor-specific TCR predicted by our model. We then further classify these sequences into Responder (CRPR) or No Responder (SDPD ). We then observed their clonal dynamics before or after treatment. For the TCR sequences that appear in pre-treatment biopsies, we observed frequency changes after treatment relative to pre-treatment frequency, and for TCR sequences seen in post-treatment biopsies, we observed changes in pre-treatment relative to post-treatment frequency. To further quantify the dynamics of TCR at the sample/patient level, we aggregate frequency changes per patient along the virus-to-tumor spectral line to each patient to output a net change in frequency per patient.
Statistical testing and machine learning models
All statistical tests applied to data are implemented with scipy . Statistics module . Scikit-learn implements classic machine learning techniques and performance metrics .
This article is an English version of an article which is originally in the Chinese language on echemi.com and is provided for information purposes only.
This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of
the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed
description of the concern or complaint, to service@echemi.com. A staff member will contact you within 5 working days. Once verified, infringing content
will be removed immediately.
The source of this page with content of products and services is from Internet,
which doesn't represent ECHEMI's opinion. If you have any queries, please write
to service@echemi.com. It will be replied within 5 days.
Moreover, if you find any instances of plagiarism from the page,
please send email to service@echemi.com with relevant evidence.
Trade Alert - Delivering the latest product trends and industry news straight to your inbox. (We`ll never share your email address with a third-party.)