Machine learning analysis of humoral and cellular responses to SARS-CoV-2 infection in young adults

Introduction

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the etiological agent of the coronavirus disease 2019 (COVID-19). Both humoral and cellular immune responses against SARS-CoV-2 have major implications for the clinical outcome of COVID-19, the risk of reinfection and the efficacy of vaccination.

Antibodies are produced from plasma cells. The B cell maturation into plasma cells is supported by CD4 T cells via cell-cell interactions and cytokine secretion, while CD8 T cells eliminate virus-infected cells through cytolytic activity. B and T cells are therefore essential for eliminating the virus and establishing immunological memory. However, the extent to which T cell responses contribute to SARS-CoV-2 clearance and, more importantly, long-term protection is still under investigation. Early studies with human subjects have reported that COVID-19 patients with X-linked or autosomal-recessive agammaglobulinemia were able to recover from infection without severe disease requiring intensive care (1), however, subsequent studies also reported increased respiratory viral detection and symptom burden among patients with primary antibody deficiency (2). While B cells are critical for preventing infection or reducing inoculum size, T cell responses play a prominent role in clearing the infection (3, 4). Furthermore, immune responses to SARS-CoV-2 that generate coordinated CD4 and CD8 T cell-based immunity have been shown to correlate with favorable outcomes in COVID-19 patients (57). COVID-19 patients with B cell immunodeficiencies showed favorable outcomes upon strong CD8 T cell responses (8). These findings underline the importance of CD8 T cell mediated cytotoxicity in viral clearance, potentially contributing to a milder disease course. Together, these observations indicate that T cells provide substantial protective immunity, which limits severe disease in settings where antibody responses are diminished, thereby being beneficial for COVID-19 patients.

In this study, we analyzed via machine learning (ML) and classical statistical modeling SARS-CoV-2 specific humoral and T cell immune responses in a cohort of young convalescent adults with asymptomatic or mildly symptomatic SARS-CoV-2 infections. We aimed to assess if particular viral antigens induce antibody- and/or cell-mediated immunodominance. To this end, we determined which T cell subset and activation marker combinations allow predicting the antibody status. Furthermore, we employed ML methods in addition to conventional statistical modeling to uncover potentially nonlinear and complex associations among humoral and T cell immune responses. Our integrated approach to studying B cell, CD4 and CD8 T cell responses to SARS-CoV-2 allowed us to identify associations between the class of immune cells responding to SARS-CoV-2 and the virus components triggering such responses. Finally, we explored associations between T cell responses of COVID-19 patients and self-reported symptoms scores.

Materials and methodsCoV-ETH cohort

The ethical approval for the CoV-ETH study (CoV-ETH cohort) was obtained from the Cantonal Ethics Commission Zurich (BASEC-Nr. 2020-00949). Written informed consent was received from all participants. The study has been performed in accordance with the Declaration of Helsinki of 1975.

The CoV-ETH study launched in May 2020 and included 2,911 voluntary participants from the ETH Zurich community and respective household members, aged 18 to 64 years (Figure 1, Table 1, Supplementary Table 1). The first sampling of blood [at time point 1 (t1)] for the collection of plasma and peripheral blood mononuclear cells (PBMC) was in May 2020. The status of respiratory infections prior to the sampling was assessed. Symptoms scores were reported as 0 (without symptoms), 1 (local: with any one or several symptoms, but no fever), and 2 (systemic: fever alone or with any symptom or several). A compound symptoms score was assessed across two screenings (t1 and t2) by taking the maximum of the two scores for each participant.

www.frontiersin.org

Figure 1 Study overview. The CoV-ETH study was launched in May 2020 and comprised of 2,911 individuals without previous knowledge on SARS-CoV-2 immune state. A serological screening assessed 65 seroconverted individuals at the first screening in May 2020. From these and 69 randomly chosen non-seroconverted negative controls, blood samples from May and September 2020 were analyzed for humoral and T cell response. As positive and negative controls, 56 samples from 36 PCR-confirmed hospitalized SARS-CoV-2 infected individuals and pre-pandemic samples from 56 healthy individuals were used, respectively.

www.frontiersin.org

Table 1 The age distribution, function at ETH Zurich, COVID-19 contact in the last three months prior to sampling, and smoking status of the participants, and number of SARS-CoV-2 tested participants are shown.

In case of seroconversion, determined by seropositivity for RBD using an enzyme-linked immunosorbent assay (ELISA), blood specimens were obtained in September 2020 (at time point 2, t2) for plasma and PBMC isolation. At this time, no vaccination was available.

Based on RBD IgG levels, we included 134 probands into our study, of whom 69 were seronegative and 65 were seropositive individuals.

C+ cohort (positive controls – hospitalized COVID-19 individuals)

The C+ cohort comprised 56 PCR-confirmed SARS-CoV-2 infected samples from 36 unique individuals aged between 18 and 70. The samples were taken between 15 and 152 days after the symptoms’ onset. Blood sample processing was performed as reported earlier (9). Blood collection was performed under institutional review board approval number 2020-039 (ethics committee of the University Medical Center Halle).

C- cohort (negative controls – pre-pandemic individuals)

Healthy pre-pandemic control samples were collected at the Rockefeller University Hospital, US, between 1996 and 2000 and originated from 56 pre-pandemic healthy individuals aged between 21 and 85. Donor consent for their samples to be used in research was obtained from all participants and the study was approved by the Rockefeller University Ethics Committee. Plasma samples were stored permanently at -80°C.

Further information on sample collection and processing can be found in the Supplementary Methods section.

Enzyme-linked immunosorbent assays (ELISA)

All CoV-ETH cohort samples were screened for SARS-CoV-2 specific IgG, IgM, and IgA antibodies targeting the receptor-binding domain (RBD) using a previously described SARS-CoV-2 RBD ELISA (10). Three further in-house immunoassays were developed for the detection of SARS-CoV-2 specific IgG antibodies targeting Spike S1, S2 and Nucleocapsid (N), respectively (Supplementary Table 1). To characterize seroconverted participants of the CoV-ETH cohort, C+ and C- cohort samples, six different plasma sample dilutions for each of the individual assays were employed to achieve respective ED50 values. Assay details can be found in the Supplementary Methods section.

T cell analysis

The T cell response assays were performed on PBMCs collected at t1 and t2 of the CoV-ETH cohort individuals. Each assay plate contained PMBCs collected from a single healthy donor as an intra-assay control (IAC). After overnight cultivation, viability and cell count adjusted to 5 × 106 lymphocytes/mL were assessed by flow cytometry on a MACSQuant® Analyzer 16 (Miltenyi Biotec).

Cells were aliquoted and stimulated with a (1) PBS (negative control), (2) SARS-CoV-2 PepTivator mix (CoV-Mix), (3) Prot_N, (4) Prot_S1, (5) Prot_S, (6) Positive Control and (7) Prot_M. The IACs received only the negative control, positive control and a mix of 10 µM of each human PepTivator CMV pp65, PepTivator EBV Consensus, PepTivator AdV5 Hexon (Miltenyi Biotec) for four hours (Supplementary Figure 1A, B). Afterwards, cell staining was performed against CD14-VioBlue® (Miltenyi Biotec, Cat. No.130-110-525), CD20-VioBlue® (Miltenyi Biotec, Cat. No.130-111-531), CD8-VioGreen™ (Miltenyi Biotec, Cat. No.130-110-684), CD4-VioBright™515 (Miltenyi Biotec, Cat. No.130-114-535), IFNγ-PE (Miltenyi Biotec, Cat. No.130-113-496), IL-2-PE-Vio615 (Miltenyi Biotec, Cat. No.130-111-307), TNFα-PE-Vio®770 (Miltenyi Biotec, Cat. No.130-120-492), CD3-APC (Miltenyi Biotec, Cat. No. 130-113-135), CD154-APC-Vio®770 (Miltenyi Biotec, Cat. No.130-114-130), and cells were analyzed by flow cytometry. Assay details can be found in the Supplementary Method section.

Flow data files in MQD format were directly imported to FlowJo™ v10.6 (BD Life Sciences) for the analysis. Singlet viable CD3 T cells, CD8 cytotoxic T cells and CD4 T helper cells were analyzed using quadratic gating for co-expression of TNF and IFN-γ, as well as IL-2 and CD154 (Supplementary Figure 1B). Gate thresholds were set based on negative and positive controls of each sample. Cell counts (#; cells per quadrant), frequency of parent (%; portion of cells in a specific quadrant) and mean fluorescence intensity (MFI; per cells in the assigned quadrant) values were reported for the double positive populations and calculated for single positive populations. In total, 155 parameters were reported for each analyzed sample well.

Neutralizing antibodies

The determination of neutralizing antibody titers (nAb) in all selected samples of the CoV-ETH cohort and the samples of the C+ cohort was performed as reported earlier (9). Fourty samples of the C- cohort as well as pre-pandemic samples reported in the previous study (9) showed negativity.

Serological data analysis

For the re-sampling of PBMC in September, we defined a raw optical density (OD) threshold for RBD IgG ≥ 0.7 or IgM ≥ 1.0 or IgA ≥ 1.0. To validate the performance of the serological tests, receiver operating characteristic (ROC) curves were constructed from the pre-pandemic C- cohort and the SARS-CoV-2 PCR-confirmed C+ cohort.

Statistical and machine learning analysesEstablishment of antibody level cutoffs

Antibody response was defined as a binary variable based on the measurements acquired from the C+ and the C- cohorts. For each antibody targeting different SARS-CoV-2 antigens and domains (RBD, S1, S2, N and nAb), an optimal range of cutoffs maximizing the balanced accuracy (11) was selected. Supplementary Table 2 reports optimal cutoff intervals and the corresponding balanced accuracies attained on the control cohort, in addition to sensitivities and specificities. ROC curves for all antibody types are plotted in the Supplementary Figure 2. In the current analysis, we used a cutoff of 50, 20, 5, 5 and 20 for RBD, S1, S2, N and nAb, respectively. In addition, the compound antibody response (see Table 2) was obtained by aggregating responses across several antibodies targeting different SARS-CoV-2 antigens and domains. A subject was labeled positive if they had the ED50 of RBD ≥ 50 and either ED50 of N ≥ 5, ED50 of S1 ≥ 20 or ED50 of S2 ≥ 5 in at least one of the screenings (t1 or t2). Otherwise, the subject was assigned to the negative group (Figure 2). The resulting compound antibody response comprised two balanced categories: negative (n=69) and positive (n=65). The criteria for defining antibody responses described above are summarized in Table 2.

www.frontiersin.org

Table 2 Definitions of the response with respect to different antibody types.

www.frontiersin.org

Figure 2 Correlation analyses relating to the assessed humoral and cellular parameters at time points t1 and t2 relating to (A) CD3, CD4 and CD8 T cells and a detailed analysis of (B) the different stimulating peptides in CD4 T cells. Text color indicates serology and T cell assay measurements at t1 and t2. The magnitude of correlation coefficients is indicated by the color bar to the right. Statistically non-significant correlations are not displayed (t-test at significance level α=0.05). Correlation test p-values were adjusted for multiple comparisons using the Benjamini-Hochberg method.

Preprocessing

As preprocessing steps before training and validating predictive models, raw T cell counts, frequencies and MFIs were (i) normalized by subtracting the corresponding T cell response to the negative control treatment (background subtraction) and (ii) standardized by subtracting the mean and scaling to the unit variance across participants (standardization). No further feature transformations were performed. During exploratory analysis, we also considered normalization by dividing by T cell response to the negative control treatment. Results for this normalization technique are reported in Supplementary Table 3. Henceforth, all reported results relate to the normalization by subtraction.

Statistical and machine learning models

To explore relationships between the compound antibody and T cell responses, we leveraged statistical and ML predictive models. We trained and validated predictive models classifying negative and positive antibody response based on T cell measurements. ML analysis was performed in the Python programming language (version 3.8.8) (12) and in the R programming language (version 4.2.2) (13). We performed binary classification using the (i) logistic regression (LR) (14) as implemented in the scikit-learn library (version 0.24.1) (15), and (ii) gradient boosting (GB) (16), as implemented in the XGBoost library (version 1.3.3) (17). GB was considered in addition to the LR due to its ability to model nonlinear relationships without extensive feature engineering and transformation. No hyperparameter tuning was performed for GB, and default hyperparameter values were used to avoid overfitting. Features, aka predictors or explanatory variables, were given by T cell measurements expressed as (i) #, (ii) %, and (iii) MFIs, or a combination thereof.

Model evaluation and comparison

The predictive performance of models was evaluated using stratified bootstrapped train-test split. In this procedure, the dataset was resampled with replacement 1,000 times. For every bootstrap resample, the resampled dataset was split into the train (80%) and test (20%) sets, stratified by the response variable, and a predictive model was trained and tested. Test set performance was aggregated across the bootstrap resamples, and empirical confidence intervals (CI) were constructed. The bootstrapping (18) was performed to construct more conservative confidence intervals and was preferred to repeated train-test splits, standard in the ML literature, since the latter might produce misleading CIs and significance (19). To compare different predictive models, we used areas under the receiver operating characteristic (AUROC) and precision-recall (AUPRC) curves (20) computed on held-out test data. In addition, we evaluated the models’ balanced accuracy (BA) (11), sensitivity, and specificity by considering a threshold of 0.5 on predicted probabilities. The latter evaluation metrics are reported in Supplementary Table 6.

ResultsDescriptive statistics of the cohort and primary serology data

The descriptive statistics of the total number of 2,911 participants of the CoV-ETH study and the subgroup used in this study are shown in Table 1. Overall, the cohort is young (50% of participants younger than 30) and healthy (>90% of participants without any underlying cardiovascular risk factors). In May 2020, we found 4% of seropositive cases, which is in line with the reported numbers in Geneva in April 2020 (21). Interestingly, more than 80% of seropositive participants had no prior history of COVID-19 symptoms and therefore are considered asymptomatic, while 20% of individuals exhibited mild to moderate disease symptoms.

We next assessed the antibody titers against RBD, S1, S2 and N in the plasma of 165 donors, including 96 seropositive, and 69 randomly chosen donors displaying antibody levels below all positivity thresholds, and thus considered as seronegative (Figure 1). Thirty-one donors that, based on RBD measurements, were initially included in the study, were subsequently excluded because of a false-positive RBD cross-reactive signal. The cross reactivity was identified through a lack of additional seropositivity for S1, S2 or N and an absence of RBD decay over a period of 1 year. Finally, 65 individuals were identified as seroconverted. The details related to the distinct antibodies of the individual participants are shown in Supplementary Figure 3.

Basic exploratory data analysis

T cell responses were assessed by stimulating isolated PBMCs with SARS-CoV-2 protein-derived peptide pools and by determining frequencies of reacting T cells via flow cytometry. We (i) evaluated whether flow cytometry read-out alternatives were impacted by normalization strategies accounting for background noise, (ii) assessed the test specificity and (iii) determined the repeatability of the measurements. A considerable proportion of measurements resulted in numerically negative values after normalization (Supplementary Figure 4), a procedure that is strongly recommended to account for background noise, particularly in a low-input PBMC setting. We found that the general stimulation of T cells did not lead to a spurious association with humoral antibody status against SARS-CoV-2. Lastly, there was considerable variation for the cytokine assays measuring T cell responsiveness, especially in the negative control stimulation, due to only few cells per target quadrant (Supplementary Table 4). Nevertheless, all variances were lower than the differences between the infected and non-infected individuals found later. The detailed analysis can be found in the Further Results Supplementary section.

B and T cell responses against SARS-CoV-2 proteinsCorrelation analysis

To assess associations between antibody and T cell responses we initially performed a Pearson’s correlation analysis displayed in Figure 2. Figure 2A shows correlations among humoral and cellular parameters at time points t1 and t2 for CD3, CD4 and CD8 T cells. Figure 2B provides correlation analysis results for different stimulating peptides of CD4 cells. The analysis of multiple populations of circulating T cells reactive to SARS-CoV-2 revealed specific responses in the total CD3 T cell compartment, as well as the CD4 and CD8 T cell subsets in response to peptide pools covering S1, S2 as well as the N and M proteins. We found a high correlation of frequency of responding CD3 and CD4 T cells against the different respective peptide pools between t1 and t2, indicating that SARS-CoV-2 induces a stable cellular immune response over four months. Cytokine production of IFN-γ, TNF and CD154 correlated strongly in CD4 T cells within and between t1 and t2. Stimulation of CD8 T cells resulted in IFN-γ and TNF release, which showed a positive correlation within t1 and t2 and between these time points. Furthermore, there was a high correlation of IL-2 production between CD3, CD4 and CD8 T cells at t1, which largely disappeared at the later time point t2, demonstrating that the IL-2 response was short-lived.

When searching for associations between humoral and T cell responses to SARS-CoV-2 infection, we found moderate correlations of RBD and S1 with CD4 T cells for all cytokines at t1. However, this correlation was lost at t2. A positive correlation of nAb was only detected with IL-2 production in CD3, CD4 and CD8 T cells at t1. This association however was also lost at t2, most likely due to a faster decay of RBD/S1 antibodies. We observed almost no correlations between CD4 and CD8 T cells, neither between nor across time points t1 and t2. A greater correlation of CD4 T cells across t1 and t2 than for CD8 T cells may suggest that CD4 T cells have a longer half-life.

General predictive relationships

In the next step, we assessed if a statistical model and ML approach would relate T cell stimulation to antibody status. We modeled the relationship between T cell and compound antibody responses using LR and GB models. Table 3 summarizes the test-set performance of the predictive models trained on numbers [#], frequencies [%] and mean fluorescence intensity [MFI] data.

www.frontiersin.org

Table 3 Test-set bootstrapped areas under receiver operating characteristic (AUROC) and precision-recall (AUPRC) curves of logistic regression (LR) and gradient boosting (GB) models predicting the compound antibody response based on T cell data.

The data indicate that both LR and GB were able to capture a significant association of T cell counts and frequencies to the antibody response. Note that for all models, the lower bound of the empirical CI was above the expected performance of a random guess. Moreover, the found association was remarkably strong; for instance, the GB model trained on counts had an average test-set AUROC and AUPRC of 0.96 (95% empirical CI [0.80, 1.00]) and 0.96 (95% CI [0.81, 1.00]), respectively. MFI measurements featured a slightly weaker association, resulting in lower average AUROC and AUPRC and wider CIs. Thus, there was an overall significant association between T cell reactivity and antibody titer.

Predictors of B cell and T cell association

We next assessed which variables contributed the most to the association between T cell stimulation and antibody status. To this end, we explored the most important predictor variables in LR and GB. In GB, a feature’s importance was quantified by the gain in accuracy from adding the variable of interest to the set of all the other variables. In LR, the conventionally used absolute value of the rescaled coefficient was utilized. Figures 3A, B show each model’s top 5 most important variables. For the GB, the most important predictors were the frequencies of CD4 IL-2+/CD154+ T cells at both time points. The LR model also ranked the percentage of CD4 IFN-γ+/TNF+ T cells at t2 alongside total frequencies of CD4 and CD8 T cells even higher. In conclusion, after stimulation with specific SARS-CoV-2 antigens, multifunctional IL-2+/CD154+ or IFN-γ+/TNF+ CD4 T cells clearly reveal a relationship between seropositivity and T cell reactivity.

www.frontiersin.org

Figure 3 Results of applying logistic regression (LR) and gradient boosting (GB) to relate T cell and antibody responses. (A, B) Variable importance values for the top 5 most relevant predictors in the (A) GB and (B) LR models, trained on T cell percentages. Box plots were obtained by resampling the dataset 1,000 times with replacement. We attribute large variations in importance and coefficient values to the small sample size and correlations among features. (C) Changes in the normalized and standardized percentage of CD4 IL-2+/CD154+ T cells stimulated with CoV-Mix at t1 and t2. Participants with negative and positive compound antibody responses can be differentiated quite well based on this measurement alone. (D, E) Test-set bootstrapped areas under (D) receiver operating characteristic (AUROC) and (E) precision-recall (AUPRC) curves of GB models predicting various antibody responses based on different treatments. For reference, we plotted the expected performance of a random guess in red.

Given the findings above, we were interested in defining which viral antigens triggered the strongest CD4 T cell response. In addition, we explored if the time-point of sampling led to varying outcomes indicating decay of T cell responsiveness over time. Furthermore we addressed the question if a repeated measurement of T cell stimulation added benefit to the T cell stimulation read-out. Figure 3C shows changes in the normalized and standardized percentage of CD4 IL-2+/CD154+ T cells responding to CoV-Mix (peptide pool mix against Prot_N, Prot_S1, Prot_S) across t1 and t2. The unstandardized percentages are reported in Supplementary Figure 6. Participants without previous SARS-CoV-2 infection tended to have a considerably lower percentage of CoV-Mix-responding CD4 T cells than infected participants. Neither the change across two time points nor the slope of change were associated with the antibody response. In conclusion, CD4 T cells that are dual-positive for IL-2+/CD154+ and stimulated to CoV-Mix alone can be used to discriminate between healthy and infected individuals. The mix of peptides from all three antigens triggered the strongest CD4 T cell response, followed by S1 and N, which were second most strongly associated with the humoral response. As there was no change in T cell responsiveness across the two time points, only one sampling would have been sufficient to discriminate an infected from a non-infected individual.

T cell sublineages

We next assessed which T cell subtype featured the best concordance with T cell response and antibody titer. We trained GB models only on the subsets of % corresponding to CD3, CD4, and CD8 T cell data. Table 4 reports the test-set performance of these models in terms of AUROC and AUPRC. We found that stimulation of CD4 T cells alone allowed discriminating between healthy and infected individuals, while CD8 T cells did not. Interestingly, we drew a similar conclusion from the sparse principal component analysis on the T cell data (see the Supplementary Material and Supplementary Figure 5), showing that the first principal component strongly correlated with the antibody response and mainly comprised CD4 T cell measurements.

www.frontiersin.org

Table 4 Test-set bootstrapped areas under receiver operating characteristic (AUROC) and precision-recall (AUPRC) curves of gradient boosting (GB) models predicting the compound antibody response based on T cell percentages.

Specific antigens and antibody types

To investigate which antigens were associated with the strongest correlation of humoral and T cell response, we performed a more detailed analysis evaluating how predictive the separate treatment with peptide pools covering the SARS-CoV-2 S1, S2, M and N were for different antibody responses. Figures 3D, E show the corresponding test-set AUROCs and AUPRCs achieved by GB models trained on the T cell frequency.

Considering the six different antibody responses to RBD, S1, S2 and nAb (Table 2), we found that GB models that were based on cell responses stimulated with S- and M-peptide pools, tended to have lower average AUROCs and AUPRCs at predicting all of the response types and larger performance variability across bootstrap resamples. These results suggest that the simultaneous presence of antibodies against RBD, S1, S2, and N most strongly correlated with the T cell response. The antibody response against RBD alone resulted in a comparable association. Neither N nor nAb alone gave conclusive evidence. Yet, here the number of positive cases was low, displaying a high variability (in the case of N due to the fast antibody decline between t1 and t2).

Symptoms and T cell reactivity

Finally, we assessed if the occurrence of symptoms during SARS-CoV-2 infections correlated with the magnitude of the T cell response, e.g. if individuals with different compound symptoms scores had comparable or different SARS-CoV-2-specific T cell responses. We examined T cell frequencies and self-reported compound symptoms scores across t1 and t2. Figure 4 shows normalized T cell frequencies against the compound symptoms score for CD4 IL-2+/CD154+ and CD4 IFN+/TNF-α+, which revealed to be most important for the prediction of compound antibody responses (see Figures 3A, B). Participants with a compound symptoms score of 2 had a higher average T cell frequency and higher variability than those with a compound symptoms score of 0, particularly for CD4 IL-2+/CD154+ T cells. We trained and validated a GB model to predict the compound symptoms score based on T cell percentages, however, the model’s test-set AUROC (0.52; 95% CI: [0.27, 0.78]), computed by averaging over all possible one-vs-one pairwise class combinations and balanced accuracy (0.36; 95% CI: [0.17, 0.66]), were not significantly different from the expected performance of a random guess. To conclude, the frequency of dual positive IL-2+CD154+ CD4 T cells tended to be higher in COVID-19-positive individuals with mild symptoms, and was highest in individuals with fever. However, our moderately-sized dataset falls short of providing evidence for a significant association between T cell response and symptoms score.

www.frontiersin.org

Figure 4 Boxplots of normalized and standardized T cell percentages against the compound symptoms score across two time points (by taking the maximum score) for (A, B) CD4 IL-2+/CD154+ and (C, D) CD4 IFN-γ+/TNF+ T cells, which were previously found to be important predictors of the compound antibody response. Herein, symptoms scores were reported as 0 for subjects without symptoms, 1 for subjects with any one or several symptoms but no fever, and 2 for subjects with fever alone or with any other symptoms. A compound symptoms score was assessed across t1 and t2 by taking the maximum of the two scores for each participant.

Discussion

Supervised machine learning approaches have been gaining increased attention in many application domains, including immunology (22) and the analysis of COVID-19 data (23). They can be easily applied to large high-dimensional datasets and could help discover predictive patterns and associations among measured covariates and response variables. This approach is especially helpful in studying the complex interactions of antibodies and T cell subsets during an immune response. In addition, it can inform regarding differences in the kinetics between the different arms of the immune response. Uncovering such interactions could facilitate the optimization of vaccines and the identification of potentially critical COVID-19 cases, which require special and time-sensitive medical care. The current study exemplified the use and benefit of ML techniques to describe complex, nonlinear, and nonadditive relationships between humoral and T cell responses.

The major results of our correlation analysis on the antibody levels are in line with previous findings on COVID-19 immune responses, as we found that antibodies targeting the S1-subunit of the SARS-CoV-2 spike protein, which exhibits a high mutation rate and mediates the binding to the receptor on the surface of target cells, make up the largest fraction of nAbs (24). Antibodies targeting the S2-subunit, which has a relatively low tolerance for sequence variation and mediates viral cell membrane fusion, only contributes a comparably small fraction of nAb (25).

Correlation analysis of the measured soluble and cellular responses revealed a yet underestimated role of T lymphocytes, especially of CD4 T helper cells to predict antibody titers. It confirmed previous reports suggesting SARS-CoV-2 specific antibodies decline faster than SARS-CoV-2 reactive T cells, which showed a much longer persistence (2628). However, our analysis revealed a strong correlation between antibody and CD4 T cell responses. Thus, solely based on the strength of the CD4 T cell response, it was possible to identify individuals who mounted high SARS-CoV-2 antibody titers. The identification of such correlations might depend on the underlying antigen but could nevertheless prove valuable for the improvement or development of novel vaccines, to reach higher antibody titers and reduce vaccination failure rates. Moreover, our CoV-ETH study consisted of a young study population that lacked severe cases. SARS-CoV-2-reactive CD4 T cells predisposing to favorable disease courses have been described in young and unexposed individuals, but with declining numbers in risk groups (2931). Generally, CD4 T cells are critical for the activation and maturation of B cells into antibody-producing plasma cells. Since CD4 memory T cells are generated after infection and vaccination, they are considered beneficial to mount a faster antibody response upon reinfection. As we show almost no decline in antigen-specific T cells within the study period, future vaccines might employ techniques to induce a long-lasting CD4 memory T cell compartment. This could be instrumental in providing immune protection via fast stimulation of SARS-CoV-2 nAb production. However, such approaches would require determining the extent of sufficient protection from infection via a memory CD4 T cell compartment, when antibodies are no longer detectable and when antibody epitopes have undergone mutations.

Of note, the antigen-specific CD8 T cell responses did not correlate with CD4 T cell responses. This might be partly due to the different decay rates of the two T cell subsets, suggesting a faster turnover of antigen-specific CD8 T cells. However, CD4 and CD8 T cell responses have been previously found to correlate with the severity of the disease in a different fashion. While a CD8 T cell response is more prominent in mild courses, a CD4 T cell response is dominant in more severe disease courses, as previously reported (32, 33). We detected a higher percentage of antigen reactive CD4 T cells in individuals that recovered from COVID-19 with mild local symptoms as compared to asymptomatic disease courses, and highest in individuals with fever. Since the CoV-ETH study group did not include severely ill or ICU patients, the role for CD4 T cells in individuals with severe COVID-19 outcomes could not be established. Additional data are therefore required to address the functional role for CD8 T cells, especially since these might also be a target of future vaccines.

Given the presented correlation of CoV-Mix-, S1-, and N-induced T cell stimulation and antibodies, our analysis indicates the feasibility of developing a model that can predict structures of the SARS-CoV-2 proteome and that could be considered as future vaccine targets. Furthermore, our analysis may be useful beyond the scope of this specific research question, as it showcases a machine-learning-based analysis pipeline for immunological data and may interest domain experts seeking to enrich their data analysis toolset. To facilitate this, we made our data analysis readily replicable by publishing the deidentified data and the code (available at https://github.com/i6092467/t-cells-response-sars-cov-2).

Taken together, by applying machine learning we suggest T cells might play a substantial yet underestimated role in the virus-specific immune response. T cell responses might bear the potential of improving future vaccine development, as antibody responses alone are insufficient to provide long-lasting protection (27).

Limitations

This study does not provide information on acute viral diagnostic in individuals with a previous SARS-CoV-2 infection. No seronegative individuals were included in the study, as these were not screened for within the study design. From the statistical and machine learning perspective, the sample size of 134 subjects is small, particularly given the large number of conducted model comparisons and statistical correlation tests. Therefore, the reported findings are exploratory and need to be interpreted cautiously. A larger cohort would allow corroborating reported results and facilitate using potentially more powerful models, such as neural networks. It would be helpful to validate the resulting ML models and findings on the external data obtained under a similar experimental setup but from a more diverse set of individuals.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Materials. The data and code are available in a GitHub repository at https://github.com/i6092467/t-cells-response-sars-cov-2. Further inquiries can be directed to the corresponding authors.

Ethics statement

The studies involving human participants were reviewed and approved by the Cantonal Ethics Commission Zurich (BASEC-Nr. 2020-00949). The patients/participants provided their written informed consent to participate in this study.

Author contributions

RM, PS, and A-KH organized and performed the sampling, data assessment and analyses and wrote the first draft of the manuscript. CD and MSc interpreted the data and wrote the manuscript. PS, KC, SG, AG, LH, PH, HK, ML, JM, AT, XS, A-KH, NB, SB, LD, MS-d-J, MSa, GS, SZU, TU, and TW performed the blood sampling. PS, KC, SG, AG, ML, JM, and AT performed the cell assays. DS, FA-Q, and SY set up the data pipeline. MB and CS provided the samples of the positive control cohort. CZ and CK set up, organized and performed the clinical blood sampling procedure. JG and MS set up, organized and performed the clinical blood sampling and provided medical care. JG was responsible for the ethical approval. LP and FS performed data assessment, analysis and interpretation. SC performed the neutralizing antibody analysis. JV supervised the ML data analyses. MS and SU conceptualized the study, performed data analyses and interpretation and wrote the manuscript. All authors contributed to the article and approved the submitted version.

Funding

This study was partially funded by the ETH Foundation, Switzerland and the state of North Rhine-Westphalia, Germany. RM was supported by the SNSF grant #320038189096. Open access funding by ETH Zurich.

Acknowledgments

We greatly acknowledge the help from numerous individuals of ETH facility management to allow the conduction of this study despite the first lockdown of the pandemic. Many thanks to all study participants. Our special thanks also goes to the private sponsors of the study.

Conflict of interest

CD, MSc, BS, PZ, FH, AR, Y-JH, GL, and HB are employees of Miltenyi Biotec B.V. & Co. KG. LP is an employee of Humabs BioMed SA, a subsidiary of Vir Biotechnology.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fimmu.2023.1158905/full#supplementary-material

References

1. Quinti I, Lougaris V, Milito C, Cinetto F, Pecoraro A, Mezzaroma I, et al. A possible role for b cells in COVID-19? lesson from patients with agammaglobulinemia. J Allergy Clin Immunol (2020) 146:211–213.e4. doi: 10.1016/j.jaci.2020.04.013

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Ponsford MJ, Shillitoe BMJ, Humphreys IR, Gennery AR, Jolles S. COVID-19 and X-linked agammaglobulinemia (XLA) - insights from a monogenic antibody deficiency. Curr Opin Allergy Clin Immunol (2021) 21:525–34. doi: 10.1097/ACI.0000000000000792

PubMed Abstract | CrossRef Full Text | Google Scholar

3. Van Assen S, Holvast A, Benne CA, Posthumus MD, Van Leeuwen MA, Voskuyl AE, et al. Humoral responses after influenza vaccination are severely reduced in patients with rheumatoid arthritis treated with rituximab. Arthritis Rheumatism (2010) 62:75–81. doi: 10.1002/art.25033

PubMed Abstract | CrossRef Full Text | Google Scholar

4. Eisenberg RA, Jawad AF, Boyer J, Maurer K, McDonald K, Prak ETL, et al. Rituximab-treated patients have a poor response to influenza vaccination. J Clin Immunol (2013) 33:388–96. doi: 10.1007/s10875-012-9813-x

PubMed Abstract | CrossRef Full Text | Google Scholar

5. Rydyznski Moderbacher C, Ramirez SI, Dan JM, Grifoni A, Hastie KM, Weiskopf D, et al. Antigen-specific adaptive immunity to SARS-CoV-2 in acute COVID-19 and associations with age and disease severity. Cell (2020) 183:996–1012.e19. doi: 10.1016/j.cell.2020.09.038

PubMed Abstract | CrossRef Full Text | Google Scholar

6. Sekine T, Perez-Potti A, Rivera-Ballesteros O, Strålin K, Gorin JB, Olsson A, et al. Robust T cell immunity in convalescent individuals with asymptomatic or mild COVID-19. Cell (2020) 183:158–168.e14. doi: 10.1016/j.cell.2020.08.017

PubMed Abstract | CrossRef Full Text | Google Scholar

7. Dan JM, Mateus J, Kato Y, Hastie KM, Yu ED, Faliti CE, et al. Immunological memory to SARS-CoV-2 assessed for up to 8 months after infection. Science (2021) 371(6529):eabf4063. doi: 10.1126/science.abf4063

PubMed Abstract | CrossRef Full Text | Google Scholar

8. Bange EM, Han NA, Wileyto P, Kim JY, Gouma S, Robinson J, et al. CD8 T cells contribute to survival in patients with COVID-19 and hematologic cancer. Nat Med (2021) 27:1280–9. doi: 10.1038/s41591-021-01386-7

PubMed Abstract | CrossRef Full Text | Google Scholar

9. Schultheiß C, Paschold L, Simnica D, Mohme M, Willscher E, von Wenserski L, et al. Next-generation sequencing of T and b cell receptor repertoires from COVID-19 patients showed signatures associated with severity of disease. Immunity (2020) 53:442–455.e4. doi: 10.1016/j.immuni.2020.06.024

PubMed Abstract | CrossRef Full Text | Google Scholar

10. Piccoli L, Park YJ, Tortorici MA, Czudnochowski N, Walls AC, Beltramello M, et al. Mapping neutralizing and immunodominant sites on the SARS-CoV-2 spike receptor-binding domain by structure-guided high-resolution serology. Cell (2020) 183:1024–1042.e21. doi: 10.1016/j.cell.2020.09.037

PubMed Abstract | CrossRef Full Text | Google Scholar

11. Brodersen KH, Ong CS, Stephan KE, Buhmann JM. (2010). The balanced accuracy and its posterior distribution, in: 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey: IEEE. pp. 3121–4.

Google Scholar

12. van Rossum G. Python Tutorial. Amsterdam, the Netherlands: Centrum voor Wiskunde en Informatica (1995).

Google Scholar

14. Hosmer DW Jr., Lemeshow S, Sturdivant RX. Applied logistic regression. Hoboken, New Jersey, USA: John Wiley & Sons (2013). doi: 10.1002/9781118548387

CrossRef Full Text | Google Scholar

15. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in Python. J Mach Learn Res (2011) 12:2825–30.

Google Scholar

17. Chen T, Guestrin C. (2016). XGBoost: a scalable tree boosting system, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA: Association for Computing Machinery. pp. 785–94.

Google Scholar

18. Efron B, Tibshirani RJ. An introduction to the bootstrap. New York, New York, USA: CRC Press (1994).

Google Scholar

19. Vanwinckelen G, Blockeel H. (2012). On estimating model accuracy with repeated cross-validation. BeneLearn 2012, in: BeneLearn 21: Proceedings of the 21st Belgian-Dutch Conference on Machine Learning, Ghent, Belgium: Benelearn 2012 Organization Committee. pp. 39–44.

Google Scholar

20. Davis J, Goadrich M. (2006). The relationship between precision-recall and ROC curves, in: Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, Pennsylvania, USA: Association for Computing Machinery. pp. 233–40.

Google Scholar

21. Stringhini S, Wisniak A, Piumatti G, Azman AS, Lauer SA, Baysson H, et al. Seroprevalence of anti-SARS-CoV-2 IgG antibodies in Geneva, Switzerland (SEROCoV-POP): a population-based study. Lancet (2020) 396:313–9. doi: 10.1016/S0140-6736(20)31304-0

PubMed Abstract | CrossRef Full Text | Google Scholar

22. Fontanella Id S, Ment Frainay Id C, Murray CS, Simpson A, Custovic Id A. Machine learning to identify pairwise interactions between specific IgE antibodies and their association with asthma: a cross-sectional analysis within a population-based birth cohort. PloS Med (2018) 15:e1002691. doi: 10.1371/journal.pmed.1002691

PubMed Abstract | CrossRef Full Text | Google Scholar

23. Patterson BK, Guevara-Coto J, Yogendra R, Francisco EB, Long E, Pise A, et al. Immune-based prediction of COVID-19 severity and chronicity decoded using machine learning. Front Immunol (2021) 12:700782. doi: 10.3389/fimmu.2021.700782

PubMed Abstract | CrossRef Full Text | Google Scholar

24. Chia WN, Zhu F, Ong SWX, Young BE, Fong S-W, Bert NL, et al. Dynamics of SARS-CoV-2 neutralising antibody responses and duration of immunity: a longitudinal study. Lancet Micro.b (2021) 2:e240–9. doi: 10.1016/S2666-5247(21)00025-2

CrossRef Full Text | Google Scholar

25. Shah P, Canziani GA, Carter EP, Chaiken I. The case for S2: the potential benefits of the S2 subunit of the SARS-CoV-2 spike protein as an immunogen in fighting the COVID-19 pandemic. Front Immunol (2021) 12:637651. doi: 10.3389/fimmu.2021.637651

PubMed Abstract | CrossRef Full Text | Google Scholar

26. Reynolds CJ, Swa

留言 (0)

沒有登入
gif