Blood transcriptomics to facilitate diagnosis and stratification in pediatric rheumatic diseases – a proof of concept study

Transcriptome profiles of rheumatic diseases, viral infection, and convalescent controls

We compared the transcriptome profiles of six rheumatic disease groups (i.e., JIA, AID, CRMO, HLA-B51, IFN, and vasculitis) with viral infection and convalescent controls. Clustering analyses using t-SNE and hierarchical algorithms were displayed in Fig. 1 and Figure S1, respectively. As shown in the t-SNE plot in Fig. 1, most controls were gathered in cluster 1 while infection cases were grouped into a separate cluster 2, which implies that the gene expression of actively infected cases and remission cases (i.e., controls) is substantially independent despite coming from the same participants. However, patients with different rheumatic diseases were not well distinguished and assigned mostly to cluster 3, while cluster 4 contained a mixture of different categories.

Fig. 1figure 1

t-SNE plot of 4 different clusters

Classifier development

The Random Forest algorithm was used for classifier development because it uses the ensemble learning technique that is robust to outliers, stable with new data, and can handle non-linear correlations. The first classifier was developed to distinguish between control, infection, and pediatric rheumatic cases based on normalized transcriptome data. Leave-one-out cross-validation results in Fig. 2a confirm that the classifier could differentiate pediatric rheumatic patients from negative controls (AUC = 0.8 ± 0.1) and from viral infection cases (AUC = 0.7 ± 0.1). The Boruta algorithm selected 349 genes out of 31,319 initial genes (Table S2) for the training of this classifier between control, infection, and pediatric rheumatic cases. Some of the notable selected genes were CD3G, CD96, and CD200R1 (CD200 receptor 1). The gene CD3G encodes the CD3γ polypeptide, which forms a part of the CD3-TCR (T-cell receptor) complex. This complex plays an important role in antigen recognition and several intracellular signal-transduction pathways. This finding indicates that some of the rheumatic diseases are specifically connected to the alteration and malfunction of γ T-cells. Previous studies have also reported the association of γ and δ T-cells with (immunodeficiency and) autoimmune diseases [12]. CD96 is expressed on T-cells and natural killer cells. It belongs to a family of molecules that provide costimulatory and coinhibitory signals during T-cell activation. It was shown to inhibit the expansion and IL-9 production of Th17 cells and thus, reduce inflammation and pathogenicity [13]. CD200R1 is also expressed on T-cells, as well as myeloid cells. It was reported to alter the balance between Th17 cells and regulatory T-cells in SLE patients [14] and has also been confirmed as one of the genetic factors susceptible to JIA, especially oligoarticular JIA [15]. Aberrant expression of CD200R1 was shown to contribute to abnormal Th17 cell differentiation and chemotaxis in patients with rheumatoid arthritis [15].

Fig. 2figure 2

ROC curves and AUC values from leave-one-out cross-validation of classifier between (a) negative controls (i.e., control), viral infected subjects (i.e., infection) and subjects with rheumatic diseases (i.e., Pedrheum); and more specifically between (b) CRMO, IFN, JIA and control/infection cases

More specific classifiers were then developed per disease group. As the number of rheumatic patients in our dataset was limited, these classifiers focused only on CRMO, IFN, and JIA groups, which had more subjects for model training and validation than the other disease groups. Three classifiers were developed to distinguish patients with CRMO (n = 6), IFN (n = 6), and JIA (n = 20) from control (n = 35) and infection (n = 46) cases. They worked quite well as their AUC values are above or equal to 0.8 (Fig. 2b). Since CRMO, IFN, and JIA were differentiated well from control and infection cases, it was subsequently important to examine how they could be distinguished from one another. ROC curves and AUC values of a classifier between CRMO, IFN, and JIA (Figure S2) indicated that IFN could be distinguished relatively well from CRMO and JIA (AUC = 0.7 ± 0.2/0.3), however CRMO is not easily differentiated from JIA (AUC = 0.5 ± 0.3), likely explained by the limited sample size. The Boruta-identified genes for these classifiers are also presented in Table S2. There were 349 selected genes from the CRMO-control-infection classifier, 247 genes from the IFN-control-infection classifier, and 286 genes from the JIA-control-infection classifier. As expected, more interferon-related genes were selected for the IFN classifier compared to those of CRMO and JIA.

Differential expression and gene ontology enrichment analyses

We analyzed the differentially expressed genes (DEGs) of CRMO, IFN, and JIA versus controls. The resulting DEGs were translated to corresponding Gene Ontology (GO) categories to understand which pathways were involved in the disease pathophysiology. Many of the top 10 GO categories of CRMO, IFN, and JIA groups are related to innate immunity including myeloid leukocyte and granulocyte activation, neutrophil activation and degranulation (Fig. 3a and Table S3). In IFN particularly, the immunity is largely mediated by antibacterial and antifungal defense responses. Results from GO analyses of CRMO, IFN, and JIA against the other Pedrheum groups are displayed in Figure S3 and Table S3. Although the classifiers could not adequately differentiate between CRMO and JIA, we noted that 1,106 DEGs could be found between CRMO and all other Pedrheum groups, 1,730 DEGs in the case of IFN, and 1,216 DEGs for JIA (Table S5). Additionally, more than 170 DEGs were found between CRMO and IFN, CRMO and JIA, as well as between IFN and JIA (Table S5).

ISG scores

Using the whole blood gene expression obtained from 3’ mRNA sequencing, we calculated the ISG scores of IFN patients and compared them with those from other disease groups. As displayed in Fig. 3b, IFN patients had the highest mean score of 18. Other disease groups, although displaying lower mean scores than IFN (6.0 for AID, 4.0 for CRMO, 9.0 for HLA-B51, 3.9 for JIA, 7.8 for vasculitis, and 14 for infection cases), did include some patients with particularly high scores: one AID patient had a score of 58, one HLA-B51 patient had score 48, and one vasculitis patient had score 44. Interestingly, longitudinal tracking of ISG scores was proven feasible using 3’ mRNA sequencing. Indeed, we showed that one patient with Aicardi-Goutières syndrome had significantly high ISG scores at early presentations that decreased following initiation of JAK-inhibition via tofacitinib (see Figure S4).

留言 (0)

沒有登入
gif