CAPN2 correlates with insulin resistance states in PCOS as evidenced by multi-dataset analysis

Data resource

For our investigation, multiple datasets from the past five years were carefully selected to investigate the molecular mechanisms of early localized insulin resistance status specifically in granulosa cells of PCOS patients. Datasets representing a wide range of body types within the PCOS population were included, and those with diagnosed insulin resistance were excluded. The goal was to identify characteristics of early-stage insulin resistance in PCOS. To ensure the accuracy of the study, datasets where participants had minimal differences in BMI were also excluded.

Ultimately, we chose four datasets from the Gene Expression Omnibus database (https://www.ncbi.nlm.nih.gov/geo/), each consisting of granulosa cell tissue samples from both PCOS patients and control subjects. It’s noteworthy that the diagnosis of PCOS in these datasets adhered strictly to the Rotterdam criteria [14]. Furthermore, control groups were meticulously matched with PCOS cases based on age to ensure the comparability of the study groups.

The primary dataset utilized was GSE80432 [15], analyzed with the Affymetrix Human Gene 1.0 ST Array (Platform GPL6244), comprising 16 samples split evenly between normal individuals and those diagnosed with PCOS. This dataset includes a wide array of body types, reflecting varying degrees of metabolic processes, which provides a comprehensive overview of the PCOS spectrum. Its diverse representation of PCOS variability makes it an optimal training set.

As a comprehensive validation dataset, GSE155489 [16] was employed, processed using the HiSeq X Ten platform (Platform GPL20795), which includes a total of 8 samples, evenly divided between normal and PCOS groups. Despite its smaller size, this dataset played a crucial role in further validating the diagnostic markers identified in the primary analysis.

To deepen our understanding of the regulatory mechanisms in PCOS, we constructed a ceRNA network, incorporating the GSE138518 lncRNA dataset [17] and the GSE138572 miRNA dataset [17], both analyzed on the Illumina HiSeq 2000 platform (Platform GPL11154). The former features 6 samples, equally divided between normal and PCOS subjects, while the latter comprises 10 samples, also evenly split between the two groups.

WGCNA

The ssGSEA algorithm calculates IR scores for each sample by first ranking all genes according to their expression levels. It then assesses the relative position of IR-related genes within this ranked list to compute an enrichment score. This score quantifies the degree to which IR-related genes are overrepresented at the top of the ranked gene list, providing a numerical IR score for each sample. The approach allows for the direct quantification of IR pathway activity in individual samples based on gene expression data.

Furthermore, the gene set used for calculating IR scores was derived from 80 IR-related genes sourced from the MsigDB, specifically selected based on the ‘Insulin Resistance’ keyword within the HP_INSULIN_RESISTANCE pathway. MsigDB served as a crucial resource for our gene set enrichment analysis, facilitating an in-depth exploration of insulin resistance’s molecular basis in PCOS.

After quantifying IR scores, we proceeded to preprocess the normalized gene expression data. This preprocessing included the removal of genes that exhibited minimal variability across samples, defined by a MAD threshold of 0.1 or lower. Subsequently, hierarchical clustering was employed to identify outlier samples, resulting in the exclusion of sample GSM2127212 from our analysis. This methodical approach ensured the inclusion of only those genes displaying significant variability and samples that are representative of typical expression patterns.

With a clean dataset, we then proceeded to employ WGCNA to identify gene modules closely associated with IR. A soft threshold was determined to optimize the network topology, which is essential for constructing a meaningful gene co-expression network. We set the minimum size for each gene module at 70, ensuring a robust analysis.

The correlation between these gene modules and the quantified IR trait was subsequently calculated. This step enabled us to pinpoint key modules that exhibit a significant association with insulin resistance, underlining the modules’ potential role in the pathophysiology of PCOS related to IR. The integration of IR scores derived from ssGSEA with WGCNA highlighted the importance of a quantitative approach to understanding the genetic underpinnings of insulin resistance in PCOS patients.

Differential expression genes analysis

In the analysis of the GSE80432 and GSE155489 datasets from the GEO, appropriate differential analysis methods were selected based on the characteristics of the downloaded data. For GSE80432, the limma package was utilized to assess differential expression between PCOS and normal samples within the mRNA expression matrix [18]. Limma, known for its robustness in small sample sizes and complex experimental designs, fits linear models and uses empirical Bayes methods for more precise variance estimates. In contrast, for GSE155489, the DESeq2 package [19] was employed, a method well-suited for analyzing count data from RNA sequencing experiments. This approach also involves fitting models to data but is specifically designed to handle the discrete count nature of sequencing data. For both datasets, the resulting P-values were used to identify significant differences between PCOS and normal samples, with a threshold of P < 0.05. Hub genes identified from WGCNA were then intersected with the DEGs from these datasets to identify a set of candidate genes. Enrichment analyses for these genes, including GO and KEGG, were conducted using the R software package ClusterProfiler [20], adhering to the same significance threshold.

Machine learning refinement of identified DEGs

In the analysis of the GSE80432 dataset, feature dimensionality was reduced using the LASSO [21] logistic regression via R’s ‘glmnet’ package, focusing on selecting genes based on expression and grouping information for effective sample classification. Subsequently, key genes were ranked using the SVM [22] algorithm with RFE [23] through the ‘e1071’ package, assessing each gene’s importance and ranking based on error rate and accuracy. The Boruta method was then applied to further refine feature selection [24]. This algorithm employs Random Forest classification to iteratively compare actual features against randomly generated shadow features, effectively identifying the most significant ones. The final set of characteristic genes was determined by intersecting features identified by LASSO, SVM, and Boruta using the jVenn tool [25].

Construction of ceRNA Network and Drug Target Prediction in IR-Related PCOS

To further investigate the role of key genes in insulin resistance-related PCOS, the study focused on elucidating the ceRNA regulatory network and identifying potential therapeutic targets. Differential expression analysis of lncRNAs and miRNAs was conducted on GSE138518 and GSE138572 datasets using the DESeq2 package. The analysis concentrated on mRNA-miRNA pairs with opposite regulation patterns, and lncRNA predictions were exclusively performed using the ENCORI database to select miRNA-lncRNA pairs demonstrating inverse regulation [26]. This led to the construction of a PCOS-specific ceRNA network based on the interactions of mRNA, miRNA, and lncRNA. The final phase involved identifying potential drug targets by querying each key gene against the CTD database (https://ctdbase.org/) [27], and the relationships between these drugs and key genes were visualized using Cytoscape software, forming a comprehensive drug-target network. This approach aims to advance the development of targeted therapies for IR-related PCOS.

Specimen collection and qPCR procedures

Following approval from the Ethics Committee of the First People’s Hospital of Yunnan Province, granulosa cell tissues were obtained from patients diagnosed with PCOS. These patients were diagnosed based on the Rotterdam criteria, encompassing oligo- or anovulation, clinical and/or biochemical signs of hyperandrogenism, and the presence of polycystic ovaries. Patients with other endocrine disorders or gynecological conditions mimicking PCOS were excluded from the study. Informed consent was secured from all participants before tissue collection. The granulosa cells were harvested during routine oocyte retrieval procedures, typically part of IVF treatments, and were either immediately processed for RNA extraction or stored at -80 °C for subsequent analysis.

For qPCR analysis, total RNA was extracted from granulosa cell tissues using the TRIzol method (Thermo Fisher Scientific, Waltham, MA, USA). The integrity and concentration of the RNA were determined using the NanoDrop ND-1000 Spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). For mRNA and lncRNA, the RNA was reverse-transcribed into cDNA using the Quantscript RT Kit (KR103, TIANGEN, Beijing, China). For miRNA analysis, cDNA synthesis was performed using the miRcute Plus miRNA First-Strand cDNA Kit (KR211, TIANGEN, Beijing, China).

qPCR was conducted on a Bio-Rad thermal cycler (CFX96 Touch, Hercules, CA, USA). The FastReal qPCR PreMix (SYBR Green, FP217, TIANGEN, Beijing, China) was used for mRNA/lncRNA analysis, and the miRcute Plus miRNA qPCR Kit (SYBR Green, FP411, TIANGEN, Beijing, China) for miRNA analysis. The qPCR conditions included an initial denaturation at 95 °C for 3 min, followed by 40 cycles of 95 °C for 15 s and 60 °C for 30 s. Expression levels of target genes and miRNAs were normalized to housekeeping gene GAPDH and internal control hsa-U6 (hsa-U6 qPCR Primer, CD201-0145, TIANGEN, Beijing, China), respectively, calculated using the 2^-ΔCt method. Primers for hsa-miRNA-433-3p (hsa-miR-433-3p qPCR Primer, CD201-0478, TIANGEN, Beijing, China) were used, and details of other primers are provided in Supplementary Table 1. All reactions were performed in triplicate to ensure accuracy and reproducibility.

Analyzing the IR-Related Differential molecular markers in PCOS

Validation of bioinformatics findings on IR-related mRNA, lncRNA, and miRNA in PCOS was conducted using Prism 9 (GraphPad Software, San Diego, CA, USA). In this part of the analysis, qPCR data from PCOS patient samples were examined using non-paired t-tests, with a two-tailed P-value of < 0.05 indicating statistical significance. Additionally, the capacity of these markers to differentiate PCOS was assessed via ROC curve analysis, particularly through the calculation of the AUC. This process aimed to align empirical data with bioinformatics predictions, thereby confirming the roles of these markers in the context of PCOS.

留言 (0)

沒有登入
gif