RASGRF2 as a potential pathogenic gene mediating the progression of alcoholic hepatitis to alcohol-related cirrhosis and hepatocellular carcinoma

2.1 Bulk transcriptome data preprocessing

To construct the discovery set for bulk transcriptome data analysis, we included multiple AH and HCC datasets were included. For AH, we included two RNAseq datasets, GSE142530 [11] and GSE167308 [12], were utilized. For GSE142530, 10 samples from patients with AH and 12 from healthy controls were included six from patients with cirrhosis without AH were excluded from the analysis. For GSE167308, seven samples from patients with AH and five from samples healthy controls were included, seven samples from patients with AC were excluded. Ultimately, only liver samples from 17 patients with AH and 17 healthy control were analyzed.

For HCC, two RNA-Seq datasets were selected: GSE184733 [13] and GSE148355 [14]. For GSE184733, samples from 17 patients with HCC and 17 healthy controls were included. For GSE148355, after excluding duplicate samples and samples from patients with other conditions (fibrosis, cirrhosis, or dysplastic nodule), samples from 60 patients with HCC and 15 healthy controls were included. In total, samples from 77 patients with HCC and 32 healthy controls were included.

The high-throughput sequencing data utilized in this study were generated using Illumina platform. Original read counts were downloaded from the Gene Expression Omnibus (GEO) database for analysis. Data were analyzed using R (4.4.1). Batch correction was performed in samples from different datasets for the same disease using the ComBat-seq function in the sva package [15]. The trimmed mean of M values (TMM) algorithm from the edgeR package was used to further standardize the data and determine the expression of differentially expressed genes (DEGs) [16]. Multiple corrections of P values were performed using the Benjamini–Hochberg method. Gene co-expression networks were constructed using the WGCNA package [17]. The optimal soft threshold was selected automatically by the software during the calculation process. The results were visualized using the ggplot2 package [18].

After screening potential shared genes in machine learning models, combined with immune infiltration correlation results, they were validated in an independent bulk transcriptome cohort to determine core shared genes. For AH, GSE143318 [19] was used as an independent validation cohort, including samples from 12 patients with AH, 6 patients with AC, and 7 healthy controls. GSE112790 [20] was used as the validation cohort for HCC, comprising samples from 183 patients with HCC and 15 healthy controls.

2.2 Functional analysis of potential shared genes

The STRING database[21] was utilized to construct a relationship network diagram for the selected potential shared genes. Additionally, the KEGG and GO databases were used on the OmicShare platform [22] were used to annotate the functions of the genes and draw functional bubble charts.

2.3 Machine learning to screen potentially shared genes

The mlr3 [23] package was used to develop machine learning models, employing two algorithms: support vector machine (SVM) and random forest (RF). Feature selection was conducted using recursive feature elimination (RFE). A line chart depicting the relationship between receiver operating characteristic (ROC) values and feature numbers was generated to identify the differential gene set with the highest area under the curve (AUC) on the ROC curve. The intersection of the differential gene sets identified by both algorithms was determined. The gene expression levels in each disease were subsequently compared. Gene sets that were simultaneously upregulated or downregulated in both diseases were retained for further verification using the bulk transcriptome data of the validation set.

2.4 Immune infiltration analysis

The immune infiltration in the discovery cohorts of AH and HCC was evaluated through a CIBERSORT analysis[24] using the IOBR2 package[25]. Spearman’s correlation analysis was applied to determine the correlation between the potentially shared genes and immune infiltration indicators.

2.5 Single-cell sequencing analysis

For the single-cell analysis of the relationship between AH, AC, and potentially shared genes, we included data from GSE255772 [26] (including liver tissues from 5 patients with AH and 1 patient with AC) and GSE136103 [27] (including liver tissues from 2 patients with AC, 2 patients with NAFLD cirrhosis, 1 patient with PBC cirrhosis, and 5 healthy controls). In this study, patients with NAFLD cirrhosis and PBC cirrhosis were included in the non-alcoholic liver fibrosis (NALC) group for analysis and compared with the AC group. For single-cell analysis of HCC, the GSE242889 [28] dataset (including liver tissues from 5 healthy controls and 5 patients with HCC) was included.

The analysis was conducted using Seurat (V5.1.0) [29]. The canonical correlation analysis (CCA) integration algorithm was used to remove batch effects between batches of samples. CellMarker (V2.0) [30] and ACT tool [31] were used to annotate the cell population, while the FeaturePlot() function was used to visualize the expression of potentially shared genes in different cells. CellChat (2.1.0) [32] was employed for cell communication analysis.

2.6 Spatial transcriptomics analysis

Original spatial transcriptome data were obtained from the spatial transcriptomics data published by Wu et al. [33] and downloaded from the Genome Sequence Archive (HRA000437). Seurat (V5.1.0) was used for data standardization, dimensionality reduction, cell clustering, and visualization of the expression levels of core shared genes in tumor and non-tumor tissues.

2.7 Immunohistochemical experiment

The liver cancer tissue microarray (ZL-LivHcc962) was purchased from Shanghai Zhuoli Biotech Co., Ltd. (Zhuoli Biotech Co., Ltd., Shanghai, China). Conventional immunohistochemical staining was used to detect the expression of RASGRF2 (26,788-1-AP, Proteintech Group, IL, USA) in liver cancer and para-cancerous samples in the tissue microarray. The ethical approval number of the tissue chip is LLS M-15-01.The deep learning (artificial intelligence [AI]) analysis module in the Visiopharm software (Denmark) was used to calculate the histochemical score for each sample, enabling a semi-quantitative analysis of the immunohistochemical results.

2.8 Construction of RASGRF2 overexpression cell line

HepG2 (Xiamen Immocell Biotechnology Co.,Ltd.) cells were cultured in a cell incubator at 37 °C, 95% air, and 5% CO2 with DMEM + 10% FBS + 1% P/S. The cells were divided into the OE-NC group and the OE-RASGRF2 group. The cells were divided into two groups: the OE-NC group (control cells transfected with a control plasmid) and the OE-RASGRF2 group (RASGRF2 overexpression cells transfected with a RASGRF2 overexpression vector). The pc-RASGRF2 lentiviral overexpression vector (synthesized by Shanghai Sangon Biotech Co., Ltd) was constructed using the pcDNA3.1 vector. The RASGRF2 sequence (NM_006909.3:377-4090) was obtained from the National Center for Biotechnology Information (NCBI), and RASGRF2 was inserted into the pcDNA3.1 vector to construct the pcDNA3.1-RASGRF2 plasmid. After vector construction, the vector was transfected into the cells using Lipofectamine 3000.

The transcription of RASGRF2 in the OE-NC group and OE-RASGRF2 groups was verified through RT-qPCR experiments. The primers were synthesized by Shanghai Sangon Biotech Co., Ltd. The primer sequences were as follows:

Primers

Sequence (5′ → 3′)

GAPDH

RASGRF2

Forward

Reverse

Forward

Reverse

AATGGGCAGCCGTTAGGAAA

GCGCCCAATACGACCAAATC

CCTGTACCTGGCCTTTCTGG

CCTCGGCCGTCTTCTTACTC

2.9 Detection of cell migration, invasion, and proliferation ability

During the cell proliferation experiment, cells in the logarithmic growth phase were seeded at 5,000 cells/well in a 96-well plate and incubated at 37 °C with 5% CO2 for 6 h. After the cells had attached to the wall, they were treated according to the their assigned experimental groups, with six replicates per group. The cells in each group were incubated for 24 h. Ten microliters of Cell Counting Kit-8 (CCK8) solution was added to each well 2 h before completion of the incubation. After incubation, the optical density at 450 nm was measured using a microplate reader.

Cell scratch test: a cross was drawn on the back of a 6-well plate, with parallel lines spaced 0.5 cm apart. Cells from each experimental group were collected at the logarithmic phase, and the cell suspension concentration was adjusted after counting. Then, 1 × 106 cells were seeded into each well of a 6-well plate. When the cells reached more than 90% confluence, the tip of a pipette was used to draw a vertical line on the back of the plate. After washing twice with PBS, images were captured under a microscope to record the size of the cell scratch at 0 h. After 24 h of incubation, the cells were then removed and photographed under a microscope to record the size of the scratch after 24 h of incubation.

For the Transwell invasion experiment, matrix gel diluent (diluted with serum-free medium at a Matrigel to serum-free medium ratio of 1:4) was evenly spread into the Transwell chamber (8-μm pore size) and incubated at 37 °C incubator for 6 h to allow solidification. After digesting the cells in the logarithmic growth phase, the cells were resuspended in a serum-free medium, and the cell density was adjusted to 2 × 105 cells/mL. The cell suspension was then added to the upper chamber. A medium containing 10% fetal bovine serum was added to the lower chamberand continuously incubated at 37 °C for 24 h. After incubation, the upper chamber was removed and examined under a microscope. Non-invading cells in the upper chamber were wiped off with a cotton swab, and 4% paraformaldehyde was added to the 24-well plate. The upper chamber was maintained in a fixed position for 20 min, the upper chamber was removed, the lower chamber fixative was aspirated, and the cells were stained with 0.2% crystal violet (Sigma-Aldrich, USA) for 10 min. After the experiment, the excess staining solution was washed with PBS, the upper surface of the chamber was gently wiped with a cotton swab to remove any excess water. Then, the chamber was air-dried. The number of migrated or invaded cells was observed under a microscope and photographed, and the cell invasion rate was calculated.

2.10 Statistical analysis

The Shapiro–Wilk test was used to assess the normality of the data, while Bartlett's test was used to evaluate the homogeneity of variance. Data that showed a normal distribution and had homogeneous variance were compared between the groups using Dunnett’s test (for more than two groups) or Student’s t-test (for two groups). For normally distributed data that with unequal variances, Dunnett’s T3 test was used for inter-group comparisons. Non-normally distributed data were analyzed using the Kruskal-Walli’s test.

留言 (0)

沒有登入
gif