Obesity is a prevalent and escalating health issue, closely associated with severe conditions such as cardiovascular diseases, type 2 diabetes, cancer, and neuropsychiatric disorders (Lopez-Jimenez et al., 2022; Khan and Hegde, 2020; Martins, 2013). Globally, obesity rates have more than tripled since 1975, with recent data indicating that over 650 million adults are obese, contributing significantly to the global burden of non-communicable diseases and healthcare costs (Mohajan and Mohajan, 2023). This condition not only contributes significantly to the global burden of non-communicable diseases but also imposes a heavy psychological and physical toll on individuals, often leading to depression, anxiety, reduced mobility, and diminished quality of life. Additionally, obesity is associated with increased healthcare costs and reduced productivity, thereby impacting both personal wellbeing and socioeconomic stability. The urgent need for effective pharmacological interventions is undeniable. Current treatments, including phentermine (Smith et al., 2013), fluoxetine (Wise, 1992), orlistat (Ballinger and Peikin, 2002), sibutramine (Padwal and Majumdar, 2007), and rimonabant (Curioni and André, 2006), each with distinct mechanisms of action. Phentermine and sibutramine primarily work by suppressing appetite through central nervous system stimulation, while orlistat inhibits lipid absorption in the gastrointestinal tract. Fluoxetine and rimonabant modulate neurotransmitter activity to influence appetite and satiety. Despite their efficacy, these pharmacological interventions are often associated with adverse effects, including nausea, dizziness, insomnia, and gastrointestinal discomfort, which limit their long-term use for obesity management (Velazquez and Apovian, 2018; Patel, 2015). These challenges, combined with the difficulty of maintaining a healthy lifestyle and the invasiveness of surgery, have sparked interest in natural therapies (Acosta et al., 2014). Traditional Chinese medicine, known for its milder side effects, has emerged as a promising alternative (Qi et al., 2015).
Plant-derived compounds have attracted considerable attention for their potential role in obesity management due to their multifaceted mechanisms of action, lower toxicity, and diverse bioactive components. These compounds, which include polyphenols, flavonoids, alkaloids, and terpenes, are known to interact with key metabolic pathways, influence lipid metabolism, and exhibit antioxidant and anti-inflammatory properties that can address various obesity-related health issues. For instance, polyphenols such as curcumin, resveratrol, and proanthocyanidins have demonstrated the ability to modulate adipocyte differentiation, reduce lipid accumulation, and improve insulin sensitivity, making them valuable in the fight against obesity (Sergent et al., 2012; Boccellino and D’Angelo, 2020). Additionally, Garcinia cambogia extract, which contains hydroxycitric acid, is widely used for weight management without toxic effects (Onakpoya et al., 2011; Semwal et al., 2015).
One particularly promising natural therapy is the use of Nelumbo nucifera leaves (Nelumbo nucifera) (Wang et al., 2021), which have been employed for their anti-obesity properties since the Ming Dynasty in China over 1,000 years ago (Fan et al., 2021; Zheng et al., 2010). Initially documented in “The Key to Diagnosis and Treatment,” Nelumbo nucifera leaves have recently gained popularity in China as tea and dietary supplements for weight loss and lipid reduction (Allison et al., 2001). Additionally, Diospyros (D.) Nelumbo nucifera, known for its sedative, anti-diabetic, antiseptic, and anti-tumor properties, has fruits and roots that exhibit anti-proliferative and cytotoxic effects on various cancer cell lines (Rauf et al., 2017). Nelumbo nucifera leaves have also been used to alleviate muscle and joint pain (Sridhar and Bhat, 2007). However, the anti-obesity potential of Nelumbo nucifera leaves remains underexplored (Liu et al., 2023). While synthetic anti-obesity drugs target specific pathways such as appetite suppression and lipid absorption, natural compounds like polyphenols and alkaloids provide a multi-target approach with generally milder side effects. This distinction highlights the complementary potential of plant-derived compounds, which can influence similar metabolic pathways in a holistic manner, offering potentially safer and more sustainable solutions for obesity management.
In recent years, machine learning has revolutionized various domains of science, and its integration into cheminformatics has significantly transformed drug discovery and development processes. Cheminformatics involves the application of computational techniques to solve chemical problems, particularly in drug design and toxicology. Machine learning, a subset of artificial intelligence, has been widely used in cheminformatics to predict molecular properties, bioactivities, and toxicity profiles, enabling the identification of potential drug candidates more efficiently. By learning from large datasets of chemical and biological information, machine learning models can predict the interactions between small molecules and biological targets, facilitating virtual screening, molecular docking, and drug repurposing efforts. These advancements are crucial in identifying novel therapeutic agents, including those derived from natural products, such as Traditional Chinese Medicine (TCM) (Nestler, 2002).
This research adopts a comprehensive approach utilizing various computational biology methods. Sixteen highly absorbable small molecules from Nelumbo nucifera leaves were screened in the Traditional Chinese Medicine Systems Pharmacology Database (TCMSP) (Ru et al., 2014). Clustering analysis identified three representative molecules, and network pharmacology analysis revealed PPARG as their common target gene. Subsequent molecular docking examined the inhibitory effects of these molecules on PPARG, and molecular dynamics simulations explored the underlying mechanisms. By integrating these advanced computational techniques, the study aims to elucidate the molecular basis of the anti-obesity effects of Nelumbo nucifera Leaf Bioactive Compounds, potentially paving the way for the development of new, effective, and safer obesity treatments. Our workflow was shown in Figure 1.
Figure 1. The work flow of our study.
2 Materials and methods2.1 Clustering analysis of small molecule dataUsing the Traditional Chinese Medicine Systems Pharmacology Database and Analysis Platform (TCMSP, https://tcmsp-e.com/), we screened all active components of Nelumbo nucifera leaves based on criteria of oral bioavailability (OB) ≥ 20% (Veber et al., 2002) and drug-likeness (DL) ≥ 0.18 (Ursu et al., 2011). To analyze the clustering of small molecule similarities, we utilized molecular fingerprints, dimensionality reduction, clustering, and 3D visualization techniques. Specifically, Morgan fingerprints were computed for each compound using RDKit (Landrum, 2013), with a radius of 2 and 2048 bits. SMILES strings were converted to molecular objects for fingerprint generation, excluding invalid ones, and duplicates were removed to ensure uniqueness in the dataset. We applied t-SNE (Van der Maaten and Hinton, 2008) with a perplexity of 30 to reduce the fingerprint data to three dimensions. Hierarchical clustering using Ward’s method was then performed on the t-SNE results (Van der Maaten and Hinton, 2008), setting the number of clusters to three. Each compound was assigned a cluster label, and within each cluster, we calculated a similarity matrix using the Tanimoto coefficient (Bajusz et al., 2015). The molecule with the highest average similarity within each cluster was identified as the representative. A 3D scatter plot was created using Plotly, coloring each compound by cluster assignment and distinctly marking representative molecules, enabling intuitive exploration of small molecule similarities and relationships (He et al., 2022).
2.2 Prediction and analysis of potential obesity targets and Nelumbo nucifera leaf component interactionsIn this study, potential target genes related to obesity were initially identified from four authoritative databases: DisGeNET (Piñero et al., 2016), GeneCards (Safran et al., 2010), PharmGKB (Hewett et al., 2002), and UniProt (UniProt Consortium, 2015). By comprehensively comparing these databases and removing duplicates, we screened a total of 2,063 potential target genes for further analysis. To predict possible interactions between the active components of Nelumbo nucifera leaves and these obesity target genes, we utilized databases and tools such as SEA, SuperPred (Gallo et al., 2022), and SwissTargetPrediction (Daina et al., 2019). These platforms enabled us to perform an in-depth prediction of intersection target genes for the active components found in Nelumbo nucifera leaves.
2.3 Construction and analysis of the Protein-Protein Interaction networkTo explore the potential interactions among key target proteins, we constructed a comprehensive Protein-Protein Interaction (PPI) (Athanasios et al., 2017) network using the STRING database (Wagner and Fischer, 1974), a prominent bioinformatics tool. This network was visualized with the latest version of Cytoscape software (Kohl et al., 2011). We conducted a thorough analysis of the network’s topological features using advanced analytical tools within Cytoscape, identifying ten hub genes that play a crucial role in the disease mechanism.
2.4 Comprehensive enrichment analysis of GO and KEGG pathwaysWe performed enrichment analyses of Gene Ontology (GO) (Harris et al., 2004) and Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa and Goto, 2000) pathways using R packages including clusterProfiler (Yu et al., 2012), org. Hs.eg.db (Carlson et al., 2019), and pathview. Gene lists were converted to Entrez IDs and analyzed for enrichment using enrichGO and enrichKEGG with p-value and q-value cutoffs of 0.01 and 0.05, respectively. Results were visualized with dot plots and bar plots, highlighting significant terms and pathways. Additionally, pathview was used to map genes onto KEGG pathways for detailed visualization (Wang et al., 2024).
2.5 Batch molecular docking of active componentsA receptor protein model based on PPARG (PDB ID: 8WFE) was constructed, encompassing 268 residues. The crystal structure was the latest structure available and was determined using the gold standard X-ray diffraction method, with a relatively high resolution of 2.20 Å. Additionally, it is a human-derived protein, and it is free from the conformational influences of other ligands or bound proteins, thus more closely representing the natural conformation. Therefore, we selected this structure. Homology modeling was performed with MODELER 10.1 to include missing residues and domains. Small molecules, initially in SMILES format, were converted to pdbqt format using Open Babel (O’Boyle et al., 2011). Molecular docking was then carried out using AutoDock Vina 1.2.0 (Eberhardt et al., 2021). The docking box was set to dimensions of 29.25 Å × 34.5 Å × 21.0 Å with center coordinates at x = −7.574, y = 10.801, z = 138.847. The docking site was chosen because it was described as the binding site for PPARG inhibitors and inverse agonists (Irwin et al., 2022; Nolte et al., 1998). Batch docking of 18 small molecules with their respective target proteins was conducted. This methodology enabled a detailed evaluation of potential inhibitors by analyzing their predicted interaction strengths with the target enzyme (Song et al., 2024). We also used the R357A PPARG mutant (PDB ID: 4O8F) and the V290M PPARG mutant (PDB ID: 4OJ4) and conducted molecular docking at the same site with a consistent method.
2.6 Machine learningIn this study, machine learning models were developed to predict the activity of PPARG inhibitors based on different molecular fingerprints. Molecular data for PPARG inhibitors were collected from publicly available databases, including CHEMBL (Gaulton et al., 2012) and PUBCHEM (Kim et al., 2019) and were represented as SMILES (Simplified Molecular Input Line Entry System) strings (Weininger, 1988). There were in total 7899 PPARG inhibitors, and for data balancing, we selected 7,900 inactive molecules. These SMILES strings were then converted into various molecular fingerprints to capture different structural and chemical features. The dataset was divided into training and test sets in an 8/2 ratio using the train_test_split function from Scikit-Learn, ensuring that both sets maintained a representative distribution of the data.
Five types of molecular fingerprints were generated: MACCS Keys (Jow et al., 1990), Morgan (Zhong and Guan, 2023), RDKit (Qiao et al., 2021), Topological Torsion (Nilakantan et al., 1987), and Atom Pairs fingerprints (Awale et al., 2015). MACCS Keys fingerprints were generated using RDKit’s GetMACCSKeysFingerprint function, which provides a 166-bit vector representation of predefined structural keys. Morgan fingerprints, analogous to Extended Connectivity Fingerprints (ECFP), were created using GetMorganFingerprintAsBitVect with a radius of 2 and 1,024 bits to capture circular substructures. RDKit fingerprints, generated with RDKFingerprint, encoded molecular substructures based on paths of bonded atoms. Topological Torsion fingerprints, produced by GetHashedTopologicalTorsionFingerprintAsBitVect, represented topological torsions within the molecular structure. Finally, Atom Pairs fingerprints, created using GetHashedAtomPairFingerprintAsBitVect, captured the relationship between atom pairs in terms of their topological distances.
Two machine learning algorithms, Random Forest (RF) (Belgiu and Drăguţ, 2016) and Extreme Gradient Boosting (XGBoost) (Sheridan et al., 2016), were employed for model development. The Random Forest model, implemented using Scikit-Learn’s RandomForestClassifier, is an ensemble method based on decision trees. Hyperparameters for the RF model, including the number of trees, maximum tree depth, minimum samples per split, minimum samples per leaf, and bootstrap sampling, were optimized using GridSearchCV. XGBoost, implemented with the XGBClassifier from the XGBoost library, is a gradient boosting algorithm. Key hyperparameters such as the number of estimators, maximum tree depth, learning rate, subsample ratio, and column sampling by tree were similarly optimized. Both models were trained using 5-fold cross-validation with KFold to ensure robust evaluation and hyperparameter optimization. The Area Under the Receiver Operating Characteristic Curve (AUC-ROC) was used as the primary scoring metric for model selection.
The performance of the trained models was evaluated on the test set using several metrics. These included AUC-ROC, which provides a comprehensive measure of model performance across all classification thresholds, sensitivity (SE) to assess the true positive rate, specificity (SP) to evaluate the true negative rate, and the Matthews Correlation Coefficient (MCC) to measure the overall quality of binary classifications. These metrics collectively provided a thorough assessment of the models’ ability to predict RANKL inhibitory activity based on different molecular fingerprints.
2.7 Molecular dynamics simulationsWe conducted molecular dynamics (MD) simulations (Hollingsworth and Dror, 2018) on two key molecules, Cycloartenol and Sitogluside, using the PMEMD module of Amber 22 (Case et al., 2022) with CUDA acceleration. Each system underwent 100 ns MD simulations. Hydrogen bonds were constrained using the SHAKE algorithm (Elber et al., 2011), and electrostatic interactions were managed with the Particle Mesh Ewald (PME) method (He et al., 2024), with an 8 Å cutoff. Following initial system construction, atomic clashes were resolved through 500 steps of steepest descent and conjugate gradient minimization. Systems were then heated from 0 K to 300 K over 50 ps, followed by density equilibration and constant pressure operations at 300 K for 500 ps in the NPT ensemble. Once the systems stabilized, three 100 ns MD simulations were run, with data recorded every 1 fs using a 2 fs time step and a Langevin thermostat with a 1 ps collision frequency. Data storage occurred every 2 ps, resulting in 2,000 frames for analysis. Trajectory analysis was performed using the CPPTRAJ (Roe et al., 2013) module of Amber 22, assessing RMSD, RMSF, radius of gyration (Rg), and solvent-accessible surface area (SASA). K-means clustering within CPPTRAJ produced 10 representative structures. The binding free energy differences of protein-ligand complexes were estimated using MM-PBSA (Genheden and Ryde, 2015) calculations from 500 snapshots of the final trajectory. This method reduces errors related to covalent energy and is frequently used alongside MM-GBSA to predict binding free energies in a continuum solvent model.
2.7.1 Secondary structure analysisWe analyzed the protein’s secondary structure using AMBER tools. First, the protein structures were aligned, and the secondary structure of each residue for every frame was outputted into a data file. The file was then modified to correct initial settings and enhance visualization by adding lines to set parameters for grid settings, color palettes, and axis labels. These adjustments allowed for accurate plotting of the data. The modifications included mapping the secondary structure data over time and adjusting the output settings to generate a detailed visual representation. Finally, the modified script was executed to produce a comprehensive plot of the protein’s secondary structure, enabling thorough analysis and interpretation of structural changes over time.
2.7.2 Covariance matrix analysisTo analyze the covariance matrix of the protein, we utilized the Bio3D package (Grant et al., 2021) in R. Initially, the protein structure file (PDB) and the trajectory file (DCD) were loaded into the environment. The Bio3D package was then employed to facilitate the analysis. We selected the C-alpha atoms for the analysis to focus on the protein’s backbone. The trajectories were fitted to the reference structure to ensure proper alignment by aligning the C-alpha atoms in both the fixed and mobile structures. Subsequently, the covariance matrix was computed using the aligned coordinates of the C-alpha atoms. Finally, the covariance matrix was visualized to interpret the correlations between the movements of different parts of the protein.
2.7.3 Principal component analysisWe conducted a Principal component analysis (PCA) (Abdi and Williams, 2010) using AMBER tools. First, the root mean square deviation (RMSD) was calculated for the initial structure (residues 1–269, excluding hydrogen atoms). An average structure was generated, and its RMSD was recalculated against the initial structure. A covariance matrix for the same residues was then constructed. Principal component analysis was performed by diagonalizing the covariance matrix, and the first two eigenvectors were extracted. These eigenvectors were used to project the conformational changes of the protein, providing insights into the dominant motions within the molecular dynamics simulation.
2.8 Validation of sitogluside on adipogenesis in 3T3-L1 cellsThe murine embryonic fibroblast cell line, 3T3-L1, was procured from the National Infrastructure of Cell Line Resource (China Infrastructure of Cell Line Resource). Dulbecco’s Modified Eagle’s Medium (DMEM) was purchased from Gibco. Sitogluside was obtained from TargetMol. The triglyceride (TG) assay kit was sourced from Applygen Technologies. Fetal calf serum (FCS) was supplied by Lablead. The modified Oil Red O staining kit was procured from Beyotime. Isobutylmethylxanthine (IBMX) was acquired from Sigma-Aldrich, Dexamethasone from MedChemExpress, and Insulin from Lablead.
Cell Culture and Treatment: 3T3-L1 preadipocytes were maintained in Dulbecco’s Modified Eagle’s Medium (DMEM) supplemented with 10% fetal calf serum, at 37°C in a 5% CO2 atmosphere. Differentiation was induced when cells reached confluence, using a differentiation medium containing 1 µM Dexamethasone, 0.5 mM IBMX, and 10 μg/mL Insulin with 10% FCS for 48 h. Thereafter, the medium was replaced with DMEM containing only 10 μg/mL Insulin and the cells were cultured for an additional 5–7 days, with the medium refreshed every 2 days.
Sitogluside Treatment: Post-differentiation, 3T3-L1 adipocytes were treated with Sitogluside at concentrations of 0 μM, 5 μM, and 10 µM. The treatments were administered in DMEM throughout the experiment to evaluate their effects on lipid accumulation and metabolic activity.
Lipid Accumulation Assessment (Oil Red O Staining): Cells were washed with PBS and fixed in 4% paraformaldehyde for 10 min at room temperature. Fixed cells were stained with Oil Red O for 40 min to visualize lipid droplets. The dye was subsequently eluted with isopropanol and quantified by measuring the absorbance at 490 nm.
Triglyceride Quantification: Triglycerides were extracted using a lysis buffer and quantified using a triglyceride quantification kit according to the manufacturer’s instructions. Absorbance was measured at 570 nm and concentrations were calculated against a standard curve.
Optical Density Measurements: The optical density of the dye extracted from the stained cells was measured at 490 nm using a spectrophotometer, to quantify the lipid content indicative of adipogenesis under various treatment conditions.
All experiments were conducted in triplicate. Data are expressed as mean ± standard deviation (SD). Differences between treated and control groups were analyzed using one-way ANOVA followed by Tukey’s post hoc test, where a p-value of less than 0.05 was considered statistically significant.
3 Results3.1 Targets of active ingredientsAs shown in Figure 2 and Table 1, 18 active molecules in Nelumbo nucifera leaves were clustered, resulting in three categories of molecules. The first category is represented by Sitogluside, the second category by Kaempferol, and the third category by Nuciferine.
Figure 2. Molecular clustering diagram. The diagram shows three clusters of small molecules: the first cluster is green representing Sitogluside, the second cluster is brown representing Kaempferol, and the third cluster is gray representing Nuciferin.
Table 1. Results of small molecule clustering.
Using multiple databases, we predicted targets and intersected the results with collected obesity disease targets. As shown in Figure 3, Sitogluside intersected with 13 genes, Kaempferol with 48 genes, and Nuciferine with 39 genes. Using the Matthews Correlation Coefficient (MCC) algorithm from the cytoHubba toolkit, we identified the crucial nodes within the Nelumbo nucifera leaf-obesity interactome. The MCC scores, indicating the strength of connectivity, were visually represented with varying color intensities, where a deeper hue indicated higher relevance to obesity pathogenicity. We then cataloged the top 10 targets for each active small molecule, revealing key players such as PPARG, as shown in Figure 4.
Figure 3. The Venn diagram illustrates the shared intersection genes between lotus leaf small molecules and obesity disease. (A) The number of shared intersection genes between Sitogluside and obesity disease is 13. (B) The number of shared genes between Kaempferol and obesity disease is 48. (C) The number of shared intersection genes between Nuciferin and obesity disease is 39.
Figure 4. The diagram illustrates protein interactions involved in lotus leaf treatment of obesity. (A) Top 10 gene interactions of Sitogluside. (B) Top 10 gene interactions of Kaempferol. (C) Top 10 gene interactions of Nuciferin.
3.2 KEGG and GO analysisFigures 5–7 collectively present the results of GO and KEGG pathway enrichment analyses for genes related to Nelumbo nucifera leaf small molecules and obesity disease. The GO enrichment analyses (A of each figure) identify key biological processes, cellular components, and molecular functions. In the first figure, significant biological processes include cellular response to chemical stress and regulation of inflammatory response, with cellular components such as membrane rafts and neuronal cell bodies, and molecular functions focusing on protein tyrosine kinase and serine hydrolase activities. The KEGG pathway analysis highlights pathways like endocrine resistance, EGFR tyrosine kinase inhibitor resistance, and several cancer-related pathways including prostate, breast, and gastric cancer. In the second figure, enriched biological processes include organic hydroxy compound transport and vascular processes in the circulatory system, with cellular components like synaptic and presynaptic membranes and molecular functions emphasizing G protein-coupled receptor activity and neurotransmitter binding. Key KEGG pathways include neuroactive ligand-receptor interaction, serotonergic synapse, and chemical carcinogenesis, as well as addiction pathways such as cocaine and amphetamine addiction. The third figure’s GO analysis emphasizes processes like organic hydroxy compound transport and stress responses, with cellular components including membrane microdomains and neuronal cell bodies, and molecular functions like neurotransmitter receptor activity. KEGG pathway analysis reveals significant pathways such as lipid and atherosclerosis, various signaling pathways (e.g., MAPK and PI3K-Akt), and cancer-related pathways. Collectively, these analyses suggest that the interaction between Nelumbo nucifera leaf small molecules and obesity involves complex networks affecting inflammation, cellular signaling, and metabolic processes, with broad implications for cancer, neurological disorders, and metabolic diseases.
Figure 5. The results presented are based on the GO (A) and KEGG (B) pathway enrichment analyses of intersection genes between Sitogluside and obesity.
Figure 6. The results presented are based on the GO (A) and KEGG (B) pathway enrichment analyses of intersection genes between kaempferol and obesity.
Figure 7. The results presented are based on the GO (A) and KEGG (B) pathway enrichment analyses of intersection genes between Nuciferin and obesity.
3.3 Molecular docking and machine learning screeningDue to the inclusion of PPARG as a key target for the three categories of molecules and considering the importance of PPARG in treating obesity, we performed molecular docking to screen the affinity of active small molecules in Nelumbo nucifera leaves for PPARG. The docking energy results are shown in Table 2. The ChemDraw (.cdx) structures of all compounds listed in Table 2 have been uploaded in the supplementary materials. Molecular docking results revealed key interactions between Nelumbo nucifera leaf bioactive compounds and PPARG (Table 2). Cycloartenol showed the highest binding affinity (−9.5 kcal/mol), engaging residues such as HIS351 and LYS502 through hydrophobic interactions. Sitogluside (−8.5 kcal/mol) formed hydrogen bonds with HIS351 and HIS477, and had multiple hydrophobic interactions. Isorhamnetin (−7.9 kcal/mol) formed a hydrogen bond with SER317, while kaempferol (−7.8 kcal/mol) formed one with LEU496. Both compounds also exhibited significant hydrophobic interactions. Other notable compounds, such as anonaine and remirin, showed binding affinities of −7.8 kcal/mol with distinct hydrogen and hydrophobic bonds. TThese findings highlight the potential of cycloartenol and sitogluside as promising PPARG ligands, contributing to the anti-obesity effects of Nelumbo nucifera leaf bioactive compounds.
Table 2. Compound_name and docking energy results.
The two molecules with the highest affinity, Sitogluside (Figure 8A) and Cycloartenol (Figure 8B), both belong to the first category of molecules, with binding affinities of −8.5 kcal/mol and −9.5 kcal/mol, respectively. The docking conformations and binding sites are illustrated in Figure 8. We conducted molecular dynamics simulations on the complexes of these two molecules with the PPARG protein, as well as on the apo protein. The interaction diagram in Figure 8A highlights various residues involved in interactions: van der Waals interactions with residues such as TYR348 and LEU496; conventional hydrogen bonds with residues like HIS351 and HIS477; and alkyl interactions with residues such as VAL478 and LYS502. Similarly, the interaction diagram in Figure 8B details van der Waals interactions with residues like CYS313 and PHE310, and alkyl interactions with residues such as VAL321 and LEU497. These interactions are crucial for understanding the binding affinity and specificity of the ligands to the PPARG protein.
Figure 8. Docking results of PPARG with two active compounds from lotus Leaf. (A) Interaction between sitogluside and PPARG. (B) Interaction between cycloartenol and PPARG.
PPARG contains a well-defined hydrophobic binding pocket that plays a critical role in ligand recognition and stabilization. This pocket is composed of key hydrophobic residues, including LEU, VAL, and ILE, which facilitate van der Waals interactions and stabilize the binding of non-polar regions of ligands. Additionally, hydrogen bonds contribute significantly to binding specificity, with residues such as GLN487 and LYS488 forming stable hydrogen bonds with ligand functional groups. Aromatic interactions, including π-alkyl and π-π stacking with residues like TYR473 and HIS477, further enhance ligand binding by providing additional stability through π-electron interactions. These findings suggest that an effective PPARG inhibitor may ideally engage multiple interaction types, including hydrophobic, hydrogen bond, and π-interactions, to ensure both specificity and binding strength.
Target prediction and reverse docking alone are insufficient for achieving biological significance, as it is challenging to distinguish between activity and inhibitory activity. To address this, we developed machine learning models for predicting PPARG inhibitors. The development of machine learning models for PPARG prediction included the use of Random Forest (RF) and Extreme Gradient Boosting (XGB) algorithms, with five molecular fingerprints (MACCS, Morgan, RDKit, Topological Torsion, AtomPairsFP) as inputs, resulting in 10 models in total. The performance of each model, evaluated using metrics such as sensitivity (SE), specificity (SP), accuracy (ACC), Matthews correlation coefficient (MCC), precision (P), F1 score (F1), balanced accuracy (BA), and area under the ROC curve (AUC), is summarized in Figure 9.
Figure 9. Performance metrics of Random Forest (RF) and Extreme Gradient Boosting (XGB) models using five different molecular fingerprints for PPARG inhibitor prediction.
The XGB models generally performed well, with Morgan-XGBoost achieving the highest AUC (0.9661) and precision (0.9775), indicating its robustness in identifying true positive PPARG inhibitors while minimizing false positives. The confusion matrices for the XGB models (Figure 10) further demonstrate this, with Morgan-XGBoost showing a very high true negative rate (97.89%) and a reasonably high true positive rate (86.14%), suggesting that this model can reliably predict PPARG inhibitors with high specificity and good sensitivity. Other XGB models, such as AtomPairsFP-XGBoost and TopologicalTorsion-XGBoost, also showed strong performance in both specificity (91.58% and 94.74%, respectively) and sensitivity (87.13% and 83.17%, respectively). The MACCS-XGBoost model exhibited a slightly lower sensitivity (81.19%) but maintained a high specificity (96.84%), reflecting its effectiveness in filtering out false positives while being slightly more conservative in identifying true positives. The Morgan-XGBoost model, with its high precision and balanced accuracy, is particularly suited for distinguishing between active and inactive PPARG inhibitors. Using the best-performing model, Morgan-XGBoost, Cycloartenol and Sitogluside were both identified as potential PPARG inhibitors.
Figure 10. Confusion matrices for the Extreme Gradient Boosting (XGB) models using different molecular fingerprints for PPARG inhibitor prediction.
Several mutations, including R357A and V290M in PPARG, render it unresponsive to drugs. To mitigate this impact, we used the R357A PPARG mutant (PDB ID: 4O8F) and the V290M PPARG mutant (PDB ID: 4OJ4) and conducted molecular docking at the same site using a consistent method. The results are provided in the supplementary material. Sitogluside showed an affinity of −7.5 kcal/mol with R357A and −8.5 kcal/mol with V290M, while Cycloartenol had an affinity of −8.0 kcal/mol with R357A and −8.6 kcal/mol with V290M. It can be observed that both Sitogluside and Cycloartenol maintained high affinity in unresponsive mutants (Bharti et al., 2021).
3.4 Molecular dynamics simulationMolecular dynamics simulations were performed for the three systems (Sitogluside, Cycloartenol and Apo). The solvent-accessible surface area (SASA) of a protein can be used to analyze its hydrophobicity and the degree of surface exposure (Wang et al., 2023). Higher SASA values indicate greater hydrophobicity, while lower values indicate less. As shown in Figure 11A, the SASA of Sitogluside decreases after 40 ns, falling below that of the apo protein (Apo) and Cycloartenol, indicating that the conformation of the protein changes after binding with this inhibitor, resulting in increased hydrophobicity and a more closed surface. Figure 11B shows that the SASA distribution center of Sitogluside is at 140 nm2, while those of Apo and Cycloartenol are at 145 nm2, which is consistent with the trend observed in the line chart.
Figure 11. Structural stability analysis of three systems. (A) SASA during a 100-nanosecond molecular dynamics simulation. (B) Relative frequencies of SASA. (C) Time evolution of RMSD from their initial structures for the three systems. (D) Relative frequencies of RMSDs. (E) Radius of gyration (Rg) for three systems during a 100-nanosecond molecular dynamics simulation. (F) Relative frequencies of radius gyration.
The root mean square deviation (RMSD) of the backbone carbon atoms relative to their initial positions is an indicator of the stability of the simulated system and reveals the deviation of the complex from its initial conformation, indicating conformational changes. Analysis of Figure 11C shows that the RMSD of Sitogluside is higher than that of the apo protein (Apo) and Cycloartenol after 20 ns, indicating greater conformational changes. As shown in Figure 11D, the RMSD distribution center of Sitogluside is at 1.7 Å, also higher than that of Apo and Cycloartenol, which aligns with the trend observed in the line chart.
The radius of gyration (Rg) reveals the compactness of the complex. Analysis of Figure 11E, F shows that the Rg of Sitogluside is smaller than that of Cycloartenol, indicating a more compact conformation after binding. However, the Rg of Sitogluside is slightly larger than that of Apo, because Apo is an apo protein without the supporting effect of a ligand.
The root mean square fluctuation (RMSF) of protein amino acids is used to analyze the extent of fluctuation of individual amino acids during the simulation process, revealing the flexibility changes of residues (Figure 12). The overall lower RMSF of Sitogluside indicates better stability, suggesting that Sitogluside induces a stable conformational change in PPARG. Therefore, despite having a higher RMSD value, Sitogluside maintains a relatively low RMSF.
Figure 12. The RMSFs of the CA atoms.
We performed a protein secondary structure (DSSP) analysis and found no significant changes (Figure 13). This reveals that the influence of Sitogluside on PPARG is not closely related to secondary structure changes. Instead of affecting the conformation of alpha-helices, Sitogluside impacts the overall compactness of the protein by modulating the flexibility of loops.
Figure 13. Secondary structure analysis of the protein in three systems. (A) Apo (B) Sitogluside (C) Cycloartenol.
After studying the conformational changes, we then focused on the dynamic characteristics of the protein. The analysis of specific motion patterns, as shown by PCA (Figures 14A–C), indicates that the first two principal components of the apo group and the Cycloartenol group account for only 27.1% and 28.8% of the motion modes, with PC1 accounting for only 16.7% and 16.2%, respectively. In contrast, the first two principal components of the Sitogluside group account for 40.5%, with PC1 alone accounting for 32.9%. This reveals that the binding of Sitogluside leads to significant differences in the protein’s motion patterns.
Figure 14. PCA analysis of the three systems. (A) Apo. (B) Sitogluside (C) Cycloartenol. Covariance matrix analysis of three systems. (D) Apo. (E) Sitogluside (F) Cycloartenol.
To investigate the changes in the internal motion patterns of the protein, we calculated the dynamic cross-correlation matrix (DCCM) for each residue in the three systems (Figures 14D–F). Blue indicates positive correlated motion between re
留言 (0)