Proteome and genome integration analysis of obesity

Introduction Definition, dangers, and status of obesity

Obesity is a chronic metabolic disease characterized by excessive accumulation or abnormal distribution of body fat.[1] Body mass index (BMI), which is weight (kg) divided by height squared (m2), is a commonly used obesity phenotype. The World Health Organization (WHO) states that for adults, a BMI of 25.0–29.9 (kg/m2) is considered overweight, and a BMI of ≥30 (kg/m2) is considered obese. Obesity represents a major health challenge because it substantially increases the risk of diseases, such as type 2 diabetes, coronary heart disease, hypertension, obstructive sleep apnea, and several types of cancer.[2] Moreover, the impact of obesity on communicable disease was reported by recent studies which indicated that individuals with obesity are at increased risk of hospitalization and severe illness from coronavirus disease 2019 (COVID-19).[3,4] According to the Global Burden of Disease Study, obesity-attributable diseases cause over 5 million deaths and 160 million disability-adjusted life years (DALYs) globally in each year.[5] The increments of deaths and DALYs attributable to high BMI in China ranked 59th and 52nd among 204 countries/territories worldwide in the past 30 years, and the increasing rates remain on the rise.[5] The prevalence of obesity has significantly increased worldwide over the past few decades.[6] According to the National Health and Nutrition Examination Survey, from 1999–2000 through 2017–2020, the prevalence of obesity among adults aged ≥20 years increased from 30.5% to 41.9% in the United States. From 1993 to 2019, the prevalence of obesity among Chinese adults increased from 4.2% to 16.4%,[7,8] and this prevalence has likely not yet reached its peak by 2019.[5]

Obesity not only causes huge damage on health, also leads to a substantial economic burden in our society. For example, the estimated annual medical cost of obesity in the United States was nearly $173 billion in 2019.[9] A recent study suggested a direct medical cost of obesity of RMB 64 billion Yuan, which accounts for ~8.0% of China's total national medical costs in 2010. This number could increase to RMB 418 billion Yuan, which would be ~22.0% of total medical costs in 2030.[10,11] Therefore, it is of great social significance to explore the pathogenesis of obesity.

Genetics of obesity

The pathogenesis of obesity is far more complex than just an imbalance between energy intake and expenditure that leads to the passive accumulation of excess weight.[12]Obesity is a complex chronic disease influenced by numerous genetic, environmental, and social factors, and their interactions. Genetic factors are now known to play a substantial role in the development of obesity. Twin, family, and adoption studies have estimated the heritability of obesity to be between 40.0% and 70.0%.[13,14] As a consequence, genetic approaches can be utilized to understand the molecular mechanisms underlying obesity.

Obesity is generally classified into two main groups based on genetic influence: common polygenic obesity and monogenic obesity.[15] Monogenic obesity is caused by single gene defects or chromosomal deletions. This condition is typically rare and presents with an early onset and severe phenotypes. Most monogenic obesity mutations,[16–19] including variants in leptin (LEP), leptin receptor (LEPR), melanocortin 4 receptor (MC4R), andproopiomelanocortin (POMC), have been identified in cohorts of patients with severe and early onset obesity.[15] Polygenic obesity is the most common type of obesity. The completion of the Human Genome Project and the International HapMap Project has greatly improved our knowledge of the human genome and has provided a better context for the study of genetic variations underlying complex diseases. Advances in high-throughput genotyping technologies have made genome-wide association studies (GWASs), a research approach used to identify genomic variants that are statistically associated with a disease/trait, easier to accomplish. To date, dozens of GWASs and their meta-analyses have identified hundreds of polygenic obesity-associated genes/loci,[20–22] such as fat mass and obesity associated (FTO), catenin beta like 1 (CTNNBL1), and cathepsin S (CTSS).

Despite the accomplishments of GWAS, several limitations and drawbacks still hinder its success.[23] First, the loci, identified thus far, generally explain only a small fraction of the heritable component of obesity. To date, the largest published GWAS of BMI in ~700,000 participants uncovered 941 near-independent single nucleotide polymorphisms (SNPs) associated with BMI, explaining ~6.0% of the variance in BMI.[21] Second, while Mendelian diseases generally result from variants in coding regions of genes, common diseases usually result from variants in gene regulation regions. Third, the same genetic variants often contribute to different outcomes because of the influence of the environment and genetic background. For such reasons, the specific mechanisms of action of the obesity susceptibility loci identified by GWAS are still poorly understood, and the translation of GWAS results into clinical practice to improve human health remains extremely challenging.[23]

Protein levels are strongly associated with obesity

Proteins are the main functional components of biological processes and are also potential drug targets. The proteome, all the proteins encoded by a genome, more accurately reflects the dynamic state of a cell, tissue, or organism. Proteomics is expected to yield better disease markers for diagnosis and therapy monitoring.[24] Proteomic analysis demonstrated that the levels of many proteins, such as LEP and C-reactive protein (CRP), vary significantly between individuals with obesity and normal-weight individuals.[25–27] The peptide hormone LEP has been shown to have a function in reducing food intake and controlling body weight, and it plays a key role in regulating body weight through a negative feedback mechanism between adipose tissue and the hypothalamus.[28] Moreover, adipose tissue is an active endocrine organ that releases a variety of hormones and cytokines that contribute to CRP elevation.[29]

Proteomic analysis has become one of the most important disciplines for elucidating the pathogenesis of obesity. However, most proteomic analyses of obesity have been limited by small sample sizes or a limited number of measured proteins.[12] Thus, systematic investigation of functional links between the proteome and obesity in large samples seems to be the next step in obesity research.

Proteome and genome integration analysis in obesity

Genomics and proteomics are complementary fields, as proteomics extends functional analysis. It is believed that through genomics and proteomics, new disease markers and drug targets can be identified, and such targets will ultimately aid in the development of products which can be used to prevent, diagnose, and treat diseases.[30] With the development of proteomic and genomic technologies, many genome-wide protein quantitative trait loci (pQTL) analyses have emerged.[24,31,32] Researchers have observed many overlaps between pQTLs and obesity-associated variants. For example, the adiposity-related variant rs6235 is significantly correlated with proprotein convertase subtilisin/kexin type 1 (PCSK1) abundance,[33] demonstrating that rs6235 may influence obesity by regulating PCSK1 expression. Thus, integrating genomics and proteomics data may help to bridge a gap in knowledge regarding SNP–obesity associations.[34,35]

Overview

In this review, we first provided an overview of the published papers on the integrated analysis of proteomic and genomic data in obesity. Then, we summarized the mainstream methods for integrating genomic and proteomic data and discussed their limitations. We also discussed some potential alternative approaches.

Methods for Searching and Identifying Eligible Papers

We searched two formulas, "((pQTL) OR (((Quantitative Trait Loci) OR (Quantitative Trait Locus)) AND (proteome))) AND (human))" AND "((GWAS) OR (Genome-Wide Association Study) OR (GWA Study) OR (Whole Genome Association Study)) AND (proteome) AND (human)", to identify potential published genomic and proteomic integration studies on obesity from inception to July 2022.

We initially retrieved 993 publications from PubMed and China National Knowledge Infrastructure. Based on the criteria of integrating proteomic and genomic data and analyzing obesity-related traits, we identified 17 eligible papers. Four main integrating strategies were used in these papers, namely overlap analysis between pQTL variants and GWAS signals, colocalization analysis, Mendelian randomization (MR) analysis, and proteome-wide association study (PWAS). Specifically, we identified nine overlap analyses and two colocalization analyses (one study performed both overlap and colocalization analyses), five MR analyses, and two PWASs.

Overlap Analysis between pQTL Variants and GWAS Signals

In recent years, genome-wide pQTL analyses have accumulated rapidly, laying the foundation for cross-omics studies of obesity. Here, we identified nine eligible genome-wide pQTL studies with overlap analysis between pQTL variants and obesity-related GWAS hits.[26,32,36–42] Seven of the nine overlap analyses focused on the comparison between plasma/serum pQTLs and obesity GWAS signals. And the other two overlap analyses focused on hepatic pQTLs and human induced pluripotent stem cell (iPSC) pQTLs, respectively.[38,39]. These overlap analyses identified many loci associated with both obesity and protein expression levels, as shown in Table 1.

Table 1 - Overlap and colocalization between pQTL associations and GWAS hits of obesity-related traits. Integration strategies Obesity-related traits Cohorts Population Sample size Lead protein Platform Annotation Reference Overlap of the significant pQTL SNPs BMI AddNeuroMed study European 96 CD33 SOMAscan Elderly people, blood plasma [42] BMI, WC, HC, weight, and obesity XenoTech LLC, LTCDS, CHTN 59.9% European 287 GLDC DIA-TPA Liver tissue [38] Overlap considering the condition of SNPs in high LD BMI The DiOGenes cohort European 494 IL-1RAcP SOMAscan Obese people, blood plasma [26] BMI, obesity, WHR, WC, HC, and other body composition parameters* AGES Reykjavik cohort European 5343 PRTN3 SOMAscan Elderly people (exome assay), serum [36] BMI, obesity, WHR, WC, HC, and other body composition parameters* The Icelandic Cancer Project and DeCODE genetics European 35,559 LRRN1 SOMAscan The largest ever since, blood plasma [37] Obesity related-traits, fat body mass The KORA F4 study European 1000 CTSS SOMAscan Blood plasma [32] BMI, WHR, and WC Human Induced Pluripotent Stem Cells Initiative (HipSci) European 151 P49411 Tandem Mass Tag (TMT) IPS cells [39] HC and BMI The DiOGenes cohort, The Ottawa Study European 376–548 ITIH3 SOMAscan, MS Obese people, blood plasma [40] BMI, WC, HC, weight, and other body composition parameters* The Fenland Study European 10,708 Siglec-9 SOMAscan Blood plasma [41] Colocalization BMI, WC, HC, weight, and other body composition parameters* The Fenland Study European 10,708 NEC1/PCSK1 SOMAscan Blood plasma [41] BMI, WC, HC, weight, and other body composition parameters* The Fenland Study European 10,708 LRIG1 SOMAscan Blood plasma, two techniques overlap [44] 485 FBLN3 Olink PEA

*Other body composition parameters: Arm fat mass, Arm fat percentage, Arm fat-free mass, Body fat percentage, Leg fat mass, Leg fat percentage, Leg fat-free mass, Trunk fat mass, Trunk fat percentage, Trunk fat-free mass, Whole body fat mass, Whole body fat-free mass, and Fat body mass.BMI: Body mass index; GWAS: Genome-wide association studies; HC: Hip circumference; LD: Linkage disequilibrium; pQTL: protein quantitative trait locus; SNPs: Single nucleotide polymorphisms; WC: Waist circumference; WHR: Waist-to-hip ratio.


Overlap analysis of plasma/serum pQTLs

Plasma and serum are the primary clinical specimens and are widely used for proteomics-based biomarker discovery. In 2012, Lourdusamy et al[42] provided a genetic association study of proteins in plasma. They quantified the protein abundances of 813 proteins measured from the plasma of 96 elderly healthy European individuals. After overlapping the pQTL variants with the nominally significant (P <1 × 10-5) association signals of the GWAS Catalog, they identified that the CD33 protein-related SNP rs1878047 was also associated with BMI.

In 2017, Suhre et al[32] investigated the levels of 1124 proteins in blood plasma samples from 1000 individuals living in southern Germany and replicated the results in 338 participants of Arab and Asian ethnicities. They identified 539 pQTLs, 384 of which displayed nominal significance in the replication sample. After overlapping, they observed that pQTLs of CTSS were associated with body fat mass. Carayol et al[26] designed a pQTL analysis based on a set of 1129 proteins from 494 obese subjects before and after a weight loss intervention. They revealed 55 BMI-associated pQTLs at baseline and three pQTLs after low-calorie dietary intervention. By performing overlapping analysis of variants in linkage disequilibrium (LD) with pQTL SNPs, they identified 16 pQTL SNPs nominally (P <0.05) associated with BMI.[20]

In 2020, Ruffieux et al[40] conducted a multivariate Bayesian QTL analysis and overlapped their pQTLs with previously reported GWAS signals (P <1 × 10-5). They identified that the pQTLs of ITIH3 were associated with BMI. In 2021, Pietzner et al[41] performed a large pQTL study including 4775 protein targets measured in plasma from 10,708 individuals of European descent. They identified 10,674 genetic variant–protein target associations, covering 3892 distinct protein targets. After overlapping analysis, they identified that the pQTLs of many proteins, such as adenosine triphosphatase Na+/K+ transporting subunit beta 2 (AT1B2), fibulin 3 (FBLN3), and R-spondin 3 (RSPO3), were also associated with obesity-related traits, including BMI, waist-to-hip ratio (WHR), waist circumference (WC), hip circumference (HC), and some other body composition parameters. Additionally, in 2021, Ferkingstad et al[37] performed the largest genome-wide pQTL study to date; the plasma levels of 4907 aptamers were measured in 35,559 Icelandic individuals. This study identified 5007 sentinel pQTLs at a false discovery rate (FDR) of 1.3% via conditional analysis. After overlapping these with the National Human Genome Research Institute (NHGRI) GWAS catalog (P <5 × 10-8), they identified 104 pQTLs of many plasma proteins, such as leucine-rich repeat neuronal protein 1 (LRRN1) and fucosyltransferase 5 (FUT5), which were associated with obesity-related traits.

All the above pQTL analyses were based on common genetic variants. In 2022, Emilsson et al[36] compared 54,469 low-frequency and common exome-array variants to 4782 protein measurements in the serum from 5343 individuals from the AGES Reykjavik cohort. They identified 2021 independent exome array variants that were associated with the serum levels of 1942 proteins. After overlapping these with the association signal at the genome wide significance (P <5 × 10-8) level from the Phenoscanner database, they identified pQTLs of many plasma proteins, such as secretogranin-3 (SCG3), CD300c molecule (CD300C), and arginyl aminopeptidase (RNPEP), which were associated with obesity-related traits, including BMI, obesity, WHR, WC, and HC.

Overlap analysis of hepatic pQTLs

The liver is an important organ for human metabolism and contains many metabolism-related proteins. The obesity-related proteins in liver tissue have great reference value for the prevention and treatment of obesity. In 2020, He et al[38] conducted a genome-wide pQTL study of 287 normal human liver samples using DIA-TPA proteomic measurement technology. This study identified 6155 pQTL variants at the genome-wide significance level (P <2.99 × 10-8). After overlapping these with the significant association signals (P <1 × 10-5) from the GWAS Catalog, ClinVar, and PharmGKB databases, they observed associations between the pQTLs of proteins, such as sulfotransferase family 1A member 1 (SULT1A1) and glycine decarboxylase (GLDC), and obesity-related phenotypes (including BMI, WC, and HC).

Overlap analysis of human iPSC pQTLs

Human iPSCs are a key cell type for disease modeling. Mirauta et al[39] reported the first comprehensive proteomic analysis of human iPSCs. By analyzing 202 iPSC lines derived from 151 donors using integrated transcriptome and genomic sequence data from the same lines, 650 pQTLs in iPSCs were identified. The overlap of these pQTLs with variants identified in GWAS (P <1 × 10-5) was examined, and pQTLs of cell proteins, such as serine/threonine-protein kinase Nek4 (NEK4), mitochondrial elongation factor Tu (TUFM), and mitochondrial carrier homolog 2 (MTCH2), were found to be associated with obesity-related traits (including BMI, WC, HC, body fat percentage, and fat-free mass).

Overlap analysis, a common downstream analysis of pQTL studies, can help us explore the physiological relevance of pQTL variants. However, the overlapping results of pQTL associations and GWAS hits can only represent correlation but not causation.

Colocalization Analysis of pQTL Associations and GWAS Hits

Previous studies demonstrated that pQTLs were mostly enriched within the intronic regions of genes.[38] Thus, it is still unclear whether proteins and obesity share a causal variant.

Colocalization analysis, as a posterior probability-based statistical method, is a technique used to assess whether two association signals are consistent with a shared causal variant.[43] We found two colocalization analyses between proteins and obesity in the Fenland cohort, as shown in Table 1. Pietzner et al[41]performed a colocalization analysis between cis-pQTLs and GWAS from GSK or Open GWAS. They identified several proteins, such as neuroendocrine convertase 1 (NEC1), apolipoprotein L3 (APOL3), and LRIG1, which shared causal variants with BMI, WC, HC, and other body composition parameters. Soon after, considering the complementarity between different sequencing technologies, Pietzner et al[44] integrated two protein sequencing technologies, the SomaScan v4 assay and the Olink proximity extension assay, and identified the phenotypic consequences of 871 protein targets across hundreds of pQTLs. Then, they performed colocalization analysis between the Open GWAS database and those pQTLs and identified some links with protein–body composition parameters, such as links between leucine rich repeats and immunoglobulin like domains 1 (LRIG1) and leg fat-free mass (left), scavenger receptor class F member 2 (SREC-II) and trunk fat-free mass, and HEXI1 (HEXIM P-TEFb complex subunit 1) and whole body fat-free mass.

Colocalization analysis can help us determine whether a protein and obesity share the same causal variants. However, colocalization analysis is calculated based on a specific region rather than the whole genome, and so the colocalization results are highly dependent on the choice of colocalization regions. In addition, colocalization analysis can only calculate the posterior probability of shared causality among traits but not the specific causal effect sizes.

Mendelian Randomization Analysis

MR, as a causal inference method, can leverage genetic variants as an instrumental variable (IV) to assess the causal relationship between exposure and disease/trait.[45] Genetic variants are strictly randomly assigned, and so their effects are not affected by confounding factors and reverse causality. Therefore, using GWAS data of proteins and obesity-related traits as exposure/outcome, MR can integrate genomic and proteomic data to identify the causal relationships between proteins and obesity, as shown in Figure 1.

F1Figure 1:

MR to assess the causal effect of proteins on obesity. Dotted lines represent potential pleiotropic or direct causal effects between variables that would violate MR assumptions. Three assumptions: (1) IV is strongly associated with exposure; (2) IV must be independent of confounders of the exposure-outcome relationship; and (3) the IVs are associated with outcome only through exposure. IVs: Instrumental variables; MR: Mendelian randomization.

The rapid development of genome-wide pQTL analysis laid the foundation for the application of MR in the identification of candidate risk proteins for obesity. The validity of MR analysis relies on three basic assumptions: (1) IV is strongly associated with exposure; (2) IV must be independent on confounders of the exposure-outcome relationship; and (3) IVs are associated with outcome only through exposure. To satisfy the first assumption, IVs (SNPs or polygenic risk scores [PRS]) are usually screened according to the genome-wide significance level (P <5 × 10-8). We identified five eligible MR analyses that integrated plasma proteomic data and obesity GWAS results.[27,46–49] All of these studies were published in the last 2 years and are summarized in Table 2. Three studies investigated the causal association between proteins and BMI. The other two studies evaluated the causal associations between proteins and body composition parameters.

Table 2 - Published MR analysis between proteins and obesity-related traits. Direction Cohort Population Sample size of pQTL analyses The number of significant proteins Lead protein Reference BMI → Proteins CKB Asian 628 6 TRAIL [48] BMI → Proteins KORA European 996 24 IGFBP1 [27] Proteins → BMI 6 CTSA BMI → Proteins INTERVAL European 2737 8 LEP [46] Proteins → BMI INTERVAL, KORA, BETTER, FHS, AGES European 996–6861 3 PDCD1LG2 [49] Proteins → Weight 11 EFEMP1 Proteins → BMI INTERVAL, KORA, BETTER, FHS, AGES European 996–6861 18 PCSK1 [47] Proteins → Other body composition parameters* 76 FST Proteins → WC 12 PCSK1 Proteins → WHR 16 RSPO3 Proteins → HC 21 NEGR1

*Other body composition parameters: Arm fat mass, Arm fat percentage, Arm fat-free mass, Body fat percentage, Leg fat mass, Leg fat percentage, Leg fat-free mass, Trunk fat mass, Trunk fat percentage, Trunk fat-free mass, Whole body fat mass, Whole body fat-free mass, and Fat body mass. BMI: Body mass index; HC: Hip circumference; MR: Mendelian randomization; pQTL: protein quantitative trait locus; WC: Waist circumference; WHR: Waist–hip ratio.


MR analysis of proteome and BMI

Due to the ease of measurement, BMI is currently the predominant indicator of obesity. In 2021, Zaghlool et al[27] performed a bidirectional MR analysis to evaluate the causal association between plasma proteins and obesity. They identified that BMI had a causal effect on 24 proteins, such as serpin family E member 1 (SERPINE1), LEP, growth hormone receptor (GHR), WAP, follistatin/kazal, immunoglobulin, Kunitz and netrin domain containing 2 (WFIKKN2), and insulin like growth factor binding protein 1 (IGFBP1). And they also identified causal effects of six proteins, such as LEPR, IGFBP1, WFIKKN2, advanced glycosylation end-product specific receptor (AGER), dermatopontin (DPT), and cathepsin A (CTSA), on BMI in the reverse-directional MR. In addition, an MR study based on the Chinese Kadoorie Biobank (CKB) cohort explored the causal relationships between BMI and plasma proteins.[48] They identified that the genetically predicted BMI was associated with six proteins (at a 5.0% FDR threshold), such as Interleukin-6 (IL-6), Interleukin-18 (IL-18), and C-C motif chemokine ligand 3 (CCL3). In 2021, Goudswaard et al[46] also performed an MR analysis to evaluate the causal effects of BMI on plasma proteins. The genetic risk scores (GRSs) of BMI were constructed based on the largest BMI GWAS meta-analysis, and the protein association results were obtained from the INTERVAL cohort.[20] Finally, they identified that BMI was causally associated with eight proteins, such as LEP, fatty acid binding protein 4 (FABP4), C5 (complement C5), and sex hormone binding globulin (SHBG).

MR analysis of the proteome and other obesity-related phenotypes

Although BMI is easy to measure, it is not an ideal indicator of obesity. Body weight consists primarily of fat, muscle, and bone minerals, which have distinct genetic mechanisms and clinical consequences. Fat is a major contributor to the clinical consequences of obesity.[50] Therefore, it is more reasonable to use body fat content and distribution as obesity phenotypes. In 2020, Zheng et al[49] published a phenome-wide MR study that mapped the influence of the plasma

留言 (0)

沒有登入
gif