Cohort profile update: Tehran cardiometabolic genetic study

Although the Iranian population is likely to have extensive genetic diversity, there is a significant lack of corresponding data in public genomic repositories. Therefore, the TCGS was initiated in 2011 to investigate the genetic makeup of Iranians. TCGS is a longitudinal, transgenerational study based on families, providing the demographic, biochemical, and genetic data of 20,367 participants over 22 years [1] which allows early screening to identify high-risk individuals who can be targeted for preventive interventions in primary and secondary healthcare systems.

Healthcare systems strive to improve efficiency and therapeutic benefit by utilizing genetic or molecular profiling for groups of patients. As a result, precision medicine, by incorporating established biomarkers, functional tests, imaging, and new genomics and omics developments for each population, aims to provide the “right treatment to the right patient at the right time” [1]. TCGS aims to identify key genetic factors that contribute to the development of cardiometabolic disease in Iranians, as the effects of omics layers, particularly genetics, are well-known to vary across ethnicities. By combining genetic findings with additional non-genetic data types available in the TCGS cohort, it will be possible to accurately identify high-risk individuals and provide timely intervention and effective treatment. This project provides significant information and data relevant to the Iranian population that contributes to global knowledge and serves as the first step toward implementing precision medicine in Iran.

TCGS design and characterization

TCGS study originated from the TLGS [2] that has followed up over 15,000 participants for at least 22 years without any pre-specified exclusion criteria. All individuals older than three years living in the area were invited to participate. The ongoing cohort project involves extensive surveys at baseline and regular follow-ups every three years with trained staff recording the development of cardiometabolic diseases and corresponding biochemical factors such as high cholesterol, low level of HDL-C, high TG, and behavior patterns such as smoking and physical activity [2]. In 2017 [3], we included  the TOTS [4], the TCS[5], and CFS, which further focused on obesity intervention, thyroid cancer, and clinical genetic disorders. The TCGS project comprises 20,367 participants selected from four longitudinal, ongoing, and family-based studies (Fig. 1). Demographic characteristics, ethnicity, blood test results, medical history, drug history, gynecology-related information, diet, physical activity, and smoking information were gathered for each participant.

Fig. 1figure 1

Description of projects which attended TCGS

Between 1999 and 2022, data from all original projects, except for TOTS, was collected and digitized from paper-based questionnaires. However, in the TOTS study, physicians used computer-based questionnaires to record participants' medical conditions. Medical information was coded according to the International Statistical Classification of Diseases and Related Health Problems (ICD) version 11 (Supplementary file 2). All medical information was categorized as a self-report (SR) or obtained through follow-up during hospitalization (H). A detailed description of the TCGS population, including sex, age, ethnicity, and disease frequency, is provided in Table 1. Ethnicity status for 6,177 participants was collected through self-reporting and questionnaires on the birthplaces of the past three generations. Information on the genotypes of 3851 out of 6177 individuals was also recorded. The inclusion of various ethnic groups in the TCGS project provides insight into the diversity of the Iranian population, with the most frequent groups being Persian (76.6%), Turk (12.1%), and Gilak (3.8%) (Table 1).

Table 1 Baseline characteristics of participants in the TCGS

The genotyping of 16,226 TCGS participants was performed using Illumina Human OmniExpress-24-v1-0 bead chip, which contains 652,919 SNP loci, at the deCODE genetics/Amgen company (Iceland) according to the manufacturer's specifications (Illumina Inc., San Diego, CA, USA) [6] (Tables S2, S3, S4, and S5). IBD information was derived from genotypes and used to confirm all familial relationships. Around 660 couples met the threshold of 0.05 for consanguineous marriage relatedness, representing a consanguineous marriage rate of 28.50% (Figure S7). Among the couples with no IBD information, 543 declared consanguinity, resulting in an overall rate of 28.15%. The study used genotyping information to correct familial relationships and assigned FID (Figure S5). The number of the cluster was 4,428 (mean ± standard deviation (sd)): 3.87 ± 1.86, (Min, Max: 1, 16)), and after correction, the number of FIDs was 3,320 (mean ± sd: 5.51 ± 5.41, (Min, Max: 3, 56)) with 1132 individuals. There were 6,368 nuclear families with 26,392 parents/offspring and 5,051 sibships after correction. The relationship distribution in TCGS families is shown in Figure S6.

The TCGS project has used the ICD-11 classification to create a comprehensive and standard system for diagnosing health conditions. A total of 24 ICD-11 chapters were used to identify 873 specific health conditions in TCGS participants [7], as detailed in Supplementary file 2. To examine the most commonly reported mortality and morbidity statistics, we focused on the top five ICD-11 groups presented in Table S6. The findings indicate that a significant number of participants in the study had diabetes mellitus in pregnancy, goiter, and hypothyroidism as their top health conditions. These findings could be attributed to the study's initial focus on cardiovascular disorders, diabetes, and thyroid disease.

Risk factors and outcomes assessment

Here we present the key results from analyzing the most extensively studied traits in this population. Our goal is to identify genetic markers and their specific effects on this population, which can be used in more targeted risk management toward precision medicine. We outline the objectives of studying these traits and summarize the findings.

Blood pressureObjective

Elevated BP is the most common risk factor for cardiovascular and renal diseases and is responsible for a significant proportion of morbidity and mortality worldwide[8]. Both genetic and environmental factors influence BP. To investigate the genomic basis of BP, we conducted various analyses, including familial aggregation analysis, heritability analysis [9], family-based linkage study [10], GWAS, epistasis analysis, and estimation of PRS[11]

Key methods and data collection

The primary outcome of interest was the incidence of HTN, as well as corresponding SBP and DBP levels, across three age groups: children (1–9 years), adolescents (10–17 years), and adults ( ages 18 and older). For adults, HTN was defined as SBP ≥ 140 mmHg, DBP ≥ 90 mmHg, or if the participant was taking antihypertensive medication [13]. All participants with their average SBP and DBP during follow-up visits were included in the subsequent analysis. Missing BMI and WC were imputed.

Main results

Our research on Iranian families revealed that SBP, DBP, BMI, and WC were highly correlated in mother–offspring and sister-sister relationships, with heritability estimates of 25% and 30% for SBP and DBP, respectively, during a follow-up of 15 years. Members of index families with higher familial BMI or WC were found to have a significantly increased risk of hypertension. We observed consistent and strong AGT gene signals linked with SBP and DBP, NLGN1 gene linked with SBP and HTN, and epistasis of TNXB gene and known genetic variants linked with all BP traits. In the GWAS analysis, we identified consistent signals on the ZBED9 gene associated with HTN in the genome-wide borderline threshold after adjusting for different environmental predictors. Our finding on ZBED9 gene was confirmed by linkage analysis in an independent sample for all BP traits. Furthermore, single-locus analysis identified two missense variants in ZBED9 (rs450630) and AGT (rs4762) associated with hypertension. Interestingly, the G allele of rs450630 exhibited an antagonistic effect on hypertension, but IGENT analysis revealed significant epistasis effects for different combinations of ZBED9, AGT, and TNXB loci in the further analysis [12].

Future direction

In the upcoming research, we plan to concentrate on two areas of investigation. Firstly, we aim to study individuals with MH, commonly observed in childhood or adolescence and characterized by elevated blood pressure levels and resistance to standard treatment. Our population will be explored for mutations that enhance renal sodium reabsorption through mineralocorticoid-dependent or independent mechanisms, resulting in fluid retention and increased blood pressure. The carriers will be identified and excluded from the case group while doing the association analysis for complex hypertension.

Second, we intend to calculate the PRS for hypertension in our population. The PRS is a promising tool for predicting the risk of complex diseases. However, it has been shown that the performance of PRS may vary across different ethnic populations due to differences in the underlying genetic architecture of these populations. Therefore, we aim to identify the associated genetic variants and their effects on hypertension in our study population. Then, we will calculate the PRS for each individual using a standardized approach that considers the population-specific genetic variants and their effect sizes. We will also assess the performance of the PRS in our study population and compare it to other populations to evaluate the generalizability of the PRS across different ethnicities. This finding will allow us to determine the clinical utility of the PRS for hypertension prediction in our population and potentially other populations with similar genetic backgrounds. Calculating the PRS for hypertension will provide valuable insights into the genetic basis of hypertension and facilitate the development of personalized prevention and treatment strategies for this complex disease.

DiabetesObjective

T2D is a serious and widespread disease associated with increased mortality and the development of CVD [13] on a global scale. Given the prevalence of T2D in the Iranian population, our study seeks to examine the genetic architecture underlying the disease. Specifically, we aim to identify and characterize genetic variants associated with an increased risk of T2D in the Iranian population and to understand better the mechanisms underlying this disease in this specific population. By elucidating the genetic underpinnings of T2D in the Iranian population, we hope to develop more targeted and effective interventions for preventing and treating this debilitating disease. Our study may also contribute to a broader understanding of the genetic basis of T2D and inform the development of clinical strategies for managing this disease in other populations.

Key methods and data collection

As an initial step in finding the genetic architecture of T2D in Iranians, we evaluated the segregation, aggregation, and family-based heritability among TCGS participants. Additionally, we calculated a restricted weighted PRS (p ≤ 5e-8) for TCGS participants and assessed its association with T2D incidence.

Main results

Familial aggregation and heritability of T2D were estimated by utilizing the TCGS family structure [14]. We constructed 2,594 constituent pedigrees based on 13,741 individuals aged over 20 (mean ± sd: 39.71 ± 16.56 years), where the familial aggregation of T2D was found to be significantly significant (p < 0.05), and family-based heritability indicated that genetic variation accounted for 65% of T2D development and expression (SE = 0.034). Complex segregation analysis showed that the polygenic model was a good fit for illustrating the mode of inheritance of T2D among the TCGS participants. The risk of parental effect was higher than that of siblings within first-degree relatives (OR = 4.11 vs. OR = 1.65), and family history of T2D among first-degree relatives was more significant than the second-degree relatives (OR = 3.84 vs. OR = 0.59).

Later, as a first step toward predicting T2D development using a person-specific genetic profile, Moazzam-Jazi et al. identified multiple T2D-associated SNPs that were significantly enriched in the TCGS cohort compared to the global population [15]. These results could partly explain the differences in drug response and subsequent treatment efficiency among cases with diverse ancestries. The cumulative effect of enriched risk SNPs was assessed by computing the PRS for adults aged 20 years or older. A significant association was found between the PRS and T2D incidence in the TCGS cohort. Hence, the high genetic burden of T2D across the Iranian population could contribute to the enhanced prevalence of the disease in this population. They also demonstrated a high hazard of T2D development in the genetically high-risk individuals compared to the genetically low-risk individuals in the model adjusted for age, sex, BMI, and other biochemical T2D risk factors [15].

Future direction

Shortly, we intend to focus on the genetic characterization of different types of diabetes, including T1D, T2D, and MODY. It is essential to differentiate between these three types of diabetes for effective treatment and management at different healthcare system levels. For example, sulfonylureas, which stimulate insulin secretion, can be effective for treating T2D and MODY 3 subtypes but may be less effective or even harmful in patients with T1D due to the absence of functional beta cells. Therefore, accurate diagnosis and classification of diabetes are critical steps toward the personalized treatment and management of this disease. It can be feasible by characterizing the key and novel genetic variants in diabetic patients in the Iranian population.

Furthermore, making the PRS based on associated genetic variants and their weight in the Iranian population is necessary. So far, this population has not been part of the global effort for GWASs, and we do not know which variants with what weight are best for creating a PRS for the Iranian population. As such, conducting a large-scale GWAS in the Iranian population is crucial to identify and characterize the genetic variants associated with T2D and other types of diabetes. By doing so, we can develop more accurate and effective PRS models for predicting the risk of diabetes in this population. Additionally, integrating genetic data with clinical and lifestyle factors can further enhance the precision of personalized diabetes management and prevention strategies. Ultimately, this may lead to better health outcomes and a reduced burden of diabetes on the Iranian healthcare system.

Lipid profileObjective

Due to the central role of lipid profile in the development of cardiometabolic disease, we performed candidate gene analyses to identify variants associated with lipids in the Iranians. We also investigated the potential of genomic prediction for lipid profile traits.

Key methods and data collection

In the TCGS population, we utilized a nested case–control approach to investigate two SNPs related to CHD incidence. In addition, we proposed a novel strategy to identify the optimal number of SNPs with the most contribution to the explanation of genomic phenotypic variation. This strategy employed a tenfold-10-repeat cross-validation method that utilized both WGR and GWAS. By implementing this approach, we aimed to enhance the computational efficiency of the making GRM in gBLUP [16]. Furthermore, we evaluated the strategy on lipid traits, including HDL-C, LDL-C, TG, and TC among TCGS participants.

Main results

Our findings demonstrated that certain genetic variations, specifically rs2048327-G (SLC22A3) and rs17465637-C (MIA3), increase the risk of CHD. The risk is around twice higher in males with the rs2048327-G allele and females with the rs17465637-C allele. We have observed that in male carriers of the rs2048327-G allele, HDL-C levels can significantly increase the likelihood of developing CHD in the future [17]. Additionally, our study has shown a significant association between risk allele of rs7865618 and CHD development in the TCGS population (p = 0.03, OR = 1.73, CI95%:1.04–2.88) [18].

In addition, we aimed to investigate whether the CETP gene polymorphisms of rs5882 and rs3764261 impact the relationship between diet and changes in serum lipid profile. For this purpose, we selected 4700 individuals aged 18 and above from the TCGS participants and assessed changes in their serum lipid profile after 3.6 year follow-up period. Our results showed that carriers of the rs3764261-A allele had a greater reduction in TC levels when consuming a higher quartile of fish intake than those with the CC genotype. Conversely, the carriers of the rs5882-G allele showed an ascending trend in TG levels across quartiles of total fat, monounsaturated, and saturated fat consumption compared to those with the AA genotype. In contrast, carriers of the rs5882-G showed a declining trend in mean changes in TG concentrations across quartiles of carbohydrate intake compared to those with the AA genotype[19].

We conducted a study to identify informative SNPs that could explain the genotypic heritability of lipid traits. These findings showed that the highest prediction accuracy in predicting these traits was achieved when considering all SNPs. In contrast, only subsets of SNPs associated with these traits, as obtained from previous GWAS, resulted in the lowest prediction accuracy. However, the subset of SNPs referred to as "truly influential SNPs" showed interesting results in capturing significant genotypic variance and contributing to heritability [16].

Additionally, Sung's two-step method [20] was employed to identify pleiotropic genetic variants that exhibited a significant association with the longitudinal data of HDL-C, LDL-C, TC, and TG. Initially, a three-level GLMM was fitted for each longitudinal trait as a response variable, followed by a simultaneous genetic association test via the GQLSM for each SNP. The results indicated that twenty variants from six genes, including C16orf95, SLC12A3, CETP, NLRC5, ESRP2, and C16orf95 genes, were strongly associated (p-value < 6.6 × 10–5) with HDL-C, TC, and TG [21].

Future direction

The TCGS population presents a valuable opportunity to study monogenic lipid disorders, as it features a high rate of consanguineous marriages and extensive familial data. This opportunity allows for exploring the genetic basis of FH and other monogenic lipid disorders and investigating underlying molecular mechanisms. The genomic study on lipids in the TCGS population may uncover potential novel targets for therapeutic interventions.

In addition, the development of PRS can aid in comprehending the intricate genetic basis of diseases and their occurrence in specific populations. PRS can identify high-risk individuals for lipid metabolism disorders, including FH, and provide personalized preventive and therapeutic strategies. Integrating PRS with monogenic lipid disorder studies may also help explain the significant lipid abnormalities in the Iranian population. The studies could provide valuable insights into the Iranian population's genetic and environmental determinants of lipid metabolism and cardiovascular disease.

ObesityObjective

The absence of studies on the genetic factors contributing to obesity in the Iranian population is a significant limitation for advancing personalized medicine. Therefore, to address this knowledge gap, our objective was to identify genetic variants associated with obesity and related traits, including WC, WHR, TG, TC, LDL-C, and HDL-C, and to assess their aggregated effects on the incidence of obesity among Iranians.

Key methods and data collection

After implementing quality control measures on the data, we utilized various regression tests to perform association analyses. All models were adjusted for the relevant covariates, such as age and sex. For PRS calculation, we used weighted PRS. We applied a false discovery rate (FDR) correction at the 5% significance level to account for multiple testing.

Main results

We conducted a family-based linkage and linkage disequilibrium analysis of 3,109 pedigrees in the first comprehensive study on Iranian pedigrees. Our results showed that RPGRIP1L is the key gene within the 16q12.2 region, and its polymorphisms could be associated with obesity risk factors among TCGS participants [22]. Moreover, we found that different SNP clusters composed of rare and common SNPs within the 16q12.2 region significantly increased BMI among Iranians. These clusters were randomly distributed across the region, with a higher density around FTO, AIKTIP, and MMP2 genes[23].

In a separate study, we discovered that nine correlated SNPs located upstream of the PPARG gene are significantly involved in the occurrence of long-term and persistent obesity [24]. Four SNPs in the MC4R gene are also significantly associated with the percentage of excess weight loss (EWL%) and BMI weight loss (EBMIL%), particularly after bariatric surgery lasting 12 months [25]. Moreover, rs13107325 was significantly associated with the increased likelihood of persistent metabolically healthy obesity in menopaused women [26].

FTO is represented as one of the central genes involved in obesity and its corresponding traits. Several FTO variants, including rs1421085, rs1558902, rs1121980, and rs8050136) were significantly associated with the MUO phenotype even after adjusting for lipid profile. However, no significant association was observed between those SNPs and metabolically healthy obesity [27]. Another study investigated the interaction between dietary patterns and FTO polymorphisms regarding changes in BMI and WC over 3⋅6 year follow-up period [28]. Six common SNPs (rs1421085, rs1121980, rs17817449, rs8050136, rs9939973, and rs3751812) within the FTO gene region were examined. The study revealed that individuals with the risk alleles and higher WDP scores had nearly a two-fold higher BMI than those without the risk alleles.

In individuals with a higher PRS, BMI, and WC tend to increase along with increasing WDP score [29]. This finding suggests that individuals with a genetic predisposition to obesity are more susceptible to the detrimental effects of an unhealthy diet and emphasizes the importance of reducing the consumption of unhealthy foods to prevent obesity. Additionally, WC increased with increasing WDP score in carriers of the risk alleles of rs1121980 and rs3751812 but not those without any risk alleles. A higher intake of TFAs in adults carrying the FTO rs8050136 risk allele was also found to significantly increase BMI and WC over an average follow-up of 3.6 years [29]. However, no significant interaction was found between combined FTO variants (rs1121980, rs14211085, and rs8050136) and dietary diversity score concerning general obesity, indicating that dietary diversity patterns may play a mediatory role in the presentation of obesity-related factors [30].

A healthy dietary pattern could modify the impact of MC4R rs17782313 on general obesity. The interaction between the risk allele of rs17782313 and a higher healthy dietary pattern score results in a lower risk of prevalent obesity than those without the risk allele [31]. In a recent study, we identified that eight common SNPs in or near the MC4R gene are significantly associated with increased BMI, WC, and WHR over a lifetime. Interestingly, they found that the aggregated effect of these SNPs significantly influences increased BMI and WC only in early adulthood, not during the middle or early adulthood stages. Therefore, the effect of MC4R risk SNP may not remain constant during the lifetime [32].

Future direction

The etiology of monogenic obesity differs substantially from the more prevalent form of obesity. Our initial approach involves screening known and suggested mutations associated with monogenic obesity to advance our understanding of this complex trait. In the further step, we aim to conduct a GWAS that encompasses various subtypes of obesity, such as normal weight obese, metabolically obese normal weight, metabolically healthy obese, and metabolically unhealthy obese. By doing so, we hope to identify potential genetic markers associated with the various forms of obesity and gain insight into the underlying mechanisms driving this condition.

Metabolic syndromeObjective

Metabolic syndrome (MetS) is a complex disease characterized by metabolic disorders such as abdominal obesity, dyslipidemia, hyperglycemia, and HTN. The development of MetS is primarily influenced by environmental factors such as an inappropriate diet and physical inactivity [33, 34]. Also, genetic factors play a role, as suggested by familial aggregation and heritability studies [35, 36]. Researchers have identified several locations associated with an increased risk of this syndrome [37]. By examining the interaction between these genetic variants and dietary factors associated with MetS, new strategies for preventing and treating MetS can be developed. We investigated the role of genetic variations, advanced statistical models, and gene-nutrient interactions in the risk of MetS.

Key methods and data collection

The first study involved a retrospective cohort study of 5,666 participants from the TCGS. The aim was to examine the relationship between MetS and its components with three GCKR polymorphisms (rs780093, rs780094, and rs1260326) using linear and logistic regression analyses in an additive genetic model. Moreover, the Cox regression analysis was performed to evaluate the association of these variants with the development of MetS over time [38].

Furthermore, several studies with smaller sample sizes have investigated gene-diet interactions in TCGS participants to predict the risk of MetS. One candidate gene study analyzed GCKR gene variants and clinical and demographic information on 4,756 eligible TCGS participants to develop an optimal prediction model (s) for MetS. Then, predictive models were compared (logistic regression (LR), Random Forest (RF), decision tree (DT), support vector machines (SVM), and discriminant analyses) [42]. Continuing to find optimal statistical models for predicting MetS, we designed a study that evaluated the association of MetS and three genes, namely BUD13, ZPR1, and APOA5, with 18 SNPs in 5,421 TCGS participants. This study was a cross-sectional study that employed two models for data analysis. The first model examined the association between variants and MetS, while the second model (HTG-MetS) evaluated the associations between genetic variants and MetS patients with high plasma TG levels. Four-gamete rules were also used to form SNP sets from correlated SNPs. The kernel machine regression models and single SNP regression were employed to estimate the association between SNP sets and MetS [43]. Two studies subsequently examined the association of CETP gene polymorphisms (rs5882 and rs3764261) in 441 MetS cases and 844 matched controls and TCF7L2 gene variants (rs7903146 and rs12255372) in 1,423 individuals with dietary intakes to predict the risk of MetS. [39, 40].

Main results

The study showed that functional GCKR variants were associated with higher TG and lower fasting blood sugar (FBS) levels. Moreover, the results of Cox-adjusted model regression revealed that carriers of rs780094, rs780093, and rs1260326 TT genotypes had a higher risk of MetS incidence [38]. The logistic regression model showed a significant association of MetS with age, gender, schooling years, BMI, and physical activity, rs780094 and rs780093. Random Forest analysis revealed that BMI, physical activity, and age are the most influential model features. Decision tree analysis showed that individuals with BMI < 24 and physical activity < 8.8 had a lower risk of developing MetS [41]. In another study, the kernel machine analysis showed that two sets of over three sets of correlated SNPs have a significant joint effect on MetS and HTG MetS models. Moreover, a single SNP regression analysis indicated that the highest OR in the HTG MetS model was for the G allele in rs2266788 (MetS: OR = 1.3, HTG MetS: OR = 1.4) and the T allele in rs651821(MetS: OR = 1.3, HTG MetS: OR = 1.4). Although both models had the same ORs, the p-values in the HTG MetS model were marginally more significant [42].

Another study investigated the potential relationship between a specific genetic variant (rs5882) and dietary macronutrient intake concerning metabolic syndrome (MetS) risk. The results indicate that this genetic variant does not interact with macronutrient intake concerning MetS risk. However, the study found that individuals carrying the G allele and consuming monounsaturated fatty acids and total fat in the lowest quartile had a reduced risk of low HDL-C. Conversely, those carrying the G allele and consuming higher levels of trans-fatty acids had an increased risk of high blood pressure [39]. We also observed that consuming nuts in the highest tertile was associated with a reduced risk of MetS among T allele carriers of rs12255372, resulting in a 34% reduction of MetS risk [40].

Future direction

Metabolic syndrome is a complex condition characterized by three or more risk factors out of five. Each risk factor has unique genetic variations. Therefore, it may be beneficial to cluster individuals based on their combination of risk factors and track them longitudinally. We can identify individuals with similar genetic profiles and risk factor combinations by doing so. This information can then be used to perform genetic analyses specific to each group, allowing for more targeted and personalized treatment options. This approach may improve the effectiveness of treatments by considering the specific genetic and metabolic factors present in each individual.

留言 (0)

沒有登入
gif