Human rs75776403 polymorphism links differential phenotypic and clinical outcomes to a CLEC18A p.T151M-driven multiomics

Prioritizing the CLEC18A p.T151M residue and functional predictions

CLEC18 family genes (CLEC18A, CLEC18B, and CLEC18C) genotype data from 2504 samples were queried from the 1000 Genome (1000G) Phase 3 database (https://www.internationalgenome.org/; accessed Dec 2, 2016). These 1000G individuals were categorized into five major populations of African, Admixed American, European, East Asian, and South Asian.

Missense annotation

Information (missense or not) regarding CLEC18 family gene variants was obtained from Haploreg v4.1.

Expression quantitative trait locus (cis-eQTL) annotation

Tissue-specific cis-eQTLs of the CLEC18A, CLEC18B, and CLEC18C genes were queried from the Genotype-Tissue Expression (GTEx) Portal v8 (http://www.gtexportal.org/home/; accessed Aug 27, 2019).

Prediction of deleteriousness

Computational predictions of the impacts of missense variants were built based on the biochemical properties of amino acid substitutions. To evaluate the deleteriousness to protein function and the structure of amino acid substitutions, we adopted three in silico prediction tools including Sort Intolerant From Tolerant (SIFT; https://sift.bii.a-star.edu.sg/; accessed Oct 2019) [5], Polymorphism Phenotyping v2 (PolyPhen-2; http://genetics.bwh.harvard.edu/pph2/; accessed Oct 2019) [6], and Combined Annotation Dependent Depletion (CADD; https://cadd.gs.washington.edu/snv/; accessed Aug 2021) [7, 8]. The SIFT score ranges 0 ~ 1. The cutoff score for SIFT is 0.05. A substitution is predicted to be tolerated (or deleterious) with a SIFT score of > 0.05 (or < 0.05). PolyPhen-2 uses the same score range as SIFT, but in the opposite direction, with a score closer to 1 representing high confidence of being damaging. CADD computes scores for all potential genetic variants throughout the reference genome. The raw score is further transformed into a CADD Phred score by ranking the variants for all 8.6 billion genetic variants. CADD Phred scores range 1 ~ 99. Phred scores for ranking the top 10% of causal genetic variants are assigned as 10. The top 1% are assigned as 20, etc. With a reasonable cutoff for deleteriousness ranging 10 ~ 20, we arbitrarily defined variants with a Phred score of > 20 as deleterious and < 10 as non-deleterious.

The CLEC18A gene plot was directly adopted from the Biodalliance website (http://www.biodalliance.org/index.html; accessed Feb 7, 2021).

Structural homology modelling of the CLEC18A protein domain

We constructed a three-dimensional (3D) homology model by inputting the CAP/SCP/TAPS domain sequences of CLEC18A via the SWISS MODEL server [9]. Briefly, the algorithm identifies a “template protein”, a sequence homologue of the input sequence, and uses the template protein to build a 3D model. In this case, human glioma pathogenesis-related protein 1 (GLIPR1) was selected as the template protein for model construction and further structural analysis. Visual rendering of 3D homology models was performed using PyMol v2.4.2 (https://pymol.org/2/; accessed Oct 26, 2021).

Lipid-binding affinity test for wild-type and mutant form CLEC18A residue

DNA fragments of the CAP/SCP/TAPS domain, which encoded the wild-type (WT; p.T151) and mutant form (p.M151) of the CLEC18A residue, were amplified by a reverse-transcription polymerase chain reaction (RT-PCR) and subcloned into the pcDNA3.1-hIgG1 Fc (mut) vector to generate the WT and mutant CLEC18A.Fc fusion proteins. The FreeStyle 293 Expression System (Invitrogen, Carlsbad, CA, USA) was applied to overexpress the CLEC18A.Fc fusion proteins. The detailed procedures were described in our previous study [1].

To perform the protein-lipid overlay assay, the phosphorylated derivatives of phosphatidylinositol (PIP) strips (P23751) membranes and Sphingo strips (S23753) membranes were purchased from Thermo Fisher Scientific (Waltham, MA, USA). Light exposure of the lipid strip membranes was avoided during the entire process before detection. The lipid strip membranes were initially blocked in 3% bovine serum albumin (BSA) in phosphate-buffered saline (PBS) for 1 h (h) at room temperature. Next, the SCP domain of the WT and mutant CLEC18A.Fc fusion proteins were added at final concentrations of 500 and 50 ng/ml for 2 h of incubation at room temperature. After three washes with 3% BSA-PBST (PBS plus 0.1% Tween20), the lipid membranes were incubated with an anti-human immunoglobulin G (IgG) horseradish peroxidase (HRP) antibody (1:5000) in blocking buffer for 1 h at room temperature. After three washes with 3% BSA-PBST, the lipids were detected with enhanced chemiluminescence reagents (GERPN2235, Cytiva, Chicago, IL, USA).

Data from the Taiwan biobank (TWB)

The Taiwan Biobank (TWB), a nationwide research database in Taiwan, was launched to facilitate biomedical research and further translate work into clinical settings by incorporating genomic, environmental, and disease profiles [10].

Participants were recruited from local communities, with inclusion criteria of being aged 30 ~ 70 years, physically active, without a cancer history, and self-reported to be of Han ancestry (i.e., both parents). Clinical (and follow up) data from anthropometric measurements, biospecimen tests, physical examinations, and questionnaires were obtained during sample enrollment. Written informed consent was provided by all individuals who participated in the TWB project. Ethical approval was obtained from the Institutional Review Board (IRB) of Taipei Medical University (IRB no. N201906005), the Ethics and Governance Council of the TWB (TWBR10807-05, TWBR10906-03), and Academia Sinica (AS-IRB01-16,018).

CLEC18A p.T151M genotypic data of the TWB cohort

Genomic DNA of participants in the TWB project was harvested using a standardized protocol and subjected to genotyping using the Axiom Genome-Wide TWB v2.0 Array Plate. Stringent quality-control filters for TWB biallelic SNV and indel genotype data were applied using PLINK v1.9 [11]. Variants with heterozygous haploid genotypes were addressed by (i) checking heterozygous calls in the pseudoautosomal region of chromosome (chr) X in males; (ii) sex checking; and (iii) directly removing the variants. Subjects with ambiguous sex data were removed during sex checks. Individuals and genotypes were sequentially filtered with a call rate threshold of 98%. Variants with a minor allele frequency (MAF) of < 1% and a Hardy–Weinberg equilibrium (HWE) P value of < 10–10 were further discarded.

Next, a subset of independent variants was selected through linkage disequilibrium (LD) pruning by calculating the pairwise LD of autosomal variants (MAF > 10%) with parameters of a window size of 200 kb, a step size of 5, and a variance inflation factor threshold of 0.2. Outliers were detected and filtered according to the heterozygosity rate calculated from the pruned variant subset. Genetic relationships among individuals were estimated using Genome-wide Complex Trait Analysis (GCTA) [12], followed by calculating the principal components (PCs).

Local imputation of rs75776403 (chr16:69954569; GRCh38) was performed by first extracting biallelic single-nucleotide polymorphisms (SNPs) of around ± 2.5 Mb. Next, phasing and imputation were conducted using the Michigan Imputation Server (https://imputationserver.sph.umich.edu/index.html#!; accessed Dec 2, 2021) by choosing the 1000G Phase 3 (v5) East Asian population as the reference. Post-imputation quality filtering was applied with an imputation score of > 0.8 and MAF of > 5%.

Phenotypic (clinical) profiles of the TWB cohort

Phenotypic varieties of TWB participants were collected from test data of biological specimens (blood and urine), physical examinations, and questionnaires. The phenotype data were further processed as follows.

Quantitative traits

In total, 86 quantitative traits were included in our study: (i) 62 original traits (spanning 11 phenotype categories) available from clinical profiles of the TWB cohorts, including five anthropometric (height [abbreviated as Ht], weight [Wt], body fat [BF], waist [WC] and hip [HC] circumference), three cardiac (systolic [Sys] and diastolic [Dias] blood pressure, heartbeat speed [Heartbeat]), two gynecological (age of menopause [Menps], age of menarche [Menrc]), three habitual (alcohol intake [DRK_y], nut intake [Nut_DsYr], smoke intake [SMK_PkYr]), five hematological (red blood cell count [RBC], white blood cell count [WBC], platelet count [Plt], hemoglobin [Hb], hematocrit [Hct]), six hepatic (total bilirubin [TBil], serum albumin [Alb], aspartate aminotransferase [AST], alanine aminotransferase [ALT], gamma-glutamyl transferase [GGT], alpha fetoprotein [AFP]), six metabolic (glycated hemoglobin [HbA1c], fasting glucose [FGlu], total cholesterol [TCho], triglyceride [TG], high- [HDLc] and low-density [LDLc] lipoprotein cholesterol), four nephrotic (blood urea nitrogen [BUN], creatinine [Cr], uric acid [UA], microalbumin [mAlb]), five orthopedic (stiffness index [SI], T score [T], Z score [Z], speed of sound [SOS], broadband ultrasound attenuation [BUA]), 18 pulmonary (vital capacity [VC], tidal volume [TV], expiratory reserve volume [ERV], inspiratory reserve volume [IRV], inspiratory capacity [IC], VC-to-Ht ratio [VcHtRatio], forced VC [FVC], forced EV in 1 s [FEV1], FEV1-to-FVC ratio [Fev1FvcRatio], FEV1-to-VC ratio [Fev1VcRatio], mean maximal flow [MMF], peak expiratory flow [PEF], 25% forced expiratory flow [FEF25], 50% FEF [FEF50], 75% FEF [FEF75], FEF75-to-Ht ratio [Fef75HtRatio], extrapolated volume-to-FVC ratio [EvFvcRatio], forced inspiratory volume in 1 s-to-FVC ratio [Fiv1FvcRatio]), and 5 virological (hepatitis C virus antibody [AtHCVAb], hepatitis B surface antigen [HBsAg], hepatitis B e antigen [HBeAg], hepatitis B surface antibody [AtHBsAb], hepatitis B core antibody [AtHBcAb]).

(ii) Additional 24 traits were derived from original traits, including nine anthropometric (body-mass index [BMI], corpulence index [CI], waist-to-stature ratio [WSR], waist-to-hip ratio [WHR], body adiposity index [BAI], BMI-adjusted WHR [BMIAdjWHR], BMI-adjusted waist circumference [BMIAdjWC], BMI-adjusted hip circumference [BMIAdjHC], height-adjusted BMI [HtAdjBMI]), three cardiac (pulse pressure [Pulse], systolic-to-diastolic blood pressure ratio [SysDiasRatio], diastolic-to-pulse pressure ratio [DiasPulseRatio]), one hematological (hematocrit-to-hemoglobin ratio [HctHbRatio]), four metabolic (BMI-adjusted fasting glucose [BMIAdjFGlu], fasting glucose-to-glycated hemoglobin ratio [FGluHbA1cRatio], triglycerides-to-high density lipoprotein cholesterol ratio [TgHDLcRatio], total cholesterol-to-high density lipoprotein cholesterol ratio [TChoHDLcRatio]), three hepatic (aspartate aminotransferase-to-alanine aminotransferase ratio [AstAltRatio], gamma-glutamyl transpeptidase-to-alanine aminotransferase ratio [GgtAltRatio], alpha fetoprotein-to-transaminase ratio [AfpAstAltRatio]), and four nephrotic parameters (glomerular filtration rate based on serum creatinine (estimated using four-variable Modification of Diet in Renal Disease (MDRD) study equation and further cropped into 15 ~ 200) [eGFR], uric acid-to-creatinine ratio [UaCrRatio], blood urea nitrogen-to-creatinine ratio [BunCrRatio] and microalbumin-to-creatinine ratio [mAlbCrRatio]).

Notably, for sitting systolic/diastolic pressures and heartbeat speed, values were averaged across three measurement time points. If a quantitative trait presented a censored value due to the detection limit, we directly replaced the record with the censored value.

Binary traits

In total, 84 binary traits were included: three alimentary canal diseases (gastroesophageal reflux [GasRef], irritable bowel syndrome [IBS], peptic ulcer [PepUlc]), seven arthritis (arthritis [Arthr], adhesive capsulitis [AC], ankylosing spondylitis [AS], degenerative arthritis [DA], palindromic rheumatism [PR], psoriatic arthritis [PsA], rheumatoid arthritis [RA]), one bone disease (osteoporosis [Osteo]), 11 cancers (breast cancer [Brst], cervical cancer [Cerv], colorectal cancer [Colon], gastric cancer [Gast],liver cancer [Liv], lung cancer [Lung], nasopharyngeal cancer [Nasph], ovarian cancer [Ova], uterine cancer [Uter], prostate cancer [Prst], other cancer [OtherCA]), six cardiovascular diseases (arrhythmia [Arythm], congenital heart disease [CongHrt], coronary artery disease [CoroArt], cardiomyopathy [Crdmyo], valve heart disease [ValHrt], other heart disease [OtherHrt]), seven gynecologic (dysmenorrhea [Dys], endometriosis [Endo], ovarian cyst [OvaCys], hormone drug usage for menopause [HorDrgMenps], irregular menstruation [IrgMS], myoma [Myo], natural abortion [NatAbor]), seven habitual (coffee consumption [Cof], tea consumption [Tea], snake consumption [Snak], drink [DRK], nut [NUT], smoke [SMK], sport [Spt]), three headache-induced symptoms (affect diary life when headache [AchDiary], nausea when headache [AchNau], photophobia when headache [AchePhot]), one hepatic symptom (liver gall stone [LivGaStn]), one immune disease (type 1 diabetes mellitus [tp1DM]), seven metabolism disorders (diabetes mellitus [DM], type 2 diabetes mellitus [tp2DM], gestational diabetes mellitus [tyGDM], gout [Gout], hyperlipidemia [HypLip], hypertension [HypTens], stroke [Stroke]), two nephrotic symptoms (kidney stone [KdnStn], renal failure [RenlFail]), six nervous-system diseases (dementia [Dmntia], epilepsy [Epilps], hemicrania [Hemcra], multiple sclerosis [MS], Parkinson’s disease [Prkins], vertigo [Vrtig]), eight ophthalmic symptoms (blind [Blnd], color blind [ColBlnd], cataract [Ctrc], floaters [Flt], glaucoma [Gluc], retinal detachment [RntDe], xerophthalmia [Xrph], other eye disease [OtherEye]), six psychiatric disorders (alcoholism/drug abuse [AlcoDrgAbuse], depression [Deprs], manic depression [ManDeprs], obsessive compulsive disease [ObsComp], postpartum depression [PostDeprs], schizophrenia [Schiz]), two pulmonary syndromes (asthma [Asthm], emphysema bronchitis [EmphBrnch]), and six soreness parameters (articulus ache [Art], back waist ache [BakWst], headache [Head], neckache [Nck], sciatica [Sct], other ache [OtherAch]).

Ordinal traits

In total, 19 ordered traits were also included: one habitual (snake consumption frequency [SNKFrq]), seven ophthalmologic (cataract eye number [CtrcN], blind eye number [BlndN], color blind eye number [ColBlndN], floaters eye number [FltN], glaucoma eye number [GlucN], retinal detachment eye number [RtDeN], xerophthalmia eye number [XrphN]), seven soreness (headache frequency [HeadFrq], headache severity [HeadSvr], neck-ache frequency [NckFrq], back waist ache frequency [BakWstFrq], articulus ache frequency [ArtFrq], sciatica frequency [SctFrq], dysmenorrhea frequency [DsmFrq]), and four psychiatric condition levels (nervousness [Nerv], feeling down [Dwn], anxiety [Anxt], depression [Dprs]).

Phenome-wide association study (PheWAS) of rs75776403 in Taiwanese Hans

We conducted a phenome-wide association study (PheWAS) of rs75776403 using the TWB cohort. First, a list of genetically unrelated samples by filtering out 1st- or 2nd-order relationships was obtained using the “–unrelated” option implemented in KING v2.2.7 [13]. Next, association tests were conducted by incorporating covariates of age, squared age, sex (ignored in sex-restricted traits), and the top 20 PCs. The association model was determined according to the type of trait.

Quantitative traits

For the 86 quantitative traits, a linear regression model was adopted. The traits were normalized using the inverse normal transformation (Elfving method) before fitting the association model. Hypothesis testing was conducted using the type I (sequential) method.

Binary traits

For the 84 dichotomized traits, a logistic regression model was applied.

Ordinal traits

For the 19 ordinal traits, an ordered logistic regression model was applied using the “polr” function implemented in the MASS package. The Z statistics were calculated using the “coeftest” function implemented in the lmtest package.

To avoid false positivity, a local false discovery rate (FDR) was estimated using the “lfdr” function implemented in the qvalue package.

Validation of PheWAS results

Summary statistics regarding the associations between rs75776403 and 2173 traits from UK Biobank (UKB, ~ 456 K subjects of European ancestry) were directly adopted from the GCTA website (https://yanglab.westlake.edu.cn/software/gcta/#DataResource; accessed Dec 25, 2021) [14, 15]. Moreover, summary statistics of rs75776403 associations with 229 traits in a Japanese population were adopted from Biobank Japan (BBJ, ~ 178 K individuals) and pheweb.jp website (https://pheweb.jp/downloads; accessed Dec 27, 2021) [16].

Meta-analysis

A fixed-effect or random-effect model was applied for a trans-ethnic meta-analysis across different populations using a restricted maximum likelihood to estimate the heterogeneity variance. Knapp-Hartung adjustments were further applied for the random-effect model. Statistical tests and forest plot visualization were conducted using the meta package.

Cancer cell line encyclopedia (CCLE) multiomics

We queried multiple omics data (including genomic, transcriptomic, proteomic, metabolomic, and phosphoproteomic profiles) of the CCLE from the Dependency Map (DepMap) portal (https://depmap.org/portal/; accessed Oct 28, 2021) [17]. The data preprocessing and analytical steps of each profile are described as follows.

Genomic data

Raw genomic data (Affymetrix Genome-Wide Human SNP 6.0 Array) from CCLE samples in Affymetrix CEL format were converted to Affymetrix CHP and then binary variant call format using the affy2vcf (https://github.com/freeseek/gtc2vcf) tool. Variants (SNPs and short insertion/deletions (indels)) were normalized according to the human genome reference assembly 38 (GRCh38). We next extracted only biallelic SNPs that satisfied the following criteria: (i) autosomal or chrX; (ii) call rate of > 98%; (iii) MAF of > 1%; (iv) non-somatic (according to CCLE mutations (~ 1.3 M somatic calls across 1741 cell lines) from whole-exome sequencing profiles); and (v) showing LD of r2 > 0.4 to at least one of the nearby 50 SNPs (within 1000 Kb). After excluding cell lines that failed the heterogeneity test, these putative “germline” variants were next subjected to genotype imputation through the Michigan Imputation Server by using 1000G Phase 3 v5 as a reference. Finally, variants with a post-imputation info score of > 0.3 and a MAF of > 5% were selected for association analyses.

Transcriptomic and proteomic data

For normalized CCLE transcriptomic (mRNA sequencing) and proteomic (quantitative multiplexed proteomics profiles by mass spectrometry from the Gygi lab (https://gygi.hms.harvard.edu/index.html; accessed Dec 14, 2021) [18] data, gene symbols were first harmonized using the HGNChelper package. Next, the expression values of duplicate genes were mean-aggregated.

Metabolomic data

Data regarding CCLE metabolomes containing expression values for 228 metabolites [19] were directly downloaded from the DepMap website.

Phosphoproteomic data

Data on phosphorylation sites (p-sites) across ~ 10 K genes (by trypsin and/or GluC digestion) were directly adopted from M. Frejno et al. [20]. Gene names were harmonized using the HGNChelper package.

Herein, we adopted the general term “molecular feature” to indicate mRNA, protein, metabolite, and p-site of multiomics. Associations between rs75776403 and molecular features of each omic profile were conducted as follows: (i) molecular features with zero variance were removed; (ii) a linear model between a molecular feature (as a dependent variable) and rs75776403 (as the independent variable) was fitted by including sex, age, histology, ethnicity, pathology and primary cancer type as covariates, with the type I (sequential) significance was assessed using an exact permutation test with the “lmp” function implemented in the lmPerm package; and (iii) a differentially expressed molecular feature was defined based on a significant threshold of P = 0.01.

Gene set and pathway enrichment analysisTranscriptome and proteome

We conducted gene set enrichment analysis (GSEA) to test for the enrichment of pathways (and/or gene sets) in expression data. A fast algorithm (i.e. Fast Gene Set Enrichment Analysis; FGSEA) implemented in the fgsea package was adopted with using weighted differential expression test statistics (i.e. − log10(P) × β; where P is the significant value and β is the regression coefficient of a molecular feature) from differentially expressed test results as input. Pathways (and/or gene sets) used for the test were compiled from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, gene ontology (GO) biological process (BP) database, and Reactome pathway database. Significant enrichment was defined as P ≤ 0.01 and the direction of enrichment was confirmed using a normalized enrichment score (NES of > 0 for positively enriched and < 0 for negatively enriched).

Robust expression enrichment was defined as follows: the weighted differential expression test statistics of each omic were first scaled. Next, pathways with scaled statistics of > 2 or < (− 2) in both expression omics were considered to be robustly enriched.

Phosphoproteome

The weighted differential expression test statistics of p-sites from trypsin and gluC were separately calculated, and then aggregated by the maximum, which takes account of the difference in site-specificity of the two digestive enzymes. The weighted statistics of each digestive enzyme and the merged weighted statistics were subjected to GSEA analysis. Reference pathways regarding human phosphosite-specific signatures were adopted from PTMsigDB v1.9.0 (https://github.com/broadinstitute/ssGSEA2.0; accessed 30 Dec 2021) [21]. After harmonizing gene symbols using the HGNChelper package, we conducted the FGSEA as described above.

Tissue-specific mRNA expression data

The mRNA-sequencing (Seq) expression profiles in transcript per million of several tissues, including the adrenal gland, brain, liver, ovary, pituitary, testis, and thyroid were queried from the Genotype-Tissue Expression (GTEx; v8) portal (https://gtexportal.org/home/datasets; accessed Nov 29, 2021). Gene symbols were harmonized using the HGNChelper package. Next, expression values of duplicated genes were aggregated by the maximum. A Spearman's rho (ρ) statistic was calculated to assess the rank-based association of pairwise genes.

留言 (0)

沒有登入
gif