Colorectal cancer risk susceptibility loci in a Swedish population

1 INTRODUCTION

Colorectal cancer (CRC) is caused by inherited genetic and environmental risk factors. A small proportion is caused by germline mutations in high-penetrance genes causing rare syndromes, such as the Lynch's syndrome and the Familial adenomatosis syndrome, which constitute 3%–5% of all CRC.1 The vast majority of CRC is considered a complex disease and the genetic contribution consists of numerous low-risk genetic variants.1 Numerous genome-wide association study (GWAS) have been undertaken to reveal genetic loci associated with genetic risk of CRC.2-9, 11-23 In total, close to 100 independent loci have been published.2 Several CRC GWAS to date have attempted ancestry-specific GWAS, such as GWAS for Asian, African, or European-descent populations, to mitigate potential confounding due to population stratification. Still, we hypothesized that studying an even more homogenous population, such as the Swedish, would improve the possibility to find novel risk loci.

We have previously used the Swedish population to search for cancer risk loci and could show higher odds ratios compared with previous GWAS.24-27 We then used haplotype analysis as well as single nucleotide polymorphism (SNP) analysis and it was clear that when a founder haplotype was associated with risk, the OR was often higher than any SNP at that locus. We concluded that a haplotype could be better to define individuals at risk compared with SNPs. Moreover, haplotype analysis did suggest that, sometimes, more than one risk factor could be involved at the same risk locus to influence risk.24-27

To search for Swedish CRC risk haplotypes, samples from the Swedish Low-risk Colorectal Cancer Study were used.28 The samples were included and genotyped within the CORECT consortium.22 A second Swedish study, also included and genotyped in the CORECT consortium, could be used to validate the findings.29, 30

2 MATERIALS AND METHODS 2.1 Cases and controls for the discovery GWAS

To be able to perform a GWAS on Swedish CRC samples, a collaboration within the middle of Sweden was established to recruit new CRC cases. Samples were collected 2004–2009, and genotyped 2014 within the collaboration with the American consortium CORECT, http://epi.grants.cancer.gov/gameon/. In total, a cohort of more than 3300 unrelated CRC patients operated on for CRC in 14 hospitals in and around Stockholm and Uppsala between 2004 and 2009, were included in the Swedish Colorectal Cancer Low-risk study.28 All patients gave informed consent and blood for genetic studies. A detailed family history was obtained at interview in each recruited case. Cancer in first- and second-degree relatives as well as cousins were recorded, and pedigrees for the families of the index-person (the patient) were constructed. All CRC diagnoses in family members were verified using medical records or death certificates. Other diagnoses were coded as stated by the index case. Cases with no relative diagnosed with CRC were considered sporadic. Familial CRC was defined as cases with at least one relative with CRC in the family as defined above. All patients, where relatives were at increased risk because of the family history, were offered genetic counseling. Sex, age, and tumor location of the index-patients were recorded based on the medical records. All tumors underwent evaluation directly after surgery by a local pathologist. As controls were used blood donors from the same region and spouses to CRC cases in the study. No information on age was available for the blood donors. Spouses were selected to be healthy and without a family history of cancer. Samples were not matched for sex or age. The vast majority of cases and controls were of Swedish origin, estimated by their typical Swedish names. All patients gave written informed consent in accordance with Swedish legislation and the study was approved by the Regional research ethics committees in Stockholm 2002 (Stockholms regionala etikprövningsnämnd), and Uppsala, Dnr: 02-489 and 03-114 (Uppsalas regionala etikprövningsnämnd).

2.2 Cases and controls for the replication

The replication data set included 580 cases and 857 controls from the Swedish Mammography Cohort.29 and the Cohort of Swedish Men,30 two population-based cohorts including over 100,000 participants from central Sweden. Cases of CRC diagnosed within the cohorts, for whom DNA was available from saliva samples, were matched with controls by sex, and by year and month of birth. The study was approved by the Regional ethics committee in Stockholm.

2.3 Genotyping and quality control

All cases and controls, both from the Swedish Colorectal Cancer Low-risk study, and the replication cohort were genotyped using the OncoArray-chip.22, 23 All Swedish samples underwent a first QC in the CORECT study.22 Data from this study were retrieved for the two Swedish studies to be used in the Swedish analysis. Both cohorts underwent a separate, second QC before the analysis. Next, a logistic association study on cancer risk, using 4381 individuals (2709 cases and 1672 controls) and 344,234 markers, was conducted in the discovery analysis. Next, SNPs with <98% call rate, <5% minor allele frequency (MAF), and those inconsistent with Hardy–Weinberg equilibrium in controls, were removed and 342,359 SNPs remained. In the final step, a multidimensional scaling (MDS) analysis was conducted on all the remaining markers for the purpose of population stratification and to identify ethnic outliers among samples (Figure S1). The outliers were excluded from the data set, while the remaining were plotted in an MDS plot (Figure S1). The scaling is based on covariate 1 (C1) and covariates C2, C3, and C4 which represented the position on first, second, third, and fourth dimension, respectively. The two axes represent distances and the graph is analyzed by proximity scaling. Plotting the C1 values against C2 will give a scatter plot in which each point is an individual, and the two axes correspond to a reduced representation of the data in two dimensions, which can be useful for identifying any clustering. In our study, we chose the two axes dimensions limits as 0.04 × 0.04, which is a conservative Goodness-of-fit cut-off. All individual points outside the limits were considered outliers in our study. In total, 342,359 SNPs and 4305 individuals remained 2663 cases (1454 males and 1209 females) and 1642 controls (870 males and 772 females) to be used for analyses.

2.4 Statistical analysis

A logistic regression model was employed to examine the association between one SNP, or a haplotype, and cancer risk. Corresponding odds ratio (OR), standard errors, 95% confidence intervals and p values were calculated accordingly using PLINK v1.07.31 When running plink, the following parameters were requested: “hap-logistic” (haplotype logistic regression analysis), “hap-window 1-25” (sliding window sizes: 1 to 25), and default settings. That includes haplotypes phasing with the E-M algorithm, minor haplotype frequency of 0.01 and omnibus association test.

p value criteria for genome-wide statistical significance of SNP has been suggested as p < 5 × 10−8. What p value should be used in sliding window analysis can be discussed. It can be argued that the same, more strict, or more loose criteria should be used. Thus, we chose not to decide on what should be considered as statistical significant, and instead present the results with the lowest p-value associated with each haplotype. No adjustments were made for sex or age. Quantile–quantile (QQ) plot was done, where observed p values in all samples were compared with those expected for a null distribution (Figure S2). The QQ plot was generated in R using the qqman package. This separate Swedish association analysis used only genotyped SNPs. Imputed SNPs were not used since imputation might miss typical Swedish haplotypes.

The two Swedish studies underwent a first QC within the CORECT study and later, a separate QC, resulting in the two data sets not having exactly the same SNPs after QC. To determine what windows to use for haplotype analysis, we previously tested different window sizes and found that windows with more than 25 SNPs rarely showed positive results.24-27 Windows 1–25 were chosen for analysis.

3 RESULTS 3.1 SNP analysis

In total 2663 CRC cases and 1642 controls were used in a SNP analysis to find out if the results were similar as in previously published GWAS. The QQ plot showed slightly lower p values than expected for a null distribution (Figure S2); however, no statistically significant loci with CRC risk association was found (Figure S3, Manhattan plot). The strongest assocation observed were located in 5p13.3 (rs4867061-G, OR = 1.27, p = 5.3 × 10−6), 8q23.3 (rs6469653-G, OR = 1.22, p = 7.8 × 10−6), and 20q13.3 (rs714506-A, OR = 1.28, p = 4.1 × 10−7) (Figure 1A−C and Table S1). The SNP on 5p13.3 is not close to any of the published SNPs. The one on 8q.23.3 (pos 117631964) is, however, between the two published ones, rs1689276612 and rs1170791142.10 The SNP rs714506 (Figure 2B, black arrow) on chrom 20q13.33 is in close linkage disequilibrium with the published SNP rs49253865 (Figure 2B, RED arrow). The published SNP was a protective allele while our study defined a risk allele. Another risk allele at this locus was previously published in a study rs2738783.23 This SNP is not presented in Table 1 since our data were included in that paper.23

image Regional plot of the strongest association observed in single SNP analysis. (A) rs4867061 on 5p13.3 (OR = 1.27, p = 5.3 × 10−6); (B) rs6469653 on 8q23.3 (OR = 1.22, p = 7.8 × 10−6); and (C) rs714506 on 20q13.3 (OR = 1.28, p = 4.1 × 10−7). SNPs are plotted using LocusZoom (ref) by position on the chromosome against association with CRC (−log10P). The SNP with the strongest association is named on the plot. SNPs are color coded to reflect their LD with the strongest SNP, blue lines show the estimated recombination rates (from 1000 G European data). Genes where the exons and the direction of transcription is shown were obtained from the UCSC genome browser and are shown below the association plot. CRC, colorectal cancer; LD, linkage disequilibrium; OR, odds ratio [Color figure can be viewed at wileyonlinelibrary.com] image LD blocks showing relationship between the SNPs in the haplotypes showing the strongest association. The pair-wise r2 between the SNPs is showed with the value as well as color coded. Odd ratio and p value as reported in Table 2 are shown above. (A) Three haplotypes observed on the 2q36.1 locus. (B) The LD over the 20q13.33, the previously reported SNP is marked with an orange arrow. The SNP showing the strongest association is marked with a blue arrow. LD between these two SNPs is 0.75. Two haplotypes were noted on (C) 1q43, (D) 10q25.3, respectively. (E) Three haplotypes were observed on 15q22.31. One haplotype was observed on (F) 17p11.2, (G) 1p34.2, and (H) 3q24, respectively. (I) On 3p24.3 seven haplotypes were observed. LD, linkage disequilibrium [Color figure can be viewed at wileyonlinelibrary.com] Table 1. GWAS results for previously published autosomal CRC susceptibility loci LOCUS rsID:CHR:BP Alleles (R/O) OR-published p value-published F1 OR1 P1 F2 OR2 P2 F3 OR3 P3 PMID 1p32.3 rs12143541:1: 55247852 G/A 1.1 (1.061.13) 9.44E−10 0.15 1.26 3E04 0.15 1.52 3E05 0.146 1.21 0.004 31089142 1p31.3 rs7542665:1: 62673037 C/T 1.08 (1.05–1.11) 3.51E−08 0.71 1.05 0.286 0.71 1.23 0.0147 0.705 1.02 0.679 30529582 1q25.3 rs10911251:1: 183081194 A/C 1.10 (1.061.14) 0.0000013 0.57 1.06 0.241 0.57 1.07 0.401 0.571 1.05 0.282 23266556; 24737748 1q41 rs6691170:1: 222045446 T/G 1.06 (1.031.09) 9.55E−10 0.39 1.08 0.112 0.39 1.16 0.048 0.391 1.06 0.23 20972440 1q41 rs6687758:1: 222164948 A/G 1.09 (1.06– 1.12) 2.27E−09 0.78 0.869 0.009 0.78 0.889 0.183 0.78 0.861 0.0081 20972440 2p16.3 rs7606562:2: 48686695 T/A 1.10 (1.07– 1.14) 1.21E−08 0.66 0.999 0.987 0.66 0.944 0.464 0.666 1.01 0.803 30529582 2q11.2 rs11692435:2: 98275354 G/A 1.12 (1.071.16) 1.22E−08 0.88 1.15 0.038 0.88 0.992 0.942 0.88 1.19 0.0141 31089142 2q32.3 rs11903757:2: 192587204 T/C 0.85 (0.780.91) 1.38E−06 0.83 0.946 0.352 0.83 0.944 0.546 0.825 0.945 0.365 23266556 2q35 rs992157:2: 219154781 A/G 1.10 (1.061.13) 3.15E−08 0.6 1.05 0.294 0.6 0.941 0.424 0.604 1.07 0.132 27005424 3p14.1 rs812481:3: 66442435 C/G 1.09 (1.051.12) 2.5E−08 0.56 1.03 0.466 0.56 1.11 0.147 0.553 1.02 0.707 26151821 3q22.2 rs113569514:3: 133748789 T/C 1.10 (1.07–1.13) 2.45E−12 0.86 0.947 0.399 0.86 1.08 0.49 0.859 0.922 0.227 30529582 3q26.2 rs10936599:3: 169492101 T/C 0.93 (0.910.96) 3.37E−08 0.25 0.93 0.158 0.25 0.926 0.363 0.253 0.931 0.18 20972440 4q24 rs17035289:4: 106048291 T/C 1.1 (1.07–1.13) 2.73E−10 0.84 0.939 0.296 0.84 0.923 0.423 0.84 0.943 0.348 31089142 5q31.1 rs639933:5: 134467751 C/A 1.07 (1.051.10) 1.14E−09 0.39 1.04 0.382 0.39 1.15 0.0671 0.387 1.02 0.688 31089142 5q31.1 rs647161:5: 134499092 A/C 0.89 (0.85–0.92) 1.22E−10 0.68 1.05 0.329 0.68 1.13 0.132 0.672 1.03 0.52 23263487 6p21.33 rs3131043:6: 30758466 G/A 1.07 (1.051.1) 2.67E−08 0.39 1.01 0.912 0.39 0.946 0.512 0.394 1.02 0.726 31089142 6p21.2 rs1321311:6: 36622900 A/C 0.95 (0.920.98) 1.14E−10 0.21 0.959 0.45 0.21 0.98 0.82 0.213 0.956 0.43 22634755 6p21.1 rs6933790:6: 41672769 T/C 1.1 (1.07–1.14) 3.65E−10 0.83 0.989 0.849 0.83 0.879 0.175 0.831 1.02 0.803 31089142 6p21.1 rs4711689:6: 41692812 A/G 1.01 (0.97–1.04) 0.0000894 0.56 0.919 0.059 0.56 0.933 0.342 0.557 0.915 0.0594 26965516 6q25.3 rs7758229:6: 160840252 T/G 1.28 (1.18–1.39) 7.92E−09 0.34 0.976 0.601 0.34 1.02 0.748 0.341 0.966 0.48 21242260 7p12.3 rs3801081:7: 47511161 G/A 1.08 (1.061.11) 2E−11 0.71 1.01 0.815 0.71 0.895 0.156 0.709 1.04 0.427 31089142 8q23.3 rs16892766:8: 117630683 A/C 0.78 (0.720.84) 3.3E−18 0.89 0.801 0.003 0.89 0.846 0.164 0.893 0.793 0.0026 18372905 8q24.21 rs10505477:8: 128407443 A/G 1.17 (1.121.23) 3.16E−11 0.53 1.15 0.002 0.53 1.21 0.0102 0.524 1.14 0.0056 17618283; 17630503 8q24.21 rs6983267:8: 128413305 T/G 1.21 (1.18–1.24) 5.50E−44 0.47 0.863 0.001 0.47 0.825 0.0097 0.468 0.872 0.0034 17618283 8q24.21 rs7014346:8: 128424792 A/G 1.19 (1.151.23) 8.6E–26 0.38 1.13 0.01 0.38 1.24 0.0046 0.372 1.11 0.0271 18372901; 19011631 8q24.21 rs7837328:8: 128423127 A/G 1.17 (1.101.24) 7.44E−08 0.44 1.16 9E-04 0.44 1.24 0.0036 0.436 1.14 0.0043 21242260 9p21.3 rs1412834:9: 22110131 T/C 1.08 (1.061.11) 4.13E−14 0.56 1.06 0.203 0.56 1.12 0.126 0.552 1.04 0.347 31089142 9p24.1 rs719725:9: 6365683 A/C 1.17 (1.121.23) 3.16E−11 0.63 1.09 0.059 0.63 1.18 0.0323 0.623 1.07 0.146 17618283 10p14 rs10795668:10: 8701219 A/G 0.89 (0.860.91) 2.5E−13 0.32 0.856 0.001 0.32 0.901 0.187 0.316

留言 (0)

沒有登入
gif