Next-Generation Sequencing and Bioinformatics-Based Protocol for the Full-Length CYP2E1 Gene Polymorphism Analysis

Introduction

Recent advances in pharmacogenetics provide clinically relevant information on the previously identified associations between genetic variants and individual variability in drug response, which, in turn, offers great promise for guiding personalized drug therapy and clinical trial design.1–3 Currently, the Clinical Pharmacogenetics Implementation Consortium (CPIC) has provided 26 evidence-based, peer-reviewed and updated pharmacogenetics clinical practice guidelines covering 23 genes and 89 drugs across several therapeutic areas, and incorporated recommendations to health-care providers on the clinical use of drugs when such genotyping results are available.1 CYP2E1 (cytochrome P450 family 2 subfamily E member 1) belongs to the well-studied CYP450 enzyme superfamily, which metabolizes both endogenous substrates, such as lauric acid, steroids, and acetone with further oxidation to precursors of gluconeogenesis, as well as exogenous compounds with low molecular weight, including some drugs such as paracetamol, salicylic acid, and general anaesthetics.4–6 CYP2E1 enzyme is induced by many of its substrates, including isoniazid and ethanol, but also by various pathophysiological conditions, such as uncontrolled diabetes, obesity, starvation, and non-alcoholic liver disease.5–7 In addition, several studies have suggested that mitochondrial CYP2E1, compared to its microsomal counterpart, is a major source of alcohol- and drug-induced reactive oxygen species production, thus contributing to genotoxic and toxicological effects.8 CYP2E1 is also involved in the metabolic activation of pro-carcinogens and chemical carcinogens, which contributes to tumorigenesis.6,7,9

The ability to identify specific genetic variants associated with drug response phenotypes is extremely important, and is a key step towards the implementation of personalized medicine. Substantial variability in the CYP2E1 gene, which spans 11,761 nucleotides, exists; however, despite several epidemiological studies, there is still a very limited evidence about the functional significance of any polymorphic variants, including specific allelic or single nucleotide variants (SNVs), in the context of deviation in drug metabolism and/or treatment response (https://www.pharmgkb.org/).10 The use of a next-generation sequencing (NGS)-based methodology could significantly enhance a systematical investigation of the inter-individual genetic polymorphisms in the full-length CYP2E1 gene, and could provide the most comprehensive data on the SNVs of interest in comparison to the methods targeting only certain genetic regions. In this study, we evaluated the performance of a targeted NGS approach based on the sequencing workflow for the full-length CYP2E1 gene, including all 9 exons with interleaving introns, untranslated (UTR) and intergenic regions. This developed protocol combines the latitude and quality of the NGS data with cost-effective and relatively simple, fast and adequate technical execution to allow simultaneous analysis of several genomic regions of interest, thereby facilitating the detection of genetic variants with clinically relevant consequences.

Materials and Methods Clinical Samples

Human DNA samples (n = 3) were obtained from the national biobank Genome Database of Latvian population;11 these samples were used as a training set to check the primer performance and to optimize the PCR conditions. The effectiveness of the developed target amplification and sequencing protocol was evaluated using the test set comprising human DNA samples (n = 3) obtained from tuberculosis patients admitted to the Riga East University Hospital, Centre of Tuberculosis and Lung Diseases. Genomic DNA was extracted from the peripheral white blood cells using the standard phenol-chloroform method. The investigation followed the Helsinki Declaration; the study protocol was approved by the Central Medical Ethics committee of Latvia (approval No 01-29.1/1), the Ethical Committee of Riga East University Hospital (approval No 24-A/15), and the Ethical Committee of Riga Stradins University (approval No 105/28.01.2016.); informed consent was obtained from all participants.

Full-Length CYP2E1 Gene Amplification and NGS Assay

Seven gene-specific oligonucleotide primer pairs targeting overlapping CYP2E1 gene fragments spanning all nine CYP2E1 gene exons with interleaving introns, untranslated (UTR) and intergenic regions were designed using an online-based Primer-BLAST tool (https://www.ncbi.nlm.nih.gov/tools/primer-blast/) (Figure 1). Sequences of the designed primers and their position in the reference genome are listed in Table 1. PCR amplification of large-sized gene fragments, NGS library preparation workflow and subsequent data analysis was carried out according to the protocol by Kivrane et al with a few minor modifications.12 Briefly, a total of 20–35 ng of DNA per reaction was used; the PCR reaction was performed using All Taq PCR Core Kit (QIAGEN, Germany) following the manufacturer’s protocol for large-sized fragment amplification; the PCR amplicon size ranged from 2059 to 3820 bp (Table 1). Obtained PCR products were analyzed by 1.5% agarose gel electrophoresis. If non-specific bands were detected, the obtained PCR amplicons were pretreated using NucleoMag NGS Clean-up and Size Select (MACHEREY-NAGEL, Germany) magnetic beads. Amplicons were normalized to a final concentration of 1 ng/μL, pooled for each DNA sample separately, and inspected using a Qubit dsDNA HS Assay Kit (Life Technologies, Carlsbad, CA, USA) to estimate the resultant concentration.

Table 1 Primer Sequences for the Full-Length CYP2E1 Gene Amplification

Figure 1 Graphical representation of the PCR fragment position in the reference CYP2E1 gene. The exons are indicated by green boxes. Names of the seven primer pairs used to generate the PCR fragments are indicated.

Abbreviations: Fw, forward; Rw, reverse; K, thousand.

NGS paired-end libraries were prepared using Nextera XT DNA Library prep kit and Nextera Index kit (Illumina, CA, USA) according to the manufacturer’s instructions. Different sequencing indexes were used for each amplicon pool enabling simultaneous CYP2E1 gene sequencing for several DNA samples. NucleoMag NGS Clean-up and Size Select magnetic beads were used for the library purification and double size-select to achieve an optimal library size of 300–500 bp. Libraries underwent quality assessment using an Agilent High Sensitivity DNA Kit and Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA) following the manufacturer’s protocol, and were sequenced using the MiSeq platform (Illumina) to generate paired-end reads with a maximum length of 250 bp. The desired sequencing coverage was set to 500x, and sequencing reagents MiSeq Reagent Nano Kit v2 (500-cycles) (Illumina) were used.

Data Analysis

Obtained sequencing data analysis was performed on the Galaxy online-based platform using the public server at https://usegalaxy.org.13 Trimmomatic (v0.38) was used to trim adapter sequences and low quality read ends (Phred quality score <20). Reads were mapped against the human reference genome (GRCh38.p13, GCF 000001405.39) using Map with BWA-MEM (v0.7.17.1). BAM filter (v0.5.9) was applied to keep only mapped reads, remove PCR duplicates and discard reads shorter than 50 bp, but Mark duplicates (v2.18.2.2) was used to finalize deduplication. Samtools depth (v1.9) was used to compute the depth at each position in genomic coordinates Chr10 (NC_000010.11, positions 133527363-133539123). Alignment was converted with Samtools mpileup (v2.1.4) for SNV calling with VarScan (v2.4.2) using Chr10 (NC_000010.11, positions 133527363-133539123) as a region for pileup generation. Detected variants were filtered using the following criteria: minimum supporting reads: six (SNV must be represented on both positive and negative strands), read depth ≥10, base quality ≥20 and minimum read frequency for homozygous positions: 75%.

All detected variants were visually inspected using Integrative Genome Viewer (IGV) (v2.8.9).14 Functional annotation of the identified single nucleotide variants (SNVs) was performed using online-based wANNOVAR tool (http://wannovar.wglab.org/).15 Reference single nucleotide polymorphism (SNP) reports accumulated in dbSNP database (https://www.ncbi.nlm.nih.gov/snp/) and PharmGKB (https://www.pharmgkb.org/) were used for the identification and annotation of detected SNVs.10,16

Confirmation of Detected Variants

Randomly selected SNVs were confirmed by Sanger sequencing using either forward or reverse amplification primers (Table 1). Sequencing was performed using the BrilliantDye Terminator Cycle Sequencing kit v1.1 (NimaGen, Nijmegen, The Netherlands) according to the manufacturer’s recommendations on an ABI Prism 3100 Genetic Analyzer (Perkin-Elmer, MA, USA). The sequence analysis and SNV identification were performed using FinchTV and MEGA softwares with the sequence of human CYP2E1 gene (E.C.1.14.13.n7) (GenBank: NG_008383.1) as the reference. Basic Logical Alignment Search Tool (BLAST, https://blast.ncbi.nlm.nih.gov/Blast.cgi) was used for sequence comparison to previously published data in GenBank.

Results

Amplification of the CYP2E1 gene fragments using all primer pairs resulted in specific products within the expected fragment length for the study sample set samples (Figure 2). The 2E1_4_Fw/2E1_4_Rw primer pair generated additional shorter PCR fragments; thus, an additional amplicon pre-treatment step involving a single side size-select was introduced aimed to reduce the amount of non-specific fragments. Sequencing data quality for the study sample set (n = 3) after the applied quality filters is summarized in Table 2. The mean read base quality score was >30, thus guaranteeing high base call accuracy. For all the test samples, an approximately 500-fold coverage and the mean mapped read depth of >200 was achieved, thus a high degree of confidence was reached. In all study samples, 98–99% of the target gene sequence was obtained with at least tenfold coverage, while less than 0.1% of the target fragment has been ascertained with zero coverage. No significant differences in the coverage distribution between exons and introns were observed. The Sanger sequencing results for randomly selected variants (n = 8; Sample 30, rs12761234; rs2070676; rs8192777; Sample 32, rs12761234, rs2070676, rs28371747; Sample 33, rs8192777; rs28371747) have confirmed that (a) amplification products correspond to the target genes, and (b) the detected SNVs were consistent with the results of NGS sequencing, therefore verifying the accuracy of the developed protocol. Representative chromatograms of the Sanger sequencing results are shown in Supplementary Figure 1.

Table 2 Sequencing Data Quality for the Study Sample Set (n = 3)

Figure 2 Visualization of PCR products of the seven primer pairs amplifying full-length CYP2E1 gene for the test samples (n=3) by agarose gel electrophoresis. (A) PCR products of the primer pairs Nr. 1, 2 and 3. (B) PCR products of the primer pair Nr. 4. (C) PCR products of the primer pairs Nr. 5, 6 and 7. Names of the primer pairs used to generate the PCR fragments are indicated. M – DNA molecular weight marker GeneRuler 1 kb DNA ladder (Thermo Fisher Scientific Baltics, UAB, Lithuania); size of the three reference bands (6000, 3000 and 1000 bp) is indicated; 30, 32 and 33 – test DNA samples.

Abbreviations: Fw, forward; Rw, reverseNC, negative control.

All detected SNVs were grouped according to the location and functional classification. In total, each sample contained 24–40 SNVs; the majority of the detected SNVs were intronic (87/102), one synonymous exonic, three variants were located in a 3` untranslated region (3`UTR), five – intergenic, and two upstream SNVs (Table 3). Overall, 57 of the detected variants were dbSNP database-referenced, and one SNP (rs2249694) was included in the Obesity-related traits database17 (Supplementary Table 1). In our sample set, three allelic variants CYP2E1*6 (rs6413432, T>A), CYP2E1*1B (rs2070676, G>A/C/T) and CYP2E1*7A (rs2070673, A>T) were identified; according to the PharmGKB Clinical annotation, all these variants were associated with the impaired drug efficacy and/or toxicity (level of the evidence: 3).10

Table 3 Classification of the Identified Single Nucleotide Variants

Discussion

The study results demonstrated the successful development of the NGS-based protocol which allowed to generate sequencing data with sufficient quality and could be successfully used to detect polymorphic sites dispersed throughout the entire CYP2E1 gene with a high degree of confidence. The proposed method is not limited by the screening of specific SNVs, or sequencing of separate coding regions. The selection of the primer sets for the amplification of overlapping gene fragments appeared to be highly specific, as a minimal number of byproducts was observed, and the additional size selection step during the library preparation efficiently helped to overcome this problem. The coverage depth of the entire target region indicated in the ability to detect all possible variants of interest, which is not limited by the targeted screening of only specific SNVs, or by sequencing of the coding gene regions only. Also, no significant differences in the coverage distribution between exons and introns were observed indicating the high-level performance of the assay.

In this study, apart from a number of different SNVs, three CYP2E1 alleles, which are common in the European population, were identified among the study participants, namely CYP2E1*6, CYP2E1*1B, and CYP2E1*7A. As the number of samples in this study was very low, this result has only an indicative character pointing out a high variability of the CYP2E1 gene. Overall, the available information regarding the clinical significance of CYP2E1 gene variations is scarce. Referring to the effects of some CYP2E1 gene polymorphisms, rs2515641 allele T was associated with decreased likelihood of toxic liver disease when treated with cytarabine, fludarabine, gemtuzumab, ozogamicin and idarubicin in people with myeloid leukemia, as compared to allele C.18 Recently, it was reported that synonymous mutation rs2515641 affects CYP2E1 mRNA and protein expression and susceptibility to drug-induced liver injury.19 CYP2E1*1B CG genotype (rs2070676) showed an association with adverse drug reaction development in latent TB infection patients,20 while rs6413432 polymorphism was associated with increased progression-free survival in ovarian cancer patients receiving cisplatin-cyclophosphamide therapy.21,22

The lack of information concerning the evidence-based clinical annotations of specific CYP2E1 genetic variants, in contrast to other members of the CYP450 family such as CYP2D6, CYP2C9 and CYP2C19, indicates that current understanding of CYP2E1 genetic variation is incomplete and further studies are needed.

Besides its role in metabolism of xenobiotics including drugs, toxins and procarcinogens, CYP2E1 is also related in several diseases and pathophysiological conditions.23 Thus, this reliable full-length CYP2E1 gene sequencing approach could be useful in many study fields, especially those aimed to identify the possible association between genetic variants and corresponding phenotypes affecting treatment response.

Conclusions

In summary, the developed NGS-based sequencing protocol allows to derive a comprehensive and consolidated overview of CYP2E1 genetic diversity and inter-individual variability, which could be useful for the implementation of population-specific genotyping strategies.

Abbreviations

SNV, single nucleotide variants; NGS, next-generation sequencing; IGV, Integrative Genome Viewer.

Ethics Approval and Informed Consent

The authors state that they have obtained appropriate institutional review from the Central Medical Ethics committee of Latvia (approval No 01-29.1/1), the Ethical Committee of Riga East University Hospital (approval No 24-A/15), and the Ethical Committee of Riga Stradins University (approval No 105/28.01.2016.) for the research described. Informed consent has been obtained from the patients involved.

Acknowledgments

The authors acknowledge the laboratory personnel of Latvian Biomedical Research and Study Centre core facility for their contribution in sequencing. Additionally, we would like to thank Genome Database of the Latvian Population for providing human DNA samples used in this study.

Author Contributions

All authors made a significant contribution to the work reported, whether that is in the conception, study design, execution, acquisition of data, analysis and interpretation, or in all these areas; took part in drafting, revising or critically reviewing the article; gave final approval of the version to be published; have agreed on the journal to which the article has been submitted; and agree to be accountable for all aspects of the work.

Funding

This study was supported by FLP Program/Ministry of Education and Science, Republic of Latvia, project No lzp2020/1-0050. This funding source had no role in the design of this study, analyses, interpretation of the data, or decision to submit results.

Disclosure

The authors declare that they have no conflicts of interest in this work.

References

1. Stanford University and St. Jude Children’s Research Hospital. Clinical pharmacogenetics implementation consortium guidelines. Available from: https://cpicpgx.org/guidelines/. Accessed October31, 2022.

2. Zhou Y, Ingelman-Sundberg M, Lauschke VM. Worldwide distribution of cytochrome P450 alleles: a meta-analysis of population-scale sequencing projects. Clin Pharmacol Ther. 2017;102(4):688–700. doi:10.1002/cpt.690

3. Goh LL, Lim CW, Sim WC, Toh LX, Leong KP, Ahmad A. Analysis of genetic variation in CYP450 genes for clinical implementation. PLoS One. 2017;12(1):e0169233. doi:10.1371/journal.pone.0169233

4. Zanger UM, Schwab M. Cytochrome P450 enzymes in drug metabolism: regulation of gene expression, enzyme activities, and impact of genetic variation. Pharmacol Ther. 2013;138:103–141. doi:10.1016/j.pharmthera.2012.12.007

5. Danielson PB. The cytochrome P450 superfamily: biochemistry, evolution and drug metabolism in humans. Curr Drug Metab. 2002;3(6):561–597. doi:10.2174/1389200023337054

6. Guengerich FP. Cytochrome P450 2E1 and its roles in disease. Chem Biol Interact. 2020;322:109056. doi:10.1016/j.cbi.2020.109056

7. Ronis MJJ, Lindros KO, Ingelman-Sundberg M. The CYP2E subfamily. In: Ioannides C, Parke DV, editors. Cytochromes P450, Metabolic and Toxicological Aspects. Boca Raton: CRC Press; 1996:211–239.

8. Harjumäki R, Pridgeon CS, Ingelman-Sundberg M. CYP2E1 in alcoholic and non-alcoholic liver injury. Roles of ROS, reactive intermediates and lipid overload. Int J Mol Sci. 2021;22(15):8221. doi:10.3390/ijms22158221

9. Na HK, Lee JY. Molecular basis of alcohol-related gastric and colon cancer. Int J Mol Sci. 2017;18(6):1116. doi:10.3390/ijms18061116

10. Whirl-Carrillo M, Huddart R, Gong L, et al. An evidence-based framework for evaluating pharmacogenomics knowledge for personalized medicine. Clin Pharmacol Ther. 2021;110(3):563–572. doi:10.1002/cpt.2350

11. Rovite V, Wolff-Sagi Y, Zaharenko L, Nikitina-Zake L, Grens E, Klovins J. Genome Database of the Latvian population (LGDB): design, goals, and primary results. J Epidemiol. 2018;28(8):353–360. doi:10.2188/jea.JE20170079

12. Kivrane A, Igumnova V, Kimsis J, et al. Implementation of a next-generation sequencing-based targeted approach for full-length CYP3A4 gene sequencing. Pharmacogenomics. 2021;22(9):519–527. doi:10.2217/pgs-2020-0128

13. Jalili V, Afgan E, Gu Q, et al. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2020 update. Nucleic Acids Res. 2020;48(W1):W395–W402. doi:10.1093/nar/gkaa434

14. Robinson J, Thorvaldsdóttir H, Winckler W, et al. Integrative genomics viewer. Nat Biotechnol. 2011;29:24–26. doi:10.1038/nbt.1754

15. Yang H, Wang K. Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR. Nat Protoc. 2015;10(10):1556–1566. doi:10.1038/nprot.2015.105

16. National Center for Biotechnology Information, National Library of Medicine. Database of Single Nucleotide Polymorphisms (dbSNP). Available from: http://www.ncbi.nlm.nih.gov/SNP/. Accessed October31, 2022.

17. Comuzzie AG, Cole SA, Laston SL, et al. Novel genetic loci identified for the pathophysiology of childhood obesity in the Hispanic population. PLoS One. 2012;7(12):e51954. doi:10.1371/journal.pone.0051954

18. Iacobucci I, Lonetti A, Candoni A, et al. Profiling of drug-metabolizing enzymes/transporters in CD33+ acute myeloid leukemia patients treated with Gemtuzumab-Ozogamicin and Fludarabine, Cytarabine and Idarubicin. Pharmacogenomics J. 2013;13(4):335–341. doi:10.1038/tpj.2012.13

19. Chen K, Guo R, Wei C. Synonymous mutation rs2515641 affects CYP2E1 mRNA and protein expression and susceptibility to drug-induced liver injury. Pharmacogenomics. 2020;21(7):459–470. doi:10.2217/pgs-2019-0151

20. Yu Y, Tsao S-M, Yang W-T, et al. Association of drug metabolic enzyme genetic polymorphisms and adverse drug reactions in patients receiving rifapentine and isoniazid therapy for latent tuberculosis. Int J Environ Res Public Health. 2019;17(1):210. doi:10.3390/ijerph17010210

21. Hlaváč V, Holý P, Souček P. Pharmacogenomics to predict tumor therapy response: a focus on ATP-binding cassette transporters and cytochromes P450. J Pers Med. 2020;10(3):108. PMID: 32872162; PMCID: PMC7565825. doi:10.3390/jpm10030108

22. Khrunin A, Ivanova F, Moisseev A, et al. Pharmacogenomics of cisplatin-based chemotherapy in ovarian cancer patients of different ethnic origins. Pharmacogenomics. 2012;13:171–178. doi:10.2217/pgs.11.140

23. García-Suástegui WA, Ramos-Chávez LA, Rubio-Osornio M, et al. The role of CYP2E1 in the drug metabolism or bioactivation in the brain. Oxid Med Cell Longev. 2017;2017:4680732. doi:10.1155/2017/4680732

留言 (0)

沒有登入
gif