Detection of DNA replication errors and 8-oxo-dGTP-mediated mutations in E. coli by Duplex DNA Sequencing

E. coli has been a model organism for genetic analysis for many decades [1], and it is still widely used to gain insights into the fundamental mechanisms by which cells maintain genome integrity. It is also a popular expression platform for the production of recombinant proteins, some of which have found their way into therapeutics. Being such an important genetic work-horse, there needs to be a clear picture of the mutations occurring in its DNA and the mechanisms by which they occur or are prevented. Under normal circumstances, mutation rates in E. coli are low (10-9 to 10-10) [2] due to the operation of a number of pathways contributing to the integrity of the genome. Base selection, exonucleolytic proofreading and post-replicative DNA mismatch repair operate serially to ensure high intrinsic fidelity of DNA replication [3], although some errors may escape this scrutiny and will end up as mutations. DNA damage from a variety of endogenous and exogenous sources, if left unrepaired, is also expected to be a significant contributor to spontaneous mutations. Transcription, replication-transcription conflicts and movement of transposable sequences are further potential contributors to sources of mutations [4].

Several approaches have been used to estimate the level of spontaneous mutations in this organism. Most of these studies, including many from our laboratory, have used long-established methods such as fluctuation assays and clone-based sequencing that rely on the selection of detectable phenotypes [5], [6]. More recently, studies using whole-genome sequencing after multiple generations of mutation accumulation have been reported [7]. Estimation of E. coli mutation rates have also been reported using comparative genomics [8]. The frequencies and mutational spectra from these studies tend to agree at some level, but also show significant differences, as discussed in [9]. Techniques using mutant selection represent indirect measurements that require many assumptions and extrapolations [9], while mutation accumulation methods although less subject to selection issues are, instead, quite laborious and time consuming. A detection of mutations directly in DNA without the need for selection or extensive genetic manipulations may have significant advantages in this respect.

Duplex DNA Sequencing is a recently developed technique, which can detect mutations directly in isolated DNA at potentially very low frequencies [10]. The approach relies on the tagging and sequencing each of the two strands of unique DNA duplex fragments. Bona-fide mutations existing in double-stranded DNA are identified by their presence at the same position in the two complementary strands and therefore can be used readily for detecting mutations in heterogeneous populations. The technique has been used to detect mutations in mitochondria and within tumors, and mutant frequencies in the range of 10-5 to 10-7 have been reported [11], [12] which is a significant improvement over other mutation calling programs that rely entirely on the information from ssDNA, which is hampered by the high error rate (10-2 to 10-3) of next generation sequencing (NGS) [13].

In the present study, we applied the Duplex Sequencing methodology to investigate mutations occurring in the bacterium E. coli as a continuation of the multiple studies of mutagenesis in this organism by traditional methods. Specifically, we used E. coli BL21-AI, a member of the B-lineage of E. coli [14], which is 98.8% identical to the more traditionally used E. coli K-12. The B strains have been used most prominently for studies and applications of protein expression and overproduction [15]. Specifically, BL21-AI is a derivative that is used in conjunction with T7-based expression vectors to achieve more controlled expression of harmful or toxic proteins due to a tighter control of basal expression levels [14]. Tighter control is achieved by the presence of an ara-T7pol cassette in its chromosome, in which the expression of the T7 RNA polymerase gene is controlled by the tight arabinose-inducible (and glucose-repressible) PBAD promoter. DNA sequencing of the BL21-AI genome has shown the chromosomal sequence to be essentially identical to that of the parental E. coli B, except for a 4-kb insertion of the araC-T7 pol cassette (introduced from an E. coli K-12 strain by P1 transduction) [14]. As shown in this work, the mutability of BL21-AI as measured by the appearance of rifampicin-resistant mutations is low and comparable to frequencies obtained for the K-12 strains, justifying its use in the present investigation.

The DNA duplex sequencing was executed on a ~10.3-kb region of the BL21-AI genome that in addition to normal E. coli B sequences also included the genes involved in recombinant protein expression (ara operon, RNAP gene, tetB). The 10-kb region is shown in Fig. 1. We first used the wild-type BL21-AI strain to investigate any background frequency that could be detected using Duplex Sequencing. We then expanded our investigation by creating two genetic variants of the strain: (a) a mutL derivative to potentially detect DNA replication errors and (b) a mutT derivative to identify mutations resulting from oxidative stress. Both strains are well established mutators that show elevated mutation rates with a characteristic mutational fingerprint that relates to their underlying error prevention/production systems. Our results show that the Duplex Sequencing approach is indeed capable of revealing the specific classes of base-substitution mutations produced in both these strains, and that these mutations are readily detected above the background of the system. Also, we also address issues with the present background of Duplex Sequencing, which may help in further refinement of this technique and expansion of its applicability.

留言 (0)

沒有登入
gif