Species-specific identification of Pseudomonas based on 16S–23S rRNA gene internal transcribed spacer (ITS) and its combined application with next-generation sequencing

Primary and secondary structures

For this study, the complete genome sequences of 103 Pseudomonas strains from GenBank were collected and analyzed. These sequences belonged to 62 Pseudomonas species. The following is a summary of the ITS sequence characteristics of Pseudomonas. A total of 560 rrn operon sequences were collected and the number of ITSs in each strain was 3–8. In the Pseudomonas species selected, the number of operons was different (Supplemental Table 1). According to the tRNA gene (tDNA) contained in the ITS, all rrn operons can be divided into two types: (1) type N (ITS without tDNA), with a length of 310 ± 20 bp; and (2) type-IA (ITS contains tDNAIle and tDNAAla), with a length of 500 ± 50 bp. Type-IA appears in all Pseudomonas species, whereas type N appears much less often, we selected the type-IA ITS as the research sequences. In addition, according to statistics, type-IA accounts for 91.1% of the 560 ITS sequences collected. The Pseudomonas species containing N-type ITS sequences were P. putida, P. antarctica, P. entomophila, P. fulva, P. mandelii, P. plecoglossicida and P. psychrophila, and their proportions of ITS types were different.

Type-IA ITS sequences were arranged and aligned using the GeneTool Lite 1.0 software and it showed a mosaic structure. These sequences were divided into five parts: (1) the upstream sequence of tDNAIle (US), with a length of 90 ± 30 bp; (2) tDNAIle; (3) the linker sequence between tDNAIle and tDNAAla (LS), with a length of 20 ± 10 bp; (4) tDNAAla; and (5) downstream of the tDNAAla sequence (DS), with a length of 240 ± 20 bp. All the US, LS, DS and the whole ITS sequence (WS) contain C regions and V regions. The N-type sequences of 13 strains were aligned, and the C regions and V regions could also be identified from the ITS sequence.

After simulating the secondary structure of the type-IA rrn operon by RNA structure 4.2 software, the secondary structure of Pseudomonas species was found to share a common trunk. Taking P. aeruginosa, for example (Fig. 1), the secondary structure contains a variety of stem-loop structures. There are three hybridized stems respectively with the upstream of 16S rRNA gene or the downstream of 23S rRNA gene, constituting reverse complementary sequences called hybrid stems (h-stem), and two stems folded with the neighbouring sequences called inner items (i-stem). In addition, each ITS sequence contains three C regions (C1, C2, C3) and three V regions (V1, V2, V3). It corresponds to the mosaic structure obtained from the GeneTool Lite software. We determined that the diversity of the sequence was mostly in the inner stems. The ITS sequence participates in the folding of the 16S and 23S rRNA genes, indicating that the ITS is an important and indispensable structure.

Fig. 1figure 1

Secondary structures of P. aeruginosa ITS. Green frame, C1,C2,C3 block; gray frame, upstream of 16S rRNA gene and downstream of 23S rRNA gene; red frame, tRNAIle and tRNAAla; blue frame, V block; orange frame, mutation region; I – stem, inner stem; H – stem, hybrid stem

Species-specific analysis of ITS sequences

The specificity of the four Pseudomonas substructures, which are US, LS, DS and WS, were evaluated by BLAST. The Gap values of 62 strains of Pseudomonas were obtained according to the RS values difference of the lowest target bacteria and the highest non-target bacteria in BLAST results (Fig. 2).

Fig. 2figure 2

S-gap of Pseudomonas ITS. Vertical axis, S-gap value. Blue rhombic point, S-gap of US; red circular point, S-gap of LS; gray square point, S-gap of DS; green triangle point, S-gap of WS

Although five red dots representing LS reached the Gap value of 100%, the short sequence caused no difference between target bacteria and non-target bacteria in BLAST results, often resulting in the Gap value of zero. This was shown in the figure of 41 strains, including P. aeruginosa, P. denitrificans, P. entomophila, P. fluorescens, P. granadensis and P. knackmussii. LS is not suitable as a species-specific DNA marker. In addition to the red dots, it can be clearly seen from the figure that the grey dots representing DS and the green dots representing WS show a higher Gap, whereas the blue dots representing US show a lower Gap because their sequence length is only about one-third of the DS.

By analyzing the results of Gap value, the US, DS and WS of 27 Pseudomonas species showed positive results, such as P. alcaligenes, P. alcaliphila, P. asturiensis, P. balearica, P. corrugata, P. cremoricolorata, etc. The ITS sequence used as a genetic marker in these 27 species was efficient. However, the Gap value of the other 35 species was negative number, and the performance of US, DS and WS was consistent.

Therefore, 35 strains were further analyzed and their frequency graphs were generated by calculating the RS values of target bacteria and non-target bacteria at each stage. According to the frequency analysis results of these 35 species, they can be divided into four types.

The first type. P. aeruginosa is the species with the largest amount of data in NCBI; the BLAST results are more complex as well. Its specificity was discussed in three aspects: (1) a frequency diagram was made based on BLAST results (Fig. 3a–c); (2) a frequency diagram was made with BLAST results containing only genome-complete data (Fig. 3d–f); and (3) the three P. aeruginosa sequences with low RS values from BLAST results were used as target bacteria to conduct BLAST again (Fig. 3g–i). The coverage of target bacteria and non-target bacteria was crossed, but the RS value of target bacteria aggregated on the horizontal axis was significantly higher than that of non-target bacteria. Therefore, frequency analysis can reveal the species-specificity of the ITS in P. aeruginosa. In addition, this conclusion can be obtained by frequency analysis of similar types of P. putida, P. fluorescens, P. stutzeri, etc.

Fig. 3figure 3

Frequency analysis of P. aeruginosa. Vertical axis, frequency value; horizontal axis, RS value. The red line, P. aeruginosa as the target species; the blue line, non-target species. a US of P. aeruginosa based on BLAST results. b DS of P. aeruginosa. c WS of P. aeruginosa. d US of P. aeruginosa based on genome-complete data. e DS of P. aeruginosa. f WS of P. aeruginosa. g WS of P. aeruginosa strain EPa3. h WS of P. aeruginosa strain G1. i WS of P. aeruginosa strain ATCC 10145 T

The second type. The sequence of P. parafulva is few in NCBI database. In BLAST results, one of the ITS sequences showed a low RS value (0.62), making the RS of some non-target bacteria exceed it and showing a negative Gap value. In terms of this situation, the low RS sequence and the BLAST result were targeted to obtain six frequency diagrams (Fig. 4a–f). We still get good results from frequency analysis. This conclusion can be obtained from the analysis of the same type of P. mordocina, P. monteilii and P. bassicacearum.

Fig. 4figure 4

Frequency analysis of P. parafulva and P. amygdali. Vertical axis, frequency value; horizontal axis, RS value. a-f The red line, P. parafulva as the target species; the blue line, non-target species. a US of P. parafulva. b DS of P. parafulva. c WS of P. parafulva. d US of P. parafulva of lower RS values. e DS of P. parafulva of lower RS values. f WS of P. parafulva of lower RS values. g-i The red line, P. amygdali as the target species; the gray line: P. syringae; the blue line, other non-target species. g US of P. amygdali. h DS of P. amygdali. i WS of P. amygdali

The third type. From the BLAST results of P. amygdali, the GAP value of P. amygdali affected by P. syringae was shown. Instead of comparing two objects, P. amygdali, P. syringae and non-target bacteria were compared. The frequency diagram is shown in Fig. 4g–i. The RS value of P. amygdali represented by the red line aggregated on the horizontal axis was significantly higher than those of the other two groups, indicating the species-specificity of the ITS in P. amygdali.

The fourth type. P. syringae can be pathogenic to a variety of organisms, which can be divided into P. syringae pv. actinidiae, P. syringae pv. tomato, P. syringae pv. syringae and other pathogenic bacteria. According to the frequency analysis, the species-specificity of the ITS in P. syringae pv. actinidiae and P. syringae pv. tomato was demonstrated (Fig. 5a–f). However, during the analysis of P. syringae pv. syringae and P. syringae pv. maculicola, their specificity could not be accurately obtained because of interference by other pathogenic bacteria (Fig. 5g, h). In conclusion, the ITS has high specificity with P. syringae specie.

Fig. 5figure 5

Frequency analysis of P.syringae. Vertical axis, frequency value; horizontal axis, RS value. a-f The red line, the target species; the blue line, non-target species. a US of P. syringae pv. actinidiae. b DS of P. syringae pv. actinidiae. c WS of P. syringae pv. actinidiae. d US of P. syringae pv. tomato. e DS of P. syringae pv. tomato. f WS of P. syringae pv. tomato. g WS of P. syringae pv. syringae. The red line, P. syringae pv. syringae as the target species; the gray line, P. syringae pv. porri; the blue line, other non-target species. h WS of P. syringae pv. maculicola. The Red line, P. syringae pv. maculicola as the target species; the gray line, P. syringae pv. actinidiae; the blue line, other non-target species

Therefore, through frequency analysis with different ways of 35 species without GAP value, all species researched can be accurately distinguished by frequency chart. These results suggest that ITS and its subdomain can be used as DNA markers expressing species-specificity. These results this method is highly specific inter species, but not intra species.

Identification of Pseudomonas by ITS

To verify this conclusion, 200 ITS sequences from the NCBI were selected, including ITS sequences of 160 Pseudomonas strains and ITS sequences of 40 non-Pseudomonas strains but belonging to Pseudomonadales, and then randomly scrambled for a double-blind experiment (Supplemental Table 2). In this experiment, neither the experimenter nor the analyst knew which strain the sequence belonged to. The results showed that 66 ITS sequences could be directly identified by Gap, and the remaining 134 ITS sequences were identified successfully by further frequency analysis, with a success rate of 100% (Table 1).

Table 1 Results of the double-blind experimentIdentification of Pseudomonas sp. in samples

ITS sequences of the bacteria in samples were amplified, and species-level identification was performed. The ITS sequence analysis revealed the species and proportion of Pseudomonas in the samples. Abundance values of Pseudomonas were obtained from 12 samples. The three bacteria with the highest abundance in each sample and their proportions are listed in Table 2. P. putida, P. monteilii, P. koreensis, P. aeruginosa and P. fluorescens are widely distributed in water. Among them, P. putida is the most widely distributed, with the highest abundance.

Table 2 Results of next generation sequencinga

留言 (0)

沒有登入
gif