Phylogeny-guided genome mining of roseocin family lantibiotics to generate improved variants of roseocin

Phylogenetic analysis for selection of diverse RosM homologs

RosM (WP_010071701.1) installs thioether rings on two peptides (Rosα and Rosβ) that differ structurally and functionally but display synergistic antimicrobial activity as roseocin. Due to the unique natural promiscuity of RosM, it was speculated that RosM may have evolved distinctively and hence was subjected to query search in the GenBank database, which resulted in hits from a wide range of actinobacteria species and four other phyla (Additional file 1: Figure S1A). In the top 100 hits, a sequence identity ranging from 33.8–99.0% and conservation in CHG motif in the active site was observed. The majority of the hits represented actinobacteria (n = 45) and cyanobacteria (n = 29) (Additional file 1: Figure S1A). As expected, amino acid sequence identity criteria for genome mining generated an uneven distribution of hits, making it challenging to evaluate all the hits for novel BGCs. However, a Bayesian phylogenetic analysis of the obtained hits led to the phylum-wise clade formation along with the formation of subclades having BGCs of similar properties (Additional file 1: Figure S1B), thus helping in the systematic evaluation of distantly related RosM homologs. Interestingly, the RosM query search did not result in any hits from firmicutes, the only phylum with the lacticin 3147-like two-component lantibiotics discovered so far (Zhang et al. 2012). This observation indicated the independent evolution of roseocin family from lacticin family two-component lantibiotics.

To understand the features of the respective BGCs, we analyzed the genome sequences of RosM hits with BAGEL 4 (Hart and Moffat 2016) and antiSMASH 5.0 (Blin et al. 2019) webservers. Both software identified the BGC cluster boundaries including all the major genes of the BGC, but showed limitations in defining the genes encoding precursor peptides. Hence, we located the putative precursor peptide genes on the GenBank file or subjected intergenic gap regions to NCBI ORF-finder, enabling us to identify the specific precursor peptide encoding genes. As a major advantage of mining in a phylogeny-guided manner, identical lanthipeptide precursors (termed redundant hits) were easily identified in an initial analysis across the BGCs of the same subclade. For example, in BGC analysis across actinobacteria, 18 out of 22 hits from the Streptomyces genus and 10 out of 12 species from the Micromonospora genus encoded an identical precursor (Additional file 1: Figure S1B). Such redundant hits were eliminated to limit the sample size and prevent skewing the final sequence alignment. Phylogenetic branch lengths were observed as < 0.05 in LanMs of actinobacteria which corresponded to BGCs encoding identical precursors (Additional file 1: Figure S1). Hence, such RosM hits were removed from the rest of the phyla. Finally, 42 RosM homologs from five phyla (Additional file 1: Table S1) were selected for phylogenetic analysis using an appropriate outgroup for rooting. Unrooted trees, like in Additional file 1: Figure S1B, are only useful for visualization of the relatedness of sequences of different clades, while only a rooted tree provides insight into evolution. A careful selection of outgroups was followed, as suggested by Adamek et al. 2019, being neither too distant nor too close to the ingroups of the dataset under study. In a recent genome mining study (Makarova et al. 2019), archaea have been shown to contain lanthipeptide BGCs across the species of the Halorussus genus. Interestingly, these archaeal lanthipeptide BGCs are of class II type with a single CCG motif LanM and an unknown class of lanthipeptides. We selected three BGCs from the Halorussus genus in the archaea database and placed them as the outgroup to plot the maximum likelihood (ML) phylogenetic tree using a 500 bootstrap value (Fig. 2).

Fig. 2figure 2

Phylogenetic tree of 42 selected RosM homologs showed conservation of gene locus and characteristic features along the phylogenetic tree. Roseocin family constituted the BGCs from actinobacteria, having three types of BGCs’ organization (type 1–3), each forming a separate subclade. CHG motif LanM for processing of NHLP type leader sequence was found in all the BGCs except in a subclade of cyanobacteria (Synechocystis sp. PCC 7509 and C. minutus) where conservation of two types of LanMs (CHG and CCG motif) and two types of leader sequence (NHLP and N11P) were found in a single BGC. LanMs from the BGCs of Halorussus genus were placed in the root. Value from 500 replicates bootstrap test is indicated on each branch. The numbers given in the bracket are the number of members of that particular subclade. NHLP nitrile hydratase leader peptide, N11P Nif11 derived peptides

BGC analysis of each of the 42 RosM hits from the final phylogenetic tree (Additional file 1: Figure S2) showed a gradual shift in the genomic location of minimally required biosynthetic genes, lanA (lanthipeptide precursor), lanM and a bifunctional lanTp (peptidase domain-containing transporter) alongside the subclades (Fig. 2). In most BGCs, precursor genes were found upstream to lanM, which probably is the natural temporal order of their synthesis. The common observed feature in most of the precursor peptide genes was the NHLP (nitrile hydratase leader peptide) family signature (Haft et al. 2010) in their leader region and a single lanM gene of CHG-type present for their processing (Fig. 2). But more than one lanM genes carrying BGCs were also found across actinobacterial and cyanobacterial species. In cyanobacteria, these BGCs showed precursor peptides having conservation in leader regions from two divergent types of leader families, i.e. NHLP and N11P (Nif11 derived peptides) family, discussed later in detail (Fig. 4). Overall, these conservations and variations made it intriguing to study the dataset further for the conserved features of lanthipeptide evolution.

Roseocin family BGCs identified in actinobacteria

Actinobacteria showed the presence of three divergent subclades (Fig. 2), each displaying a characteristic pattern of genetic arrangement within the BGC. Initially, BGCs seemed unrelated owing to a difference in the organization of genes, sequence, and the number of lanthipeptide precursors, with some encoding more than one CHG-type LanMs (Fig. 2). However, further analysis showed that all BGCs encode precursor peptides homologous to either Rosα or Rosβ (Additional file 1: Figure S3A and B, respectively). Based on the chronological order in the phylogenetic tree, these 13 BGCs were classified as type 1–3, to represent their respective subclades (Figs. 2 and 3A). Roseocin was grouped as a member of the type 2 class, having stringent conservation of roseocin BGC features among the other mined members of the same subclade. Precursor peptides of type 2 BGCs also display features that agree with the earlier postulated structure of roseocin (Singh et al. 2020) and hence were used as a platform for designing the variants of Rosα (explained in later sections). However, precursor peptides in type 1 and 3 BGCs showed a more significant variation in the amino acids that might result in a different ring topology of these lanthipeptides (Additional file 1: Figure S3). The type 1 BGCs containing three instead of two precursor genes deviate from the usual two-component lantibiotics (Fig. 3A). Such kind of BGCs have been characterized earlier in the lacticin 3147 family (Xin et al. 2016; Zhao and Van Der Donk 2016). There, additional precursor was found to be a result of the duplication of one of the two genes. Contrary to this, we did not observe any core sequence similarity in the third precursor peptide of type 1 subclade (designated as LanA2A) to either of the other two lanthipeptides (Additional file 2), which ruled out an evolutionary gene duplication event. Phylogenetic analysis showed that LanA2A is closely related to alpha homologs (Fig. 3B). This indicates that LanA2A is either an alpha peptide that synergizes with a common beta-peptide, or it may be a constituent of a novel three-component synergistic system. As an advantage of random genome mining for lanthipeptide, many small-sized roseocin homologs were also found in the study by Walker et al. 2020 (Additional file 2). To classify such small-sized homologs into type 1–3 subclade, BGC analysis was done in the current study (Additional file 1: Figure S5) and various peculiar attributes like missing or duplicated genes, multiple LanMs, etc. were noted and hence, they could not be categorized as either of the members of type 1–3 subclade. These genes probably might be of lower significance and might have come into temporary existence to get eliminated during the natural selection for the most potential genes.

Fig. 3figure 3

Diversity among the 13 representative members of the roseocin family. A Three common types of BGCs encode roseocin homologs, type-1, type-2, and type-3 BGC examples are of S. rhizosphericus, S. roseosporus NRRL 11379 and C. methinotrophica, respectively. B Phylogenetic tree of lanthipeptide core sequences with ML method with bootstrap values of 500 replicates. The exceptional third precursor (LanA2A) of the single LanM-three precursor i.e. type-1 BGC, is phylogenetically related to alpha peptides. Colour coding in Fig. 3B is red: alpha peptides; green: beta peptides; blue: third precursor core region. C and D Variation in the core peptide sequences as a function of Shannon entropy in the roseocin alpha and beta homologs, respectively. The alpha peptides contain a S/TxxxxTxGCC motif at the N-terminal end, and beta homologs contain a GS/TxxxxS/TxGCC motif at the C-terminal end. E A gigantic Rosα homolog from the Micromonosporaceae family contains nine Cys and thirteen Ser/Thr residues that may form as many lanthionine rings. Rosα of Streptomyces roseosporus (S. roseo) contains an indispensable disulphide bond and four (methyl)lanthionine rings (dotted lines depict the proposed ring topology in Rosα, Singh et al. 2020); M. arida-Micromonospora arida. LanA precursor peptide, LanM modification enzyme, HP Hypothetical Protein, LanT dual function peptidase-domain containing transporter

The type 3 BGCs of roseocin family, consisting of two LanMs and two precursor peptides (Fig. 3A), were confined to the Micromonosporaceae family (Fig. 2). These BGCs encoded a supersized homolog of Rosα, with the highest number of thioether-forming moieties (13 Ser/Thr and 9 Cys residues) in a single precursor peptide (Fig. 3E; Rosα has 5 Ser/Thr and 6 Cys residues). Such a huge precursor peptide probably necessitates a dedicated LanM for efficient post-translational modification in parallel to the LanM for the beta peptide. High pairwise sequence identity of LanMs of the Micromonosporaceae family (~ 50%) (Additional file 1: Table S2) indicates that this separate LanM might have resulted from a recent LanM gene duplication event (Additional file 1: Figure S4), unlike the two LanMs in the lacticin 3147 family, which have low sequence identity (24–29%) and one LanM has evolved specificity for modification of only one of the two precursors (Mcclerren et al. 2006). A similar sequence identity score in the pairwise alignment of lanthipeptide leaders (Additional file 1: Table S2) is surprising and can make sense only under the coevolutionary phenomenon, a perspective discussed in detail in the following sections.

As discussed earlier, alpha peptide initiates the interaction with the bacterial membrane by targeting lipid II, a key step in the mechanism of action of two-component lantibiotics (Bakhtiary et al. 2017; Oman et al. 2011). Most of the alpha peptides characterized to date possess an Asp/Glu residue containing lipid II binding motif (CTxTxD/EC), which is absent in Rosα peptide (Singh et al. 2020). Using the knowledge generated in the current study on the diversity of the roseocin family, it seems necessary to look for a novel motif for a similar or divergent action mechanism. To understand the variability and conservation of amino acid substitutions among all the roseocin homologs, a Shannon entropy (SE) analysis was done. Lower SE value (< 2.0) indicates higher conservation of amino acid residues through evolution (Garcia-Boronat et al. 2008). A conservation of a ten amino acid long stretch, S/TxxxxTxGCC, at the N-terminus of Rosα homologs (Fig. 3C) and an 11 amino acid stretch, GS/TxxxxS/TxGCC at the C-terminus of Rosβ homologs (Fig. 3D) was observed. Both the motifs were proposed to have a structure with overlapping lanthionine rings in our earlier study (Singh et al. 2020). Such a ring structure at the N-terminus of Rosα homologs is analogous to the nisin-like peptides (having two N-terminal rings, proven to be responsible for target binding), instead of an Asp/Glu residue-specific target binding motif of the two-component lacticin 3147-family lantibiotics (Cooper et al. 2008; Bakhtiary et al. 2017). Increased SE (> 2.0) in the other amino acid sequence positions (Fig. 3C, 3D, and Additional file 1: Figure S3A, S3B) revealed the innumerable combinations experimented by nature, as is evident by the changes in the number of Ser/Thr and Cys residues of the core region among the Rosα and Rosβ homologs. Except for the stretches mentioned above, substitutions were allowed at all the amino acid positions. Further, plausible exchange of indispensable disulfide of Rosα with thioether ring in Streptomyces rhizosphaericus (Additional file 1: Figure S3A); exchangeable lanthionine (Lan) and methyllanthionine (MeLan) rings; insertion/deletion of one or more thioether rings suggests the enormous scope of modular engineering of both, Rosα and Rosβ peptides (Additional file 1: Figure S3). The presence of a conserved motif and variability in the rest of the core region probably results from balanced combinatorial chemistry, operating parallelly with the conserved motif-oriented evolution of lanthipeptides.

However, the rest of the BGCs from other phyla showed no significant core sequence conservation. The lanthipeptides of proteobacteria, chloroflexi, acidobacteria, and cyanobacteria phyla seldom have significant antimicrobial activity (Mohr et al. 2015; Cubillos-Ruiz et al. 2017; Bothwell et al. 2021). Nevertheless, we proceeded further and discovered many overlooked aspects of lanthipeptide BGCs, providing new insights into lanthipeptide evolution.

A new diversity-oriented class of lanthipeptides in cyanobacteria

Unlike significant conservation observed above in the core region of the roseocin-like lanthipeptides, diversity-oriented evolution is characterized by the generation of a vast variety of lanthipeptide core sequences with no conservation at all (Zhang et al. 2012; Cubillos-Ruiz et al. 2017). So far, prochlorosin-like BGCs are the only example which have evolved a highly promiscuous LanM (with CCG motif) for the maturation of diverse lanthipeptide sequences in marine cyanobacteria i.e. Synechococcus and Prochlorococcus (Li et al. 2010; Mukherjee and Van Der Donk 2014). Similarly, in our dataset, freshwater cyanobacterium species from Synechococcales also showed the diversity-oriented lanthipeptide BGCs, but with a novel, exquisitely divergent mechanism (Fig. 4).

Fig. 4figure 4

Novel BGCs, encoding diverse lanthipeptide core sequences, consist of two LanMs for processing two types of precursor peptides. A Two BGCs encoding NHLP and N11P family lanthipeptide leaders in their precursor peptides with the corresponding synthetases, i.e. CHG motif and CCG motif LanM, were identified in Synechococcales. B Sequence logos of NHLP family and N11P family lanthipeptide leader sequences using the precursor sequences from the above two BGCs. C Sequence alignment of cyclase domain of putative LanMs from characteristic BGC of Synechococcales showed a difference in catalytic motif. RosM like LanMs has a CHG motif, while ProcM like LanMs have a CCG motif. D Sequence identity percentage in the pairwise alignment of the 12 lanthipeptide precursors’ leader sequences (lower half) and core sequences (upper half). Diversity among lanthipeptide core sequences was high, irrespective of leader conservation. E Multiple sequence alignment of lanthipeptide core sequences depicts natural diversity. LanA precursor peptide, LanM modification enzyme, HP Hypothetical Protein, LanT dual function peptidase-domain containing transporter

In the current study, despite the expansion of hits across different phyla, obtained BGCs had CHG-type LanM, for processing NHLP-type lanthipeptide precursors (Fig. 2). NHLP family (or nitrile hydratase leader peptide; cl22942 subfamily TIGR03898) and N11P (Nif11 derived peptides; cl06756 subfamily TIGR03798) are the two well-characterized lanthipeptide leader types that have evolved from nitrile hydratase enzyme and Nif11 proteins, respectively (Haft et al. 2010). Usually, a single type of lanthipeptide leader, i.e. either of the NHLP or N11P, is observed in a BGC (Zhang et al. 2014). However, an exception was observed in the cyanobacteria (Fig. 2), which earlier were the source of prochlorosin family lanthipeptides as well (Cubillos-Ruiz et al. 2017). A non-conventional BGC with both types of lanthipeptide leader (NHLP and N11P), along with two LanMs in a single BGC, for the maturation of three and nine precursor peptides (Fig. 4A and B) was identified by a manual search of the nearby ORFs. This type of BGC was found confined to Synechococcales and included Synechocystis sp. PCC 7509 and Chamaesiphon minutus as member species (Fig. 4A). As N11P family lanthipeptides are only associated with the CCG motif LanM, we speculated one of the LanMs to be the CCG motif LanM. Surprisingly, sequence alignment showed the presence of ProcM-like CCG motif LanM in the same BGC besides a CHG motif LanM (Fig. 4C). This unprecedented example of association between two leader types and two LanM types in a single BGC indicates another evolved mechanism of diversity-oriented BGCs in cyanobacterial species that could be a better way of efficient biosynthesis of diverse lanthipeptide core sequences (Fig. 4D and E).

Three lanthipeptide precursor sequences (2 + 1) in C. minutus and nine (4 + 5) in Synechocystis sp. PCC 7509 represents an intermediate number of diverse sequences observed earlier for prochlorosin-like genes (Cubillos-Ruiz et al. 2017). A truncated gene found in C. minutus genome (Additional file 2) could result from mutations like frameshift or early stop codon, preventing the synthesis of a functional ORF. Such pseudogenes are a common feature of the prochlorosin family lanthipeptides and support the ongoing diversification of precursor genes in a diversity-oriented manner (Cubillos-Ruiz et al. 2017). In the C. minutus genome, four more distantly located N11P-type lanthipeptide precursors were found, which might also be associated with this BGC (Additional file 1: Figure S2). Intrigued by the novel mechanism of diversity generation in Synechococcales, we further analyzed the other BGCs to identify conserved features of relevance.

Coevolution of lanthipeptide leader and lanthipeptide synthetase among different phyla

Lanthipeptide precursor is derived from an assimilation of a protein tailored as a leader sequence with an independently evolving core sequence rich in Ser/Thr and Cys residues (Haft et al. 2010; Zhang et al. 2012). In a previous study by Zhang et al. 2014, the ProcM (having CCG motif) was used for genome mining, and thus obtained BGCs showed highly varying lanthipeptide leader families. However, in our study, despite the diversity among BGCs from a wider range of phyla, high conservation among the leader region of the precursor peptides was observed (Figs. 2 and 5A). The only exception was cyanobacterial species (Fig. 4), which could be unearthed by manual inspection of ORFs that otherwise would have been missed (Singh and Sareen 2014; Zhang et al. 2014). Thus, we found that all leader peptide sequences belonged to the NHLP family (Fig. 5A). In 80% of pairwise sequence alignments of leader sequences, we observed > 39% sequence identity; while only in 10% of the core sequences pairwise alignments, an identity of > 39% was observed (Fig. 5B and C). These identity scores support the fact that conservation in leader peptides does not restrict the lanthipeptide core diversification even among different phyla. Variability pattern was also plotted for all the 42 LanMs (Additional file 1: Figure S6A) which surprisingly had an overlap with the variability in leader regions (Additional file 1: Figure S6B). Earlier, the conservation of two leader family types and two different types of LanMs in a single BGC of C. minutus and Synechocystis sp. (Fig. 4) also suggested an essential linkage between the leader and lanthipeptide synthetases.

Fig. 5figure 5

Conservation of lanthipeptide leader sequence over diverse core sequences. A Multiple sequence alignment using MUSCLE of all the identified 68 precursor peptides from 42 BGCs. All leader sequences are from the NHLP family of lanthipeptide leaders. Conserved residue positions are highlighted. B Pairwise identity among 68 lanthipeptide leader sequences (lower half) and core sequences (upper half) showed high similarity among leader over core sequences (except lanthipeptide core sequences of the roseocin family having core conservation). C Cumulative frequency of pairwise sequence identity among the lanthipeptide leader and core sequences, respectively. 80% of lanthipeptide leader pairwise alignment showed > 39% identity. However, only 10% of lanthipeptide core pairwise alignment fulfilled the same criteria (mainly of the roseocin family)

To explore further, we determined the mutation rates of both the lanM and the lanthipeptide leader genetic region by calculating the dN/dS ratio, which is the ratio of the rate of nonsynonymous to synonymous mutations. The lanM and lanthipeptide leader genetic regions from 13 BGCs of actinobacteria (roseocin family) and 9 BGCs of cyanobacteria were selected for the separate analysis of two phyla. The calculated dN/dS for lanM and leader peptide exhibited distinct patterns for the two phyla (Fig. 6A). The dN/dS ratio was in agreement with the phylum-wise evolution of lanthipeptide synthetases (Zhang et al. 2012), as well as the coevolution of lanthipeptide synthetases and leader sequences. It has been proposed earlier by Cubillos-Ruiz et al. 2017, that a lower dN/dS ratio is confined only to the lanMs having the CCG motif of the prochlorosin family (or ProcMs), suggesting an evolutionary locked state that favors the catalytic promiscuity for the processing of diverse precursors. Interestingly, cyanobacterial lanMs with CHG motif (hence we proposed the name, CyanMs) found in the current study also displayed a lower dN/dS ratio, i.e. 0.24 (Fig. 7B), suggesting an evolutionary linkage between CyanMs and ProcMs. However, phylogenetic analysis showed a significant divergence of ProcMs from CyanMs (Fig. 7C) even with similar dN/dS values (Fig. 7B). This indicates that lanthipeptide synthetases of the cyanobacteria must have diverged during evolution into two subclades of CHG and CCG motif LanMs, both being locked into a similar evolutionary conserved state and probably having a similar level of substrate tolerance. The reason for such a divergence is not clear, but the significance of phylum in deciding the fate of lanthipeptide synthetases enforces the phylum-dependent effect on the evolution of lanthipeptides than proposed earlier (Cubillos-Ruiz et al. 2017).

Fig. 6figure 6

Coevolution of lanthipeptide leader and synthetase in a phylum-dependent manner. A Pairwise dN/dS distribution of lanthipeptide leader and lanM from actinobacteria (13 BGCs of roseocin family) and cyanobacteria phylum (9 BGCs) showed variation in different phyla suggesting coevolution of the lanthipeptide leader and synthetases. B Standard box plot of the evolutionary rates of CCG motif prochlorosin family LanMs (Cubillos-Ruiz et al. 2017) (Additional file 1: Table S3) and CHG motif LanMs of cyanobacteria (from the current study) showed a similar pattern (median 0.21 and 0.24, respectively), while the CHG motif lanMs of roseocin family in actinobacteria has a higher value (median 0.48). In the standard box plot, the lower and upper shows the first and third quartile values, respectively, separated by the median value. The error bar plots the minimum and maximum values. C Phylogenetic tree of CCG motif ProcMs (Cubillos-Ruiz et al. 2017) and CHG motif LanMs of cyanobacteria and actinobacteria (from current study). Prochlorosin family LanMs displayed significant divergence from CHG motif LanMs of cyanobacteria and actinobacteria

留言 (0)

沒有登入
gif