Evolutionary divergence of motifs in B-class MADS-box proteins of seed plants

Identification of B class gene sequences

We retrieved 97 known B protein sequences (Table 1) using proteins from A. thaliana (AT3G54340 and AT5G20240) and Oryza sativa (OsMADS2, 4, and 16) as query sequences [1, 3, 6, 18,19,20] in a Basic Local Alignment Search Tool (BLAST) search [20]. Subsequently, the retrieved sequences were entered into Simple Modular Architecture Research Tool (SMART) to confirm they have MADS-box domains [22]. Sequence alignment of the MADS domains was displayed in Additional file 2: Fig. S1.

Motif identification in B class genes

We used the MEME tool [3] to identify conserved sequence motifs in 97 B protein sequences in this study (Fig. 1 and Additional file 1). A total of 10 conserved motifs were identified in the AP3/PI proteins among 21 plant species (Fig. 1). The detailed amino acid sites conservation profiles are shown in Additional file 2: Fig. S2. Motif 1 (consensus sequence IEIKRIENPTNRQVTYSKRRNGIFKKAHELTVLCDAKVSLIMFSS) and motif 9 (consensus sequence KAAELTVLCDAKVSLIMFSST) overlap with the MADS domain (M domain). In the B proteins of Nelumbo nucifera (ABE11602, ADD25193, ADD25194, and ADD25195), Glycine max (GmMADS121, GmMADS133, GmMADS147, and GmMADS175), and some of the B proteins in Amborella trichopoda (LOC18429933 and LOC18424280), motif 1 was replaced by motif 9 (Fig. 1). However, as shown in Fig. 1, some A. trichopoda B protein sequences have motif 1 (LOC18436882, and LOC18448591). The M domains of gymnosperm B proteins belong to the motif 1 group (Fig. 1).

Fig. 1figure1

MEME search results regarding the protein motifs in AP3/PI. The protein motifs of the B proteins in 97 sequences were separately obtained using the MEME motif search tool for each group. Ten motifs were identified, each of which is represented as a colored box. Asterisks and red letters: motif 9. Gymnosperms (numbers 1–3) and angiosperms (numbers 4–97): Basal angiosperm (Amborella trichopoda: 4–7), monocots (Oryza sativa: 8–10, Zea mays: 11–12, Phalaenopsis aphrodite: 13–17 and Musa acuminata: 18–21), and magnoliopsida and eudicots (22–97)

Gymnosperm B proteins (GbMADS4, GbMADS9, and GGM2) lack motif 4, which overlaps with the intervening domain (I domain), consists of approximately 30 amino acids and is a less-conserved region involved in protein dimerization [10].

Motifs 2, 3, 5, 7, and 10 were found in the keratin-like domain (K-domain). The K domain of AP3/PI consists of approximately 70 amino acids that are divided into K1, K2, and K3 subdomains [13]. K1 and K2 are required for dimer formation, whereas K3 may participate in multimerization [13, 23]. We found motif 2 and motif 10 in the K1 subdomain; motif 3 was found in K2; and motif 5, with the consensus sequence KYHVIKTQTDTCKKKVRNLEE, or its alternative motif 7, with the consensus sequence QMEYWKMMKRNDKMLEDENKQLTF, were found in K3 (Figs. 1 and Additional file 2: Fig. S2).

Motif 10 of Malus domestica (MdMADS65, MdMADS99, MdMADS121, MdMADS127, and MdMADS151) and Populus trichocarpa (PtMADS30, PtMADS38, and PtMADS45) replaced motif 2 in the K1 domain of other angiosperms. Motifs 6 and 8 were found in the C-terminal domain (C domain) (Fig. 1).

Phylogeny analyses of plant AP3/PI genes

To investigate the phylogenetic relationship among these 97 AP3/PI sequences (Additional file 1), a Bayesian phylogeny was reconstructed (Fig. 2). Overall, the determined phylogeny was consistent with the species tree, indicating the phylogeny was reliable. As shown in Fig. 2, the target genes have divided into two major groups, representing AP3 and PI, respectively. Those MADS protein sequences containing motif 9 (highlighted in red) were identified within both AP3 group (LOC18424280, ABE11602, ADD25194, GmMADS121, GmMADS133, GmMADS147, and GmMADS175) and the PI group (ADD25195 and LOC18429933), indicating motif 9 may have evolved independently within the two separate lineages. The position of Amborella MADS genes (gene IDs starts with LOC) was consistent with this species as an evolutionary intermediate between lower plants and core eudicot plants. Noteworthy, Amborella MADS genes LOC18448591 and LOC18436882 (highlighted in green) containing motif 1 were identified both in AP3 and PI groups (Fig. 2).

Fig. 2figure2

Phylogenetic tree of the plant AP3/PI genes. The phylogeny was reconstructed using Bayesian approach. Bayesian posterior probability was annotated above each branch in the phylogenetic tree. M domain: motif 1(black and green; green: A. trichopoda: LOC18448591 and LOC18436882) and motif 9 (red): LOC18424280, LOC18429933, GmMADS121, GmMADS133, GmMADS147, GmMADS175, ABE11602, ADD25193, ADD25194, and ADD25195

Protein structural modelling of MADS domain

To investigate the potential impact of the substitution between motif 1 and motif 9, protein structural modelling was performed for plant MADS domains using AT5G20240 as a reference. The previously well-determined structure of human myocyte enhancer factor-2 (MEF2, PDB ID: 1TQE, MADS-box superfamily) in complex with DNA and its interacting protein (Fig. 3A) was used as a template for homology (~ 48% aa identity) structural modelling. Structural superimposition (Fig. 3B) showed the MADS domain of AT5G20240 was well-conserved in comparison with MEF2. Based on 3D structural superimposition with MEF2, the spatial positions of motif 1 and motif 9 of AT5G20240 in reference to binding DNA were displayed (Fig. 3C). A total of 10 amino acids in AT5G20240 were identified as potential DNA-binding sites (shown in sticks in Fig. 3C, Additional file 2: Fig. S3), six of which were located in motif 1 versus only one in motif 9, suggesting motif 1, instead of motif 9, may be responsible for DNA-binding.

Fig. 3figure3

Protein structural modelling of plant MADS domain (AT5G20240). A Overall structure (PDB: 1TQE) of human MEF2 (dimers) in complex with DNA and interacting protein (green color). B Superimposition of modelled AT5G20240 (yellow) with MEF2 (deep salmon). C Displays the spatial locations of motif 1 (cyan & red) and motif 9 (red), with DNA-binding residues shown in sticks. D Displays the identified dimerization residues in sticks. E Displays potential tetramerization residues in sticks. F Displays residues (sticks) involved in protein–protein interaction. Amino acid numbering is according to the M domain of AT5G20240

As indicated in the MEF2 structure, the MADS domain binds DNA in the dimer form. Thus, amino acid sites for dimerization were also identified for AT5G20240 (Fig. 3D). A total of 52 residues out of the 89 MADS domain amino acids were identified for dimerization interaction (Fig. 3D, Additional file 2: Fig. S3), which cover the majority of motif 9 residues and also a significant number of residues in motif 1. Motif 1, which overlaps with motif 9, was intertwined for the MADS dimers to form a dimer complex (Fig. 3D & E), suggesting that both motif 1 and motif 9 were critical for the dimerization interaction. In addition, residues for protein–protein interaction (Fig. 3E) and potential tetramerization (Fig. 3F) were also identified and displayed. None of these interactions involved residues from motif 1 and motif 9.

留言 (0)

沒有登入
gif