Decoding the Virtual 2D Map of the Chloroplast Proteomes

The Molecular Mass of the Chloroplast Protein Ranged from 0.448 to 616.334 kDa

An extensive analysis of the chloroplast proteome, based on the fully-annotated protein sequences of 2893 species, comprising a total of 256,387 protein sequences, revealed that the molecular mass of the chloroplast plastome ranged from 0.448 to 616.334 kDa (Supplementary File 1). The ribosomal protein L16 (accession: AWK02406.1) of Cercidiphyllum japonicum (accession: MG605672.1) encoded the smallest protein (0.448 kDa). In comparison, the cell division protein (accession: AID67672.1) of Nephroselmis astigmatica (accession: KJ746600.1) was found to be the largest protein (616.334 kDa) present in the chloroplast proteome. Additional low-molecular-mass proteins found in the chloroplast proteome included the ribosomal protein S12 of Spondias bahiensis (0.478 kDa, accession: ANI86804.1), acetyl-CoA carboxylase beta subunit of Carpinus putoensis (0.713 kDa, accession: APS87155.1), NADH-plastoquinone oxidoreductase subunit 4 of Trompettia cardenasiana (0.969 kDa, accession: AMP19627.1), Ycf1 of Euryale ferox (1.120 kDa, accession: AUD56613.1), and ribosomal protein L23 of Lathyrus odoratus (1.363 kDa, accession: AIL55910.1) (Supplementary File 1). The smallest protein in the chloroplast proteome was comprised of only four amino acids, M-S-L-V (accession: MG605672.1). A few of the other low-molecular-mass proteins with short peptide sequences were M-L-S-E (ribosomal protein S12, accession: ANI86804.1), M-V-F-S-C-K (acetyl-CoA carboxylase beta subunit, accession: APS87155.1), M-C-S-K-I-K-I-F (NADH-plastoquinone oxidoreductase subunit 4, accession: AMP19627.1), M-I-L-K-Y-N-I-L-I (Ycf1, accession: AUD56613.1), and M-I-I-M-L-E-P-G-Y-S-I-P (ribosomal protein L23, accession: AIL55910.1).

A principal component analysis (PCA) of the low-molecular-mass proteins of the chloroplast proteome revealed that monocots, magnoliids, gymnosperms, and bryophytes share similar low-molecular-mass chloroplast proteins, while the low-molecular-mass proteins of eudicots, nymphaeales, pteridophytes, and algae cluster separately; indicating distinct differences in the low-molecular-mass proteins present within these two groups (Fig. 1). A Pearson correlation analysis (p < 0.05) indicated that the low-molecular-mass proteins of eudicots and nymphaeales are negatively correlated (− 0.289), while the low-molecular-mass proteins of bryophytes and algae (0.299), pteridophytes and bryophytes (0.389), bryophytes and eudicots (0.24), and nymphaeales and magnoliids (0.303) were all positively correlated (Fig. 1).

Fig. 1figure 1

Statistical analysis of low-molecular-mass proteins present in chloroplast proteomes. A Principal component analysis (PCoA) of low-molecular-mass proteins in chloroplast proteomes. Low-molecular mass proteins of algae, pteridophytes, nymphaeales, and eudicots are independent, suggesting little to no commonality. B Pearson’s correlation analysis of low-molecular-mass proteins in the plant kingdom’s chloroplast proteome of different taxonomic groups. C Heat map of Pearson’s correlation values (p < 0.05) of low-molecular-mass proteins found in the chloroplast proteome. The majority of low-molecular-mass proteins are negatively correlated with each other

The largest identified chloroplast protein (cell division protein) has a molecular mass of 616.334 kDa, and is comprised of 5242 amino acids (Supplementary File 1). Some of the other high-molecular-mass chloroplast proteins were hypothetical chloroplast RF21 (575.771 kDa, accession: AWH11312.1), cell division protein (487.534 kDa, accession: ALO62775.1), hypothetical chloroplast RF1 (485.475 kDa, accession: AHZ11038.1), and Ycf1a (482.348 kDa, accession: GAQ93691.1) (Supplementary File 1). The high-molecular-mass cell division protein was only found in algal species and absent in other species. Principal component analysis of the high-molecular-mass chloroplast proteins revealed that the high-molecular-mass proteins of gymnosperms, bryophytes, magnoliids, protists, and pteridophyte clustered together, while the high-molecular-mass proteins of algae, monocots, nymphaeales, and eudicots clustered independently (Fig. 2). These data suggest commonality in the high-molecular-mass proteins in the lower eukaryotic plant taxa (gymnosperms, bryophytes, magnoliids, protists, and pteridophytes). In comparison, no commonality is present in the higher eukaryotic plant taxa (monocots, nymphaeales, and eudicots). A Pearson’s correlation (p < 0.05) analysis revealed that the high-molecular-mass proteins in the bryophytes and nymphaeales were positively correlated (0.476) with each other, while several other groups were negatively correlated (Fig. 2).

Fig. 2figure 2

Statistical analysis of high-molecular-mass proteins present in chloroplast proteomes. A Principal component analysis (PCoA) of high-molecular-mass proteins in chloroplast proteomes. High-molecular-weight proteins in the chloroplast proteome of different taxonomic groups indicate that monocots, eudicots, and algae are independent, suggesting a lack of commonality in the high-molecular mass chloroplast proteins in these taxonomic groups. B Pearson’s correlation analysis (p < 0.05) values for high-molecular-mass proteins in the chloroplast proteome of different taxonomic groups. C Heat map of the Pearson’s coefficients of high-molecular-mass proteins. A high correlation between nymphaeales and bryophytes is evident, while several others are negatively correlated

Chloroplast proteomes were found to encode a range from 3 to 370 proteins in their proteome. Pilostyles aethiopica (eudicot) contained the lowest number of chloroplast-encoded proteins, while Pinus koraiensis was found to encode the highest number (370) of chloroplast-encoded proteins. The chloroplast plastome contained an average of 88.749 chloroplast-encoded proteins with an average mass of 32.483 kDa (Fig. 3, Supplementary file 1). Some of the species with a lower number of chloroplast-encoded proteins were Monoraphidium neglectum (4), Pilostyles hamiltonii (4), Asarum minus (7), and Cytinus hypocistis (15). Similarly, some of the species encoding a higher number of chloroplast proteins were Grateloupia taiwanensis (233), Grateloupia filicina (233), Porphyridium purpureum (224), Osmundaria fimbriata (224), Lophocladia kuetzingii (221), and Kuetzingia canaliculata (218) (Supplementary file 2). All of the species encoding a high number of chloroplast proteins were algal species (Supplementary file 2). Chloroplast proteomes were found to contain an average of 25,307.87 amino acids per proteome (Supplementary file 2). The highest average protein size was found in Monoraphidium neglectum, containing an average of 1743 amino acids per chloroplast protein (Supplementary file 2). The chloroplast proteome of Grateloupia filicina encoded the highest number of amino acids with 51,662 (Supplementary file 2). Other species encoding a high number of amino acids in their chloroplast proteome were Pyropia haitanensis (50281), Porphyra purpurea (50195), Porphyra pulchra (50192), and Palmaria palmata (50141). The chloroplast proteome of Pilostyles aethiopica encodes the lowest number of amino acids with 621 (Supplementary file 2). Other species encoding a low number of amino acids in their chloroplast proteome were Pilostyles hamiltonii (911), Asarum minus (1727), and Cytinus hypocistis (2215) (Supplementary file 2). The average chloroplast protein size was only 288.9613 amino acids (Supplementary file 2). Approximately 33.22% of chloroplast proteins contain ≤100 amino acids, and 15.44% of chloroplast proteins contain ≤50 amino acids. Notably, only 4.69% of chloroplast-encoded proteins contained ≥1000 amino acids.

Fig. 3figure 3

Box and Whisker plot analysis of chloroplast proteomes. A An average number of protein sequences. B An average number of amino acids per protein. C The average molecular mass of chloroplast proteins (kDa). D Average isoelectric point, E average percentage of acidic pI proteins, and F average percentage of basic pI proteins

The Chloroplast Proteome of Grateloupia filicina Is the Heaviest (5854.794 kDa), and Pilostyles Aethiopica Is the Lightest (72.579 kDa)

Approximately 4.8% of chloroplast-encoded proteins had a molecular mass of ≥100 kDa, while 15.502% had a molecular mass ranging from 50 to 100 kDa, and 79.662% had a molecular mass ranging from 0.44 to 50 kDa. The chloroplast proteome of Grateloupia filicina was comprised of a total molecular mass of 5854.794 kDa, representing the chloroplast proteome with the greatest molecular mass (Supplementary file 3). Other species containing large molecular mass proteomes included Grateloupia taiwanensis (5636.905 kDa), Pyropia haitanensis (5636.98 kDa), Palmaria palmata (5631.679 kDa), and several other species (Supplementary file 3). The lowest molecular mass chloroplast proteome was found in Pilostyles aethiopica (72.579 kDa), followed by Pilostyles hamiltonii (106.661 kDa) and Elytrophorus spicatus (175.639 kDa) (Supplementary file 3). The average molecular mass of the chloroplast proteome was 2877.533 kDa (Supplementary file 3). The average molecular mass of the chloroplast proteomes of algae, bryophytes, eudicots, gymnosperms, magnoliids, monocots, nymphaeales, protists, and pteridophytes was 3805.064, 2562.121, 2921.544, 2624.771, 2808.423, 2467.242, 2993.64, 2652.881, and 2873.399 kDa, respectively (Supplementary file 3). The average molecular mass of chloroplast proteomes in descending order occurred in the algae (3805.064 kDa) > nymphaeales (2993.64 kDa) > eudicots (2921.544 kDa) > pteridophytes (2873.399) > magnoliids (2808.4232 kDa) > protists (2652.88 kDa) > gymnosperms (2624.77 kDa) > bryophytes (2562.1211) > monocots (2467.241 kDa). Algae contained the species with the greatest molecular mass (3805.064 kDa), while monocots contained the species with the lowest molecular mass chloroplast proteomes (2467.241 kDa) (Supplementary file 3).

Chloroplast Proteomes Encode a Greater Number of Basic pI Proteins

The pI of chloroplast proteins ranged from 2.854 to 12.954 (Table 1, Supplementary file 1). The average e pI of all chloroplast proteomes was 7.852 (Fig. 3, Supplementary file 1). The hypothetical plastid protein (accession: CCP38196.1) in Chondrus crispus exhibited the lowest pI (2.854), while ORF62e (accession: AAO74126.1) in Pinus koraiensis had the highest pI (12.954) (Supplementary file 1). Other chloroplast-encoded proteins with a low pI included the putative ribosomal protein 3 (pI: 2.905, accession: AOM65352.1), photosystem I subunit VIII (pI: 3.058, accession: AWT39761.1), photosystem I protein I (pI: 3.058, accession: BAK19043.1), cytochrome b6-f complex subunit VI (pI: 3.058, accession: ALM87861.1), and several others (Supplementary file 1). Chloroplast-encoded proteins with a high pI were ribosomal protein L34 (pI: 12.881, accession: AOM66732.1), ribosomal protein S11 (pI: 12.193, accession: API85172.1), ribosomal protein L32 (pI: 12.164, accession: ASA34479.1), ribosomal protein S18 (pI: 12.12, accession: AHL24798.1), ribosomal protein L36 (pI: 12.091, accession: YP_009470691.1), and several others (Supplementary file 1). Among the 256,387 chloroplast proteins analyzed, 56.334% were in the basic pI range, 43.611% were found in the acidic pI range, and only 0.054% were identified with a neutral (pI 7) pI (Supplementary file 4). DNA Directed RNA polymerase alpha subunit, a 38.64 kDa protein, was identified as the largest neutral pI protein. Although several other proteins with a pI 7 were revealed, the Abundance of DNA-directed RNA polymerase alpha subunit was the largest.

Table 1 Amino acid composition in the chloroplast proteome of different taxonomic groups of plants and their highest and lowest abundance in the different taxonomic groupsProtists Encode more Basic pI Proteins in their Chloroplast Proteomes

The chloroplast proteomes of protists encoded the greatest percentage of basic pI proteins (63.50504%), while the chloroplast proteomes of gymnosperms had the lowest percentage (51.19304%) (Supplementary file 4). The average isoelectric point of the basic pI proteins in the overall chloroplast proteome was 9.669 (Fig. 3), while the average isoelectric point of the acidic pI proteins was 5.506 (Fig. 3). PCA analysis revealed that the basic pI containing chloroplast proteomes of algae and nymphaeales were distant from other groups, while monocots and eudicots clustered together (Fig. 4). The basic pI proteins of protists, magnoliids, bryophytes, pteridophytes, and gymnosperms are grouped independently of each other (Fig. 4). Chloroplast proteomes with the highest percentage of basic pI proteins, in descending order, were protists (63.505%) > algae (61.936%) > bryophytes (59.380%) > pteridophytes (59.358%) > monocots (55.797%) > eudicots (55.244%) > magnoliids (53.768%) > nymphaeales (52.088%) > gymnosperms (51.193). Correlation analysis indicated that, with the exception magnoliids and bryophytes (− 0.294) and bryophyte and nymphaeales (− 0.179), the basic pI proteins of all the other groups were positively correlated (Fig. 4). The algal species, Prototheca stagnorum, was found to encode the highest percentage (96.428%) of basic pI proteins, followed by Burmannia oblonga (95.454%), Prototheca zopfii (94.736%), Burmannia championii (94.285%), Neottia listeroides (94.285%), and Hydnora visseri (94.117%) (Supplementary file 4).

Fig. 4figure 4

Statistical analysis of basic pI proteins in chloroplast proteomes. A Principal component analysis (PCoA) of basic pI proteins from the chloroplast proteome of different taxonomic groups of plants. The analysis indicated that monocots and eudicots cluster together, suggesting a commonality in these groups’ pI of chloroplast proteins. In contrast, algae, protists, and bryophytes are located distantly from the monocot-eudicot cluster. B Pearson’s correlation analysis (p < 0.05) values of basic pI proteins in the chloroplast proteins of different taxonomic groups of plants. C Heat map of the correlation between basic pI proteins. The analysis indicated several positive correlations between basic pI proteins in the chloroplast proteomes of different taxonomic groups of plants

The chloroplast proteome of Asarum minus encoded the lowest percentage (28.571%) of basic pI proteins, followed by Coscinodiscus radiatus (36.690%), Schrenkiella parvula (44.827%), and Cephalotaxus sinensis (45.121%) (Supplementary file 4). The chloroplast proteomes of at least 23 species contained more than 90% basic pI proteins (Supplementary file 4). Similar to basic pI proteins, the chloroplast proteome of gymnosperms had the highest percentage (48.680%) of acidic pI proteins. In comparison, the chloroplast proteomes of protists encoded the lowest percentage (36.470%) of acidic pI proteins (Supplementary file 4). A principal component analysis indicated that the acidic pI proteins of gymnosperms, magnoliids, bryophytes, and protists clustered together, while eudicots, monocots, algae, nymphaeales, and pteridophyte were all located independent of each other (Fig. 5). A Pearson’s correlation analysis of the acidic pI proteins in the different taxonomic groups revealed that the acidic pI proteins of eudicots and bryophytes (0.515), monocots and protists (0.314), monocots and nymphaeales (0.257), magnoliids and nymphaeales (0.32) were all positively correlated, while the acidic pI proteins of algae and nymphaeales (− 0.473), bryophytes and gymnosperms (− 0.356), pteridophytes and nymphaeales (− 0.392), and gymnosperms and pteridophytes (− 0.162) were all negatively correlated (Fig. 5).

Fig. 5figure 5

Statistical analysis of acidic pI proteins in chloroplast proteomes. A Principal component analysis (PCoA) of acidic pI proteins from the chloroplast proteomes of different taxonomic groups of plants. The analysis indicated that monocots, eudicots, algae, and pteridophytes locate independently from each other, suggesting a lack of commonality between them. Pearson’s (p < 0.05) correlation analysis values of acidic pI proteins in the chloroplast proteome of different taxonomic groups of plants. C Heat map of the correlation values between other basic pI proteins. The analysis indicated that acidic pI proteins are negatively correlated with different taxonomic groups of the plants

The chloroplast proteomes containing the highest percentage of acidic pI chloroplast proteins, in descending order, were gymnosperms (48.680%) > nymphaeales (47.911%) > magnoliids (46.145%) > eudicots (44.699%) > monocots (44.219%) > pteridophytes (40.622%) > bryophytes (40.045%) > algae (37.919%) > protists (36.470%). The chloroplast proteome of Asarum minus had the highest percentage (71.428%) of acidic pI proteins, followed by Cephalotaxus sinensis (54.878%), Pinus tabuliformis (54.054%), and Cymbomonas tetramitiformis (53.94%) (Supplementary file 4). Prototheca stagnorum contained the lowest percentage (3.571%) of acidic pI proteins, followed by Burmannia oblonga (4.545%), Prototheca zopfii (5.263%), and Neottia listeroides (5.714%).

The Molecular Weight and pI of the Chloroplast Proteome Exhibits a Bimodal Distribution

The isoelectric point and molecular mass values vary greatly among different chloroplast proteomes and may actually exhibit a bimodal distribution (Fig. 6). The calculated mean pI of the overall chloroplast proteome was 7.852, and the mean molecular mass was 32.483 kDa. The variance in pI was 5.613, which is lower than the mean, while the variance in the molecular mass was 1966.947, which is quite higher than the mean (Supplementary Table 1). The 75th percentile for the calculated pI of proteins was 9.736, while the 25th percentile was a calculated pI of 5.715 (Supplementary Table 1). The 75th percentile for the calculated molecular mass of chloroplast proteins was 38.95 kDa, while the 25th percentile was calculated to be 9.18 kDa (Supplementary Table 1). The Skewness of the pI and molecular mass of chloroplast proteomes was 0.108 and 3.569, respectively, while the kurtosis for pI and molecular mass was − 1.246 and 15.282, respectively (Supplementary Table 1). The pI exhibited a platykurtic (< 3) distribution, while the molecular mass of chloroplast proteins exhibited a leptokurtic (> 3) distribution. The normal distribution of pI for P(X > 12.954), P(X < 2.854), P(X > 7.951), and P(X < 7.951) was 0.0158, 0.0174, 0.484, and 0.516, respectively (Supplementary Table 1). The normal distribution of molecular mass for P(X > 616.334), P(X < 0.448), P(X > 17.669), and P(X < 17.669) was 0, 0.235, 0.629, and 0.370, respectively (Supplementary Table 1). These data indicate that the probability of an encoded chloroplast protein with a pI above 12.954 is very low (0.0158), and the probability of an encoded protein with a pI below 2.854 is less than 0.0174. However, the probability of an encoded protein with a pI > 7.951 is very high (0.484). Similarly, the probability of an encoded protein with a molecular mass greater than 616.334 kDa is zero (Supplementary Table 1). Only 126 species (4.35%) of the examined species were found to encode neutral pI proteins (Supplementary file 5). Coeloseira compressa, Lobelia anceps, and Megaleranthis saniculifolia encoded two neutral pI proteins, while the remaining species were found to contain only one neutral pI protein within their chloroplast proteome.

Fig. 6figure 6

Virtual 2D map of chloroplast proteomes. The X-axis represents the pI, and Y-axis represents the molecular mass of different chloroplast proteomes. The overall chloroplast proteome exhibits a bimodal distribution. Basic pI proteins are more abundant in chloroplast proteomes than nuclear proteomes; hence the modality shifts towards the basic pI range

Chloroplast Proteome Lack Sec and Pyl Amino Acid and the Abundance of Leu Was Highest, and Cys Was Lowest

Plastome-wide proteome analysis of amino acid composition revealed that Leu (10.59%) was the most abundant amino acid. At the same time, Cys (1.125) was the least abundant amino acid in the chloroplast proteome (Table 1, Fig. 7, Supplementary file 6). Other high-abundant amino acids in the chloroplast proteome were Ile (8.503%), Ser (7.536%), and Gly (6.807%). Other low abundant amino acids in the chloroplast proteome were Trp (1.683%), His (2.298%), and Met (2.305) (Table 1, Supplementary file 6). The chloroplast proteome was found to encode 50.785% non-polar and 49.197% polar amino acids. Notably, only 0.955% of protist chloroplast proteins contain Cys, and only 0.988% of algal chloroplast proteins contain Cys. The percentage of algal chloroplast proteins containing Arg was 4.8 and 4.97% in protists, which was considerably lower relative to other taxonomic groups (Table, Fig. 7). The highest and lowest abundance of various amino acids in different taxonomic groups are indicated by an asterisk (*) and a dagger (†), respectively, in Fig. 7. None of the analyzed chloroplast protein sequences were found to contain Sec selenocysteine (Sec), and a few encoded Xaa (unknown), B (Asx, codes for Asn or Asp), and J (Xle, codes for Leu or Ile) (Supplementary file 1). At least 108 species contained Xaa, six contained Asx, and eight contained Xle amino acids. The amino acid pyrrolysine, and selenocysteine, were also not found in the chloroplast proteome. The highest and lowest abundant amino acids in many individual species were also determined (Table 2). Most of the species listed in Table 2 were algae or protists and exhibited significant variation in amino acid composition. For example, although the average Percentage of Leu in the chloroplast proteome was 10.590% (Table 1), the Percentage of Leu was 12.385% in the chloroplast proteome of Codonopsis lanceolata (Table 2). Similarly, the Percentage of Ile in the chloroplast proteome was 8.503% (Table 1), while the percentage of Ile in Choreocolax polysiphoniae was 14.555% (Table 2). The chloroplast proteome of Pilostyles aethiopica does not contain Trp and may have lost the genes responsible for encoding this amino acid. A PCA analysis revealed that Leu, Ile, Lys, Asn, and Ser are independent of each other, while Cys, Met, His, and Trp cluster together (Fig. 8). Similarly, Tyr, Gln, Thr, Glu, Asp, Phe, Val, and Gly also cluster together, reflecting their similar percentage of abundance in the proteome. A Pearson’s correlation analysis (p < 0.05) of amino acid composition was conducted to better understand their abundance in the chloroplast proteome. Results indicated that a maximum of the chloroplast encoded amino acids were positively correlated with each other, with a few exceptions (Fig. 8). The abundances of Cys, Met, His, Tyr, Gln, Thr, Glu, Asp, Phe, Val, Gly, and Trp were found to be correlated (Fig. 8). A few amino acid combinations exhibited a negative correlation, including Lys and His (− 0.083), Lys and Trp (− 0.128), Lys and Arg (− 0.061), Asn and Tyr (− 0.004), Asn and Trp (− 0.027), Arg and Asn (− 0.047), Gln and Arg (− 0.066), Tyr and Lys (− 0.015), Pro and Tyr (− 0.022), and Tyr and Val (− 0.06) (Fig. 8).

Fig. 7figure 7

Amino acid composition in chloroplast proteomes. A Relative abundance (Percentage) of amino acids in different chloroplast proteomes. Asterisks indicate the highest abundance in the group, and a dagger indicates the lowest abundance. B Line graph of the amino acid composition of all 20 essential amino acids and the unknown amino acid Xaa. The graph indicates that Leu is the most abundant and Cys is the lowest abundant amino acid in chloroplast proteomes

Table 2 Highest and lowest percent abundance of amino acids in the chloroplast proteomes of different plant speciesFig. 8figure 8

Statistical analysis of amino acid composition in chloroplast proteomes. A Principal analysis (PCoA) of amino acid composition in chloroplast proteomes. The analysis indicated that Leu, Ile, Asn, Lys, Pro, Gly, Ser, and Arg amino acids locate independent from each other, while other amino acids cluster in groups; suggesting the differential composition of Leu, Ile, Asn, Lys, Pro, Gly, Ser, and Arg amino acids. B Heat map of the Pearson’s correlation analysis values of the amino acid composition in chloroplast proteomes. All of the amino acids, except for Lys and His, were positively correlated

留言 (0)

沒有登入
gif