Genome-wide computational analysis of the dirigent gene family in Solanum lycopersicum

Confirmation of DIR genes in S. Lycopersicum

The DIR sequences from the model organism Arabidopsis thaliana and the seed file (PF03018) were used as keywords in a BLASTp homology search against the S. lycopersicum genome to determine all possible closely related sequences in Phytozome v13 (https://phytozome-next.jgi.doe.gov). By eliminating other repetitive sequences, a maximum of 31 DIR genes, labeled SlDIR1 to SlDIR31, were retrieved from the S. lycopersicum genome. All the SlDIRs (S. lycopersicum DIR genes) genes belonged to the DIR group according to the annotation of Phytozome v13 and also matched with their Arabidopsis thaliana orthologs. Subsequently, the existence of a preserved DIR motif was confirmed utilizing SMART and Pfam screening of all SlDIRs. TBtool was used to further confirm the presence of a DIR domain in all SlDIRs genes, as shown in Fig. 1.

Fig. 1figure 1

Confirmation of DIR domains in all the S. lycopersicum retrieved sequences

Gene structural characterization, conserved motif analysis, and phylogenetic tree construction

The amino acid contents and stoichiometries of molecules in the S. lycopersicum DIR genes family are diverse, and the amount of compound proteins varies greatly among subclasses. The amino acid lengths of the S. lycopersicum DIR genes ranged from 60 (SlDIR23) to 399 (SlDIR11), with an average of 189 base pairs. The lowest molecular weight is 6302.19 kDa, and the maximum is 41413.18 kDa, with an average weight of 20715.22 kDa. The mean isoelectric point (pI) is 7.75, with scores ranging from 4.47 (SlDIR11) to 9.88 (SlDIR11) (SlDIR3). The pI is greater than 7 in 58% of the S. lycopersicum DIR gene family members, while it is less than 7 in the remaining genes. As a direct consequence of these findings, there are more basic DIR proteins than acid proteins. The presence of N-glycosylation (Asn) in each DIR sequence and other physiochemical properties can be seen in Table 1.

The genomic and protein coding sequence (CDS) of the S. lycopersicum were studied, and the genetic makeup of their intron and exon structures were examined to determine how they work. According to the findings of the GSDS 2.0 software program (http://gsds.gao-lab.org/), only six of the 31 DIR genes (16%) had only one intron. Notably, 25 of the 31 proteins did not have introns. The genetic structure of all the genes is shown in Fig. 2. The WoLF PSORT results of subcellular localization were predicted using a heatmap from TBtool software. The highest probability values are highlighted in red, and the lowest probability values are highlighted in light blue color as shown in Fig. 3.

Members of the S. lycopersicum DIR genes family from the same subfamily have comparable motif types and quantities; however, there are variations in motif configurations across subfamily members. The precision of the phylogenetic analysis was improved by discovering comparable gene architectures and preserved domains within the same subfamily. Structural variations across subfamilies, on the other hand, imply that the DIR gene family in S. lycopersicum has functional variability. The genes and their respective motifs can be visualized in Fig. 4 which shows that each gene has its functionality depending upon the number of motifs present.

ClustalW was employed to evaluate the amino acid patterns of 31 S. lycopersicum DIR genes against 26 Arabidopsis thaliana DIR genes in addition to assessing the DIR gene family in model plants and S. lycopersicum from an evolutionary perspective and to investigate the unique features of the S. lycopersicum DIR genes. MEGA X, minimal evolution, and neighbor-joining (NJ) methods were used to study the phylogenetic relation. The S. lycopersicum DIR and Arabidopsis thaliana DIR genes were clustered together, suggesting that the S. lycopersicum DIR genes that can be categorized from Arabidopsis thaliana sequences are part of the same grouping. The DIR group of all these sequences may be categorized into seven subclasses (indicated by different colors) based on their similarity to DIR sequences in Arabidopsis thaliana, as shown in Fig. 5.

Table 1 Physiochemical properties of all the DIR genes in S. lycopersicumFig. 2figure 2

The intron-exon structure of DIR genes, the exons are shown in light yellow, and the black curve line indicates an intronic region with a blue color indicating the upstream/downstream region

Fig. 3figure 3

Heatmap interpretation of the subcellular localization of S. lycopersicum DIR genes

Fig. 4figure 4

Phylogenetic analysis and conserved motif analysis of S. lycopersicum DIR genes (A) Phylogeny of DIR genes via MEGA X with neighbor-joining methodology. (B) Different colors represent the various conserved motif domains of DIR genes in all S. lycopersicum DIR genes

Fig. 5figure 5

Phylogenetic relationship of S. lycopersicum and Arabidopsis thaliana DIR genes

DIR genes promoter analysis

To advance the study of the putative biological responses of S. lycopersicum DIR genes during signaling, development, and endurance to abiotic and biotic stress feedback, PlantCARE [26] was utilized to evaluate cis-acting regions within the 2 kb upstream sequence and 200 base pairs upstream from each transcription start site of the S. lycopersicum DIR genes. Upon further investigation of the responsive parts of each gene, it was found that, as also shown in Fig. 6a (2 kb base pairs) and Fig. 6b (200 base pairs), each gene seems to have a diverse range of activities in response to environmental stress as well as plant development, growth, and control.

Fig. 6figure 6

(a) A comprehensive analysis of all S. lycopersicum DIR gene promoter analysis (up to 2 kb upstream). (b) A comprehensive analysis of all S. lycopersicum DIR gene promoter analysis (up to 200 bases upstream region)

Chromosomal location and gene duplication analysis

The Phytozome dataset v13 provided the chromosomal locations of all S. lycopersicum DIR genes. All DIR genes were physically allocated to their appropriate chromosomes by using the phenogram tool, as shown in Fig. 7. The 31 DIR genes were highly heterogeneous and dispersed throughout the S. lycopersicum genome on all the 12 chromosomes, excluding chromosome number 03, indicating that biological variability evolved during evolution. Nine genes were found on chromosome 10 (chr10). On the other hand, chromosomes 5, 8, 9, 11, and 12 contained the fewest DIR genes, with only one. Two DIR genes were found on three chromosomes (chromosome number 04, chromosome number 06, and chromosome number 07). Furthermore, chromosomes 2 and 8 contain 3 DIR genes.

The (synonymous rate) Ks, (non-synonymous rate) Ka, and Ka/Ks for these iterations were calculated, and the values were used to predict duplication divergence time. Throughout the genome, there is a wide variety of duplications. The ratio of Ka/Ks indicated that all of the values were less than one, implying that they were purified.

Ka/Ks = 1 indicates neutrality in the process of selection, while Ka/Ks > 1 indicates positive selection, and Ka/Ks < 1 indicates purifying selection. A Ka/Ks < 1 was found for all duplicated DIR gene pairs, indicating purifying selection throughout evolution, except for the SlDIR2 and SlDIR3 gene pairs, which indicate positive selection and are highly conserved throughout evolution. In addition, duplication events of duplicated gene pairs were predicted to have happened somewhere between 3.75 and 74.36 million years ago (Table 2).

Fig. 7figure 7

Allocation of DIR genes across the S. lycopersicum genome

Table 2 Ka, Ks, and Ka/Ks calculations and divergence times of the duplicated S. Lycopersicum DIR gene pairsProtein-protein linkage association, signal peptide prediction, and coexpression analysis

To determine the importance of S. lycopersicum DIR proteins, data relating to proteins were obtained, and coexpression studies of these proteins with linked taxa were performed. The String Browser revealed significant associations among proteins at different stages. A preliminary shell of contact is observed throughout the intersection, as indicated by all the bright clusters. Figure 8b depicts the evolution, preservation, and coexpression of the differentially expressed proteins in a set of related taxa. It can be visualized from Fig. 8b that DIR sequences are preserved throughout the related taxa, black color indicates high preservation while light color indicates low. Figure 8a depicts the anticipated relationship of 31 S. lycopersicum DIR proteins that shows the established linkage among them. SignalP 6.0 was used to predict peptide signals, and the results are shown in Table 3. All the proteins were predicted to have signal peptides, except for nine proteins, SlDIR1, SlDIR2, SlDIR4, SlDIR10, SlDIR12, SlDIR22, SlDIR23, SlDIR28, and SlDIR30. The values of the cleavage site position and marginal probabilities for the signal peptide regions are also given in Table 3.

Fig. 8figure 8

(a) String database prediction of S. lycopersicum DIR genes. (b) STRING database depicts DIR genes co-expression in related taxa. Black color shows the highest expression in the S. lycopersicum genome vs. light color at a different scale

Table 3 Signal peptide prediction, cleavage site position, and marginal probabilities for the signal peptide regions of S. Lycopersicum DIR genesSynteny analysis and tertiary structure prediction

DIR proteins were analyzed to find orthologous pairs between S. lycopersicum and Arabidopsis thaliana to further deduce the evolutionary connection. According to synteny analysis, S. lycopersicum DIR genes and Arabidopsis thaliana DIR genes have collinear gene pairs. A Circos plot [31] was constructed to predict that S. lycopersicum DIR genes possess a high degree of evolutionary homology with Arabidopsis thaliana, indicating that they could have similar biological activities (Fig. 9). Within the circle, ribbons in four semitransparent colors—blue, green, orange, and red—show the local alignments generated by the BLAST approach. These colors correspond to the four quartiles up to the maximum score; that is, a local alignment scoring 80% of the maximum score is red, while one scoring 20% of the maximum score is blue.For the prediction of protein tertiary structure, the Phyre2 online tool was used, the results of which are shown in Fig. 10. (http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index)

Fig. 9figure 9

Synteny map of all the identified S. lycopersicum DIR and Arabidopsis thaliana DIR genes

Fig. 10figure 10

Tertiary prediction of all S. lycopersicum DIR proteins

Transmembrane potential and sequence identity and similarity analysis

To test whether the potential transmembrane outcome is active for all of the studied sequences, TMHMM, an online tool, was used to search for all of the S. lycopersicum DIR genes. 15 DIR genes out of the 31 S. lycopersicum DIR genes were found to be involved in the essential functions of the cellular membrane. The SIAS assessment was used to determine the sequence distinctiveness, coherence, and global similarity. Table 4 presents common findings in a tabular arrangement. The resemblance and identification ability of S. lycopersicum DIR genes were greater than those of the global similarities based on the data (Table 4).

Table 4 Sequence analysis of S. lycopersicum DIR genes

留言 (0)

沒有登入
gif