ProSeqAProDB: Prosequence assisted protein Database

Attaining an active three-dimensional conformation through folding of the linear polypeptide chain synthesized on the ribosome is crucial to the proper functioning of protein. The process of protein folding is thought to be spontaneous, driven by thermodynamic considerations. However, a large number of protein molecules are known to require assistance for folding from some extrinsic biomolecules, called chaperones [1]. Such chaperones are mostly generic in nature and are expected to direct a variety of proteins down the correct path in their protein folding journey. A substantial amount of research has been directed in past to understand the role of such chaperones in maintaining cellular proteostasis and preventing protein mis-folding diseases [2]. Interestingly, about three decades ago, a few examples of self-chaperoning proteins were reported. These proteins contain an extra stretch of amino acids, later termed as prosequences, (also prodomains or propeptides), which appears to direct their folding, but would be later cleaved from the cognate protein at the end of the folding process, locking the cognate protein in the correct conformation [3]. Since removal of such prosequence regions were shown to hinder the folding process, these regions were thought to be essential for folding and thereby termed as Intra-molecular Chaperones (IMCs) by Inouye and his colleagues [4]. Subtilisin E, a protease from Bacillus subtilis is one of the first examples discovered, which served as a model candidate for studying prosequence assisted protein folding for a long time, with significant number of studies exploring various aspects of such folding 5, 6. Later, other proteases such as alpha-lytic protease, carboxypeptidase and other serine and cysteine proteases were also reported to have prosequences 7, 8, 9. Using these proteases for folding-unfolding-refolding studies, it was proposed that the free energy barrier for refolding may be significantly high, and the interaction of the unfolded protein with the prosequence lowers the free energy of transition state, thereby increasing the rate of folding [5]. But these prosequences are sensitive to proteolytic degradation after their cleavage from the cognate protein and hence were considered as a single turnover catalyst [4]. More interestingly, a distinctive phenomenon of ‘protein memory’ was reported in case of subtilisin, where two different prosequences imparted two distinct structures to the same cognate protein (enzymatically active in both cases), but exhibited differences in thermostability and substrate specificity [10]. This observation not only promised valuable insights into protein folding processes but also practical applications to protein engineering as well 11, 12.Table 1.

Several other studies reported that the prosequences do not merely function as IMCs but also prevent precocious activation of its cognate protein and may protect the mature protein from proteolytic degradation 13, 14, 15, 16. The prosequence was shown to act as competitive inhibitor of its cognate protease, thereby regulating the time of action of proteases [17]. Since then, the prosequences have been shown to affect various additional functions such as enzyme targeting, transport, intracellular sorting, cellular localization and directing enzymes through secretory pathways 18, 19, 20. Due to such multifunctionality, it was proposed that these pro-regions could be of two distinctive types, one that act as chaperones, and the other that affect other functions and therefore can be considered as post-translational modulators of protein function [21].

Over the years, a large number of examples of prosequence containing proteins have accumulated in scientific literature but there has been no systematic effort to collate such data and comprehend the ancestral function and mechanism of action of prosequences further [11]. We believe that analysing the repertoire of such prosequence containing proteins in terms of their cellular function, location and sequence features may not only enhance our understanding of protein folding phenomenon in general, but may also open avenues for tackling protein misfolding diseases from a new perspective. We therefore initiated the study with a compilation of proteins experimentally shown to have prosesquence or prodomain regions. To the best of our knowledge, there is no comprehensive database available for Prosequences and respective prosequence assisted proteins. In order to generate non-redundant, curated and evidence based dataset of prosequence-assisted proteins, we looked at the most popular protein repository, UniProtKB [22].

In this study, we have established a publicly available web-based ProSeqAProDB database containing evidenced and curated prosequence assisted protein dataset. The database allows a user-friendly graphical interface and links to other related databases such as UniProtKB, PubMed and Protein Data Bank 22, 23, 24. It also provides filtering tools and downloading options for data along with advanced search tools.

留言 (0)

沒有登入
gif