Correlation, response and entropy approaches to allosteric behaviors: a critical comparison on the ubiquitin case

In proteins undergoing allosteric regulation [1], the binding of a ligand to the regulatory site affects the catalytic activity of the active site, generally placed at a distant location from the binding region [2]. The term 'allostery' was coined by Monod and Jacob [3] just to define such long-range effects activated across a molecule by the binding event to a specific site.

This sort of biological 'remote switching process' [4] is assumed possible as protein native states are not only structurally stable, but also susceptible enough to transfer signals among far away sites through long-range correlated fluctuations [58]. In practice, the release of the binding energy can trigger structural and/or dynamical changes in far regions of the bio-molecule, thus allowing a fine control of the active site [9]. In this perspective, it is still unclear whether the coordinated motion of aminoacids in protein native states is a universal element to interpret such a wide phenomenology, and if allostery is a further manifestation of the structure-function relationship [10]. The theoretical methods generally applied to infer the structural origin of allostery in macromolecules are based on normal mode analysis (NMA) that can be performed either through full-atom simulations [11, 12] or within less expensive coarse-grained approaches, such as the elastic network models (ENM) [13]. Other approaches deal with allostery as a problem of information transport across the network of interactions (contacts) defined by the topology of the native structure [1416]. A typical scheme assigns Markov-chain like transition rules for exploring the network and then identifying allosteric paths connecting regulating and active sites, as the most probable ones [17]. This technique has been successfully applied to the study of the electro-mechanical coupling between voltage sensor domain and pore domain in voltage-gated potassium channels [1820].

A popular tool to analyze the residue-residue coherent dynamics in a molecule with N residues is the equilibrium $3N\times 3N$-covariance matrix [10],

Equation (1)

where $\Delta \mathbf_i(t) = \mathbf_i(t) - \mathbf_i$ indicates the instantaneous displacement of residue i from its native position, Ri , taken as equilibrium position. The direct study of the correlations of the free molecule (apo-structure) and their variations after the binding (holo-structure) can show how the docking of a ligand to the regulatory site produces observable changes to the dynamics of the target site [12].

Under a physical-like interpretation, allostery can be discussed in terms of the response of proteins and enzymes to a local perturbation generated by the binding of a ligand. The reaction to binding determines the release of the mechanical strain that converts into the transfer of signal at the molecular scale from the source to the target [21]. In other words, according to the ensemble model scenario [22, 23], allostery emerges from a modification of the native free-energy landscape of proteins under different 'effects': ligand binding, mechanic or chemical excitations, and environmental changes in pH or temperature.

More generally, allostery can also be associated with the notion of 'causation' in which regulatory and active sites are linked by a series of cause–effect pathways. There are two possible definitions of cause–effect relationship which correspond either to the interventional view or to the observational view of causation.

In the first, one directly performs local or structural modifications of a system and measures how they affect the behavior of specific system variables. This definition of causality was proposed by Pearl [24]. Conversely, the second consists in determining whether and to what extent the simple observation of certain variables is useful to predict the future of others, without manipulating the system.

The analysis of correlations, equation (1), belongs to the observational approach. However, 'correlations do not imply causation', as they measure only associations among variables without explaining their cause–effect relationship [25]. For instance, two residues i and j can move jointly not because they are in direct interaction, but because they are driven by a shared group of other residues [2527].

As a consequence, it is reasonable to suspect that a mere investigation of allostery based on correlations only could overlook relevant biological information.

On the contrary, response theory seems not only the natural framework to understand how a static [28] or dynamic perturbation [29, 30] propagates from source to target site in proteins, but also the most reasonable approach to establish the causal influence between source and target. The detection of causation based on response theory recalls the interventional approach, according to which the coordinate xj causally influences the coordinate xi, if a perturbation of xj results in a variation of the measured value of xi. In formulae, we will say that xi influences xj, if

Equation (2)

i.e. a small perturbation on $x_j(\tau)$ at time τ results in a non-zero future variation on the average of $x_i(t+\tau)$ over its unperturbed evolution. In equation (2), we assume steady dynamics. If $\delta x_j$ is small enough, it is well known that the quantity (2) can be related to the spontaneous correlations in the unperturbed dynamics by one of the pillars of non-equilibrium statistical mechanics, the fluctuation–response theorem (FRT) [31], also known as fluctuation–dissipation theorem (FDT).

The analysis of allostery communication beyond correlations has been already suggested in the literature. For instance, some authors looked at the structural properties of molecules to predict their allosteric behaviours [32], others [33, 34] employed the transfer entropy (TE), a measure of causation borrowed from information-theory and introduced by Schreiber [35] and Paluš et al [36] in the context of stochastic processes and dynamical systems. The entropy transfer from the evolution of the coordinate $x_j(t)$ to the evolution of the coordinate $x_i(t)$ determines the information (uncertainty) that we gain (lose) on the future states of xi, if we not only consider the past history of xi, but we also include the past of xj. It quantifies the causal influence of xj on xi. In formulae,

Equation (3)

where $H(a|b)$ indicates the conditional Shannon entropy [37] of the state a given the state b. Equation (3) assumes stationary processes. Notice that TE is by definition asymmetric, thus naturally incorporating a direction of the entropy/information transfer from $x_j \to x_i$, that is generally different from $x_i \to x_j$.

For the sake of completeness, it should be mentioned that allosteric processes are often analyzed also in terms of pairwise mutual information and its high-order generalization called interaction information which are useful indicators of causal dependency [38]. In particular Interaction Information is appropriate when the complexity of the allosteric process involves clusters of variables at the same time so it cannot be decomposed into elementary pairwise dependencies [39]. However, since these quantities are generally used in pure statistical formulation which does not take into account the temporal evolution, they will not be discussed here.

In this work, motivated by the importance of allosteric mechanisms, we compare the behavior of the three mentioned indicators: correlation, response, and TE when applied to the human ubiquitin (Ub), a simple and very well-studied protein, which, according to recent experiments [40], has shown regulation of allosteric nature.

Since each of the three indicators extracts specific features from the fluctuations of the residues, their combined use should provide a more robust view of those coordinated movements that play a possible role in allosteric mechanism.

Especially from the analysis of the response, we will try to identify sites of Ub that are more susceptible to perturbation, and understand if they are possibly involved in the known allosteric pathways of Ub. Following [29, 30], we employ the dynamic version of the indicators to exploit the information contained in their time dependence and in their propagation across the structure.

The exact comparison of these indicators in realistic all-atom simulations is severely limited by the system size and by the need of collecting very long time series (dimensional curse). To overcome this limitation, we describe the native state fluctuations of Ub via a coarse-grained approach, portraying the native state as a mechanical system made of nodes, representing the position of the α-carbons (C$_$) of the aminoacids, connected by harmonic springs. This description is referred to as the Gaussian network model (GNM) introduced in [41]. The main advantage of using a Gaussian model relies on its full solvability, allowing the analytical expressions of correlations, responses, and TEs to be worked out.

In the literature, GNM and more generally coarse-grained elastic networks constitute simplified less-expensive alternatives to all-atom NMA, generally used for fast and easy characterization of slow and large-scale dynamics of native structures. They allow the identification of flexible and rigid regions in huge single and multi-domain proteins and are proven to be meaningful in the prediction of functional modes relevant for a comprehension of the structure-function relationship of biopolymers [4245]. In addition, when the size of the system becomes inaccessible to all-atom NMA, the ENM represent the only viable approach.

It is important to recall that the GNM presents two crucial limitations. First of all, it does not apply to processes where a molecule changes its shape to visit other conformational states in order to perform its function. Therefore, the approach we follow here describes only 'allostery without conformational changes' proposed by Cooper and Dryden [22], and Ub falls into this category. Actually, there are generalizations of the ENM including also molecule transitions from one state to the other [46] by defining a GNM representation for each state and then assigning the transition rate among such states.

The second limitation of the standard GNM approach arises from neglecting the side chains, a common approximation of many coarse-graining methods. As we shall see in section 3, to partially relieve this crude approximation, we use a variant of GNM which is based on the 'heavy-atom contact map' (briefly heavy-map), that incorporates some effects of the side-chain presence.

Even if the small rearrangements in Ub allostery justify the use of the GNM-like approach improved by heavy-map representation, it should be noted that for allosteric processes occurring via transition among different metastable states, or involving not negligible nonlinear fluctuations [47], full-atomistic molecular dynamics, possibly supported by enhanced sampling techniques (e.g. conformational flooding [48]), remains the leading investigation tool.

The paper is organized as follows. In section 2 we briefly recall some structural properties of Ub and its allosteric control. section 3 reports the simplified theoretical framework for the description of Ub and the mathematical properties of the indicators (correlation, response, and entropy transfer) that we used to characterize the interplay among Ub fluctuation modes. Sections 46 contain the results obtained by the analysis based on these three indicators. In section 7, we show the results on the complex Ub-USP, ubiquitin and ubiquitin-specific peptidase (USP) to see how the Ub internal motion is modified by the interaction with one of its natural substrates. Final discussion and conclusions are drawn in section 8.

In this section, we briefly summarize the principal biological information on Ub that has been used to orient our theoretical analysis. Post-translational modifications are covalent modifications altering the functional state of a protein; typical post-translational modifications involve the attachment of small chemical groups like acetyl, phosphate or methyl groups. Ubiquitylation, the attachment of Ub to its target substrates, can be considered an extreme case where the chemical group attached to the target protein is itself a small protein. Ubiquitylated proteins are normally targeted to degradation in the 26S proteasome, but ubiquitylation can also induce trafficking or endocytosis. Ub is bound to each target protein through the sequential action of three enzymes that ultimately connect the COO− terminal group of Ub with the side chain of a Lysine residue of the target. Since Ub also comprises several Lysine residues, it can become the target of further ubiquitylation reactions that create a linear chain of Ub units bound to the target protein. The geometry and bonding pattern of these chains determines a different fate of the marked molecule.

A recent work by Smith et al [40], based on NMR relaxation dispersion experiments highlighted a conformational switch of peptide bond Asp52-Gly53 of allosteric significance. As sketched panels (a) and (b) of figure 1, showing portions of structures from PDB files 1UBI [49] and 2IBI respectively, this bond flips between two states referred to as NH-out (1UBI) and NH-in (2IBI).

Figure 1. The conformational switch of peptide bond Asp52-Gly53. Panel (a) shows the NH-out conformation where the NH group of Gly53 points towards the solvent (direction shown by the arrow). Panel (b) shows the NH-in conformation where the NH group points towards the protein interior (as shown by the arrow). Panel (c) ubiquitin-ubiquitinase complex (PDB:2IBI). The USP is shown in the background, while Ub is displayed in red and green, the green region represents Ub binding interface with the USP (as defined through a distance cutoff of 3.0 Å). The C-terminal tail of Ub is also part of the binding interface with USP. The red spheres represent Ile23 and the Asp52-Gly53 peptide-bond, while the Glu24 and Gln49 are shown in green, and the blue rod indicates the hydrogen-bond Glu24-Gly53. Panel (c) shows that the conformational switch (Asp52-Gly53) and the Ub-USP interface are located on the opposite sides of Ub emphasizing the allosteric nature of the regulation mechanism.

Standard image High-resolution image

In the NH-out state, the NH group of the peptide bond points towards the bulk where it forms hydrogen bonds with water molecules (black arrow). As a result, the only possible interaction with the neighboring Glu24 residue is a hydrogen bond between the CO group of the Asp52-Gly53 bond and the NH group of Glu24. The sidechain of Glu24 is therefore not involved in this interaction. By contrast, in the NH-in configuration the NH group of the peptide bond points towards the interior of the protein (black arrow), where it can H-bond the side-chain of Glu24, that is also hydrogen bonded by the NH group of Glu24 itself. To assess the functional significance of this conformational switch Smith et al performed a bioinformatics analysis on a database of Ub experimental structures. This analysis suggested a correlation between the NH-in conformation and the binding of the Ub to the ubiquitinase USP (ubiquitin-specific protease). Figure 1(c) shows the structure of the complex Ub-USP. This result was completely unexpected since neither Asp52 nor Gly53 is directly involved in Ub-USP binding. The finding thus led to the hypothesis that the switch of the Asp52-Gly53 bond might induce an allosteric rearrangement of the USP binding region of Ub. Indeed, further analysis showed that the NH-in and NH-out states are respectively associated to the contraction and expansion of the ubiquitinase binding interface. Moreover, it was shown that a contracted binding interface allows fewer steric clashes, energetically promoting the binding of USP. The mechanism can thus be summarized as follows: the NH-in state allosterically induces the contraction of the binding interface reducing the number of clashes and favoring the USP binding. The residues more affected by this interface deformation are Gly35 and Gln49. Interestingly, these results agree with an older work by Massi et al [50] that identified chemical exchange processes affecting Ile23, Asn25 (flanking Glu24), Thr55 (close to the Asp52-Gly53 bond) and Val70.

The allosteric regulation of Ub can be seen as a propagation of perturbation from the couple of aminoacids (Glu24,Gly53) that we consider as source to the couple (Gly35,Gln49), that instead acts as target. In the following, for convenience, we shall refer to these sites as the allosteric set, ASet = .

In a coarse-grained approach, the native state of the Ub is described as a mechanical system made of nodes, representing the position of the α-carbons (C$_$) of the aminoacids, connected by harmonic springs. The potential energy of GNM is very simple and reads

Equation (4)

where $\_1,\Delta\mathbf_2,\ldots,\Delta\mathbf_N\}$ are the instantaneous displacements of the N C$_\alpha$ atoms from their native positions taken as equilibrium states. The quantity g defines the adjustable energy scale that can be set by matching the theoretical mean square displacement of the C$_\alpha$ from their native positions with the experimental crystallographic B-factors [5153]. The coefficients Kij are the elements of the coupling matrix $\mathbb$, often termed Kirchhoff matrix, which is defined through the contact matrix elements, Aij (also connectivity matrix of the network) through the relation

Equation (5)

where A0 is a factor weighting the strength of the harmonic interaction along the chain (backbone) over the off-chain interaction. $A_0\sim 10$ seems a reasonable value to distinguish the role of the backbone links from the rest of the network.

The effect of the side chains can be partially included by using a GNM approach based on a heavy-atom contact-map [54] that excludes hydrogens. In this scheme, a pair of residues i − j is connected by a spring if, they have at least a couple of heavy atoms $a,b$ in contact in the native state of Ub (PDB: 1 ubi). In formulae, this means that $A_ = 1$ if the relation

Equation (6)

holds, with a cutoff $r_c = 5$ Å, where $\Theta(u)$ is the unitary step function.

Panel (a) of figure 2 shows the heavy contact-map representing the native interactions, panel (b) shows the topological diagram of Ub secondary structure that includes five beta-strands S1,..., S5 and two helices H1, H2.

Figure 2. Topology structure of Ub. Panel (a): contact map defined by the heavy-atom contacts using a cutoff of $R_c = 5$ Å; the cutoff identifies 283 native contacts such that $|i-j|\geqslant 2$. Panel (b): topology plot of Ub showing the secondary structure content with five beta-sheets and two helices. Red and blue shadowed boxes indicate the separation of Ub in two clusters, CluA and CluB joined by the hinge residues Glu64 [55]. Reprinted figure with permission from [55] Copyright 2020 by the American Physical Society.

Standard image High-resolution image

In the following, we will redefine $\Delta\mathbf_i \to \mathbf_i$ to simplify the notation.

The full NMA of Ub would require the computation of the Hessian matrix, obtained by computing the second partial derivatives of the force-field potential on the equilibrium state. The less complex Hessian of the GNM turns to be decomposed into blocks

where ri denotes the position of the ith C$_$ and µ and ν indicate the generic component $x,y,z$. In practice, the three coordinates of a C$_$ becomes formally equivalent $x_i\equiv y_i\equiv z_i$, hence the position vector $\mathbf_i = x_i \mathbf$ is virtually a scalar (with $\mathbf = (1,1,1)$), and $\langle \mathbf_i^2 \rangle = 3 \langle x_i^2 \rangle$. Therefore GNM approach reduces a system of 3N degrees of freedom to a system in N degrees of freedom only; equivalently, it deals with protein fluctuations as a problem of scalar elasticity.

The equation of motion for each GNM coordinate in the overdamped regime reads

Equation (7)

here, γ denotes the friction, $k_\textrm B$ the Boltzmann constant and ξi is a zero-average and time delta-correlated Gaussian process. Hereafter, we set $\mu = g/\gamma$. From the solution (in vector form) of equation (7)

Equation (8)

correlation and response are straightforward to obtain as

Equation (9)Equation (10)

where $\mathbb(0) = \langle \mathbf(0)\,\mathbf^\textrm T(0) \rangle$ is the equal-time correlation matrix also termed equal-time covariance matrix and the average is over the thermal noise.

The advantage of using the GNM relies on its full solvability, since it defines a multivariate Onrstein–Ulehnbeck process (7) whose explicit solution only requires numerical diagonalization of the sparse matrix $\mathbb$. $\mathbb$ is symmetric and diagonalizable with all real not negative eigenvalues, but not invertible because it has one vanishing eigenvalue, due to the translation invariance of the system. It can be represented in the form

Equation (11)

where Λ is the diagonal matrix containing all the eigenvalues $\$ of $\mathbb$. $\lambda(k)$ is the kth eigenvalue of $\mathbb$ associated to its kth eigenvector, $\mathbb \mathbf(k) = \lambda(k) \mathbf(k)$. $\mathbb = \(1),\ldots,\mathbf(N)\}$ is the diagonalizing matrix whose columns are the eigenvectors, $\mathbb^$ is its transpose, moreover $\mathbb \mathbb^ = \mathbb$, $\mathbb$ being the identity N × N-matrix.

3.1. Correlation

The Gaussian nature of the GNM implies that the equilibrium covariance matrix $\mathbb(0)$ is proportional to pseudo-inverse $\mathbb^$ of the matrix $\mathbb$,

$\mathbb^$ replaces the standard matrix inversion, as $\mathbb$ is not invertible because of its vanishing determinant due to the translation invariance of the system. By definition $\mathbb^ = \mathbb \mathbb \mathbb^$, where the diagonal matrix D is $D_ = \delta_/\lambda(i)$, if $\lambda(i)\ne 0$ and

留言 (0)

沒有登入
gif