Domain-Based Protein Docking with Extremely Large Conformational Changes

Protein-protein interactions are fundamental to many biological processes in living cells. To understand in detail the mechanisms of these processes, modeling the 3D structures of their associated protein complexes is a critical step. While protein complex structures are steadily being determined by experiment and deposited in the Protein Data Bank (PDB),1 experiments are costly both in time and expense. Moreover, structures of protein complexes are often extremely difficult to determine by experiments. Thus, when a protein complex structure has not yet been experimentally determined, computational tools can be used to construct atomic models.2 A so-called protein docking program can take component proteins, called subunits, as input and assemble them into atomic models of the protein complex. Many general protein docking methods and specialized versions thereof have been publicly released, such as ZDOCK,3 HADDOCK,4 ClusPro,5 RosettaDock,6 HEX,7 SwarmDock,8 and ATTRACT.9 Even protein structure prediction methods like AlphaFold10 have been tweaked to be able to output multimeric structures.11 The rigid-body docking method LZerD12, 13, 14, 15 in particular has been consistently ranked highly in the server category in CAPRI,16, 17 the blind communitywide assessment of protein docking methods.

Often confounding to computational complex modeling is the fact that proteins are flexible molecules. Even with state-of-the-art conformational sampling techniques, existing docking methods struggle to handle substantial conformational changes beyond roughly 2 Å RMSD.18, 19, 20 Large scale conformational changes near and above 10 Å RMSD, though well above 2 Å RMSD and thus difficult to model, are quite common, and are often related to protein function.21, 22, 23, 24, 25, 26, 27 For example, in many cellular processes, calmodulin, ubiquitous in eukaryotes, undergoes such a conformational change when it binds to its various target proteins, and its flexibility facilitates recognition of a comparatively large number of targets in tandem with increases in Ca2+ ion concentration.28 In the case of the nuclear import cycle, a crucial step is the release of the importing-beta-binding domain of importin-alpha from importin-beta by GTPase Ran. The binding of GTPase Ran to importin-beta induces a large-scale change in the helical conformation of importin-beta, allosterically inducing the importin subunits to dissociate.29 It is then clear that techniques capable of modeling such large conformational changes have the potential to elucidate many cellular processes in many cellular contexts.

Computational techniques have been developed which quantitatively predict the location, nature, or degree of conformational change a given protein might undergo,30, 31, 32, 33 including as part of complex formation.34 While existing assembly methods generally struggle with a few angstroms root mean squared deviation (RMSD) of conformational change,18 experimentalists often observe far more drastic conformational changes.21, 22, 23, 24, 25, 26, 27 There are many ways to consider lesser flexibility. The implicitly soft surface representation of LZerD12, 13 can handle side chain flexibility. When the backbone must be moved, it can be sampled explicitly e.g. by ClustENM20, 35 with normal modes, by CABS-Dock36 or RosettaDock37 with Monte Carlo simulation, or by ATTRACT with molecular dynamics.38, 39 Many explicit sampling methods require cross-docking, which necessitates either precise sampling or fast sample docking to prevent intractable computation times.40, 41 At the extreme end of the flexibility spectrum, intrinsically disordered proteins and protein regions can currently be docked by IDP-LZerD,42, 43, 44 even at ligand disordered region sizes of 69 residues. However, IDP-LZerD is specific for docking a disordered protein with no structured domains and relies on docking and knitting together necessarily small peptide fragments to a receptor protein and is thus not suitable for assembling complexes with large, ordered ligand domains. Despite substantial advancements, existing protein docking methods cannot generally model large-scale conformational changes of ordered ligand proteins. Current methods can handle some lesser flexibility,20 but cannot seem to break a barrier at larger RMSDs of conformational change. In the regime of conformational change ≥ 10.0 Å RMSD even with coherent domains, current methods are simply not adequate.

In this work, we target this ≥ 10.0 Å RMSD coherent regime, and describe a new method called Flex-LZerD. Flex-LZerD is comprised of a novel method for normal mode-based flexible fitting of an initial structure to docked partial structure fragments, here domains, and a method for selecting these partial docked structure domains. This restriction of the expensive rigid-body docking to the initial stage circumvents the cross-docking problem and renders the large-scale flexible docking tractable. Previous work by Karaca and Bonvin showed multidomain docking to be a promising route, but did not handle large gaps and mainly considered benchmark targets well below the 10.0 Å RMSD regime.45 The novel fitting by iterative projection along residue-level rigid block normal modes to the docked domains and geometry minimization, at multiple levels with and without the receptor structure, then yields all-atom models in the new putative bound states. Past work in loop modeling has developed methods capable of modeling typically up to 12-residue-long gaps in a protein structure where both endpoints are known.46 However, our benchmark in this work contains domain pairs with dozens to hundreds of unmodeled residues. These gaps are well outside the range of loop modeling but are handled in this context by the flexible fitting. We will directly show that the examined flexibility regime is inaccessible to rigid-body docking, that deep learning methods like AlphaFold do not currently handle such structures which are observed in different conformations, that flexible fitting is capable of modeling the conformational differences within the regime, and that Flex-LZerD as a whole is capable of flexibly assembling ligand and receptor proteins and nucleic acids into complex models. Interactions of both DNA and RNA with proteins are handled in this framework. Flex-LZerD yielded acceptable models within the top 10 models for 5 of the 9 unbound docking cases where the receptor structure used was also unbound, according to the CAPRI criteria. On a broader unbound/bound benchmark set where the receptor structure was in its bound form, Flex-LZerD likewise yielded acceptable models within the top 10 models for 17 of 23 total targets. These successes break down into 9 of 15 protein–protein complexes and 8 of 8 protein-nucleic acid complexes modeled to acceptable quality. The Flex-LZerD flexible fitting code is available from https://github.com/kiharalab/Flex-LZerD.

留言 (0)

沒有登入
gif