iQDeep: an integrated web server for protein scoring using multiscale deep learning models

Protein scoring is a critical component of protein structure prediction 1, 2, 3, 4, 5. Protein scoring has been gaining noticeable attention in the Critical Assessment of Protein Structure Prediction (CASP) experiments under the accuracy estimation category 6, 7, 8, 9. Promising progress has been made in the CASP14 accuracy estimation category [7] with various deep learning-based methods performing well. Among them is our previously published method QDeep [2], introducing several new advances for the first time including the incorporation of the deep residual neural networks (ResNets) architectures for protein scoring, effective integration of predicted inter-residue interaction with other sequential and structural features, and the use of ensemble learning. With rapid new developments in the field of protein structure prediction 10, 11, 12, however, the scoring resolution used in our original QDeep method no longer represents the state of the art. Recent advances in deep learning-based protein structure prediction methods such as AlphaFold2 [10] have enabled computational prediction of protein with considerably higher accuracy than what has been achieved before. As such, the development of high-resolution protein scoring methods commensurate with the increasing accuracy of structure prediction methods is of critical importance.

While our original QDeep method uses the Global Distance Test Total Score (GDT-TS) [13] as the ground truth metric, the recent advances in structure prediction 11, 12, 14, 15, 16 and refinement 17, 18, 19, 20, 21 make the high-accuracy variant of the Global Distance Test (GDT-HA) [22] more suitable as the ground truth metric. State-of-the-art methods such as AlphaFold2 [10] and RoseTTAFold [11] attain very high accuracy in terms of GDT-TS, but the GDT-HA metric reveals subtle performance differences between the methods (Supplementary Figure S1) due to the sensitivity of GDT-HA to minor structural deviations. As such, a high-resolution scoring function that can estimate the GDT-HA metric with high fidelity can capture minute structural differences in structural models predicted from the state-of-the-art protein structure prediction methods, enabling improved model selection and ranking (Supplementary Text S2). Furthermore, an integrated protein scoring framework that can alternate between the standard and high accuracy variants of the Global Distance Test on-demand in order to control the scoring resolution, can enhance the versatility of protein scoring by covering a broad range of predictive modeling scenarios. Open availability of such a versatile method via a publicly accessible web server has the potential for broad dissemination and a field-wide impact.

Here we present iQDeep, an integrated and fully configurable web server for protein scoring using multiscale deep learning models. iQDeep employs multiscale deep residual neural networks (ResNets) to perform residue-level error classifications at multiple predefined error resolutions, and then probabilistically combines the predictions from the multiscale error classifiers for protein scoring. Building on the original QDeep method, we train a new set of ResNet classifiers to perform residue-level ensemble error classifications at finer-grained error resolutions explicitly targeting the GDT-HA metric. The residue-level error classifications from the multiscale ResNets can then be probabilistically combined for quantitating the accuracy of a predicted protein model. By adjusting the resolutions and probabilistic combination of the multiscale ResNets, our method can reliably estimate the standard- or high-accuracy variants of the Global Distance Test metric for protein scoring. The interactive and privacy-preserving web interface of iQDeep allows customizable job submission, tracking, and results retrieval with quantitative and visual analysis along with extensive help information on job processing and results interpretation. The performance of the underlying methods has been rigorously tested and compared against the state-of-the-art approaches in multiple rounds of CASP experiments including benchmark assessment in CASP12 and CASP13 and blind evaluation in CASP14. The iQDeep web server is freely available at http://fusion.cs.vt.edu/iQDeep.

留言 (0)

沒有登入
gif