Protein structure prediction in the deep learning era

ElsevierVolume 77, December 2022, 102495Current Opinion in Structural BiologyAbstract

Significant advances have been achieved in protein structure prediction, especially with the recent development of the AlphaFold2 and the RoseTTAFold systems. This article reviews the progress in deep learning-based protein structure prediction methods in the past two years. First, we divide the representative methods into two categories: the two-step approach and the end-to-end approach. Then, we show that the two-step approach is possible to achieve similar accuracy to the state-of-the-art end-to-end approach AlphaFold2. Compared to the end-to-end approach, the two-step approach requires fewer computing resources. We conclude that it is valuable to keep developing both approaches. Finally, a few outstanding challenges in function-orientated protein structure prediction are pointed out for future development.

Introduction

Protein structure prediction aims to predict the 3D structure from amino acid sequence, which is regarded as one of the grand challenges in computational biology [1]. The progress in protein structure prediction is very slow until the last decade. Especially, the deep learning-based AlphaFold2 system increases the accuracy of protein structure prediction to an unprecedented level [2]. AlphaFold2 is thus well accepted as one of the milestones in the field [3].

There are three essential factors contributing to the breakthrough in protein structure prediction. The first one is the availability of big biological data, including experimentally determined protein sequences and structures. The second one is the significant advance in the development of computer hardware, including Graphics Processing Unit (GPU) and Tensor Processing Unit (TPU). The last one is the rapid progress of deep learning algorithm, such as residual convolutional network (ResNet) [4] and attention-based transformer [5].

This article presents a brief review of the representative deep learning-based protein structure prediction methods developed in the past two years. A more comprehensive review can be found in the studies by Pearce et al. [6,7]. Figure 1 shows the major steps involved in deep learning-based protein structure prediction methods. Starting from the amino acid sequence of a protein, homologous sequences are first collected with sequence alignment tools such as HHblits [8] and MMseqs2 [9], to construct a multiple sequence alignment (MSA). Either the raw MSA or MSA-derived co-evolution features are then fed into a deep neural network for structure prediction. Homologous structure templates (when available) can be also included in the network. Depending on how the structure is predicted, the methods can be classified into two categories: two-step approach and end-to-end approach (Table 1).

Section snippetsTwo-step approach

The two-step approach divides the task of protein structure prediction into inter-residue 2D geometry prediction with deep learning (step 1) and 3D structure realization (step 2). Correspondingly, there are two key components here. One is what 2D geometry to predict in the first step, and the other one is how to convert the predicted 2D geometry into 3D structure in the second step. Representative methods include RaptorX-Contact [10,11], AlphaFold [12], trRosetta [13,14], trRosettaX [15],

End-to-end approach

The end-to-end approach predicts 3D structure directly within one unified network in one step. Compared to the two-step approach, end-to-end structure prediction is attractive but more difficult to implement. For example, AlQuraishi developed an end-to-end differentiable model (named RGN) with the recurrent geometric network, which takes the input of amino acid sequence and position-specific scoring matrix (PSSM, derived from MSA) and outputs 3D structure [27]. As shown in a recent study that

Remarks on two-step and end-to-end approaches

From what is described above, it seems that the end-to-end approach outperforms the two-step approach significantly. Does this mean that shall we give up the two-step approach? In our opinion, the answer is ‘no’ based on the following observations.

First, we adapt a few components (i.e., Evoformer) from AlphaFold2 and incorporate them into the first step of trRosettaX. The second step of structure realization remains unchanged. We name the new version by trRosettaX2. Preliminary tests on the

Conclusions

With the accumulation of big biological data, the progress in deep learning algorithms, and advance in computer hardware, the last decade has witnessed breakthroughs in protein structure prediction. In this article, we reviewed the representative protein structure prediction methods in the past two years. Specifically, we classify existing methods into two groups, i.e., the two-step approach and the end-to-end approach. The end-to-end approach AlphaFold2 has substantially increased the accuracy

Conflict of interest statement

Nothing declared.

Acknowledgments

This work is supported in part by the National Natural Science Foundation of China (NSFC T2225007, T2222012, 11871290, 61873185, and 61932018).

References (53)M. Remmert et al.HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment

Nat Methods

(2012)

R. Pearce et al.Deep learning techniques have significantly impacted protein structure prediction and protein design

Curr Opin Struct Biol

(2021)

K.A. Dill et al.The protein-folding problem, 50 years on

Science

(2012)

J. Jumper et al.Highly accurate protein structure prediction with AlphaFold

Nature

(2021)

E. Callaway'It will change everything': DeepMind's AI makes gigantic leap in solving protein structures

Nature

(2020)

K. He et al.Deep residual learning for image recognitionA. Vaswani et al.Attention is all you need

Adv Neural Inf Process Syst

(2017)

R. Pearce et al.Toward the solution of the protein structure prediction problem

J Biol Chem

(2021)

M. Steinegger et al.MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets

Nat Biotechnol

(2017)

J. XuDistance-based protein folding powered by deep learning

Proc Natl Acad Sci USA

(2019)

S. Wang et al.Accurate de novo prediction of protein contact map by ultra-deep learning model

PLoS Comput Biol

(2017)

A.W. Senior et al.Improved protein structure prediction using potentials from deep learning

Nature

(2020)

J. Yang et al.Improved protein structure prediction using predicted interresidue orientations

Proc Natl Acad Sci USA

(2020)

Z. Du et al.The trRosetta server for fast and accurate protein structure prediction

Nat Protoc

(2021)

H. Su et al.Improved protein structure prediction using a new multi-scale network and homologous templates

Adv Sci

(2021)

F. Ju et al.CopulaNet: learning residue co-evolution directly from multiple sequence alignment for protein structure prediction

Nat Commun

(2021)

T. Shen et al.When homologous sequences meet structural decoys: accurate contact prediction by tFold in CASP14-(tFold for CASP14 contact prediction)

Proteins

(2021)

J.G. Greener et al.Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints

Nat Commun

(2019)

W. Zheng et al.Protein structure prediction using deep learning distance and hydrogen-bonding restraints in CASP14

Proteins: Struct, Funct, Bioinf

(2021)

J. Hou et al.The MULTICOM protein structure prediction server empowered by deep learning and contact distance predictionQ. Wu et al.Protein contact prediction using metagenome sequence data and residual neural networks

Bioinformatics

(2020)

A.T. BrungerVersion 1.2 of the crystallography and NMR system

Nat Protoc

(2007)

R.H. Swendsen et al.Replica Monte Carlo simulation of spin glasses

Phys Rev Lett

(1986)

D.C. Liu et al.On the limited memory BFGS method for large scale optimization

Math Program

(1989)

M. Baek et al.Accurate prediction of protein structures and interactions using a three-track neural network

Science

(2021)

J. Xu et al.Improved protein structure prediction by deep learning irrespective of co-evolution information

Nat Mach Intell

(2021)

View full text

© 2022 Elsevier Ltd. All rights reserved.

留言 (0)

沒有登入
gif