Prot2Prot: a deep learning model for rapid, photorealistic macromolecular visualization

The Prot2Prot machine-learning model effectively renders photorealistic molecular representations via image-to-image translation of much simpler, easy-to-generate, molecular-surface “sketches.” Prot2Prot illustrations are well suited for scientific publication, outreach, and education. CLI Prot2Prot can also generate animations of protein motions (Online Resource 1 and 2).

Description of rendering styles

We trained Prot2Prot models to mimic three distinctive rendering styles, which we call “Simple Surface,” “Chalky,” and “Chalky Shadow.”

Simple surface

In the “Simple Surface” rendering style, carbon, oxygen, nitrogen, sulfur, and hydrogen atoms are light silver, red, blue, yellow, and white. Color support for other elements is limited. When rendering the photorealistic Blender images used for training, we applied two effects to give the final images a better sense of depth. First, we used Blender’s mist pass to render more distant protein regions in lighter colors, producing a “fade-to-white” fog effect. Second, we used Blender’s depth-of-field effect to focus the virtual camera on the protein surface directly in front of it, such that regions distant from that focal point appear slightly blurred or out of focus.

We also used several advanced lighting techniques to enhance photorealism. First, we applied a slight subsurface-scattering effect to all surfaces using Blender’s Principled BSDF shader. When light hits many natural materials, it penetrates the surface and is scattered in the object’s interior. After a light ray makes its way back to the surface, it leaves the object at a random angle, not the predictable angle typical of a perfectly reflective (“glossy”) surface. Second, rather than light the scene with a single point or directional light, we used a public-domain, high dynamic range image (HDRI [27]) to surround and light the surfaces. High-dynamic-range (HDR) lighting prevents the darkest and lightest regions of the image from being saturated as perfectly black or white, allowing the viewer to see full detail across the entire image. Third, we applied ambient occlusion to the scene. This non-physical rendering technique approximates global illumination by darkening surfaces that are only partially accessible to the broader environment (e.g., enclosed pockets). After rendering the image using Blender’s Cycles path-tracing render engine, we adjusted the color level using ImageMagick to ensure the background was precisely white, as typically required for publication-quality images.

We successfully trained our Prot2Prot models to mimic these Blender-rendered output images given a corresponding input “sketch image.” When converted to the TensorFlow.js graph-model format, the final model takes up roughly 40 MB. Figure 4A, B show how the model has learned to mimic the fade-to-white-fog (*), depth-of-field (†), and ambient-occlusion (‡) effects of the Blender-rendered training images.

Fig. 4

An atomic resolution model of the human apoptosome obtained via electron microscopy (70,189 atoms; PDB ID 3J2T), visualized using Prot2Prot. A, B Simple Surface rendering style. C Chalky rendering style, colorized with a green tint. D Chalky Shadow rendering style. Examples of fade-to-white fog, depth of field, and ambient occlusion are marked with *, †, and ‡, respectively

Chalky

The “Chalky” rendering style also has fade-to-white fog, ambient occlusion, and depth-of-field blur. Unlike Simple Surface, Chalky shows all atoms in the same white material, without subsurface scattering. Instead, we set the “Roughness” and “Clearcoat Roughness” settings on the Principled BSDF shader to their maximum values to give the surface a highly diffuse appearance. Chalky uses a public-domain studio lighting setup obtained from blendswap.com [28] to light the proteins rather than an HDRI. After rendering the training images, we again adjusted the color levels using ImageMagick.

Trained Prot2Prot models successfully mimic these Blender-rendered output images as well. The Chalky models also take up ~40 MB, with similar run times in the browser. Figure 4C shows how Chalky images are particularly well suited to the custom colorization procedure (in this case, with a green tint) described in the Materials and Methods.

Chalky shadow

The “Chalky Shadow” rendering style is the same as the “Chalky” style, except the virtual studio lights are allowed to cast a shadow onto a pure-white floor below. The trained Prot2Prot models successfully mimic the shadows computed using advanced path tracing in Blender (Fig. 4D). Online Resource 2 (bottom row) illustrates how these shadows even convincingly change according to the protein orientation. These models are also roughly 40 MB.

Video rendering via the command line interface

Command-line-interface (CLI) Prot2Prot also accepts multi-frame PDB files as input, allowing users to create animations of molecular dynamics simulations, conformational transitions, etc. Prot2Prot provides four different animation styles via its CLI (Online Resource 1). A “still” animation captures only the frame-by-frame motions of individual atoms without imparting any large-scale rotations to the entire protein. Alternatively, three whole-scene rotation animations can further facilitate visualization: “rock,” “turn table,” and “zoom.”

To demonstrate these animation styles, we first used UCSF Chimera [2] to generate a multi-frame PDB file of S. cerevisiae hexokinase 2 (ScHxk2). Specifically, we used Chimera’s “Morph Conformations” tool to capture the transition between open and closed ScHxk2 structures extracted from a recent molecular dynamics simulation [29]. We created video animations of this transition from image sequences of 48 Prot2Prot-rendered trajectory frames (Online Resource 1).

These animations convincingly capture the ScHxk2 open-to-close transition, but the protein surfaces appear to “flicker.” This subtle artifact arises because Prot2Prot renders each frame without regard for adjacent frames (i.e., the resulting animations lack temporal coherence). To address this issue, we used Prot2Prot to re-render the ScHxk2 trajectory to only twelve images. We then used the Real-time Intermediate Flow Estimation (RIFE) 3.1 algorithm [30], as implemented in the Flowframes software package [31], to interpolate between these twelve images. The resulting animations capture the same open-to-close transition but without the flicker (Online Resource 2). We had similar success using the commercial frame interpolation algorithm implemented in Adobe After Effects.

Compatibility and run times

We have tested the Prot2Prot Web App on all major operating systems and web browsers (Table 1), including some mobile devices. The Prot2Prot model is memory intensive, and the web app will crash if run on a device with a less capable graphical processing unit (GPU). Where possible, the app detects any crash and asks the user to (1) select a smaller output-image size or (2) use the central processing unit (CPU) rather than the GPU. Rendering on the CPU is slower but also less memory restrained.

Table 1 Prot2Prot compatibility

Prot2Prot currently runs fastest on Chromium-based browsers (e.g., Google Chrome, Microsoft Edge, etc.) because these browsers support OffscreenCanvas. On other browsers (e.g., Firefox), TensorFlow.js must use the CPU to run inference rather than the GPU. Users can already enable OffscreenCanvas in Firefox via the advanced configuration preferences, suggesting future versions will enable it by default.

We tested CLI Prot2Prot on Ubuntu Linux running Node.js 16.13.2. The Node.js runtime environment is available on all major desktop operating systems, so we expect CLI Prot2Prot to be broadly compatible as well.

Aside from benefiting from broad compatibility, Prot2Prot also produces high-quality images much faster than dedicated 3D modeling programs such as Blender. Prot2Prot does not require users to set up lights, cameras, materials, etc.—setup activities that typically take much longer than rendering the image itself. But beyond eliminating the need for this laborious setup, Prot2Prot also has improved render times. To demonstrate, we rendered a test scene using Blender 3.2.0 on a MacBook Pro with an Apple X chip. The Blender Cycles path-tracing engine took roughly one minute to generate a 1024 × 1024 image using the GPU Compute device (Apple M1 Max GPU). In contrast, the Prot2Prot web app running on the same machine (Chrome browser) generated a similar image in only 1.2 s once the WebGL shaders had compiled (~6 s). Rendering times vary substantially depending on the available software and hardware (e.g., GPU vs. CPU). For example, older versions of Blender (e.g., 3.0.0) do not support GPU rendering on Apple hardware, and Prot2Prot does not run as quickly when using the CPU version of TensorFlow.js (as required, for example, in Firefox and Safari). But this comparison nevertheless demonstrates that Prot2Prot can dramatically accelerate photorealistic molecular visualization without requiring expertise in 3D modeling.

Visual comparison with other software packages

Figure 5 compares a Prot2Prot rendering to renderings produced by other popular molecular-visualization packages. Prot2Prot has learned advanced rendering techniques such as lighting and subsurface scattering, so users need not undertake the laborious process of setting these techniques up themselves. Rendering a Prot2Prot image is thus as simple as loading the protein, rotating and zooming, and pressing the “Prot2Prot” render button. In contrast, other molecular-visualization programs have many settings that users must adjust to modify the presentation. To normalize the effort invested in producing each image, we sought the path of least resistance when creating comparable renderings using other programs. We changed only those settings needed to set the protein representation to surface, to match atom coloring to the extent possible, and to set the background color to white. Figure 5A shows a Prot2Prot image rendered using the Simple Surface style. Figure 5B, D show renderings generated using the popular desktop molecular-visualization programs PyMOL [5], UCSF Chimera [2], and VMD [1], respectively. Figure 5E, F show renderings generated using two popular web-based visualization programs, Mol* [6] and 3Dmol.js [9].

Fig. 5

Renderings produced by select molecular-visualization software packages. A Prot2Prot using the Simple Surface style. B PyMOL, a desktop program. C UCSF Chimera, a desktop program. D VMD, a desktop program. E Mol*, a web-based program. F 3Dmol.js, a web-based program. In all cases, we changed only those settings required to set the protein representation to surface, to match atom coloring to the extent possible, and to set the background color to white

Limitations

Prot2Prot is a powerful, easy-to-use tool for photorealistic protein rendering, but it has several notable limitations. First, it is generally useful only for rendering protein surfaces. We attempted to train a Prot2Prot model to generate a cartoon-like image of protein tertiary structure given a sketch of the protein backbone atoms (Fig. 6A–C). Prot2Prot often correctly identified alpha helices and beta sheets, but misclassifications were frequent. Furthermore, it depicted alpha helices as elongated blobs rather than perfect cylinders.

Fig. 6

Examples of Prot2Prot shortcomings. A–C Prot2Prot is best suited for rendering protein surfaces. It cannot accurately render a cartoon representation given a sketch of the protein backbone atoms. D The Chalky Shadow rendering style sometimes generates shadows that are excessively wavy (†). An artifactual shadow “blob” sometimes appears in the lower-left-hand corner (‡). E Viewing protein surfaces up close can produce artifacts (*). F Viewing protein surfaces at great distances tends to overrepresent carbon atoms (white). G On rare occasions protein surfaces may be subtly checkered even at intermediate distances (§)

The shadows rendered when using the Chalky Shadow style are generally impressive. Still, occasionally they appear to be more wavy than appropriate given the actual contours of the protein’s profile (Fig. 6D, marked with †). Prot2Prot also sometimes renders a shadow “blob” in the lower-left-hand corner of its Chalky-Shadow output images (Fig. 6D, marked with ‡). Fortunately, image cropping can easily remove this small artifact.

Prot2Prot also often struggles to correctly render protein surfaces with positioning that differs substantially from that depicted in the training images. Artifacts typically occur when proteins are very close to the virtual camera (Fig. 6E, marked with *) or very distant (Fig. 6F). In the case of distant proteins, Prot2Prot appears to overemphasize the contribution of carbon atoms (Fig. 6F, colored in light silver). Finally, subtle checkered (“waffle”) patterns occasionally appear when rendering proteins even at intermediate distances (Fig. 6G, marked with §). Rotating or scaling the molecule slightly generally eliminates these patterns.

Finally, Prot2Prot is trained to render protein surfaces, which are comprised primarily of carbon, oxygen, nitrogen, sulfur, and hydrogen atoms. The model is not trained to render macromolecules containing atoms of other elements (e.g., nucleic acids, which contain phosphorus). In practice, Prot2Prot can successfully render non-proteins when run using the Chalky and Chalky Shadow styles, which depend more on atomic positions that atom types. But running Prot2Prot using the Simple Surface style, which colors atoms by element, is sometimes problematic. Fortunately, in many cases the offending atom is obscured by other less problematic atoms (e.g., oxygen atoms, which often obscure an offending phosphorus).

View original article

JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN

分享书签

0 0 0 0 0 0 0

More from this channel

Prot2Prot: a deep learning model for rapid, photorealistic macromolecular visualization

留言 (0)