The effect of midbond functions on interaction energies computed using MP2 and CCSD(T)

1 INTRODUCTION

The role of noncovalent interactions in various areas of natural sciences cannot be underestimated. They play, for example, an important role in the design and construction of supramolecular self-assemblies and materials1, 2 as well as in biochemical research.3-5 The computational description of noncovalent interactions has therefore been a topic of research for several decades.6-13 Compared to covalent bonds, noncovalent interactions are characterized by larger intermolecular distances, smaller changes in the electronic structure of the interacting species and energy differences which are orders of magnitude lower.10 Thus, quantifying noncovalent interaction energies through computations has stricter requirements on the level of theory to obtain meaningful results. For quantitative results of interaction energies, correlated ab initio electronic structure wave function models in combination with high-quality atomic orbital (AO) basis sets are necessary.14 To this end, the coupled cluster hierarchy of wave function models has been shown to be effective.15 Indeed, the coupled cluster model with singles, doubles and perturbative triple excitations16 (CCSD(T)) with a complete basis set (CBS) description (also known as “gold standard” of electronic structure theory)17 provides quantitative accuracy.15, 18-20 This method has two apparent challenges, namely the N7 scaling of the CCSD(T) model (with N being a number of correlated electrons/measure of molecular size) and a very slow convergence of the electron correlation energy to the CBS limit. The scaling issue can be avoided by local approximations,21-23 while with respect to the latter point, extrapolation techniques24-29 towards the CBS limit have proved effective and are therefore widely used.30-32 Alternatively, the convergence towards the CBS limit can be improved by including terms into the wave function, which explicitly depend on the intermolecular distance.33, 34 These so-called explicitly correlated methods are characterized by a much faster convergence to the CBS limit.35, 36

A common approach for computing interaction energies is the supramolecular approach, in which the interaction energy of two weakly bound systems is defined as the difference of the energies of the dimer and then energies of the monomers. In addition to scaling and incomplete basis set challenges, the supramolecular approach (in combination with a finite basis set) introduces errors such as the basis set superposition error (BSSE). The BSSE is an artifact of the supramolecular approach, and it appears since the monomers of the complex calculation will have an improved description over the separate monomers.37 The BSSE results in an artificially strong computed interaction energy of a complex,38 and for noncovalent interactions the BSSE may be on the same order of magnitude as the interaction itself, thus a correction of the error is necessary. The counterpoise (CP) correction method of Boys and Bernardi39 has been used in numerous applications and has proven very efficient. At the same time, the CP approach tends sometimes to overcorrect for BSSE.40-43 Therefore, Sherill and co-workers44 recommended to apply the so-called “half-counterpoise” corrections (average of raw and CP-corrected quantities) for ab initio calculations on noncovalent complexes using the basis set of aug-cc-pVQZ and below. This is also valid at explicitly correlated levels of theory, MP2-F12 and CCSD(T*)-F12, in combination with intermediate basis sets (e.g., cc-pVTZ-F12), while with small basis like cc-pVDZ-F12 uncorrected results are closer to CBS limit due to beneficial error compensation between BSSE (overbinding) and intrinsic basis set incompleteness (underbinding).45 Other alternatives to the classical CP correction method are known—the so-called virtual CP approach,37, 46 atom by atom scheme (CPaa)47 and Same Number Of Optimized Parameters scheme48, 49 to name few.

There is a second effect on the quality of the computed interaction energies which is much less discussed than the BSSE. In electronic structure theory, we commonly choose atom centered basis sets. Since intermolecular distances are large for noncovalent interactions (relative to distances in covalent bonds) the atom centered basis functions have limited flexibility in the interaction energy. And hence, the nucleus centered basis sets favor the description of the molecule region over the interaction region, thus resulting in an imbalanced description of the noncovalent system. This effect will thus also be present in BSSE corrected calculations since the monomer and dimer energies are still just computed using the atom centered basis set.

This effect is seldom discussed in its own right, and the associated error is attributed to the fact that CBS is not reached. However, the CBS errors will be much larger for the interaction region than for the molecules due to basis sets being centered on atoms. The errors associated with not reaching CBS are commonly addressed by using large atom centered basis sets with diffuse functions, for example, aug-cc-pVXZ basis sets with cardinal number X = 5 or higher.19 However, this rapidly ends in unfeasible calculations even for small molecular systems and moreover, this is an inefficient way to progress since the approach requires one to go close to CBS limit both in molecule regions and the interaction region, using atom centered basis functions. It is also worth noting that explicitly correlated models alone cannot remove the error introduced by the imbalanced description of molecules and interaction region.

As an alternative, Tao and Pan50 introduced the use of midbond functions, a set of functions centered in the interaction region to supplement atom centered basis functions. The 3s3p2d midbond functions set with exponents 0.9, 0.3, 0.1 for s- and p-functions and exponents 0.6 and 0.2 for d-functions was one of the first used midbond sets. This set was subsequently implemented by several research groups in the 1990's for calculations of interaction energies and potential energy surfaces of rare gas dimers and rare gas-molecule complexes.51-59 Extension of this midbond set by one f-type function and further by one g-type function resulted in 3s3p2d1f and 3s3p2d1f1g sets (the added f- and g-functions have both the exponent 0.3), respectively. These sets have been applied in calculations of potential energy surfaces60-66 and also dissociation energies.67 Some examples of large sets of midbond functions, for example, 6s6p6d3f3g3h are also known.68 A prominent example of the computational effectiveness of midbond functions for computing noncovalent interaction energies is the recent extensive studies by the Patkowski group,69 demonstrating that at CCSD(T) and CCSD(T)-F12 levels of theory aug-cc-pVDZ basis set supplemented with midbond functions can provide results of high accuracy.17, 69

Midbond functions are generally used on ghost atoms located in the intermolecular region, and displacement along the van der Waals bond has been shown to have only a negligible impact on the accuracy of interaction energies of rare gas complexes.51, 52 The optimal position of midbond functions in small molecular dimers was also investigated in a more recent study by Shaw and Hill,18 where they optimized the position of the midbond functions. However, since moving the midbond center closer to the larger/heaviest monomer is energetically favorable, such an optimization procedure only maps the effect of placement along the bond. The study by Shaw and Hill18 has also challenged the prevalent assumption that the interaction energy is insensitive to the exponents of the midbond functions.70 It was demonstrated that significant improvement in both canonical and explicitly correlated calculations of interaction energies can be achieved by optimization of midbond exponents.18

To this date, systematic studies of effects of midbond functions in calculations of interaction energies for medium sized molecular systems (30 atoms) using midbond functions have been performed on the S22 data set.11, 69 Despite the usefulness and popularity of the S22 data set,71-74 there are some issues associated with it. For example, the S22 data set mainly targets interactions of nucleic acid bases while other interaction categories are under-represented (e.g., single hydrogen bonds) or even missing (aliphatic–aliphatic dispersion interactions).75, 76 The S66 data set was designed to cover a wider range of interaction motifs.75 For the S66 data set, Ma and Werner have included midbond functions in local correlation calculations of interaction energies.77 However, they present no systematic studies on how midbond centers specifically affect the results for the S66 data set.

In this paper we use the A24 and S66 data sets to illustrate how midbond functions affect the computed interaction energies at MP2 and CCSD(T) levels of theory. We show that using midbond functions of the type 3s3p2d1f1g alleviates the requirements on the chosen nucleus centered basis set, and that including a midbond center is highly efficient for small molecular systems, as displayed using the A24 set of molecules. Further, the A24 data set is used to illustrate that midbond functions provide a balanced description of molecules and interaction regions. For the S66 data set we explore the inclusion of two, rather than just one, midbond centers and we show that the accuracy increases upon adding the second midbond center, which, however, is associated with significantly increased computational costs. We also compare computations of interaction energies using the 3s3p2d1f1g midbond set with the use of correlation consistent aug-cc-pVDZ and aug-cc-pVTZ as midbond sets and we see that the midbond set needs to contain functions of high angular momentum (at least f-functions) in order to be effective. Further, by examining two approaches for placing midbond centers we show that results are not particularly sensitive to exact placement as long as it is reasonable.

The paper is organized as follows. Section 2 contains an exposition of the methodology. In Section 3 we use the A24 data set to illustrate the primary objective for using midbond functions. In Section 4 we present results for the S66 data set. In Section 5 we provide a summary and concluding remarks.

2 METHODOLOGY 2.1 Data sets

For the presented study we use the A2478 and S6675 data sets. Whereas the A24 data set consists of small molecular systems, the S66 data set contains larger and more biologically relevant complexes. Both data sets include noncovalent interactions of various kinds, covering hydrogen bonds (referred to as hydrogen group), dispersion (dispersion group) and a mixture of interaction types (others group). The complex geometries are used without further optimization and they can be found in the BEGDB online database.79

2.2 Interaction energy All interaction energies are calculated using the supramolecular approach, in which the interaction energy Eint of two weakly bound atoms or molecules is defined as the difference of the energies of the dimer AB, E(AB), and the energies of the monomers A, E(A), and B, E(B). CP correction39 is used to correct for the BSSE, that is, the monomer energies are calculated in the basis of the dimer. The interaction energy is therefore given by, urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0001(1)

The superscript “AB” on monomers indicates that the basis for the dimer is used in the monomer calculations. All results presented in this paper are computed using the CP correction, and for simplicity, the “CP” superscript is made implicit.

2.3 Midbond sets

In this work, we will investigate the performance of the midbond set 3s3p2d1f1g67 and the correlation consistent aug-cc-pVXZ basis functions (X = D, T, Q) for carbon as midbond sets. The exponents of the former set of midbond functions are 0.9, 0.3, 0.1 for s-functions and p-functions, 0.6, 0.2 for d-functions and 0.3 for the f- and g-functions. For carbon atom the contracted composition of correlation consistent basis functions is: [3s,2p,1d] + diffuse (1s,1p,1d) for aug-cc-pVDZ, [4s,3p,2d,1f] + diffuse (1s,1p,1d,1f) for aug-cc-pVTZ and [5s,4p,3d,2f,1g] + diffuse (1s,1p,1d,1f,1g) for aug-cc-pVQZ. We will use a shorthand notation to indicate which type of midbond functions we use, that is, (+33211) for 3s3p2d1f1g and (+aXZ) for aug-cc-pVXZ, respectively.

The standard procedure of adding midbond functions is to place them on the midpoint between the centers of mass of the interacting monomers.50, 67, 68 Patkowski and co-workers stress that this method often leads to a midpoint being located closer to one of monomers69 and thus favoring its description. In order to avoid this, they refer to an algorithm80 of locating a midbond center, where its location is a urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0002 weighted average of intermolecular atom–atom midpoints, urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0003(2)where the summation runs over all atoms a in subsystem A and all atoms b in subsystem B.69 Except where stated explicitly otherwise, we will use Equation (2) to place one midbond center (referred to as systematic). In Section 4.3 we explore the effect of placement of midbond functions, and we include results for where simple chemical intuition is used to place midbond centers (referred to as intuitive, see Figure S1). A detailed description of this approach can be found in Section S3.5 in the Supporting Information. In sections where we use two midbond centers for the S66 data set, we only use the 53 out of 66 complexes which have wide enough interaction regions. The two centers are placed manually. For system names of the 53 complexes with two midbond centers see Table S25. Geometries of the complexes (including midbond centers) of the A24 and S66 data sets are available at https://doi.org/10.18710/2FWECY. 2.4 Statistical measures The errors of the interaction energies are calculated in absolute terms urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0004(3)and in relative (%) terms, urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0005(4)where urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0006 is the reference value for the interaction energy of complex urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0007, and urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0008 is the computed value of the interaction energy for complex urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0009. Since the magnitude of interaction energies varies significantly with interaction type, using errors in relative terms (%) makes it easier to compare the quality of results for different interaction types. We will present errors in terms of the mean absolute error, urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0010, and the relative mean absolute error, urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0011, urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0012(5) urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0013(6)where urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0014 is the total number of noncovalent complexes. 2.5 Computational details

The interaction energies for the A24 (see Section 3) and S66 (see Section 4) data sets are computed at MP2 and CCSD(T) levels of theory using correlation consistent basis sets (aug)-cc-pVXZ81 with X = D, T, Q for A24 and X = D, T for S66, within the frozen-core approximation and employing CP correction for BSSE. All calculations presented in this work are carried out using the LSDALTON program.82 The Hartree–Fock convergence threshold was tightened from default to 10−7 and the integral threshold was tightened from default 10−8 to 10−9 for all calculations.

For the timing calculations presented in Sections 4.1.1 and 4.2.1 we use the LSDALTON2020.0 wall times from utilizing a single cluster node with two Intel Xeon E5-2630 v2 CPUs summing up to 20 cores. The timing computations are performed on resources provided by the NTNU IDUN/EPIC computing cluster.83 All timings include the dimer calculation as well as the CP-corrected calculations for monomers.

The CCSD(T) reference interaction energies for the A24 data set were obtained from aug-cc-pVQZ(+33211) calculations (see Table S1), where a 3s3p2d1f1g midbond center was placed according to Equation (2). For CCSD(T) calculations in the S66 set, we used the revised benchmark CCSD(T)/CBS interaction energies presented in the publication of Hobza et al.76 as reference values. To obtain MP2 reference values for both data sets we compute interaction energies at MP2/aug-cc-pVQZ level of theory adding one 3s3p2d1f1g midbond center according to Equation (2) (see Tables S1 and S16).

3 THE OBJECTIVE OF MIDBOND FUNCTIONS

In this section we use the A24 set to illustrate why the inclusion of a midbond center significantly improves the interaction energy when using small basis sets, and which is unrelated to reaching the CBS limit. We do so by examining computed monomer and dimer energy contributions compared to reference monomer and dimer energies (all within the CP correction). We present CCSD(T) results for the A24 data set using the basis set cc-pVXZ, X = D, T, Q with and without the midbond sets 3s3p2d1f1g (denoted +33211) and carbon aug-cc-pVXZ, X = D, T, Q (denoted +aXZ). In Table 1 we present the energy differences for monomers (MA and MB) and dimers (D) for each basis set combination (for monomer and dimer energies see Tables S13–S15) with respect to reference values computed using aug-cc-pVQZ(+33211). urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0015 and urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0016 for the interaction energies are presented in Table 2 (interaction energies itself can be found in Tables S7–S9). Note that errors in energy contributions in Table 1 are given in mEh, while errors in interaction energies in Table 2 are given in kcal/mol.

TABLE 1. Mean absolute errors (urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0017, mEh) in monomers (MA and MB) and dimer (D) energies of the A24 set of complexes computed using CCSD(T) and basis set combinations cc-pVXZ(+33211), cc-pVXZ(+aXZ), and cc-pVXZ for X = D, T, Q 3s3p2d1f1g aug-cc-pVXZ No midbond Basis MA MB D MA MB D MA MB D DZ 85.0 94.8 180.1 90.5 99.7 190.8 98.6 106.8 206.9 TZ 20.1 23.2 43.5 20.2 23.2 43.5 23.2 26.1 49.8 QZ 1.4 1.8 3.3 1.1 1.4 2.5 2.2 2.5 4.9 Note: Errors are relative to CCSD(T)/aug-cc-pVQZ(+33211) reference values. TABLE 2. Relative mean absolute errors (urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0018, in %) and mean absolute errors (urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0019, in kcal/mol) for interaction energies of the A24 set of complexes calculated at CCSD(T) level of theory using cc-pVXZ(+33211), cc-pVXZ(+aXZ), and cc-pVXZ for X = D, T, and Q 3s3p2d1f1g aug-cc-pVXZ No midbond Basis urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0020 urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0021 urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0022 urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0023 urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0024 urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0025 DZ 11.8 0.233 23.9 0.374 68.7 0.920 TZ 6.2 0.113 5.9 0.100 31.5 0.412 QZ 2.5 0.042 1.3 0.021 13.5 0.165 Note: Reference values are computed using CCSD(T)/aug-cc-pVQZ(+33211).

We first consider the energy differences for monomers and dimers separately, presented in Table 1. We see that for cc-pVXZ in combination without and with both types of midbond sets (3s3p2d1f1g and aug-cc-pVXZ), urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0026 in monomer and dimer energies are, as expected, decreasing as the basis set is increased from cc-pVDZ through cc-pVTZ to cc-pVQZ. Further, we note that the errors in MA, MB, and D are similar for results with and without midbond functions but with somewhat larger differences for cc-pVDZ results. However, the errors with and without midbond functions are still on the same order of magnitude even for cc-pVDZ results. In contrast, if we look at urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0027 and urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0028 results in Table 2, we see a significant difference between errors for the interaction energies computed with and without midbond sets. For example, for the cc-pVDZ calculation, urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0029 for cc-pVDZ(+33211), cc-pVDZ(+aDZ), and cc-pVDZ are 11.8%, 23.9% and 68.7%. Increasing the basis set to cc-pVTZ reduces urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0030 to 6.2%, 5.9%, and 31.5% for cc-pVTZ(+33211), cc-pVTZ(+aTZ), and cc-pVTZ, respectively. Hence, we see that although the errors for MA, MB, and D energies for the cc-pVTZ without a midbond set are much smaller than the errors for cc-pVDZ with midbond sets, the error in the interaction energy itself for the cc-pVTZ calculations without a midbond set (urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0031 = 31.5%) is larger than the interaction energy error, for example, for cc-pVDZ(+33211) (urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0032 = 11.8%). Thus, we see that more accurate results for MA, MB, and D do not necessarily correlate with more accurate interaction energies. The primary role of a midbond set is therefore not to provide more accurate monomer and dimer energies by more rapidly reaching CBS limit, but rather to ensure that a balanced description of monomers and dimer is obtained. That is, a midbond set corrects imbalances in description of molecules and the interaction region occurring due to the use of atom centered basis sets which only have sufficient flexibility in the interaction region when large cardinal numbers and augmentation is included.

We have also performed the same calculations for MP2 to see whether the observed results are artifacts of CCSD(T). A detailed overview of errors for the MP2 calculations can be found in Section S2.2 in the Supporting Information (see Tables S2 and S3). The MP2 interaction energies are shown in Tables S4–S6, whereas the monomer and dimer energies are presented in Tables S10–S12. The MP2 results show the same trends as the CCSD(T) results, with errors generally being somewhat larger. Although urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0033 in monomer and dimer energies are similar with and without midbond functions for each basis set, they do not correlate with the urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0034 and urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0035 of interaction energies. This supports the findings from the CCSD(T) calculations in Tables 1 and 2 that a better agreement of monomer and dimer energies with corresponding reference values does not necessarily mean a higher accuracy in the interaction energies.

4 RESULTS FOR S66 DATA SET

In this section we present results for the S66 data set. In Section 4.1 we discuss the effect on interaction energies using one and two 3s3p2d1f1g midbond centers. In Section 4.2 we compare the results of using two 3s3p2d1f1g centers with using two carbon-type aug-cc-pVDZ midbond centers. We further comment on relative timings for various combinations of basis sets and midbond functions. In Section 4.3 we discuss the effect of using a systematic method for placing midbond function centers (see Section 2.3) versus using an intuitive (manual) method for placing midbond centers (see Section S3.5 in the Supporting Information).

4.1 One versus two 3s3p2d1f1g midbond centers

The S66 data set contains complexes with wide interaction regions, and it therefore is reasonable to investigate whether an increased number of midbond centers will result in improved interaction energies. Further, we investigate whether the increased number of midbond centers can compensate for the deficiencies of using a small atom centered basis set (e.g., cc-pVDZ), as was the case for the A24 data set (see Section 3). Note that two midbond centers are only used for 53 out of 66 complexes,* where the interaction region is reasonably large. Accordingly, we compare results obtained using two midbond centers (see Tables S17 and S21) with results obtained using one midbond center (see Tables S19 and S23) only for the 53 complexes. Results for urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0036 and urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0037 are shown in Table 3 and visualized in Figure 1.

TABLE 3. Relative mean absolute errors (urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0038, in %) and mean absolute errors (urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0039, in kcal/mol) of interaction energies for 53 S66 complexes (see Section ) calculated at different levels of theory using cc-pVDZ, cc-pVTZ, and aug-cc-pVDZ in combination with one (33211) or two (2*33211) midbond centers 33211 2*33211 MP2 CCSD(T) MP2 CCSD(T) urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0040 urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0041 urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0042 urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0043 urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0044 urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0045 urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0046 urn:x-wiley:01928651:media:jcc26777:jcc26777-math-0047 DZ Hydrogen 10.1 0.958 12.0 1.189 7.2

留言 (0)

沒有登入
gif