Fragment-based Linear Scaling Computation Methods
QM methods have been widely applied in chemistry because they can evaluate electron interactions relative to molecular mechanical (MM) methods. However, due to the steep computational scaling associated with system size, it is difficult, or even impossible, to perform quantum calculations for large molecular systems, such as biomolecules that contain hundreds or thousands of atoms. The desire to study systems larger than what was computationally feasible led to the development of novel methods, including QM/MM methods, semi-empirical approaches, and reduced-scaling methods.11 Another method is linear scaling,24 a thriving field of research into efficient calculation of large molecular systems using only modest requirements for memory and CPU time. Some linear scaling methods rely on screening or approximating electron-repulsive integrals,25-30such as the fast multipole method25 and linear scaling exchange29; however, they do not split a whole molecule into fragments, and can only achieve linear scaling for one-dimensional systems, such as alkane chains. In contrast, FLSMs as another important class of scaling method can achieve real linear scaling for 3D systems, such as proteins.10 These represent examples of the significant progress made in developing and applying new fragmentation methods.31-40
Li et al.32 grouped FLSMs into density- and energy-based methods. In the present study, we divided all FLSMs into ‘overlapping’ and ‘disjoint’ methods according to fragment formation. FLSMs, such as MFCC, FMO, generalized energy-based fragmentation (GEBF), molecular-tailoring approach (MTA), kernel energy method, X-Pol (previously referred to as MODEL), use strictly linear scaling and have received increased attention.10,14,15,32,41-48 For example, FMO was used by Heifetz et al.49,50 to investigate agonist–orexin-2 receptor interactions and optimize interleukin-2-inducible T cell kinase inhibitors. Additionally, MFCC was used by Liu et al.51 for geometry optimization and vibrational spectrum calculation of proteins, and Singh et al.52 used the MTA method to estimate binding energies for large water clusters.
Performing an FLSM calculation usually requires three steps. The first step is dividing the large molecular system into subsystems, which might or might not contain buffer atoms, according to different methods. Ideally, the correct fragmentation operation ensures the local interaction of every fragment. Second, input file(s) of appropriate quantum chemistry software should be prepared and used for subsystem calculation. The final step is assembling the calculation results of substructures to obtain the original system’s properties, such as charge, energy, or energy gradient. Correct and efficient performance of FLSM calculations is not straightforward, especially for the molecule fragmentation step, which is cumbersome. Combining the three steps into a single, automated solution in one platform or software package would significantly lower the barrier of using FLSMs.
Among FLSMs, two methods typically belong to different subclasses, with MFCC and FMO chosen for implementation in this study. MFCC was proposed by Zhang et al. in 200314 and represents an inclusion-exclusion principle-based method that belongs to the ‘overlapping’ FLSM subclass. It is ideally suited for calculations involving large biomolecules, such as ligand–protein-binding energies. There are other methods, including generalized (G)MFCC/MM and electrostatically embedded (EE)-GMFCC,14,53,54developed based on the MFCC method; however, a platform or software suite for simplified use of the MFCC method for research is currently unavailable. In this study, we constructed an automated process to make MFCC a useful tool, especially for users unfamiliar with such methods.
Kitaura and co-workers originally proposed the FMO method, which belongs to the ‘disjoint’ FLSM subclass, in 1999 to calculate the energies of large molecular systems and with properties obtained from many-body expansion or FMO calculations.15,55-59 FMO is a well-established tool for calculating energies and other properties, optimizing structure and study time evolution with molecular dynamics (MD), and investigating interactions in large molecular systems.10,57,60 Several improved methods have been developed based on FMO, including effective fragment-potential FMO61, FMO/polarizable continuum model (PCM)62, and FMO-long-range correction density functional tight binding.63 Additionally, FMO has been implemented in several programs, including General Atomic and Molecular Electronic Structure System [GAMESS (US)], ABINIT-MP, OpenFMO, and parallelized ab initio calculation system (PAICS).64-67 GAMESS (US) incorporates a majority of FMO-related methods and is a widely used package for FMO research.57 OpenFMO is an open-architecture program targeting effective FMO calculations on massive parallel computers, especially GPU-accelerated computers.67 The availability of graphical user interfaces makes FMO application relatively easy for preparing calculations and visualizing results.34,40,68 For example, FragIt is used to prepare input files for FMO calculation in GAMESS, but it cannot use HPCs to accelerate the computing process, and it includes limited results analysis ability.35 Additionally, Facio is used for FMO input-file preparation for PC-GAMESS; however, its implementation as a Windows application restricts its use.69 BioStation Viewer and PAICSView are user interfaces for ABINIT-MP and PAICS, respectively, and both have limited abilities to interact with HPCs. In the present study, we implemented the full FMO method of GAMESS into GridMol to allow users to prepare FMO input files easily and use HPCs to accelerate the computation process.