Using the GENERALIZED BORN model to represent solvation effects.

[created by Alexey Onufriev, Feb. 2002]


This tutorial is a short introduction into the Generalized Born (GB) model and its usage within AMBER 7. Here you will find a (very) brief description of the model, and steps necessary to set-up and run a simple MD simulation of a protein (thioredoxin) using the GB to represent solvation effects.


What is GB? The analytic Generalized Born (GB) model efficiently describes electrostatics of molecules in water environment. It represents the solvent implicitly as continuum with the dielectric properties of water, and includes the charge screening effects of salt. Several versions of the GB model are now available in AMBER.

Some attractive features of GB models:


Preliminaries.

Before we proceed, make sure that AMBERHOME shell variable is set properly: it needs to be set to the directory where AMBER resides. If the AMBER source is in /usr/local/src/AMBER/amber7, you would type at the prompt, or include in your .cshrc file:

setenv AMBERHOME /usr/local/src/AMBER/amber7

To add the AMBER binary directory to your path, type at the prompt or add to your .cshrc:

set path = ($path $AMBERHOME/exe)

Preparing the PDB file for input into LEaP.

We will use a 108-residue protein thioredoxin for this tutorial. First, we do some manipulations with its PDB file 2TRX.pdb to prepare it for input into LEaP - a program used to build the necessary topology and coordinate files (novices to the field are encouraged to consult other, more detailed, tutorials and the Manual at this point). This particular PDB file happens to contain two structures, A and B, and we choose A to work with -- retain lines 155 through 1004 and delete the rest. We then screen through the file and remove all possible double entries for atoms within a residue, i.e. whenever we encounter smth. like:

 
ATOM    389  CG AGLU A  48      29.878  16.097   0.001  0.50 23.22   2  2TRX 540 
ATOM    390  CG BGLU A  48      31.372  15.982   1.785  0.50 24.50   2  2TRX 541 
ATOM    391  CD AGLU A  48      30.874  15.856  -1.084  0.50 24.57   2  2TRX 542 
ATOM    392  CD BGLU A  48      32.387  15.377   0.888  0.50 25.93   2  2TRX 543 
we get rid of alternative positions for atoms CG and CD, lines 2 and 4 (if you don't have reasons to prefer one over the other, it does not matter whether you retain all "A" or "B" positions).

Now change ASP A 26 into ASH A 26 at every instance, to reflect the fact that this particular glutamic acid is uncharged (protonated) in thioredoxin. We also change CYS A 32 and CYS A 35 into CYX A 32 and CYX A 35 respectively, to allow LEaP to create a di-sulfate bond between CYS32 and CYS35. The result of all these manipulations is in a new PDB file which we call trx.pdb. We will use it later as input to LEaP.

Generating AMBER topology and coordinate files.

We are now ready to generate the topology and coordinate files necessary to run molecular dynamics simulations. To this end, we have prepared a small script file trx.leap.scrpt to use with tleap - a text version of LEaP. Before we run tleap , let's see what each line of trx.leap.scrpt means:

source leaprc.ff99 /* We use ff99 force-field, which is a good choice for GB. */

set default PBradii bondi /* Use Bondi set of atomic radii, needed for a particular version of GB we will use here (igb=2) */

trx = loadpdb trx.pdb /* Load the (modified) PDB file trx.pdb */

bond trx.35.SG trx.32.SG /* Create a di-sulfate bond between residues 32 and 35 */

saveamberparm trx trx.top trx.crd /* Save the generated topology and coordinate files into trx.top and trx.crd respectively. */

quit /* Wonder what this may stand for... */

Now run tleap by typing on command line:

tleap -f trx.leap.scrpt

Each line of trx.leap.scrpt is executed, and hydrogen atoms are added automatically. On your screen you will see a number of LEaP messages. It is always a good idea to check them to make sure everything went OK. If it did, AMBER topology and coordinate files were generated, trx.top and trx.crd . We will use them as inputs for our next step. Note that no explicit water molecules have been added to the structure - those are represented implicitly by the GB force-field.

Running Molecular Dynamics.

Step 1. Minimization First we remove possible steric clashes or "hot spots`` in the structure by relaxing atomic positions to local energy minima. We will use sander for it, the necessary input parameters are given in the file min.in , which looks like:


cat <mdin
  minimize structure
 &cntrl
   imin=1, maxcyc=100,
   cut=300.0, igb=2, saltcon=0.2, gbsa=1,
   ntpr=10, ntx=1, ntb=0,
 &end
  keep all atoms frozen
  5.0
RES 1 108
END
END
eof
sander   -O -i mdin \
-c trx.crd  -p trx.top  -ref trx.crd -o trx.min.out -r trx.rst

Before we execute this script, let's see what it does. The first line creates a file mdin which contains all lines of min.in from &cntrl to eof (not including); this becomes the part of actual input into sander , specified by: sander -O -i mdin -c trx.crd -p trx.top -ref trx.crd -o trx.min.out -r trx.rst . Other input files: trx.crd , trx.top ; output: trx.min.out and trx.rst .

By setting imin=1 and maxcyc=100 , we tell sander to perform 100 cycles of minimization, while keeping all atoms in each residue from 1 through 108 constrained ( ntr = 1 ) to their original positions (specified in the reference structure trx.crd ) by a harmonic potential U = kX^2 with a force constant k = 5.0 [kcal/mol/A^2].

Control variables igb=2, saltcon=0.2, gbsa=1, are GB specific: igb=2 means that we are using the 2nd (out of 4) GB models available in AMBER-7, it is expected to perform well on proteins and other compounds with large interior regions; saltcon=0.2 sets the surrounding (monovalent) salt concentration to 0.2 [mol/L], and gbsa=1 means that the hydrophobic effect is taken into acount by adding a surface energy term to the total energy. Note that by setting cut =300 we specify a cut-off value for long range interactions, 300 A, larger than the system size, that is there is essentially no cut-off. More details, and explanation of other parameters are found in the manual.

We are now ready to start the minimization. On command line type

min.in &

(You may need to type chmod +x first to make min.in an executable. ) You can watch the progress by

more mdinfo

When everything is finished, look up the results in trx.min.out . The first block of data in the ``RESULTS" section looks like:

   NSTEP       ENERGY          RMS            GMAX         NAME    NUMBER
      1      -2.6695E+03     1.4295E+01     1.6593E+02     HD3      1020

 BOND    =      158.2876  ANGLE   =      333.3177  DIHED      =      926.6942
 VDWAALS =     -735.5748  EEL     =    -8283.6186  EGB        =    -1699.1415
 1-4 VDW =      487.5186  1-4 EEL =     6115.6341  RESTRAINT  =        0.0000
 ESURF   =       27.3360    	
which is a breakdown of the molecule's energy by components, in [Kcal/mol]. Here EGB = -1699.1415 is its solvation energy , and
   
  ENERGY  
-2.6695E+03  
is the total energy of the molecule (in its original configuration), in which the solvent degrees of freedom have been averaged out and taken into account implicitly (within the approximations made, via the continuum solvent description and the inclusion of the ``hydrophobic" surface area term ). Therefore, to obtain a quick estimate of a molecule's energy including solvent effects (both energetic and entropic!) , one can generate the corresponding coordinate and topology files and then use sander to get the energy. More on this so-called ``GB/SA" approach can be found in the Manual, but we will have a small example towards the end. Now let's move on to our next step:

Step 2. Equilibration As was mentioned earlier, long (100ps or more) solvent equilibration is not required when the GB is used. We will use the script equil.in for the job:


cat < mdin
  equilibrate  structure
 &cntrl
   imin=0,
   ntc=2, ntf=2,
   cut=12.0, igb=2, saltcon=0.2, gbsa=1,
   ntpr=50,
   nstlim = 5000, dt=0.002,
   ntt=1, tempi=0.0, temp0=300.0, tautp=1.0,
   ntx=1, irest=0, ntb=0,
   nscm = 1000,
   ntr=1,
 &end
  keep all atoms frozen
  1.0
RES 1 108
END
END
eof
sander   -O -i mdin \
-c trx.rst  -p trx.top  -ref trx.rst -o trx.equil.out -r trx.equil.rst
We will use the energy-minimized structure obtained in the previous step, trx.rst , as the input coordinates. SHAKE will be employed to constrain hydrogen atoms ntc=2, ntf=2 , this will allow us to use a longer integration time-step dt = 0.002 . During the nstlim = 5000 * dt = 0.002 = 10ps of equilibration the temperature will be gradually raised from tempi=0.0 to temp0=300.0 degrees K. The protein will remain constrained ( ntr=1 ) to its energy-minimized structure by a weaker potential of 1.0 Kcal/mol/A^2 -- as all possible steric clashes were already removed, we don't expect anything drastic to happen to our protein. (In principle, one could repeat the equilibration step a number of times starting from a larger force constant and then gradually reducing it ). For the purpose of this tutorial - to reduce computational time - we will use a fairly small value for the electrostatic cut-off, cut=12.0 . For ``real" runs you may want to consider larger values, such as 18 or even 24 A. Ideally, of course, one should use no cut-off (i.e. set cut to a value larger than system size), but then you run into the usual trade off between the speed and accuracy of your computations. Finally, every nscm = 1000 steps the translational and rotational motion will be removed, which is often necessary when running MD simulations with GB. Now run the equilibration:

equil.in &

This will take a while (a few hours on a single processor 400 MHz PC running under Linux), so you might consider filling time by reading more about the GB model. The progress of the equilibration is reported in trx.equil.out ; when everything is finished, you will find the equilibrated thioredoxin structure in trx.equil.rst .

Step 3. Running MD at 300 K. Finally, we are ready to run an unconstrained MD simulation of thioredoxin. The following script runmd.in is used:

 
cat < mdin
  run MD at 300K
 &cntrl
   ntc=2, ntf=2,
   cut=12.0, igb=2, saltcon=0.2, gbsa=1,
   ntpr=50, ntwx=50,
   nstlim = 500, dt=0.002,
   ntt=1, tempi=300.0, temp0=300.0, tautp=2.0,
   ntx=1, irest=0, ntb=0,
   nscm = 1000,
 &end
eof
sander   -O -i mdin \
-c trx.equil.rst  -p trx.top  -o trx.md.out -x trx.md.x -r trx.md.rst
For this tutorial, we will perform a very short run, only nstlim = 500 steps; in reality you will probably want to perform a considerably longer simulation. As the simulation proceeds, protein coordinates will be accumulated in the file trx.md.x , which is appended with the current set every ntwx=50 steps, the simulation progress is reported into trx.md.out every ntpr=50 steps. Now type:

runmd.in &

to start the MD simulation. This short run will finish soon, you can then extract the resulting structures from trx.md.x using AMBER utilities such as ptraj or carnal . For example, create a directory MD_DATA/ and then execute the provided ptraj.sh script:

ptraj.sh

The utility will extract all of the accumulated structures out of trx.md.x , and store them in the specifed directory MD_DATA in PDB format. It will also compute their RMSDs (in Angstroms), over all atoms in the protein, relative to the reference structure trx.crd , and dump the reults into the file trx.rmsd .

That's it! You can analyze the structures in MD_DATA using your favorite tools, for example use rasmol to view them. Note that in the PDB files output by AMBER you may need to change CYX back to CYS, ASH to ASP and HID to HIS to get all residues displayed properly.

Calculating energies using GB

Apart from its use in Molecular Dynamics, the GB can also be employed to obtain quick estimates of energies of structures in solution. This is especially useful when you have a large number of coordinate sets obtained by other methods. For example, you want to find out which snapshot in MD_DATA has lower energy, trx.pdb.10 or trx.pdb.9 ? To answer this question, we will perform two single-point energy calculations using sander . First, we prepare AMBER coordinate files corresponding to the selected PDB files. You can modify the script we have already used, trx.leap.scrpt , into smth. like generate_crd.scrpt . With this, you will generate coordinate file trx.md.10.crd in MD_DATA . We will now run sander using energy.in script:


cat < mdin
  compute single-point energy
 &cntrl
   cut=300.0, igb=2, saltcon=0.2, gbsa=1,
   ntpr=1,
   nstlim = 0, dt=0.002,
   ntt=1, tempi=300.0, temp0=300.0, tautp=2.0,
   ntx=1, irest=0, ntb=0,
 &end
eof
sander   -O -i mdin \
-c MD_DATA/trx.md.10.crd  -p trx.top  -o energy.out
Note that we have switched back to a 300 A cut-off for greater accuracy, and use the same topology file as before since this does not change from snapshot to snapshot. Once energy.in is finished, do

grep "EPtot" < energy.out | awk '{ print $9 }'

to obtain

-2202.4387

- total (potential) energy in [Kcal/mol] of the solvated structure trx.pdb.10 . Repeating the same steps for trx.pdb.9 yields -2243.0907 [Kcal/mol]. Notice that energy fluctuation between these two snapshots is quite large - we would like to emphasize that a much greater number of representative structures is usually needed to obtain meaningful averages.