Loop Prediction
loop predict residue selection &
[many, many options, see below]
Options controlling
generation of loop candidate structures
ofac real
This is the “overlap factor”, which defines what we mean by a steric clash. The default value is 0.7. Lower values may be appropriate when using low-resolution structures, or if loop prediction with the default value results in no loops generated.
nconf_min integer
This the minimum number of loops to be generated by the loop build-up algorithm. The default is 2^Nres, where Nres is the number of residues in the loop. It may be necessary to decrease this, particularly for long loops, if the number of loops “blows up” and exceeds the allocated memory (currently set to 250k loops).
ideal[ize] yes/no
If yes, then impose “ideal” bond lengths and angles during the loop build-up. If no, then use the bond lengths/angles from the input structure.
mid_loop residue
The loop build-up procedure splits the loop into 2 parts, and then builds up from both sides. The midpoint of the loop is identified automatically, but if you want to pick it yourself, you can do so with this option. This can occasionally be helpful if one half of the loop is much “floppier” than the other.
mid_move yes/no
By default, the algorithm moves the “break-point” of the loop if the number of conformations on each side becomes grossly imbalanced. But you can turn this behavior off with this option.
cling yes/no
Protein loops very rarely just dangle out in solution (at least when they adopt a well-defined structure at all); typically they form contacts with the body of the protein, through side chains. By default, the build-up algorithms prevents the loop from adopting conformations where the loop travels from the body of the protein, to improve the sampling efficiency. But in some cases you may want to remove that constraint with this option (i.e., “cling no”), e.g., predicting a floppy loop involved in ligand binding.
ofac real
After building up loop conformations from both sides, the fragments have to be joined in the middle. The middle residue can wind up adopting highly strained conformations as a result of the closure procedure, and the algorithm weeds these out. This parameter represents the maximum angle deviation (in degrees) from the Ramachandran allowed regions for phi/psi, deviation from ideality for the N-Calpha-C bond-angle, and deviation from planarity for the peptides. Default: 25 degrees.
Options to specify names
of output files
rmsdfile file
This file contains energies and RMSDs (to the native or to a reference structure loaded using “load native”). Format:
Rank Model# Energy RMSD1 RMSD2 RMSD3 RMSD4
“Rank” is the energy rank of the loop. There are 2 special loop conformations listed: “-1” is the minimized starting loop structure, and “0” is the side chain optimized and minimized starting loop structure. The energies of these structures are frequently useful points-of-comparison (did we generate any loops lower in energy than where we started?). “Model#” is an identifier for the loop structure, reflecting the order in which it was generated by the program; it corresponds to the MODEL record number in the “pdbfile”. The four default RMSD values represent: global backbone (RMSD1) and all-heavy-atom (RMSD2) rmsd’s, and local backbone (RMSD3) and all-heavy-atom (RMSD4) rmsd’s. (Global refers to aligning the body of the protein; local refers to aligning just the loop itself.)
pdbfile file
This contains the energy minimized loop structures generated in the course of the prediction, organized by MODEL records, as well as the minimized and side chain optimized starting structures (MODELs “-1” and “0”), and the complete starting structure (MODEL “-2”), for convenience. For all generated loops, only those atoms that are actually “moving” during the simulation are included. This is to reduce the size of the files.
Options to specify side chains to optimize on body of
protein
The default behavior of the loop prediction algorithm is to optimize only the loop and sidechains on the loop itself, keeping the remainder of the protein rigid. But for many applications, including homology modeling, this is often inadequate, and it is necessary to allow side chains on the body of the protein near the loop optimize as well (i.e., when you can’t assume that these side chains are in reasonable conformations to begin with). These options allow you to do this:
sidecut distance
Optimize all side chains within a distance cutoff (in Ĺ) from the initial loop conformation. That is, during the side chain optimization and minimization of loop candidates, these side chains will be optimized along with those on the loop itself.
sideadd [residue selection]
Optimize specific side chains that you specify with the usual side chain selection options.
sidefrz yes/no
This option determines whether
side chains specified by "sidecut" or "sideadd" are
included during the loop build-up or if they are temporarily deleted. "yes" means the side chains are
included and is the default setting.
"sidefrz no" is helpful when the side chains surrounding the
loop are so far off from where they "should" be, that they physically
block the loop from adopting the native conformation during the build-up. Note that "sidefrz no" can
dramatically increase the sampling space depending on the loop length and the
number of side chains that are listed.
Options to constrain the loop prediction
For many purposes, it is useful to restrict the sampling during the loop build-up, based on either the Cartesian coordinates or dihedral angles.
maxcalpha
Constrains the C-alpha atoms from moving more than some distance (in Ĺ) away from the initial positions. You can apply this to either a single residue or all residues: “maxcalpha 5.0” applies a 5 Ĺ constraint on all C-alpha atoms, which “maxcalpha A:50 5.0” applies it only to residue 50 on chain A.
maxang
Analogous to “maxcalpha”, except that it applies constraints to the dihedral angles (both phi and psi), in degrees, to one or all residues.
constrain atom distance x y z
This is similar to “maxcalpha” but more general. It can be applied to any atom in the loop, and it does not depend on the initial structure (the “x y z” parameters specify the center of a sphere, in the Cartesian space of the protein, with a radius “distance”; the atom must be found within that sphere if the loop is to be accepted).
helix residue constraint
Constrain the dihedrals (phi/psi) for this residue to be roughly within the alpha-helix portion of the Ramachandran plot. “constraint” is in degrees, and basically specifies how close to an “ideal” helix the residue is required to be.
strand residue constraint
Analogous to the “helix” option, but for the beta-sheet portion of the Ramachandran plot.
Support for nonstandard amino acids
Amino acids with nonstandard side chains and even nonstandard backbones can be included in the loop prediction. It’s just a little complicated, so email me if you need this capability.
Nested options
The loop prediction algorithm calls the side chain
optimization code twice for each loop (see the paper for details), and so you
can pass options to the side chain code using “side1” or “side2” (for the 2
different optimizations), followed by any of the side chain prediction
options. The loop prediction algorithm
also uses clustering routines, and you can pass options to them using the
“clust” option, as described here.