LOOP DECOY SETS
Several people have asked me to provide the loop structures
generated by my loop prediction algorithm (as implemented in Protein Local
Optimization Program) to be used as a "decoy" set for testing various
scoring functions. I have not made
any direct comparison yet to other existing loop decoy sets, but my guess is
that our set is much more "difficult", i.e., it will be more
challenging to obtain native or native-like conformations as the lowest in
energy. This is for several
reasons. First, for each loop,
there is a nice continuum of decoy structures ranging from near-native
to low accuracy. Second, all of
the loop structures have been extensively energy optimized, including side
chain sampling. Very few loops, if
any, will contain any steric clashes, and hydrogen
bonds have been optimized. Third,
there are lots of decoys, several hundred for each loop.
The gzipped tar files that can be
accessed below contain a ".list" file which
gives the PDB identifiers, chain name (if non-blank) and residue numbers of the
beginning/end of the loop, followed by the number of structures in the
set. Then there is one ".decoys" file for each loop, which is an NMR-style PDB
file, i.e., using MODEL records to separate different loop structures. For convenience, I have copied the
entire "native" structure as MODEL 0. MODEL 1 contains the energy-minimized native (using
OPLS/SGB). It will typically be
very similar to the native, but if you are using an all-atom energy function,
it will probably be lower in energy.
MODEL 2 is similar to MODEL 1, but a side chain optimization is
performed on the loop region prior to the minimization. The rest of the loop structures are
generated at various stages of the hierarchical loop prediction algorithm.
Other than MODEL 0, each MODEL record contains only the
atoms in the loop region, for compactness. All atoms are represented, including side chains and hydrogens.
Note that not all cases studied in the paper are included
here. For the longer loops, only
those from the "filtered" data sets are included. There are one or two others that also
seem to be missing; I must have accidentally deleted them at some point. Still, there is an awful lot of data
here.
If you use this loop decoy set, please reference
M. P. Jacobson, D. L. Pincus, C. S. Rapp, T. J. F. Day, B. Honig,
D. E. Shaw, and R. A. Friesner. "A Hierarchical
Approach to All-Atom Loop Prediction", Proteins, 55 (2004), pp. 351-367.
4 residue loops (~8 Mb)
5 residue loops (~37 Mb)
6 residue loops (~58 Mb)
7 residue loops (~68 Mb)
8 residue loops (~81 Mb)
9 residue loops (~97 Mb)
10 residue loops (~69 Mb)
11 residue loops (~82 Mb)
12 residue loops (~38 Mb)
Perturbed loop test sets
Also available are the starting perturbed crystal structures
used in the following publication. These test cases contain 80 starting
structures with 20 loops of length 6,8,10, and 12 each. In each case, a single
loop in a crystal structure is perturbed away from the native loop structure
and then all side chains outside the loop are placed in non-native but local
energy minima. The backbone outside the loop remains in the native
configuration.
B. D. Sellers*, K. Zhu*, S. Zhao, R. A.
Friesner, M. P. Jacobson. "Towards better
refinement of comparative models: predicting loops in inexact
environments", Proteins, 72 (2008) pp. 959-971. (*Joint first authors)
Download
perturbed loop test set (12MB)
Partial Antibody starting comparative models test set
Also available is a
test set of partial antibody comparative models. The models were
constructed from native crystal structures with the CDR loops "modeled
in". L1, L2, L3 and H1, H2 have been modeled using knowledge-based
methods while H3 has not been refined. These starting models are detailed
in
B. D. Sellers, J. P. Nilmeier, and M. P. Jacobson. "Antibodies
as a model system for comparative model refinement", Proteins, 78 (2010), pp. 2490-2505. Download
antibody test set (2.3MB)