LOOP DECOY SETS
Several people have asked me to provide the loop structures generated by my loop prediction algorithm (as implemented in Protein Local Optimization Program) to be used as a "decoy" set for testing various scoring functions. I have not made any direct comparison yet to other existing loop decoy sets, but my guess is that our set is much more "difficult", i.e., it will be more challenging to obtain native or native-like conformations as the lowest in energy. This is for several reasons. First, for each loop, there is a nice continuum of decoy structures ranging from near-native to low accuracy. Second, all of the loop structures have been extensively energy optimized, including side chain sampling. Very few loops, if any, will contain any steric clashes, and hydrogen bonds have been optimized. Third, there are lots of decoys, several hundred for each loop.
The gzipped tar files that can be accessed below contain a ".list" file which gives the PDB identifiers, chain name (if non-blank) and residue numbers of the beginning/end of the loop, followed by the number of structures in the set. Then there is one ".decoys" file for each loop, which is an NMR-style PDB file, i.e., using MODEL records to separate different loop structures. For convenience, I have copied the entire "native" structure as MODEL 0. MODEL 1 contains the energy-minimized native (using OPLS/SGB). It will typically be very similar to the native, but if you are using an all-atom energy function, it will probably be lower in energy. MODEL 2 is similar to MODEL 1, but a side chain optimization is performed on the loop region prior to the minimization. The rest of the loop structures are generated at various stages of the hierarchical loop prediction algorithm.
Other than MODEL 0, each MODEL record contains only the atoms in the loop region, for compactness. All atoms are represented, including side chains and hydrogens.
Note that not all cases studied in the paper are included here. For the longer loops, only those from the "filtered" data sets are included. There are one or two others that also seem to be missing; I must have accidentally deleted them at some point. Still, there is an awful lot of data here.
If you use this loop decoy set, please reference
M. P. Jacobson, D. L. Pincus, C. S. Rapp, T. J. F. Day, B. Honig, D. E. Shaw, and R. A. Friesner. "A Hierarchical Approach to All-Atom Loop Prediction", Proteins, 55 (2004), pp. 351-367.
4 residue loops (~8 Mb)
5 residue loops (~37 Mb)
6 residue loops (~58 Mb)
7 residue loops (~68 Mb)
8 residue loops (~81 Mb)
9 residue loops (~97 Mb)
10 residue loops (~69 Mb)
11 residue loops (~82 Mb)
12 residue loops (~38 Mb)
Perturbed loop test sets
Also available are the starting perturbed crystal structures used in the following publication. These test cases contain 80 starting structures with 20 loops of length 6,8,10, and 12 each. In each case, a single loop in a crystal structure is perturbed away from the native loop structure and then all side chains outside the loop are placed in non-native but local energy minima. The backbone outside the loop remains in the native configuration.
B. D. Sellers*, K. Zhu*, S. Zhao, R. A. Friesner, M. P. Jacobson. "Towards better refinement of comparative models: predicting loops in inexact environments", Proteins, 72 (2008) pp. 959-971. (*Joint first authors)
Partial Antibody starting comparative models test set
Also available is a test set of partial antibody comparative models. The models were constructed from native crystal structures with the CDR loops "modeled in". L1, L2, L3 and H1, H2 have been modeled using knowledge-based methods while H3 has not been refined. These starting models are detailed in
B. D. Sellers, J. P. Nilmeier, and M. P. Jacobson. "Antibodies as a model system for comparative model refinement", Proteins, 78 (2010), pp. 2490-2505. Download antibody test set (2.3MB)