Interpreting
PDB files and relating them to papers
Biochemistry/MCB 568 -- Fall 2007
John W. Little--University of ArizonaBioc/MCB568 Home Page
When you read a paper and try to use the information to look at a PDB file, you need to be able to relate the residue numbers in the paper to those in the file.
Proteins: Generally, the residues in the PDB file have the same numbering scheme as those in the paper. These numbers refer to the residue numbers in the native protein. In most structures of protein-DNA complexes, the proteins used for x-ray crystallography are not the intact protein, but some fragment of the protein (this is because the intact proteins usually don't crystallize well; the catch-all explanation for this is that they are "floppy"). Nonetheless, the residues aren't numbered starting from 1 in the fragment, but in the intact protein. For instance, the glucocorticoid receptor fragment used for structure determination (1glu.pdb) comprises residues 440 to 525 of the intact protein, and these are numbered as such. You can find this information by looking at the PDB file with a text editor (see below).
If there are several subunits of the same protein, the numbering scheme is consistent even when different subunits don't have the same residues showing (as is the case with lambda repressor, for which only one arm is visible).
DNA: Here the case is more complicated. Co-crystal structures always contain synthetic DNA oligonucleotides. The numbering scheme for these DNA molecules isn't standardized; every paper has its own nomenclature, and it varies a lot. A typical example is lambda repressor, for which the paper uses a nomenclature like position 1 or 1' to refer to the base pairs, not the individual bases. Other papers might call a base 1 and its complement 1' or -1. The PDB files generally don't follow these conventions. In turn, there are several different ways that the bases are numbered. Examples:
For lambda repressor (1lmb.pdb), the bases are numbered such that one strand starts at 1 and goes to 21, the other strand starts at 22 and goes to 42. In this case, you can identify a base uniquely by its number; you don't need to specify the strand (using something like *:2).
For glucocorticoid receptor (1glu.pdb), the bases on one strand are numbered starting at -10 to 9; those on the other strand are also numbered -10 to 9. In this case, if you select dna and -10, you get two residues not one; you would have to say select dna and -10 and *:c to get the one in strand c.
To determine how the PDB file for your DNA does it, open the PDB file with a word processing program and look at it. If you do this, be sure that you don't save it in the format of your word processor, because RasMol won't be able to use it; you don't need to save it, but if you need to save it, do so as a text file. It might be helpful to print it out if it's not too long.
To establish the correspondence between the paper's nomenclature and that of the PDB file, you have to study the sequence of the two, and make a table showing the relationship. This will allow you to translate the information in the paper to using RasMol properly to identify the bases. Most papers have a figure showing the sequence of the oligonucleotide used; you can Xerox this figure and write on the xerox. Don't write in the journal.
http://www.biochem.arizona.edu/classes/bioc568/bioc568.htm
Last modified October 2, 2006
All contents copyright © 2007. All rights reserved.