What
are PDB files?The protein data bank contains coordinates for most of the proteins and nucleic acids whose structures are known. Most such coordinates are experimentally determined using X-ray crystallography or NMR spectroscopy. Chapter 17 of the supplemental text Introduction to Protein Structure (Branden & Tooze) contains an introduction to structure determination methods.
A PDB file contains the position of every atom in the structure as well as some information about the molecule. The postion is listed relative to an origin on a three dimensional grid. For example, an atom with position X = 1.0, Y = 0.0, and Z = 0.0 is 1.0 angstroms from the origin along the X axis. A second atom at position (1.0, 2.0, 2.0) is 3.0 angstroms from the origin (sqrt(12 + 22 + 22)), assuming the angles between the three axes are 90 degrees (not always true in crystal - derived structures), and sqrt(8.0) angstroms away from the first atom. A program like RasMol displays the relative position for each atom, and links together those atoms within covalent bonding distance of one another (e.g. 1.54 angstroms for carbon - carbon covalent bonds).
PDB files are text files and can be viewed in any editor. At the top of each file is the name of the compound, information about the compound such as bound ligands and the sequence of the molecule, and references to journal articles discussing the compound. Following this is information about each atom in the structure. Generally, the atoms are arranged in sequence so that the first amino acid in the first chain of a protein is listed first, and the last amino acid in the last chain is listed last. For example, the first two atoms in the PDB file used in the first exercise are:
ATOM 1 N MET 1 40.184 17.101 24.260 1.00 50.62
ATOM 2 CA MET 1 38.989 16.442 23.757 1.00 49.62
In the first line, the first entry (ATOM) says this line contains atom coordinates. The second says it is atom number 1. The third says the atom is a nitrogen. The fourth and fifth indicate the atom belongs to a methionine, which is the first amino acid in the sequence. The next three numbers indicate the XYZ coordinates of the atom in angstroms. The last two numbers are the occupancy, which can range from 0 - 1.0 (that is 0 - 100% present in all copys of the protein that comprised the crystal used to determine the structure), and the temperature factor, which is a measure of variability in position for the atom in the crystal. Values below about 20 for the temperature factor indicate little motion for the atom, while values above about 35 indicate a lot of motion. Thus this atom was not very well ordered in the crystal.
In the second line, we see the second atom in the structure is part of the first residue, is a carbon (CA stands for carbon alpha), and is also poorly ordered in the crystal.
Most of the PDB files we use in this course will be provided locally. If there is some other structure you would like to see, go to the Protein Data Bank page and grab it!
Back to RasMol page
Bioc/MCB568 Home Page
http://www.biochem.arizona.edu/classes/bioc568/bioc568.htm
Last modified October 2, 2006
All contents copyright © 2007 All rights reserved.