Proteins: Primary Sequence

 Lecture Notes | 462a Home


Reading - Chapter 5
Practice problems - Chapter 5, #15, 17;
simple heptapeptide sequence problem; Proteins extra problems

Levels of Structure 

The function of a protein can only be understood in terms of its structure.  The three dimensional structures of many proteins have been determined and from these structures a few general principles can be derived.  Protein structure is discussed in terms of four levels of organization:

  • Primary Structure is the amino acid sequence of its polypeptide chain(s).  Every protein has a unique amino acid sequence.
  • Secondary Structure is the local spatial arrangement of the polypeptide backbone, giving rise to recurring structural patterns, ignoring the conformation of the individual sidechains (R groups).
  • Tertiary Structure is the three dimensional structure of the entire polypeptide, including conformations of side chains.
  • Quaternary Structure refers (only in proteins that are composed of two or more polypeptide chains, called subunits) to the three dimensional spatial arrangement of the subunits.
  • (See Lehninger Principles of Biochemistry, Fig. 5-16 and related text material.)

 Primary Structure 

 

insulin.gif (1620 bytes)

 

  • This is the primary structure of bovine insulin, which is composed of two polypeptide chains (A and B).  The two polypeptide chains are joined by two interchain disulfide bonds - the A chain also contains an intrachain disulfide bond.

 

  • Determining the amino acid sequence of a protein used to be a very laborious and time consuming process involving chemical and enzymatic degradation. 
  • Today, the amino acid sequence of proteins is usually determined from the nucleotide sequence of the gene - a relatively simple and rapid process.
  • The amino acid sequence of the same protein from many sources, e.g., cytochrome c, shows that some amino acid residues are conserved among all the proteins, whereas others are not conserved. 
  • Such an analysis provides valuable information about amino acid residues that may be essential for a protein's function.

 

The importance of amino acid side chains: Real Life Example - sickle cell hemoglobin

  • Hemoglobin is the oxygen transport protein in blood. 
  • It is a tetramer containing two a and two b chains (Hemoglobin). 
  • Hemoglobin exists in two states: an oxy form and a deoxy form. 
  • Several hundred mutant hemoglobins are known to exist.  In most, a single amino acid replacement occurs in either the a or b chain of normal Hb A. 
  • Many of these changes cause no known effect, but several lead to pathologies associated with abnormal O2 transport. 
  • In sickle cell hemoglobin, HbS, there is a single amino acid replacement of a Val for Glu at position 6 of the b chain. This seemingly innocuous change places a hydrophobic sidechain on the surface of the protein.   In the deoxy conformation the Val sidechain of a b chain in one Hb binds to the b chain of another Hb.  This leads to polymer formation and precipitation of the deoxy Hb.  This leads to red cell lysis and anemia (Hemoglobin S).

 

Amino Acid Composition

 
  • The amino acid composition is a fundamental characteristic of any protein.
  • Hydrolysis of the protein in acid releases the amino acids which are then quantitated using ion exchange chromatography in an automated amino acid analyzer. 
  • The amino acid peaks can be detected using ninhydrin, which reacts with the free amino groups of amino acids to produce a purple color, or by reaction with reagents that generate fluorescent derivatives, permitting detection of much smaller amounts of each amino acid.

  

  Amino Acid Sequence 

  • The amino acid of each protein is unique and determination of the amino acid sequence is an important part of characterizing proteins.  Today, most protein amino acid sequences are deduced from the sequences of their genes, because sequencing DNA is much easier than sequencing proteins. 
  • However, determination of protein sequences is still an important tool in Biochemistry. We use an automated process based on the Edman reaction and chromatographic techniques to identify the PTH-derivative. 
  • Although these reactions proceed to > 90% yields at each step, eventually (about 25-75 cycles) it becomes difficult to detect the newly released product.  Thus a single series of Edman degradation reactions is not able to determine the entire sequence of a protein. 
 

  • What is needed are smaller fragments, with new amino termini, which can be individually purified and sequenced.  This is accomplished by cleaving the protein with a proteolytic enzyme, such as trypsin, or a chemical reagent such as cyanogen bromide, which generates a set of peptides, fragments of the original protein, that can be separated and sequenced. 
    • Trypsin cleaves peptide bonds on the carboxyl side of Lys or Arg residues, as illustrated below.

    • Chymotrypsin cleaves peptide bonds on the carboxyl of Phe, Trp or Tyr residues, but also sometimes on the carboxyl side of other hydrophobic amino acids, e.g. Val, Leu, Ile, or Met.
    • Other proteases have different specificities.
    • Cyanogen bromide cleaves on the carboxyl side of Met residues, but the chemistry of the cleavage converts the Met residue at the C-terminus of the new peptide to a derivative that is converted by acid hydrolysis to homoserine (R group is -CH2-CH2-SH) rather than Met, so amino acid composition of the new peptide would show homoserine.
    • There are thus a variety of ways to fragment the protein under investigation to determine the sequences of manageable-size peptides.
  • The problem, of course, is that once the proteolysis has been accomplished and the peptides separated and sequenced, you don't know how they are ordered in the original protein.  Reestablishing the order is the big problem in protein sequencing. The method is like solving a puzzle -- the sequences of the families of peptides obtained from two different cleavage methods are examined for OVERLAPS. For an example, see simple practice problem for sequence of a heptapeptide, and also the strategy for sequencing the B chain of insulin.

  

Mass Spectrometry 

  • Recently mass spectrometry has become an important technique in peptide/protein chemistry.  Mass spectrometers consist of three basis parts

 

  • An ion source that creates charged molecules in the gas phase
  • a mass analyzer that uses a physical property, e.g., time-of-flight (TOF), to separate ions
  • a detector.
  • Two important methods are used to create protein ions:
    • In matrix-assisted laser desorption ionization (MALDI) ions are created by using a laser to excite proteins in a crystalline matrix.  MALDI is particularly suited for determining the molecular weight of proteins, often to accuracies of a few parts per million.  The spectrum shown above illustrates the molecular masses of several peptides in a mixture. 
    • In electrospray ionization (ESI) ions are created by applying a potential to a flowing liquid.  This causes the liquid to spray and protein ions to be created.  This method can also be used to measure molecular weight, but is most powerful when used in tandem MS/MS.
  • A tandem mass spectrometer combines two mass analyzers with a method to energetically activate ions. In the first spectrometer a particular ion is isolated from all other ions that enter the mass analyzer (as marked above), dissociated, and the m/z values of the dissociation products determined in the second mass analyzer. The dissociation process causes covalent bonds to fragment.  In the case of peptide ions, fragmentation processes predominate at or around the amide bond, creating a ladder of ions that is indicative of an amino acid sequence, as illustrated below.

Sequence Homology

  • Once the amino acid sequence of a protein has been determined, there are powerful computer programs (If you are interested, go to this web site to see some of the tools available for proteomics) that can be used to determine if the sequence is similar to other proteins.  Such a search might give the results shown below.

#1 MKRTYQPNRRKRSKVHGFRARMSTKNGRKVLARRRRKGRKVLSA #2 MKRTWQPSKLKHARVHGFRARMATKNGRKVIKARRAKGRVRLSA #3 MKRTYQPSRVKRNRKFGFRARMKTKGGRLILSRRRAKGRMKLTV #4 MKRTFQPSILKRNRSHGFRTRMATKNGRYILSRRRAKLRTRLTV #5 MKRTYQPSKQKRNRTHGFRARMATKNGRQVLNRRRAKGRKRLTV #6 TKRTFQPNNRRRARKHGFRARMRTRAGRAILSARRGKNRAELSA #7 SKRTFQPNNRRRAKTHGFRLRMRTRAGRAILANRRAKGRASLSA #8 GKRTFQPNNRRRARVHGFRLRMRTRAGRSIVSDRRRKGRRTLTA

 

 

  • The degree of identity between the sequences can be used to construct a distance matrix, which indicates how closely related the different sequences are.  Here is one for  cytochrome c from a variety of species.

  • Based on such a distance matrix, one can then construct a phylogenetic tree, as illustrated here for cytochrome c.

 

Genomics and Proteomics

  • There has been a great deal of effort directed towards determining the complete sequence of the human genome (genomics) and many other genomes (including yeast, and the fruit fly Drosophila melanogaster). Once the complete sequence is finished, an important issue looms: what to do with the data!  Being able to UNDERSTAND (and ultimately to make use of) the information in the DNA sequence requires figuring out what the proteins encoded by the genome are and what they do (proteomics).  In many cases we can deduce the nature of the protein product of a gene by homology to other proteins already sequenced, but in many other cases (maybe >30%), we have no clue.  We can use biotechnology techniques to produce the protein, which can then be purified and studied in order to try to deduce its function.  One important approach is to determine its three dimensional structure, which may give a clue to its function.  The future of protein biochemistry is indeed exciting!


lecture notes | 462a Home


Biochemistry 462a
http://www.biochem.arizona.edu/classes/bioc462/462a/462a.html
Department of Biochemistry and Molecular Biophysics
The University of Arizona
mawells@email.arizona.edu 
All contents copyright © 1998-2000. All rights reserved.
Last revision spring/summer 2000