Lecture 7 - Genomics and Genome Mapping
Revised at 9:05 AM
Monday, September 17, 2001
- Updated Fall 2001 material
- added information on E. coli 0157:H7

Download PDF file



Overview of Genome Organization

The field of molecular life science is changing rapidly because of the genomic revolution. High throughput DNA sequencing and the development of algorithms to analyze the vast amount of data have led to the elucidation of entire genome sequences.

Before we look at the methods used to analyze the genome of an organism, let's get an overview of how
sequencing the human genome has changed biomedical science.



The public side of the human genome sequencing effort was coordinated by the National Institutes of Health and involved a consortium of international researchers.


Online resources for the public human genome data can be accessed through the National Center for Biotechnology Information (NCBI) web site.


The commercial side of the human genome sequencing effort is based at Celera Genomics in Rockville, Maryland.


Tools for discovering new gene functions through the Celera web site give a taste of the New Biology.





E. coli K-12 Genome

The E. coli K-12 genome sequence was reported several years ago and revealed a compact arrangement of genes separated by only ~120 bp on average.







DNA sequence analysis predicts that the E. coli K-12 genome contains 4,288 genes, of which ~3,200 have predicted functions.


The E. coli Wisconsin group maintains a web site with current information on the genome analysis.




Microbial Genomes - Human Pathogens

The genomes of numerous microbial organisms, primarily human pathogens, have also been sequenced. Most of these have been done by The Institute for Genomic Research (TIGR). These microbial databases have allowed researchers to compare pathogenic and nonpathogenic organisms to identify targets for disease treatment.



An example of a pathogenic bacterium is Escherichia coli O157:H7 which is a major food-borne infectious pathogen that causes diarrhea, haemorrhagic colitis, and haemolytic uremic syndrome. This sequence has been completed Genome Information Research Center, Osaka University.






What types of genes would you expect to be shared between E. coli K-12 and Helicobactor pylori, and what genes might be unique to H. pylori?


What might explain the observation that bacterial genes with a related function are often clustered together in the genome?


How might knowing the complete sequence of E. coli K-12 be useful in developing a rapid detection test for pathogenic strains of E. coli that have been found in contaminated food products?



Saccharomyces cerevisiae Genome

Sequencing of the yeast genome required a large cooperative effort because the yeast genome is 3 times larger than the E. coli genome. All 12,068 kb of the yeast sequence is available on the internet for gene analysis.



The yeast strain Saccharomyces cerevisiae contains 16 linear chromosomes each with a centromere and mulitple sites of replication initiation.








The genomic organization of the yeast genome can be accessed through the internet and used to study specific regions of the genome. Here is the region of the yeast genome containing the Leu2 gene. LIke the bacterial genome, the yeast genome research teams maintain an Internet site for updates.









What is the difference between using classical genetics and "reverse" genetics in the context of yeast-based studies?


How might the strategy of reverse genetics be used to discover the function of newly discovered yeast ORFs?




The genomes of three other important model organisms have also been completely sequenced, these are Caenorhabditis elegans , Drosophila melanogaster , and Arabidopsis.



Homo sapiens Genome

Human genome sequence analysis reached a first stage of completion in the summer of 2000 summer by the New York Times:




The complete stories from the two competing groups were published simulataneously in February 2001 with the
NIH consortium data being published in Nature and Celera's data published in Science.

Too much data to digest in one sitting, but here are
some of the highlights:


1. Timeline of Genome Sequencing Efforts







2. Summary of Genome Analyses to date







3. Sequencing strategies used by the NIH consortium and by Celera



NIH strategy (Top Down)









Celera Strategy (Bottom Up)
Developed by Gene Myers, UA Computer Science Dept.

The key to the strategy is shotgun cloning of random (not mapped) fragments of a defined length (2kb or 10kb) so that scaffolds can be assembled from terminal sequences of each cloned insert.









4. High throughput DNA sequencing is entirely automated from colonies to computers.







5. The NIH consortium strategy was dependent on multiple clone overlaps.









6. As much as 45% of the 3 billion nucleotides in the human genome are repetitive.








7. The functional diversity of human genes is similar to other species, just more of them.









8. Human and mouse genes have similar DNA sequence but not chromosmal location.









9. DNA sequence analysis showed that most genes are orthologous (not unique to humans)












In what way does having access to the complete sequence of the human genome change the way we perform experiments in the lab. Give an example of the "old" way and the Y2k way?


How do genomics companies plan to make money with the information they have gathered?



If you were the CEO of a large genomics company such as Celera what would you tell your stockholders about how the company plans to protect its investment considering that the NIH has released a public version of the human genome sequence?



Experimental Approaches to Genome Analysis


The human genome contains
23 pairs of chromosomes which can be visualized by staining mitotic cells.






Mammalian genes are spread across large regions of the genome because of intronic and regulatory sequences. For example, the human glucocorticoid gene is ~100 kb long.






The challenge for researchers in the field of human molecular geneticis is to relate genetic data collected from individuals with inherited disease, to the physical map of the human genome. The classsic problem of diseases without genes and genes without function. Two approaches are primarily used, one is called linkage mapping and and the other is radiation hybrid mapping. Both of these methods are dependent on having a biochemical or molecular genetic assay available to infer the genotype.








How would you score for these genotypes, i.e., how would you know Abde/aBDE was rare?


What is it about yeast genetic analysis that makes genotyping easier than Drosophila genetics?


What is the difference between a recessive and dominant phenotype? Which is easier to score?









Alignment of the human genetic map with DNA-level physical maps is very challenging and is most often applied to areas of the human genome suspected to contain disease genes. This requires methods that take advantage of DNA polymorphisms to provide gene-independent markers that can be monitored in human samples using standard biochemical assays.

Two examples of this approach are Restriction fragment length polymorphisms (RFLPs) and the use of Short tandem repeat polymorphisms (STRPs).


RFLPs can be monitored by Southern blotting. DNA extracted from related individuals is analyzed using an restriction enzyme (EcoRI) that distinguishes between two alleles (EcoRI+ and EcoRI-). Heterozygous individuals contain one copy of each allele (+/-), and homozygous individuals contain only one allele type (+/+ or -/-).





STRPs can easily be identified by PCR using sequence specific primers that flank the polymorphic region. Since the number of tandem repeats can vary by a single repeat length (e.g., 150 bp), multiple polymorphisms can be detected with the same PCR primers (A, B, C).






What is the value of having a molecular marker such as an RFLP or STRP, is this considered a genotype or a phenotype?


What assumption must you make when analyzing human DNA samples using the STRP approach, i.e., what must be true about the tandem repeat pattern in order to intrepret pedigree data using blood samples?


What would you conclude in the above example if the STRP pattern of one of the offspring were A/A?




Department of Biochemistry & Molecular Biophysics
The University of Arizona
Professor Roger L. Miesfeld
RLM@u.arizona.edu
© 2000. All rights reserved.