![]() |
![]() |
||
| Lecture 3 - Nucleic Acids Lab Practicums | |||
Isolating Evolutionarily Conserved Gene Sequences
Research Objective
The trypsin gene encodes a protease that is required for digestion of a mosquito's blood meal. If a female mosquito cannot obtain nutrients from a blood meal due to inability to digest proteins, then it would effectively prevent egg laying.
A graduate student reasons that isolation and characterization of the Aedes aegypti trypsin gene could lead to biochemical assays designed to develop mosquito trypsin inhibitors that might make useful pesticides. The Drosophila (fruit fly) trypsin gene has been cloned and is available. Based on the likely homology between the Fly and Mosquito trypsin genes, the graduate student wants to use a portion of the cloned Drosophila DNA as a molecular probe to isolate the Mosquito trypsin coding sequence from a Mosquito cDNA library.
- lab model of multicellular organism - human pest (malarial vector)


Fruit Fly (Drosophila)
- genome has been sequenced
- trypsin gene has been clonedMosquito (A. aegypti)
- few genes known or isolated
- trypsin gene structure is unknown
Research Materials
Orthologous genes are those genes that have the same function in two different organisms. For example, the Drosophila trypsin gene is the ortholog of the human trypsin gene, and presumably the trypsin gene in mosquitos. In contrast, paralogous genes correspond to related genes in the same organism that perform similar functions and contain a high degree of amino acid sequence identity. The trypsin and chymotrypsin genes of Drosophila both encode serine proteases and are considered paralogous genes. Gene sequences from orthologous or paralogous genes can be used as nucleic acid probes to isolate new genes if the hybridization conditions are modified to permit a degree of base pair mismatches.
Amino acid sequence comparisons of Human Trypsin, Drosophila Trypsin and a Drosophila Chymotrypsin:
10 20 30 40 50 60
| | | | | |
HumanTr -----------------------------------------IVGGYNCEENSVPYQVSLN
FlyTryp MRSSIGLTGMAKTILHLFIGGIPPGKSELRSHCKAPTLDGRIVGGQVANIKDIPYQVSLQ
FlyChym -QADQPDLVYPEYYQQRSLYGLQSNFSGRR--------RARVVGGEDGENGEWCWQVALI
:*** : . :**:*
Prim.cons. M2222222222222222222G22222S22RSHCKAPTL22RIVGG333E3333PYQVSL3
70 80 90 100 110 120
| | | | | |
HumanTr SGY--HFCGGSLINEQWVVSAGHCYKS------RIQVRLGEHNIEVLEGN--EQFINAAK
FlyTryp RTY--HFCGGSLIAQGWVLTAAHCTEGSAIL--LSKVRIGSSRTSVG-G----QLVGIKR
FlyChym NSLNQYLCGAALIGTQWVLTAAHCVTNIVRSGDAIYVRVGDYDLTRKYGSPGAQTLRVAT
::**.:** **::*.** . **:*. * * :
Prim.cons. 33YNQHFCGGSLI33QWVLTAAHC3332222GD3I3VR3G33333V32G2PG2Q3333A3
130 140 150 160 170 180
| | | | | |
HumanTr IIRHPQYDRKTLNNDIMLIKLSSRAVINARVSTISLP--TAPPATGTKCLISGWGNTASS
FlyTryp VHRHPKFDAYTIDFDFSLLELEEYSAKNVTQAFVGLPEQDADISDGTPVLVSGWGNTQS-
FlyChym TYIHHNHNSQTLDNDIALLKLHGQAELRDGVCLVCLPARGVSHAAGKRCTVTGYRYMGE-
* :.: *:: *: *::* : . . : ** . : *. ::*: .
Prim.cons. 33RHP33D33TLDNDI3LLKL333A33N33V33V3LP223A33A3GT3CLVSGWGNT3SS
190 200 210 220 230 240
| | | | | |
HumanTr GADYPDELQCLDAPVLSQAKCEASYP--GKITSNMFCVG------FLEGGKDSCQGDSGG
FlyTryp AQETSAVLRSVTVPKVSQTQCTEAYGNFGSITDRMLCV-------ITEGGKDACQGDSGG
FlyChym AGPIPLRVREAEIPIVSDTECIRKVN---AVTEKIFILPASSFCAGGEEGHDACQGDSGG
. . :: * :*:::* :*..:: : * *:*:****.**
Prim.cons. A333P33LR3333P3VSQT3C333Y3NFG3IT33MFCV2ASSFCA33EGGKDACQGDSGG
250 260 270 280 290
| | | | |
HumanTr PVVCNG----QLQGVVSWGDGCAQKNKPGVYTKVYNYVKWIKNTIAANS-
FlyTryp PLAADG----VLWGVVSWGYGCARPNYPGVYSRVSAVRDWISSVSGI---
FlyChym PLVCQDDGFYELAGLVSWGFGCGRQDVPGVYVKTSSFIGWINQIISVNNL
*:..:. * *:**** **.: : **** :. **.. .
Prim.cons. PLVC3GDGFY3L3GVVSWG3GCAR3N3PGVY3KVS3333WI333I33N2L
This comparison was done using "CLUSTALW" alignments with an WWW-based application in Lyon, France. The catalytic triad (His-57, Asp-102, Ser-195) found in all serine proteases is highlighted in bold.
Protein structure of chymotrypsin showing the highly conserved amino acids in the catalytic triad

Gene structure and function studies often include the isolation and characterization of complementary DNA (cDNA) which is synthesized from mRNA using the viral enzyme reverse transcriptase (chapter 4). DNA sequences contained in cDNA clones represent the highly conserved protein conding sequence, also called the open reading frame (ORF) of the gene, and other non-coding sequences representing the 5' leader and 3' untranslated region (UTR) of the mRNA. For these studies, the graduate student has available a full-length Drosophila trypsin cDNA which contains 6.5 kb of DNA containing both ORF and UTR sequences. Importantly, UTR sequences often contain repetitive sequence elements that are imbedded in this "non-functional" region.
Purified mRNA from Drosophila and Aedes aegypti is available for preliminary studies, and a mosquito cDNA library with over a million recombinant phage has been purchased from commercial sources. Based on other paralogous trypsin genes, and the similarity in biochemical functions of the fly and mosquito trypsin genes, the student predicts that there is an 80% identity between the coding sequence (ORF) of the two genes. However, he also thinks that there are repetitive dinucleotide and trinucleotide sequences (ACACACAC and CAGCAGCAG) in his Drosophila trpsin cDNA clone.
Basic Strategy
1. Use restriction enzymes and Southern Blotting to prepared a fragment of the fly cDNA that contains no repetitive sequence
2. Demonstrate that this fly fragment hybridizes to a unique mosquito mRNA transcript using Northern blotting.
3. Screen the mosquito cDNA library (chapter 4) and purify a candidate trypsin cDNA clone.
4. Determine the DNA sequence of this cloned gene by running Sanger dideoxynucleotide sequencing reactions in the lab using polyacrylamide gel electrophoresis and radioactive labeling.
Comments
Isolation of related gene sequences by hybridization methods relies on sufficient complementarity to permit stable helix formation under the chosen conditions. If the hybrdization condtions are too stringent (high temperature and low ionic strength) then related gene sequences with even a few mismatched base pairs may not be identified. In contrast, if the hybridization conditions are not stringent enough, then too many non-specific DNA duplexes will form and thus obscure the bona vide signal. Pretesting the hybridization conditions by Southern (or Nothern) Blotting is one way to determine an optimal method. Note that identification of related gene sequences can also be done "in silico" (by computer) if all required genomes or cDNA clones have been sequenced using bioinformatics. Indeed, in the final step of the research plan, the DNA sequence of the candidate cloned gene will need to be compared to the sequence of the known fly trypsin gene using bioinformatic analyses.
Nucleotide sequence comparison of the fly and mosquito cDNA clones in the region of the catalytic serine. This stretch of cDNA sequence represents the most highly conserved sequences between the two clones.

Amino acid sequence comparison of the Fly and Mosquito (Aedes) trypsin proteins based on the newly isolated cDNA. Note the many regions of similarity are found outside of the most conserved segment. These data would confirm that the candidate Mosquito cDNA clone encodes a protein that is most likely the trypsin ortholog.
10 20 30 40 50 60
| | | | | |
FlyTrp MRSSIGLTGMAKTILHLFIGGIPPGKSELRSHCKAPTLDGRIVGGQVANIKDIPYQVSLQ
AedTrp --------MFTSTVVFASLMALASAFPSLD----N----GRVVNGQTATLGQFPFQVLLK
::.*::. : .:... ..* **:*.**.*.: ::*:** *:
Prim.cons. MRSSIGLT2222T222222222222222L2SHCK2PTLDGR2V2GQ2A22222P2QV2L2
70 80 90 100 110 120
| | | | | |
FlyTrp RTYH----FCGGSLIAQGWVLTAAHCTEGSAILLSKVRIG--SSRTSVGGQLVGIKRVHR
AedTrp VELSQGRALCGGSLLSDQWVLTAGHCTDGAKSFEVTLGAVDFEDTTNDGRVVLTATEYHR
:*****::: *****.***:*: : .: .. *. * :: .. **
Prim.cons. 2222QGRA2CGGSL2222WVLTA2HCT2G22222222222DF222T22G222222222HR
130 140 150 160 170 180
| | | | | |
FlyTrp HPKFDAYTIDFDFSLLELEEYSAKNVTQAFVGLPEQDADISDGTPVLVSGWGNTQSAQET
AedTrp HEKYNPLFATNDVAVVKLPTPVAFNDRVQPVKLPTGSDTFTD-REVVVSGWGLQKNGGNV
* *::. *.::::* * * * ** . ::* *:***** :.. :.
Prim.cons. H2K22222222D22222L2222A2N22222V2LP2222222DG22V2VSGWG22222222
190 200 210 220 230 240
| | | | | |
FlyTrp SAVLRSVTVPKVSQTQCTEAYGNFGSITDRMLCVITEGGKDACQGDSGGPLAADG--VLW
AedTrp ADKLQYAPLTVISNNECSKAYSPL-VIKKTTLCAKGENKESPCQGDSGGPLVLEGENVQV
: *: ..:. :*:.:*::**. : *.. **. *. :..*********. :* *
Prim.cons. 222L22222222S222C22AY222G2I2222LC222E22222CQGDSGGPL222GENV22
250 260 270
| | |
FlyTrp GVVSWGYGCARPN-YPGVYSRVSAVRDWISSVSGI
AedTrp GVVSFGHAVGCEQGYPGAFARLTSFVDWIKQKTGL
****:*:. . : ***.::*:::. ***.. :*:
Prim.cons. GVVS2G2222222GYPG222R22222DWI2222G2
Prospective
It is likely that more than one cDNA clone will be identified in the library screening because of the overlapping nature of library clones. Restriction enzyme mapping (chapter 3) and DNA sequencing would need to be done to determine if a full-length clone had been isolated. If not, then additional library screening would be necessary with a 5' region of the newly isolated mosquito trypsin cDNA. Assuming that the research strategy was eventually succesful, biochemical analyses of recombinant mosquito tyrpsin protein would be the next step in to overall objective aimed at finding enzyme inhibitors that could be used as target specific pesticides.
How would restriction enzymes and Southern Blotting be used to prepare a segment of the fly trypsin cDNA that contained no mosquito repetive sequences?
How might the Northern Blotting parameters need to be optimized if after the first attempt there is no clear indication that the Drosophila cDNA will be a useful probe to screen the Mosquito library?
How are radioisotopes used as a "label" in nucleic acid hybridaztion methods (Southern/Northern blots, library screening, etc.)? Assuming that multiple candidate mosquito cDNA clones hybridized to the fly probe, what would be the explanation for clones that gave a dark spot (highly radioactive) or a light spot (low amounts of radioactivity) on the x-ray film?
Based on the cDNA sequence comparison shown above for the two cDNA clones in the vicinity of the catalytic serine, what is the degree of sequence similarity expressed in "percent identity?" How do you explain the observatoin that the amino acid sequence is identical in some parts of this sequence even though there are mismatched nucleotides?
Biochemical studies had previously shown that the enzymatic activity of Mosquito trypsin increases rapidly in the gut of females soon after a blood meal. What experiment could be done using the newly isolated Mosquito trypsin cDNA, and sample material obtained from a laboratory Mosquito colony, to determine if the observed increase in trypsin enzyme activity was due to an increase in transcription of the trypsin gene?
Mapping Transcriptional Start Sites
Research Objective
A molecular endocrinologist is interested in estrogen-regulated gene expression in mammary epithelial cells. She has recently isolated a nearly full-length cDNA clone and the corresponding 5 genomic DNA sequences for a gene that is induced 10-fold by estrogen treatment of human mammary epithelial cells in culture. Her research objective is to identify the 5 end of the gene transcript in order to facilitate future studies aimed at investigating estrogen-regulated expression of this gene in normal and tumorgenic mammary epithelial cells.
Research Materials
1. Based on the cDNA sequence, and the estimated size of the gene transcript from Northern blots, she predicts that the 5 end of the transcript is 50-150 nucleotides further upstream of the sequence in her longest cDNA clone.
2. A plasmid subclone of genomic DNA has been constructed which corresponds to a 2 kb region that overlaps with the most 5 cDNA sequence and therefore is likely to contain the gene promoter. An EcoRI site is located in the exon 1 coding sequence and can be used as a landmark.

3. An antisense oligonucleotide has been designed that is 24 nucleotides long and has a 3 end that is located 10 nucleotides downstream of the 5 terminus of the cloned cDNA fragment.
4. RNA from untreated and estrogen-treated mammary epithelial cells has been isolated and shown by Northern blots to contain a 10-fold difference in steady-state levels of the new gene transcript.
Basic Strategy
There are two methods commonly used to map the 5 end of mRNA transcripts when both the cDNA and genomic DNA corresponding to the 5 region of the gene have been cloned. The first method is called RNase mapping (we will talk about this later when we get to chapter 5). The second method is called primer extension which utilizes an end-labeled oligonucleotide that serves as a primer for cDNA synthesis using the enzyme reverse transcriptase as outlined below. A third approach for transcript mapping is called S1 nuclease mapping which is similar to RNase mapping except that a single strand end-labeled DNA probe is used.

In the presence of this gene-specific primer, dNTPs and cellular RNA, reverse transcriptase synthesizes cDNA from any primer that is annealed to template RNA. The length of the longest end-labeled cDNA product should correspond to the total number of nucleotides between the 5 end of the primer on the antisense strand, and the 5 terminus of the mRNA template. For both the RNase mapping and primer extension methods, the pattern of product formation from parallel reactions using RNA from either untreated and estrogen-treated cells, would be used to confirm specific mapping of the estrogen-induced gene.
Comments
The predicted outcome from the primer extension assay is the reverse transcriptase (RNA-dependent DNA polymerase) will use the oligonucleotide as a primer only on specific complementary mRNA transcripts. The enzyme will extend the DNA primer by synthesizing cDNA until the template terminates and the enzyme falls off. Since the oligonucleotide primer was radioactively labeled at the 5' end, the extended products will also be labeled. Following the in vitro reaction, the cDNA products are loaded onto a polyacrylamide gel and resolved by electrophoresis. By using different sources of mRNA for the priming reaction, a single cDNA product should be identified that is specific to the priming event and associated with estrogen treatment of mammary cells. Results from this experiment are shown below.

By determining the length of the extended cDNA product, it is possible to predict the location of the 5' of the transcript on the genomic DNA segment.

These data suggest that the gene promoter (RNA polymerase initiation site) lies 128 bp upstream of the EcoRI site in the genome.

Prospective
Once the 5 end of a gene transcript is localized within the context of a genomic sequence, it becomes possible to functionally test for gene promoter activity using sequences within the first ~200 nucleotides upstream of the transcriptional start site. In this example, the researcher could construct a reporter plasmid (chapter 4) which contains the putative estrogen-regulated promoter and test its activity in normal and tumorgenic mammary cells that have been treated with estrogens. Subsequent promoter mapping experiments could then be done to determine if this gene is a primary target of estrogen action, which may be important to understanding its regulation in normal and neoplastic mammary cells. In vitro transcription studies could also be performed to map the transcriptional start site. This would be done using truncated versions of the cloned genomic DNA as a template in reactions containing nuclear cell extracts and [a-32P]UTP.
What conditions are needed in the initial primer annealing reaction in order to maximize the specificity of the reaction? What is a limitation of the reverse transcriptase reaction that may effect the specificity of the priming reaction?
Explain why no product is observed in lanes 1 and 2 of the autoradiograph shown above? Why were these reactions performed? What would it mean if there were products lane 2?
Why is the amount of product in lane 4 greater than in lane 3? Why is there less of the labeled primer in lane 4 as compared to lane 3? What might you conclude if the intensity of the 150 nucleotide band were the same in lanes 3 and 4?
Why is the distance between the promoter and the EcoRI site shown on the summary map as 128 nucleotides even though the primer was only extended 126 nucleotides? Could the extended cDNA product be digested with EcoRI enzyme following the extension reaction? Explain.
Why did the oligonucleotide primer lack the 5' terminal ten nucleotides present in the cDNA clone? Why is this important for for follow-up experiments that would include isolation of a double-strand cDNA clone corresponding to the extended product?
| Department of Biochemistry & Molecular Biophysics The University of Arizona Professor Roger L. Miesfeld RLM@u.arizona.edu © 2000. All rights reserved. |