Davidson College: WWW Homepage Template for Bio304

Test #2 Answers

Question #7

Part A: You work in a lab that studies Okazaki fragments. Find the Rasmol image for Taq DNA polymerase and put it on your web page so that I can click on it from your main page.

To view a three dimensional image of DNA polymerase from the species Thermus aquaticus, click below.

Rasmol Image of Taq DNA polymerase (source: National Center for Biotechnology Information. 14 November 1997. Entrez Structure Query. <http://www.ncbi.nlm.nih.gov/Structure> Accessed: 20 March 1998.

Part B: You work for Monsanto and have obtained a peptide sequence (IEESQFAIVVFSENY) from a plant protein. Search Genbank and tell me two things:

What is the name of the full-length protein and what species is it from?

TMV resistance protein N -- tobacco.
Nicotiana glutinosa

Are there any proteins from other species that have a high degree of sequence similarity? Explain your answer.

Resistance protein RPP5
Arabidopsis thaliana

Explanation:

To answer this question, I performed a blastp Genbank search using the given peptide sequence (referred to in the rest of this answer as the input: IEESQFAIVVFSENY). The Genbank search gave me the source of the protein (or at least a source that had an amino acid sequence that matched the input amino acid sequence perfectly). In addition, the Genbank search gave me a related protein, the resistance protein RPP5 from the organism Arabidopsis thaliana.

The Genbank search algorithm/program is quite complicated and involved. For a more detailed description of the algorithm and the rules for searching, click here. Basically, when Genbank searches, it looks for sequences that match the input protein sequence. If a perfect match is attained, the input sequence most likely came from the matched protein. Many times, however, the overlap between the input sequence and protein(s) found in the search is not perfect. In this case, the Genbank algorithm finds sequences that are related. A related sequence is one that may contain the majority of the input sequence (the same amino acids in basically the same order) with some amino acids missing or others added. The Genbank algorithm then scores these protein sequence matches (this is where it gets quite complicated and I am not sure that I understand all of the details). At a very basic level, the score that the Genbank algorithm gives to potential matches is based on the amount of overlap, the likelihood that any changes signify a divergence from the same original sequence, and other related factors.

The higher the score, the better the match. In the case of the sequence for this test, a perfect match was found between the input sequence and the TMV resistance protein N from the organism Nicotiana glutinosa. In other words, the input sequence appears exactly (same amino acids in the same order) in a portion of this protein.

Another protein, the resistance protein RPP5 from the organism Arabidopsis thaliana also matched but not as precisely. This protein had many of the same amino acids, in basically the same order but with some omissions/substitutions. Therefore, there was not a perfect match in this case. A score of 57 for this protein suggests, however, that there is a high degree of sequence similarity between the two proteins.

The following is the information that the Genbank search provided me about the resistance protein from Arabidopsis thaliana. As you can see, the score was 57 (remember, the larger the positive score, the better the match). Also, the amino acids from the input and the resistance gene were almost the same in almost the same order. Therefore, this is a good match.

Genbank Information on the Resistance Protein RPP5 from Arabidopsis thaliana

gnl|PID|e1250343 (AL021768) resistance protein RPP5 - like [Arabidopsis thaliana]

Length = 1715

Score = 57 (26.2 bits), Expect = 2.1, P = 0.88 Identities = 11/15 (73%), Positives = 13/15 (86%)

Query: 1 IEESQFAIVVFSENY 15

I+ES A+VVFSENY

Sbjct: 1418 IKESSIAVVVFSENY 1432

This information was obtained through a Genbank search: National Center for Biotechnology Information. 14 November 1997. Entrez Protein Query. <http://www.ncbi.nlm.nih.gov/Web/Search/index.html> Accessed: 20 March 1998.

Return to Personal Homepage

Return to Davidson College Molecular Biology Home Page

Send comments, questions, and suggestions to:

mjayellis@aol.com