*This website was produced as an assignment for an undergratuate course at Davidson College.*

T-Cell Surface Glycoprotein CD4 Sequence and Orthologs

This web page examines the protein sequence of the T-Cell Surface Glycoprotein CD4 in Humans. Through the usage of different protein databases such as NCBI, BLAST, Entrez, OMIM, and PDB the ortholog sequences of CD4 were found and compared. These CD4 orthologs, which are CD4 genes that have evolved from common ancestors, can be used to understand which protein residues have been conserved throughout the different species. This web page will illustrate the similarities and differences between the CD4 sequences in different species and demonstrate which residues seem to be the most important to maintain the primary shape and function of CD4.

The Sequence of CD4:

When finding and comparing the CD4 ortholog sequences, the Human protein sequence was used as the standard, or basis, for the comparison. In Humans, the mature cDNA sequence of the T-Cell Glycoprotein CD4 is composed 1742 base pairs and 450 amino acids. Figure 1 illustrates the protein sequence of CD4 from 1 to 450 in Humans

 

Figure 1. Protein sequence of Human T-Cell Glycoprotein CD4. All 450 amino acids can be seen in this figure. Human CD4 was the basis for the crystal structure of the protein. This figure can be found at the protein data base NCBI

 

The Orthologs:

Once the protein sequence of CD4 was determined, a protein-protein BLAST was run to determine some of the Human CD4 orthologs that have maintained similarities in the protein sequence. When the search was completed, the BLAST analysis yielded 553 hits. In order to only compare the orthologs that are quite similar, only the hits with a score greater than 200 were considered. Figure 2 below shows the different hits that were obtained.

Figure 2. Hits from the BLAST analysis. Although there were 553 hits, only those with a score of 200 or higher are displayed. This insures that the orthologs compared will be quite similar to each other in protein sequence. Click here to see all 553 BLAST hits. *To view this web site you must copy and paste the id code 1110353629-1984-159526979426.BLASTQ2.

Now that the BLAST test was run and it was obvious that many of the CD4 Protein Residues have been maintained throughout other species, we can now begin to see how closely related the species actually are. To begin, the SwissProt Server was used to compare the CD4 structures of many closely related species. Although the figure is quite extensive, Figure 3 clearly displays the similarities and differences within the protein sequences between the different species. Many of these species such as the Mus Musculus (Mouse), Macaca Mulatta (Rhesus Macaque *a small primate*), Pan troglodytes (Chimpanzee), will be examined further to see what extent the protein residues are conserved. It should be interesting to see how related the protein structures are between Humans and our close relatives the Primates. Similarly, the protein differences between Humans and other mammals will be analyzed. With a quick glance at Figure 3 one can see that there are an extraordinarily large amount of similarities between Humans (CD4_Human) and the primates (CD4_Cerae, CD4_Macmu, etc.) showing that this gene has been conserved within these two species.

 

Figure 3. Complete look at the Protein sequences of CD4 as they occur in certain species. Human CD4 is the basis for comparison in this model. One can see that there are many consistencies between humans and primates as well as quite a few similarities between mammals. Please Click Here to visit the original website and click on the different species symbols (i.e. AAQ03208 ) to see the different species evaluated in this figure.

After reviewing Figure 3, we see that there are many similarities between humans and primates as well as humans and mammals. This does not come as a great shock considering how closely related humans are to primates not to mention the fact that humans happen to be mammals. There is still some more analysis that needs be done for there are still some questions that need to be addressed. For example, how closely related are CD4 protein structures in other eukaryotes other than mammals? Also do prokaryotes even have these protein structures and if so how closely related are they to humans? To answer some of these questions more BLAST analysis were done between Human CD4 and the Mouse, C. elegans, Drosophila, Arabidopsis, Yeast, and E. coli CD4, respectively.

Figure 4. Protein-Protein BLAST analysis of Human CD4 and Mouse CD4. The 66% similarity shows that there are some aspects of the protein sequence are similar between the two species. The Human CD4 is labeled Query, while the Mouse CD4 is labeled Sbjct. Figure courtesy of the NCBI database.

When a Protein-Protein BLAST analysis was done between Human CD4 and the species Arabidopsis Thaliana, E. Coli, and Yeast no correlation between the two was found. With this information we can conclude that these species are not orthologs to Human CD4. When examining the other two species, Mus Musculus (Mouse) (Figure 4.), C. elegans (Worms) (Figure 5.), and Drosophila (Flies) (Figure 6.) only the mouse had signifigant similarities.

Figure 5. Protein-Protein BLAST analysis between Human CD4 and C. Elegans. The low precentage of similarities illustrates that the protein residues were not thouroughly conserved. The Human CD4 is labeled Query, while the C. Elegans is labeled Sbjct. Figure courtesy of the NCBI database.

Figure 6. Protein-Protein BLAST analysis between Human CD4 and Drosophila The low precentage of similarities illustrates that the protein residues were not thouroughly conserved. The Human CD4 is labeled Query, while the Drosophila is labeled Sbjct. Figure courtesy of the NCBI database.

Now that the similarities between Human CD4 and other mamalian species have been affirmed and the differences between non mamalian species illustrated, I thought it would be interesting to see how similar the Human CD4 gene is to Primates as well as a very random mammal, the Domestic Dog. Two more Protein-Protein BLAST tests were run with the Chimpanzee CD4 (Figure 7) protein sequence as well as the Dog Protein Sequence (Figure 8).

Figure 7. BLAST analysis of Human CD4 and Chimpanzee CD4. The 91% similarity shows that there are some aspects of the protein sequence are similar between the two species. The Human CD4 is labeled Query, while the Chimp CD4 is labeled Sbjct. Figure courtesy of the NCBI database.

Figure 8. BLAST analysis of Human CD4 and Dog CD4. The 68% similarity shows that there are some aspects of the protein sequence are similar between the two species. The Human CD4 is labeled Query, while the Dog CD4 is labeled Sbjct. Figure courtesy of the NCBI database.

The Conclusion:

So with all of the data that we have collected we have been able to determine that the CD4 gene has many mammalian orthologs (Ranging from the Bottle Nose Dolphin to the Beluga Whale, to the Great Apes of the primate sector) but very few similarities with insects, worms, and bacteria. According to our data Figure 9 shows the amino acid residues that have been conserved. The * in Figure 9 represent conserved residues and within the CD4 orthologs, the full conservation begins at amino acid 297 and spans about 74 amino acids to about Amino Acid 371. Because of the heavy conservation in this relatively small area, we can conclude that these 74 amino acids are vital to the structure and function of the CD4 protein in the mammals.

Figure 9. Protein-Protein BLAST sequencing in which we can see which residues are conserved throughout the species. The * indicates residues that have been conserved. The : represent residues that have conserved substitutions. The . represent semi-conserved substitutions. The CD4 protein sequence is shown starting at amino acid 263 and finishes at amino acid 383. Figure Courtesy of http://us.expasy.org/sprot.

When thinking about why the CD4 gene seems to be relativly conserved in Mammals but not in insects, worms, or bacteria, we should look back at the function of the CD4 and its purpose in our bodies. The fact that CD4 is a receptor protein for T-Cells is the biggest clue to why it is so conserved in mammals. The reason why is that all mammals have very complex immune systems, while insects, worms, and bacteria do not. The difference in our immune systems explains why primates, who are our close relative, have very similar sequences as humans, while dogs have a much more variable sequence.

References:

NCBI. 2005. <http://www.ncbi.nlm.nih.gov/> Accessed 8 March 2005.

SwissProt. 2005. Swiss-prot protein knowledgebase. <http://us.expasy.org/sprot/> Accessed 2005 8 Mar.


Oscar's Home Page

Molecular Biology Homepage

Davidson College Homepage

Comments, Questions, Concerns? Email me at oshernandez@davidson.edu.