*This website was produced as an assignment for an undergratuate course at Davidson College.*


Orthologs of Reverse Transcriptase

Orthologs are genetic elements that are conserved across species and have evolved from a common ancestor (Sadava et al., 2008). Examination of orthologs can provide a great deal of information about the function of a particular protein and the importance of various domains within the protein (i.e. catalytic sites, DNA binding sites etc...).  

Reverse transcriptases are categorized into a larger family of genetic elements called retroelements. Retroelements (also known as transposable elements) are unique gene sequences that can be transferred within the genome via an RNA intermediate (Nature Glossary Online). These retroelements have been observed in a wide variety of species and their abundance across different genomes supports the idea that they evolved many millennia ago. Reverse transcription is thought to have been integral in converting from RNA to DNA as the primary genetic material is most life forms, which was evolutionarily important because DNA is a much more stable molecule than RNA (Flavell 1995).



Image from Xion and Eickbush 1990. Permission pending.
Figure 1. This figure depicts the proposed evolutionary history for the origin of various retroelements. The reverse transcriptase portion of each retroelement is depicted as solid shading.



Image from Xiong and Eickbush 1990. Permission pending.

Figure 2. Phylogenetic tree depicting reverse transcriptase-like (RTL) elements found in different species. There are more than 19 different species depicted in this tree that contain RTLs (this is not including the wide diversity of reverse transcriptases among different types of retroviruses themselves). Of the big seven, human, mouse, Drosophila, Arabidopsis, yeast and E. coli are represented in this tree. It is interesting to note that different types of retroelements are found within the same species (i.e. both LTR and non-LTR retrotransposons are found in D. melanogaster).

Conservation of Reverse Transcriptase in Various Species:

Conservation Among Retroviruses:
Five retroviruses’ reverse transcriptase sequences were compared and 10 residues out of 94 were invariant among all five viruses (Toh et al., 1983). This level of conservation is significant, because the viruses surveyed were very different from each other. In addition, these sequences were compared against polymerases in other species, and the level of conservation observed between the five retrovirus reverse transcriptases compared in this paper was not seen in these latter comparisons. Thus, the homology observed between the reverse transcriptases in these five viruses is unique to the reverse transcriptase protein. In some of the subsequent figures, the sequences from these retroviruses are compared to putative retroelements in other species in order to determine whether or not the protein is a reverse transcriptase.


Image from Toh et al., 1983. Permission pending.

Figure 3. This figure shows the alignment of putative reverse transcriptase regions in five different viruses. The dots indicate positions that are invariant among the five viruses. Boxes indicate residues that have conservative amino acid substitutions.

Drosophila melanogaster:

D. melanogaster has several components of its genome that have reverse transcriptase-like elements (Figure 2). The copia transposable element is one of the most well-studied RTLs in this organism. One of the most highly conserved regions of the reverse transcriptase enzyme occurs between amino acid residues 1019-1024 in the copia element (Figure 4). There is also a second highly conserved region of amino acids in this enzyme that was found between amino acids 1087-1094 in the copia ORF (Figure 4). This second highly conserved domain contains the YXDD box that is the active site for reverse transcriptase. In addition, the function of this element as a reverse transcriptase is highly supported because it contains DNA sequence that could be a tRNA primer-binding site, which is crucial for reverse transcription to take place (Mount and Rubin 1985).


Image from Mount and Rubin 1985. Permission pending.

Figure 4. Homology between the copia element of D. melanogaster and reverse transcriptases found in retroviruses. Stars indicate homology and homology was deemed significant when a particular residue was shared by three or more of the proteins being compared. The red box (not present in the original figure) indicates the YXDD box that is the reverse transcriptase active site.

Saccharomyces cerevisae:
The yeast genome has a group of transposable elements called Ty elements (Clare and Farabaugh 1985). The Ty912 element contains two open reading frames (ORFs): tya912 and tyb912. Tya912 shows distinct homology to DNA binding proteins (Clare and Farabaugh 1985), which is important for the transposable element to move within the genome. The tyb912 protein shows sequence homology to both reverse transcriptase (amino acid residues 306-359) and DNA polymerase (amino acid residues 835-1046; Clare and Farabaugh 1985). In addition, there is a high level of homology between this Ty element and the copia element found in D. melanogaster (Mount and Rubin 1985; Figure 5). The Ty element has a slightly different RT active site than the one found in D. melanogaster: tyrosine is changed to phenylalanine to give FVDD instead of YVDD. This lack of conservation between the first two amino acids of the YXDD box is seen in other species and will be discussed further later.


Image from Mount and Rubin 1985. Permission pending.

Figure 5. This is a sequence comparison of the copia element found in D. melanogaster and the Ty912 ORF in S. cerevisiae. Stars indicate regions of amino acid homology. The red box (not present in the original figure) depicts the location of the YXDD box.

Chlamydomonas reinhardtii:
Reverse transcriptase-like (RTL) elements are found within rRNA genes in the mitochondrial genome of C. reinhardtii (Boer and Gray 1988). The C. reinhardtii RTL protein was compared to mitochondrial plasmids (discussed further later) found in Neurospora crassa, which are closely related to reverse transcriptases of retroviruses (Boer and Gray 1988). Of the nine amino acid residues that are invariant among viral reverse transcriptases, the RTL protein in C. reinhardtii contains at least five of these residues (Figure 5). However, the YXDD box in the RTL element of C. reinhardtii is not highly conserved: the second D is changed to N (Figure 5). Although this is a conservative amino acid subsitution (both aspartic acid and arginine are hydrophilic), the two aspartic acid residues of the YXDD box are rarely subsituted with other amino acids in the other species studied. Thus, further characterization of this element is necessary, in order to determine whether or not this element is a true homolog of reverse transcriptase, or if it is simplay a common RNA binding protein that is the result of convergent evolution (Boer and Gray 1988).


Image from Boer and Gray 1988. Permission pending.

Figure 6. This diagram depicts sequence homology between the RTL element of C. reinhardtii and the Mauriceville plasmid in N. crassa.  The residues that are invariant among the five viral reverse transcriptases (Toh et al, 1983) are indicated by filled triangles and deviations from this pattern are indicated by open triangles. The boxed regions are regions of homology that were previously identified via comparisons between mitochondrial ORFs and reverse transcriptases. The red box (not in the original figure) indicates the location of the YXDD box.

Arabidopsis thaliana
Ta1 is a transposable element found in the plant A. thaliana. Although this retrotransposon is present within this organism it is believed to no longer be functional (Voytas et al., 1990). This element could have become inactive within the genome because a) it could have failed to propagate itself enough in order to be able to offset non-conservative mutations and/or b) this genetic element could be detrimental to the fitness of A. thaliana (Voytas et al., 1990). The figure below shows sequence comparison between the RNA binding domain of the Ta1 element and reverse transcriptases found in other viruses. Conservation of important residues in the RNA binding domain is crucial, because if the RNA cannot bind to the reverse transcriptase, then this protein cannot do its job. There is a high degree of amino acid conservation at the boxed residues in all of the Ta1 elements except for Ta1-3 (Figure 7), which could have contributed to the loss of function of this genetic element in A. thaliana.


Image from Votyas et al., 1990. Permission pending.

Figure 7. Sequence comparison in the RNA binding domain between the Ta1 elements, HIV-1 reverse transcriptase, other reverse transcriptases from retroviruses (RSV and MMULV), and the Tnt1 element from the N. tabacum plant. Conserved residues are boxed and the non-conserved amino acid substitution at Ta1-3 is shown in bold.

msDNA in bacteria (M. xanthus and E. coli)
There are certain chromosomal loci within bacterial genomes called retrons that produce msDNA (multi-copy single stranded DNA) and reverse transcriptase. The reverse transcriptase portion of the retron is translated first and then this protein is responsible for creating a DNA copy of the msDNA mRNA (Rice and Lampson 1996). The precise function of this mechanism is not yet known, but some scientists suggest that msDNA plays an important role in mutagenesis and may increase the mutation rate of the host cell which is important for bacteria to be able to successfully hinder the activity of a host cell (Rice and Lampson 1996). Additionally, this mechanism may serve as a primer for reverse transcription of other mRNAs within the bacterial genome (Rice and Lampson 1996). Sequence analysis of the RT genes found in bacteria reveals that bacteria have the highly conserved YXDD box (in bacteria it is YADD; Rice and Lampson 1996; Figure 7).


Image from Rice and Lampson 1996. Permission pending.

Figure 8. This figure shows the comparison between retrons found in different species of bacteria, which are all compared to Cal (which is the RT sequence from the group II bacterial intron) and HIV-1 reverse transcriptase. The boxed regions indicate the seven highly conserved reverse transcriptase domains. Amino acids that are conserved are indicated in bold, with the conserved amino acid indicated at the top of the column. A capital letter indicates that the amino acid is found in all reverse transcriptases and a lower case letter indicates that that amino acid is unique to bacterial reverse transcriptases. The red box (not in the original figure) indicates the location of the YXDD box.

LINE-1(LI) Retrotransposons in Humans
L1 retrotransposons encode two ORFs, the first of which translates into an RNA-binding protein and the second of which translates into a reverse transcriptase (Han and Boeke 2005). These retrotransposons are an example of non-LTR retrotransposons, meaning that they do not have long term repeat (LTR) sequences flanking the retrotransposon (Rice and Lampson 1996). These retrotransposon sequences are widely distributed throughout the human genome. Their precise function is not fully characterized, but it is believed that they are involved in gene expression and regulation (Han and Boeke 2005). For example, LI is thought to be involved in epigenetic regulation. Silencing of the LI promoter via methylation leads to formation of heterochromatin (tightly packed DNA), which could influence expression of adjacent gene sequences (Ban and Hoeke 2005).

In order to make sure that reverse transcriptase sequences were accounted for in all of the “big seven” species, I performed a pBLAST with HIV-1 reverse transcriptase sequence information from PDB. I restricted the search to the "big seven" species. In addition, I entered sequence information from pBLAST into the Conserved Domains Database (CDD) in order to determine the conserved domains found in each sequence. The information is summarized in the table below.

One of the most striking results from the NCBI pBLAST is that an RTL element is found in C. elegans, when this species was not depicted on the phylogenetic tree in Figure 2. Interestingly, this search did not find any RTLs in E. coli. This could be due to the parameters placed on the search or that the protein present in E. coli was not present in the databases that this search retrieved information from. Another interesting finding from this search was that there was a great deal of variation in the first amino acid residues of the YXDD box, but the lasttwo aspartic acid residues were highly conserved among all species. It has been shown that there is some variation of the tyrosine position in this catalytic site in various species (Battacharya et al., 2002), thus variations in the first two amino acids of the YXDD box can still result in a functional reverse transcriptase protein. This was further supported by the information found in CDD, as all of the proteins from the NCBI pBLAST results had a reverse transcriptase domain. The pBLAST results also reveal that the particular RTL element found in D. melanogaster does not have an RNase domain (Figure 9). This suggests that this particular RTL may be a truncated version of the full length reverse transcriptase and that the mechanism involved in this RTL does not require that the RNA template be degraded.


Protein Name; Accession Number; e-value
CDD Comments
Homo sapiens
Polymerase; AAC63291.1; 7e-38
Fully conserved reverse transcriptase domain from retroviruses and RNase domain
Mus musculus
Pol protein; BAF81993; 3e-33
Same as above
Caenorhabditis elegans
Retrotransposon-like family member (retr-1); NP_498959; 3e-12
Conserved reverse transcriptase domain from retrotransposons and retroviruses and RNase domain
Drosophila melanogaster
Pol protein; AAF36671; 1e-10
Conserved reverse transcriptase domain from retrotransposons and retroviruses
Saccharomyces cerevisiae
Pol3; AAA98435.1; 6e-10
Conserved reverse transcriptase domain from retrotransposons and retroviruses and RNase domain
Arabidopsis thaliana
Putative retroelement pol protein; AAD24647.1; 7e-10
Same as above

Figure 9. pBLAST results for with the HIV-1 RT protein sequence (sequence information obtained from PDB). I restricted the search to six organisms seen in the table, plus E. coli. I then entered the sequences obtained from pBLAST (I chose the sequence with the lowest e-value) into the Conserved Domains Database (CDD) in order to determine the regions of homology found in the various proteins. I also performed a multiple sequence alignment of all of the proteins found and highlighted the portion of the alignment that shows the region of the YXDD box (red box).

Other Mechanisms Involving Reverse Transcriptase:

Some species, such as Tetrahymena thermophila, as well as other eukaryotes, have evolved ways in which to combat the problem of linear chromosome shortening (either due to attack by exonucleases and/or the inability of DNA polymerase to replicate the entirety of the chromosome; Flavell 1995). This species in particular uses telomerase, which is a specialized reverse transcriptase, that uses an RNA primer to extend the otherwise truncated terminal ends of linear chromosomes. This allows for increased longevity of a particular cell.

Group II Introns:
These are another group of retroelements that are unique to eukaryotes. In this mechanism, certain introns are transposed between different RNA segments and are then transcribed into DNA via reverse transcriptase (Flavell 1995). This is seen more commonly in lower eukaryotes (Flavell 1995) and can be an important mechanism for mutation within the genome which is crucial for the process of evolution.

Mitochondrial Plasmids:
Some fungi, such as Neurospora, are known to contain double stranded plasmids in their mitochondria, which are replicated by reverse transcriptase (Flavell 1995). These mitochondrial plasmids have secondary structure on their 3’ end that which has homology to tRNA and serves as the primer to replicate the RNA template strand into DNA via reverse transcription.


Image from Rice and Lampson 1996. Permission pending.

Figure 10. Domain organization of reverse transcriptases in different species. Panel A shows RT coding elements in prokaryotes and panel B shows RT coding elements in eukaryotes. The complexity greatly increases from bacteria to retroviruses. RT=reverse transcriptase, RH=RNase H, Zn=zinc finger motif, G=gag region, P=protease domain, IN=integrase domain, ENV=envelope domain and LTR=long terminal repeats.

Reverse transcriptases are widespread throughout different species. There is a great deal of variation within the amino acid sequences themselves (even within the most conserved residue of the protein, the catalytic YXDD box), but there is a high degree of conservation of function among many different species. Not only is there diversity of these RTLs across species, there can also be a wide variety of RTLs within the same species, as can be seen by the number of times the same species appears in the phylogenetic tree in Figure 2 and by the fact that different RTLs were found in the article search as compared to the NCBI pBLAST search. In addition, the organization of genetic elements that contain reverse transcriptases is widely varied and the complexity of these genetic elements increases in higher order organisms (Figure 10). The widely varied organization of RTLs in different species makes sense, because these genetic elements have very different functions in different organisms. In some organisms the function is clearly defined (ex. retroviruses), whereas in other organisms, the precise function is not yet known (ex. msDNA in E. coli). The widespread nature of reverse transcriptase across and within species supports the idea that this protein has been around for quite some time.


Nature Glossary Online. Available from http://www.nature.com/nrn/journal/v4/n10/glossary/nrn1219_glossary.html Nature Reviews Glossary [2010 October 23].

Battacharya S, Bakre A and Battacharya A. 2002. Mobile genetic elements in protozoan parasites. Journal of Genetics 81: 73-86. http://www.ncbi.nlm.nih.gov/pubmed?term=Mobile[Title]%20AND%20genetic[Title]%20AND%20elements[Title]%20AND%20protozoan[Title]%20AND%20parasites[Title]&log$=citationsensor

Boer PH and Gray MW. 1988. Genes encoding a subunit of respirary NADH dehydrogenase (ND1) and a reverse transcriptase-like protein (RTL) are linked to ribosomal rNA gene pieces in Chlamydomonas reinhardtii mitochondrial DNA. The EMBO Journal 7: 3501-3508.

Clare J and Farabaugh P. 1985. Nucleotide sequence of a yeast TY element: evidence for an unusual mechanism of gene expression. Proceedings of the National Academy of Science 82: 2829-2833.

Flavell AJ. 1995. Retroelements, reverse transcriptase and evolution. Comparative Biochemistry and Physiology 110B: 3-15.

Han JS and Boeke JD. 2005. LINE-1 retrotransposons: modulators of quantity and quality of mammalian gene expression? BioEssays 27: 775-784.

Mathias SL, Scott AF, Kazazian HH, Boeke JD and Gabriel A. 1991. Reverse transcriptase encoded by a human transposable element. Science 254: 1808-1810.
Mount SM and Rubin GM. 1985. Complete nucleotide sequence of the Drosophila transposable element copia: homology between copia and retroviral proteins. Molecular and Cellular Biology 5: 1630-1638.

Rice SA and Lampson BC. 1995. Bacterial reverse transcriptase and msDNA. Virus Genes 11: 95-104.

Saigo K, Wataru K, Matsuo Y, Inouye S, Yoshioka K and Yuki S. 1984. Identification of the coding sequence for a reverse transcriptase-like enzyme in a transposable genetic element in Drosophila melanogaster. Nature 312: 659-661.

Sadava D, Heller CH, Orians GH, Purves WK and Hillis DM. Life: The Science of Biology 8th Edition. Massacheusetts and Virginia: Sinauer Associates Inc. and W.H. Freeman and Company, 2008. Print.

Voytas DF, konieczny A, Cummings MP and Ausubel FM. 1990. The structure distribution and evolution of the Ta1 retrotransposable element family of Arabidopsis thaliana. Genetics 126: 713-721.

Xiong Y and Eickbush TH. 1990. Origin and evolution of retroelements based upon their reverse transcriptase sequences. The EMBO Journal 9: 3353-3362.





Pallavi's Homepage

Molecular Biology Homepage

Davidson College Homepage

Please direct questions or comments to Pallavi Penumetcha