*This website was produced as an assignment for an undergratuate course at Davidson College.*

from Saccharomyces cerevisiae

The Story of Orthologs...

What is an Ortholog?

    An ortholog is a protein with high homology to a proteins found in another species.  As species diverge, both new species carry many of the same genes (Figure 1).  These genes will often evolve differently forming two similar, but not identical genes.  These two genes then encode for the proteins that are considered orthologs because they have high similarity but are found in different species. 

Figure 1. Phylogenetic tree of seven sequenced hemiascomycetous yeast genomes based on multiple alignment of 94 single-copy genes conserved in 26 tasonomic groups (see Methods). Numbers next to each branch correspond to the number of families (clusters) specific to a genome or group of genomes leading to this node. (Figure Reproduced with Permission, 1)

Background on Gal4 Orthologss

  Orthologs of the Gal4 protein were not found in any other species included in the Big Seven Genomic Species (consisting of human, mouse, C. elegans, Drosophila, Arabidopsis, yeasts, and E. coli) except for yeast contained Gal4 orthologs.  An extensive literature search was performed as well as using BLAST, DB Phylome, and Ortho DB databases (BLAST Results, DB Phylome Results, and Ortho DB Results).  Of the 15 genes from 10 species found in the Ortho DB database search, it was found that only a handful had been characterized, and further only a few had been extensively characterized.
    The literature search confirmed that the Gal4 family of transcription factors is in fact a fungal specific family (2-4).  Interestingly, yeasts are eukaryotic organisms and are often used as a model due to similarities to human cells.  The Gal4 protein, however, is not one of the similarities.  This implies that this family of proteins developed after fungi diverged from other eukaryotes.   
    The remainder of this page will review the homology seen between Gal4 and one of the other most extensively characterized orthologs, the Lac9 protein in Kluyveromyces lactis.  A comparison of these orthologs has proven useful for identifying critical sequences in the function of Gal4 and Lac9 proteins.  No useful literature for the other orthologs reported in the Ortho DB search was found.

A Comparison: Gal4 vs. Lac9

    The Gal4 protein is necessary for harvesting galactose from the major food source of S. cerevisiae, melibiose (Gal4) (2).  A different strain of yeast, K. lactis, is found in milk where its primary food source is lactose (2).  In order to harvest galactose from lactose, it uses a slightly different set of proteins than does S. cerevisiae.  Ultimately both species obtain galactose from their respective environments (2).  
   The two species contained a common ancient ancestor, making it likely that they share many of the same genes (2).  In fact high conservation between the two species is seen (Figure 1).  Often conserved sequences are the most highly studied sequences because a useful way in understanding functions of recently identified proteins comes from comparison with proteins in other organisms that are “similar, yet divergent in both sequence and function” (3-4).  In this case, both strains of yeast must accomplish similar tasks: obtaining galactose from a food source.  It is probably that by studying similarities between the proteins used in the two yeasts, we can obtain significant information regarding the proteins.
    The Gal4 function has been described previously as necessary for regulation of galactose metabolism (Gal4).  In the similar yeast strain, K. lactis, this metabolism is regulated by a Gal4 ortholog, the Lac9 protein (3).   Upon analysis of the Lac9 protein in K. lactis, Salmerom et al., determined that there is significant functional similarity between the proteins encoded by the GAL4 and LAC9 genes, however the actual amino acid sequence is quite divergent (2).  Both proteins bind DNA as a “homodimer to specific upstream activating sequences (UASG) in the GAL promoters” (2). However the Gal4 and Lac9 proteins contain only about 30% of the same amino acid sequence (2-3).

Figure 2. Regions of homology (open boxes) between LAC9 and GAL4 Proteins. Thin lines represent regions possessing no more than 17% homology between the two proteins. (Permission Pending, 3)

    The homology between the Gal4 and Lac9 proteins can be seen in three regions (Figure 2).   Region I contains a series of 76 amino acids with high homology (3).  This region, which is located on the N-terminus, has been shown to be required for nuclear localization (3).  Further it has been suggested that this region is also necessary for DNA-binding as well, suggesting that the two proteins have similar DNA-binding domains (4).  In both studies the researchers were placing functions on the conserved sequences that are common to the two proteins.  Because both proteins must facilitate similar tasks within the cell, it makes since that such functions would be contained within the conserved sequences. For instance both act as transcription factors and therefore need to localize to the nucleus.  Without a correct signal sequence, this would not be accomplished and the protein would not function.  Both proteins also need to bind DNA, and are expected to do so in a similar manner.  There it is not surprising that the conserved regions of the protein are important for the common functions of the two proteins. 
    Region II and Region III were not completely described, but possible reasons for the conservation were suggested.  Region II, located close to the middle of both proteins, could play a role in oligomerization or other interactions that are common between the two proteins (3).  Others suggested that this region might interact with negative regulators, but also state that because of its size it probably has more than one function (4).  Region III is a “short (18 amino acid), but almost completely conserved, region located within this C-terminal area” (3).  Salmerom et al., suggest that this must be another functional domain because little homology is seen surrounding it, implying these 18 amino acids were conserved for a specific reason (3).   They however did not state what function they thought would be carried out by the sequence.  Another study suggested other functions that are most likely contained within one of the conserved regions: repression, subunit interation, and transcriptional activation (4).   It is possible that these functions also correspond to one of the regions described.
    In the end, the two proteins that contain much similarity do have many differences.  Due to the conservation of certain sequences, it is expected that these contain critical functional domains of the proteins (3-4). 

Significance of This Orthology in Other Species

    The lack of Gal4 orthologs in other species, specifically Drosophila and mammals, provide many useful tools.  As probably one of the most widely characterized eukaryotic transcription factors, many tool using Gal4 have been developed (5).  It is often used to identify protein-DNA and protein-protein interactions using yeast one-hybrid and yeast two-hybrid screens respectively.  Further, when transfected into cells other than yeast, the properties of Gal4 have allowed researchers to identify various transcriptional activators among other things (5).  Gal4 has become a very useful tool.


(1)  Jeffries TW, Grigoriev IV, Grimwood J, Laplaza JM, Aerts A, Salamov A, Schmutz J, Lindquist E, Dehal P, Shapiro H, Jin Y, Passoth V, Richardson PM. Genome sequence of the lignocellulose-bioconverting and xylose-fermenting yeast pichia stipitis. Nat Biotechnol 2007 MAR;25(3):319-26.
(2)  Rubio-Texeira M. A comparative analysis of the GAL genetic switch between not-so-distant cousins: Saccharomyces cerevisiae versus kluyveromyces lactis. FEMS Yeast Res 2005 DEC;5(12):1115-28.
(3)  Salmeron JM, Johnston SA. Analysis of the kluyveromyces-lactis positive regulatory gene Lac9 reveals functional homology to, but sequence divergence from, the saccharomyces-cerevisiae Gal4 gene. Nucleic Acids Res 1986 OCT 10;14(19):7767-81.
(4)  Wray LV, Witte MM, Dickson RC, Riley MI. Characterization of a positive regulatory gene, Lac9, that controls induction of the lactose-galactose regulon of kluyveromyces-lactis - structural and functional-relationships to Gal4 of saccharomyces-cerevisiae. Mol Cell Biol 1987 MAR;7(3):1111-21.
(5)  Sadowski I. Uses for GAL4 expression in mammalian cells. Genet Eng (NY) 1995 1995;17:119-48.

Matt's Homepage

Molecular Biology Homepage

Davidson College Homepage

Please direct questions or comments to my email masurdel@davidson.edu