In my last two web assignments, I investigated the possible function of the unannotated ORF called YKR005C. In the first assignment, “My Favorite Yeast Gene,” I used the ORF’s nucleic acid sequence to develop hypotheses on its possible function. In the second assignment, “My Favorite Yeast Expression,” I used microarray data. From the data compiled during these two investigations, I hypothesized that YKR005C is an enzyme needed by yeast cells to repair telomeres during sporulation.
In this fourth assignment, I will conduct one last investigation on the possible function of ORF YKR005C. For this investigation, I will use protein databases. In specific I will search the PDB, ProteinInfo on PROWL, SWISS-2DPAGE-Viewer, DIP, MIPS, TRIPLES, and BioGRID database. In addition to these databases, I will look at the protein interaction webs generated by Schwilowsi et al (2000). As done on my previous investigation, I will begin this investigation by using DID4, an annotated gene needed for endocytosis in yeast cells and which is located near YKR005C on chromosome 11, to observe the utility of these databases and protein interaction webs in pinpointing the function of a specific gene.
In looking through these protein databases, I will be mining data on information that is just starting to be compiled. Currently, researchers are attempting to find effective, high throughput methods to identify proteins and protein interactions. Important to a protein's function are its shape, currently determined by x-ray crystallography, its location in the cell, the pathways it is involved in, the average times and quantities at which it is produced, and its interactions with other proteins. Yeast two-hybrid experiments, mTn insertions, tandem mass spectrometry, and isotope coded affinity tags are some of the new methods researchers have devised . The majority of the databases I will be using will reveal information only on interactions between proteins.
DID4: an Annotated Gene
No information was found on DID4 in the database, which contains proteins whose crystal structures have been experimentally determined.
ProtienInfo (on PROWL)
Analysis of DID4's sequence on ProteinInfo shows the protein to have a theoretical molecular weight of 26274.343 Da and a theoretical pI of 5.2.
On the SWISS-2DPAGE-Viewer, I found a 2D-PAGE gel for an S. cerevisiae proteome. I then searched the gel for a spot that may correspond to DID4. On 2D-PAGE gel, a protein’s isoelectric value and molecular weight do not always correspond to where the protein is found on the gel. The protein may be phosphorylated or have glycoproteins attached which may alter its isoelectric point, and (less likely) its molecular weight. For this reason, rather than look only at the spot with coordinates 26.3 kD and pH 5.2, I looked for the presence of a spot around these coordinates (Fig 1). Pink crosses, spots where the protein has been identified, inside the red region of figure 1 were not DID4 but rather genes TSA1, CYPD, TPIS, VATE, nd PMM.
Figure 1. 2D-Page gel of S. cerevisiae proteome. Area around 26.kD and pH 5.2 is highlighted in a red rectangle. (SWISS-2DPAGE Viewer, 2006; http://ca.expasy.org/swiss-2dpage/viewer?map=YEAST&ac=all). Accessed 11 Nov 2006.
The MIPS database shows DID4 to interact with 41 other proteins. These interactions were discovered using high throughput methods. No other methods were used to verify if these interactions truly occur in yeast cells. These 41 proteins are: YGR035c, YCL049c, YER121w, ADK2, PHO88, RTT105, PAU2, STD1, YER084w, MOB2, VMA7, YPS5, ERG28, YDR063w, MRS5, HVG1, YER079w, YDR290w, RPP1B, YDL162c, DAD1, TIR1, JSN1, SPC24, YBR197c, HUA2, YEL074w, YPT31, APG17, ATP14, YEL057c, ADE8, IES5, YGL258w, AME1, PCL2, YEL068c, YBR242w, CDC36, YBR262c, and YLR108c (MIPS, 2006; http://mips.gsf.de/genre/proj/yeast/searchEntryAction.do?text=YKL002w&db=). Accessed 11 Nov 2006.
The large number of proteins that interact with DID4 can be visualized on the DIP database (Fig 2). Haphazardly clicking on the orange nodes, I found that these nodes represent proteins DAD1, MRS5, and CDC36. These three proteins are included in the 41 proteins listed on the MIPS database that interact with DID4. To learn about the functions of these three proteins, I search the SGD. Doing so I learned that DAD1 is a structural constituent of the cytoskeleton that is aids in chromosome segregation and is found in the condensed nuclear chromosome kinetochore (SGD, 2006; http://db.yeastgenome.org/cgi-bin/locus.pl?locus=DAD1). In addition, MRS5 plays an essential role in transporting protein into the mitochondrial inner membrane and is found in the mitochornidal inner membrane protein space (SDG, 2006; http://db.yeastgenome.org/cgi-bin/locus.pl?dbid=S000000295). CDC36 is a basal transcription factor that regulates mRNA levels through transcription regulation and mRNA destabilization by deadenlyation. It is found in both the cytoplasm and the nucleus (SDG, 2006; http://db.yeastgenome.org/cgi-bin/locus.pl?dbid=S000000295).
Figure 2. Image of protein interactions with DID4 (shown as a red node) as displayed on DIP. Note the number proteins (orange nodes) that interact with DID4 and the number of proteins (yellow nodes) that interact with these proteins (DIP, http://dip.doe-mbi.ucla.edu/dip/DIPview.cgi?PK=1916). Accessed 11 Nov 2006.
No information was found on DID4 in TRIPLES, a database that stores information on data from all mTn insertion insertions on S. cerevisiae.
BioGrid shows 40 proteins to interact with DID4. These interactions were found using affinity capture-western, yeast two-hybrid, and synthetic growth defect experiments. (BioGrid, 2006; http://www.thebiogrid.org/SearchResults/summary/34131)
Yeast Resource Center (YCR)
This database shows data from two-hybrid (Y2H) experiments. Only 27 proteins (all used as bait) are shown to interact with DID4 (Fig 3). None of these proteins are found to interact with DID4 on MIPS. In addition, DID4 is never used as bait. (YCR, 2006; http://www.yeastrc.org/pdr/viewProtein.do?id=531790&showDescriptions=false&showSingles=true).
Figure 3. Yeast two Hybrid Data for DID4. Note DID4 only appeared as bait, and that DAD1, MRS5, and CDC36 are not included in this list of 27 proteins. (SDG, 2006; http://www.yeastrc.org/pdr/viewProtein.do?id=531790&showDescriptions=false&showSingles=true). Accessed 11 Nov 2006.
Schwilowsi et al, 2000:
I could not find DID4 or one of its aliases (GRD7, REN1, VPL2, VPT14, VPS2, or CHM2) on the degradation, kinase tree, membrane or aging protein interaction webs
From the 2D-PAGE gel, I see that I will not always be able to find a spot corresponding to the molecular weight and theoretical isoelectric point given on PROWL's ProteinInfo database. Perhaps one of the unanottated spots inside the red rectangle of figure 1 is in fact DID4. It may also be that DID4 is not found on the gel because the protein, as well as many others, was lost during protein isolation.
MIPS, DIP and BioGRID all seem to provide the same information, but use different ways of displaying this data. As seen by their functions, DAD1, MRS5, and CDC26 do not easily reveal the two known roles of DID4. Perhaps DID4 has other functions still unknown. Because of the high throughput methods used to detect protein interactions, however, it may also be that not all the proteins found to interact with DID4 because of temporal or spatial, or isolation or because these proteins normally have a different conformation do to allosteric modulations or phosphoriylations, or because the environment in which the protein interactions are detected, such as the nucleus for yeast 2-hybird experiments, change the true form of certain molecules.
Interestingly, the YRC, which considers only data from participating laboratories, shows different proteins to interact with DID4. In addition, not all this data is necessarily correct because some fo the data have not been published or peer reviewed. In fact none of the proteins found in YCR's list are found in MIPS, DIP or BioGRID. The yeast resource center database (YRC) displays different and fewer proteins to interact with DID4. In fact, none of the 17 proteins listed in YRC are listed in the other databases. Perhaps this is because all interactions are unpublished data. According to YRC, these data, though unpublished, may be useful because this set of data include data "produced during the refinement of protocols to be used in a published article," and "'extra' data for which there was not room in the publication, or data that was not directly relevant to the paper published" (YRC, 2006; http://www.yeastrc.org/pdr/pages/unpublished.js).
Databases with no results
The PDB, though it grows daily, contains only a small number of proteins. I am therefore not surprised that DID4 was not found on the database.
The TRIPLES database contains data on mTn insertion points. A specifically engineered transposon, mTn insertions randomly insert into the yeast genome. Because of this random quality, it is possible that mTn never inserted into DID4.
It seems strange that DID4, a protein involved in endocytosis, is not found in the membrane protein interaction web. Perhaps it truly does not interact with the membrane, or perhaps it is missing Schwilowsi et al's protein interaction web.
The proteomic information obtained on these various websites do not easily point to DID4's functions in the cell. In addition, I need to remember that unpublished data found on the YRC may be different than that displayed on other databases and may be products of poor procedures. Finally it should take note that in the future these same daabases may provide better and more information as well as improved ways to visualize the information provided.
YKR005C: a Putative Gene
No information was found on YKR005C in the database.
ProtienInfo (on PROWL)
Analysis of DID4's sequence on ProteinInfo show the protein to have a theoretical molecular weight of 50307.980 Da and a theoretical pI of 5.4.
On the SWISS-2DPAGE-Viewer, I found the sample 2D-PAGE gel for an S. cerevisiae proteome. Close to these coordinates, I found a faint spot (Fig 4)
Figure 4. 2D-Page gel of S. cerevisiae proteome. Molecular Weight shown on the y-axis, and pH shown on the X-axis. Area around 50.3.kD and pH 5.4 is highlighted in a red rectangle. Note the spot in side the red rectangle (SWISS-2DPAGE Viewer, 2006; http://ca.expasy.org/swiss-2dpage/viewer?map=YEAST&ac=all). Accessed 11 Nov 2006.
Only one protein, VMA6 is shown to interact with YKR005C. This interaction was detected only by high throughput methods. In addition the database shows that many other proteins interact with VMA6. (MIPS, 2006, http://mips.gsf.de/genre/proj/yeast/searchEntryAction.do?text=YKR005c&db=).
As in the MIPS database, only VMA6 is shown to interact with YKR005C (Fig 5). Again, protein VMA6 is shown to interact with many other proteins (Fig 6)
Figure 5. Image of protein interactions with YKR005C (shown as a red node) as displayed on DIP. Note that there is only one other protein, VMA6 (orange node), that interacts with YKR005C. (DIP, http://dip.doe-mbi.ucla.edu/dip/DIPview.cgi?PK=4606). Accessed 11 Nov 2006.
Figure 6. Image of protein interactions with VMA6 (shown as a red node) as displayed on DIP. Note the complex protein interactions that take place between VMA6, the protein that it physically interacts with (orange nodes), and the other proteins that these proteins interact with (yellow nodes). (DIP, http://dip.doe-mbi.ucla.edu/dip/DIPview.cgi?PK=1737). Accessed 11 Nov 2006.
Searching for VMA6 on the SGD, I learned that VMA6 is a component of the V0 integral membrane domain of vacuolar H+-ATPase, an electrogenic proton pump found in the endomembrane system. In addition to stabilizing the V0 subunits, VMA6 is required for V1 domain assembly on the vacuolar membrane. Its molecular function is hydrogen-transporting ATPase activity through a rotational mechanism. Its biological process is vacuolar acidification and vacuolar transport, and its cellular competent s is the vacuolar membrane, specifically the hydrogen-transporting ATPase V0 domain. In addition, when systematic delegations cause mutations in VMA6, the yeast cells is viable, but displays growth defects on non-fermentable carbon sources, reduced fitness in rich medium, increased glycogen accumulation, and sensitivity to media buffered at neutral pH or media containing 100mM of calcium (SDG, 2006; http://db.yeastgenome.org/cgi-bin/locus.pl?locus=VMA6).
No information was found on YKR005C in the database.
No new information is shown in the database. The display of information, however, is excellent. From the database I can easily learn about the function of VMA6 (Fig 7). Clicking on the VMA6 link, the database also provides a clear way of displaying number (104 in all) and names all the possible proteins with which VMA6 interacts. This large group of proteins includes: TFP1, SNF7, and CNB1. These three proteins have some interesting roles (Table 1). TFP1, for example, is the vacuolar ATPase V1 domain subunit A. It contains catalytic nucleotide binding sites and undergoes self-catalyzed splicing to yield two smaller proteins, one of which is a site-specific endonuclease. SNF7 is one of four subunits of the ESCRTIII (endosomal sorting complex required for transport III). It helps the cell sort transmembrane proteins into the multivesicular body pathway, and is recruited from the cytoplasm to the endosomal membranes. CNB1 encodes calcineurin B, the regulatory subunit of calcineurin, a calcium/calmodulin-regulated protein phosphatase (BioGRID, 2006; http://www.thebiogrid.org/SearchResults/summary/31705).
Figure 7. Display of VMA6, on result page of BioGRID for query of proteins that interact with YKR005C. Note how much information is provided in the screen ((BioGRID, 2006;http://www.thebiogrid.org/SearchResults/summary/3413). Accessed 11 Nov 2006.
Protein Role in Cell
Vacuolar ATPase V1 domain subunit A containing the catalytic nucleotide binding sites; protein precursor undergoes self-catalyzed splicing to yield the extein Tfp1p and the intein Vde (PI-SceI), which is a site-specific endonuclease SNF7 One of four subunits of the endosomal sorting complex required for transport III (ESCRT-III); involved in the sorting of transmembrane proteins into the multivesicular body (MVB) pathway; recruited from the cytoplasm to endosomal membranes CNB1 Calcineurin B; the regulatory subunit of calcineurin, a Ca++/calmodulin-regulated protein phosphatase which regulates Crz1p (a stress-response transcription factor), the other calcineurin subunit is encoded by CNA1 and/or CMP1
Table 1. Three proteins, TFP1, SNF7 and CNB1, detected through high throughput methods to interact with VMA6 and their functions. (BioGRID, 2006; http://www.thebiogrid.org/SearchResults/summary/31705). Accessed 11 Nov 2006.
Yeast Resource Center (YRC)
Interestingly, a search on YRC, showed YKR005C to interact with CDC13. (Fig 8). Searching on the SGD, I find that CDC13 is a single stranded DNA-binding protein found at TG1-3 telomere G-tails that caps telomeres and regulates their replication through the recruitment of specific sub-complexes. Involved in the cell division cycle, mutations in this CDC13 result in inviable cells and arrest in G2. In addition, inactivation of the gene results in abnormally telomeres and activation of the DNA damage checkpoint. (SGD, 2006; http://db.yeastgenome.org/cgi-bin/locus.pl?locus=YDL220C). Searching on the BioGRID, I find that there are 46 proteins that interact with CDC13 (BioGRID, 2006; http://www.thebiogrid.org/SearchResults/summary/31825). Of these 46, most are involved in telomeric maintenance.
Figure 8. Information obtained concerning proteins that interact with CDC13. (Y(YCR, 2006; http://www.yeastrc.org/pdr/viewProtein.do?id=532025&showDescriptions=false&showSingles=true). Accessed 11 Nov 2006.
Figure 9. Information about CDC13 as shown on SDG. (2006; http://db.yeastgenome.org/cgi-bin/locus.pl?locus=YDL220C). Accessed 11 Nov 2006.
Schwilowsi et al, 2000:
YKR005C was not found on degradation, kinase tree, membrane or aging protein interaction webs.
Perhaps the spot found inside the red square in figure 4 contains YKR005C protein, or perhaps it is found in another spot close to this spot. As a caveat, even if YKR005C is not found to be on the 2D gel, this data alone does not prove that YKR005C does not encode protein. The 2D gel shown shows only the average proteome of a population of cells under a particular condition. Perhaps the conditions in which these cells were grown are not conducive to YKR005C transcription or perhaps YKR005C is produced at such low levels that it would be hard to isolate enough YKR005C to isolate on a gel. Also, as explained in DID4's analysis, YKR005C may be a hard protein to isolate. Finally, even if YKR005C does not encode a protein, its transcribed mRNA may play an important role in the cell.
MIPS, DIP and BioGRID all show YKR005C to interact only with VMA6. If this interaction truly occurs in yeast cells, it could explain why YKR005C is only found in S. cerevisiae. Used for ages by humans to ferment bear and leaven bread, it could be possible that over time yeast have evolved to have a preference for fermentation and be less efficient in production energy through respiration. Yeast cannot desperate efficiently, may be better at fermenting dough or malted barley. Perhaps YKR005C is a competitive inhibitor of VMA6. Because so many other proteins interact with VMA6, however, YKR005C can never completely prevent VMA6 from functioning. As a caveat, VMA6 was found to interact with YKR005C in a yeast two-hybrid experiment. This type of experiment looks at the protein interactions between two individual cells. Naturally, however, VMA6 functions as part of a larger complex of proteins. In this larger complex, the shape of VMA6 may be changed sufficiently so that YKR005C cannot bind to VMA6. Thus though this experiment may be accurate, it is very probable that YKR005C and VMA6 do not normally interact with each other.
On the YRC database shows YKR005C to interact with CDC13. I do not know how reliable this information is, however, because these data have never been published. If YKR005C does truly interact with CDC13, however, my hypothesis that YKR005C is involved in telomeric maintenance is greatly strengthened. Data from BioGRID showing that CDC13 interacts mainly with proteins involved in telomeric maintenance is further confirmation.
Putting it all together
From my first investigation into YKR005C, I found that YKR005C may contain one hydrophobic region, has a nucleic acid sequence unique to and highly conserved in S. cerevisiae, and may have a secondary structure consisting solely of alpha helixes (My Favorite Yeast Gene). In my second investigation, I learned that its expression is repressed during sporulation and induced during depletion of histone H4, and that its expression is normally clustered with genes involved in telomeric maintenance, or processes normally clustered with DID4 (My Favorite Yeast Expression). In looking at the thumbnail displays of YKR005C expression in various microarray experiments on the SGD, I see that YKR005C was also induced during investigations of ploiddy regulation of gene expression, of the response to creating a single unrepaired DSB by HO endonuclease in nocodazole-arrested cells, of expression during the cell cycle (SDG, 2006; http://db.yeastgenome.org/cgi-bin/expression/expressionConnection.pl?type=summary&dbid=S000001713#histone). From this final investigation, I find that YKR005C may interact with either VMA6, a protein involved in vacuolar respiration, or CDC13, a protein involved in telomeric maintenance.
*If YKR005C does interact with CDC13, I maintain that YKR005C functions in telomeric maintenance.
*If, as is more likely the case, YKR005C does not interact with CDC13, I believe it interacts with a protein complex rather than a single protein, and may even affect a yeast cell's ability to produce energy.
First I would like to perform a yeast two-hybrid experiment using CDC13 as bait and YKR005C as prey to see if the information found on YRC is reproducible. In addition, would like to perform two other yeast two-hybrid experiments, this time using YKR005C as bait and VMA6 or CDC13 as prey. I would like to perform these experiments because they seem relatively simple tasks to perform and would help verify if the YKR005C can truly interact with VMA6 or CDC13.
Because, I do not feel confident in my final hypotheses, I would like to perform a new experiment where the hypothesized promoter region of the gene (including any proceeding junk DNA, so long as this DNA sequence does not contain an ORF) would be attached to the beginning of the nucleic sequence of GFP. I would then like to place this construct into high frequency plasmid, and inserted into yeast cells. I would then submit these cells to conditions where YKR005C was found to be induced in microarray experiments and take various samples to see if I could detect GFP. My reason for conducting this experiment is that I would like to corroborate information from the microarray data. In specific, I would like to confirm that YKR005C really is transcribed in yeast cells.
My final experiment would investigate if YKR005C is involved in telomeric maintenance. Because this function seems important for all organisms I would expect to see genes in other species that have similar sequences to YKR005C. Perhaps, however, YKR005C evolved as a redundant means to control telomeric maintenance. If this is the case, then only by deleting other genes controlling telomeric maintenance could one detect the function of YKR005C. Therefore I would perform microarray experiments to test for differences in YKR005C’s transcription level when known genes involved in telomeric maintenance are knocked out of yeast cells.
BioGRID database. 2006. BioGRID home page. <http://www.thebiogrid.org>. Accessed 11 Nov 2006.
[DIP] Database of Interacting Proteins. 2006. Node search. <http://dip.doe-mbi.ucla.edu/dip/Search.cgi?SM=3>. Accessed 11 Nov 2006.
[PDB] Protein Data Bank. 2006. PDB home page. <http://www.rcsb.org/pdb/index.html>. Accessed 2006 Nov 11.
PROWL. 2006. ProteinInfo home page. <http://prowl.rockefeller.edu/prowl/proteininfo.html>. Accessed 11 November 2006.
[MIPS] Munich Information Center for Protein Sequences. 2006. Fungal genome analysis. <http://mips.gsf.de/projects/fungi>. Accessed 11 Nov 2006.
Schwikowsi, B. P. Uetz, & S. Fields. 2000. A network of protein-protein interactions in yeast. Nature Biotechnology. 18: 1257-1261.
Kinase tree: http://media.pearsoncmg.com/bc/bc_campbell_genomics_2/medialib/data/kinasetree.pdf. All accessed 11 Nov 2006.
[SGD] Saccharomyces Gene Database. 2006. SGD home page. <http://www.yeastgenome.org/>. Accessed 11 November 2006.
Swiss-2D Page Map Selection. 2006. Swiss 2DPAGE. <http://ca.expasy.org/swiss-2dpage/viewer>. Accessed 11 Nov 2006.
TRIPLES Database. 2006. TRIPLES home page. <http://ygac.med.yale.edu/triples/default.htm>. Accessed 11 Nov 2006.
[YRC] Yeast Resource Center Two-Hybrid Analysis. 2000. An introduction to Yeast Two-Hybird. <http://depts.washington.edu/~yeastrc/pages/y2h.html>. Accessed 11 Nov 2006.
Genomics Web Page