Mutant TSG101 May Cause Cancer Especially Breast

Erin Zwack

This web page was produced as an assignment for an undergraduate course at Davidson College.

Mutant TSG101 May Cause Cancer Especially Breast

        As aberrant transcripts from single nucleotide polymorphisms (SNPs) in Homo sapiens (human) Tumor Susceptibility Gene 101 (TSG101) as well as abnormal transcriptional expression of TSG101 are found in cancerous but not healthy breast tissue, TSG101 mutations have been linked to breast cancer.  For the Fall 2007 Genomics Laboratory Class at Davidson College, I set out to analyze the genomic sequence, expression data, and protein interactions of TSG101 to develop a model of the breast cancer phenotype.  Sequence analysis of TSG101 suggests that TSG101 is regulated by TFAP2A and produces a cytoplasmic protein responsible for transport and cell division regulation.   Yeast, whose TSG101 ortholog contains TSG101’s UEV domain, microarray data suggested that STP22 is transcribed more during stationary phase and repressed by MBP1 or NRG1; therefore, TSG101 could also be induced during stationary phase and repressed by similar proteins.  Using this information, I designed a preliminary model: TSG101 transcription is repressed by TFAP2A, a MBP1-like protein, promoted by the TATA box, and results in normal cell function. The Human Interactome Map provided proteins, especially androgen receptor, glucocorticoid receptor, and p300, that clarify the black boxes of protein function.  These three regulate transcription and need to be transported to the nucleus. TSG101 probably transports these proteins.  With the normal model more complete, truncated TSG101, caused by a guanine insertion, was introduced to the disease model.  Truncated TSG101 does not transport these proteins to the nucleus; therefore, the genes regulated by them are not transcribed enough. Cancer would likely develop in more than just breast tissue.  GEO microarray data supports TSG101 causing other cancers.  The glucocorticoid receptor may cause the higher occurrence in breast tissue.  As no alternative transport or transcription pathways have been identified, gene therapy is the best treatment.  Designing a model for disease pathways helps to find a novel treatment.

TSG101 Genomic Analysis

Genomic Analysis Summary:

        As aberrant transcripts from insertions, deletions, non-synonymous mutations, and abnormal splicing of the Homo sapiens’ (human) Tumor Susceptibility Gene 101 (TSG101) have been found in cancerous but not healthy breast tissue, mutations in TSG101 have been linked to breast cancer (Li 1997, Steiner 1997, OMIM).  As part of the Fall 2007 Genomics Laboratory Class at Davidson College, I identified single nucleotide polymorphisms (SNP’s) using the Ensemble Genome browser and compared their locations to the secondary structure generated for those regions by PREDATOR and the locations of conserved domains in the protein as documented by NCBI.  I also analyzed the normal structure and function of TSG101 by identifying transcription factor binding motifs with JASPAR, cellular location of the protein through Kyte-Doolittle Hydropathy plots, and critical amino acids by aligning TSG101 to orthologues with ClustalW.  Determining normal function helped identify how mutations could result in breast cancer.  My analysis revealed several mutations that could result in cancer as well as a transcriptional motif whose transcription factor is up-regulated in breast cancers.  One SNP changes a phenylalanine to a leucine in a conserved domain related to regulation of cell division.  Phenylalanine contains an aromatic ring that provides stability while leucine does not (The Biology Project Biochemistry ).  The loss of stability could have a significant effect on protein function especially since this region’s secondary structure is not predicted to be one of the typical designations. The other SNP of interest is an insertion that results in the truncation of the protein because of a frameshift. This SNP might be the cause of the abberant transcripts that have been found in breast cancers.   These results support breast cancer being linked to mutant TSG101. Wet lab experiments based on these results could determine the actual cause of the breast cancers.  

Analysis of TSG101:

        The first step in understanding how TSG101 links to breast cancer is to understand what motif regulates TSG101.  Using Jaspar, I discovered that TSG101 has a TFAP2A motif (threshold >= 95%), which ends 39 base pairs before TSG101’s start codon.  TFAP2A has been shown to be upregulated in many breast cancers (Aqeilan 2004).  It also has a TATA box motif (80% >= threshold < 95%), which ends 365 base pairs before the start codon.  Further research suggested that TFAP2A acts much like the yeast transcriptional repressor MBP1 and most likely causes repression of TSG101 while the TATA box promotes transcription.

        Upon gaining knowledge of when the protein would be generated, I generated a Kyte-doolittle hydropathy plot (Protein Hydrophobicity Plots 2007) to determine where in the cell the protein functions.  The plot suggested that TSG101 functions in the cytoplasm of the cell or outside of the cell but is not a transmembrane protein (Figure 1).


        Once a general location for where TSG101's protein was found, I searched TSG101 for conserved domains. Based on the typical functions related to these domains and the general location of the protein, a basic understanding of the protein's function developed. The Conserved Domain database at NCBI (2007) revealed that TSG101 had a UEV domain and other domains related to the UEV domain (figure 2). These domains are responsible for ubiquitin conjugation. The UEV domain in particular normally has some relation to protein transport. As the protein is most likely located in the cytoplasm, TSG101 could function in that capacity. TSG101 also has several partial conserved domains that are related to cell division, it is possible that TSG101 also plays some role in cell division (figure 2). Both potential functions could lead to cancer if TSG101 malfunctioned either by not allowing the appropriate molecules to be transported from the cytoplasm to the nucleus or by allowing uncontrolled growth.

        Using ClustalW (2007), the orthologs for TSG101 in Pan troglodytes, Mus musculus, Drosophila melanogaster, Dan rerio, and Saccharomyces cerevisiae were aligned with the Homo sapiens TSG101 protein. The most amino acids that were conserved in all species were located in the regions of TSG101 that contained the conserved domains (figure 2). As yeast (Saccharomyces cereviseae) are comparatively easy to manipulate, experiments on yeast could lead to greater understanding of TSG101; therefore, a conserved domain search was run on the yeast ortholog. The search revealed that yeast also contain a UEV domain (figure 2). As both species contain this domain, experiments in yeast might explain its importance in humans.

        PREDATOR (2007) predicted that TSG101 protein contains alpha helices, random coiling, and extended strands. Most of the alpha helical structure is in the conserved domains relating to cell division. The rest of the alpha helical structure is in the conserved domain for ubiquitin conjugation, which is mainly random coiling.

        Now that a general picture of location, sturcture, and function for the normal form of the protein is established, identifying single nucleotide polymorphisms allows for the identification of mutations that potentially cause the abberant transcripts found in breast cancer. Through ensembl, two SNPs of interest have been identified (ensembl 2007). One SNP, an insertion a glycine, causes a frameshift beginning at amino acid 94 and introduces a stop codon at amino acid 97. The second SNP, a non-synonymous coding change of an A to a G, results in the substitution of a leucine for a phenylalanine at amino acid 343 (figure 3).

         The reduction of stability with the change from a phenylalanine to a leucine could cuase a structural change in that area (The Biology Project Biochemistry 2007). As it is at a junction of two alpha helices in a conserved domain, this change could be disrupting a sensitive functional structure. The result could be a less efficient, more efficient, or malfunctioning protein. A change in function concerning cell division could cause cancer.

         The truncation of a protein to such a severe extent as caused by this frameshift would not allow the protein to function. Complete loss of normal function of a protein that transports other proteins and is related to cell division could cause cancer.


        Sequence analysis suggests that TSG101 is a cytoplasmic protein that transports proteins and helps regulate cell division. The components of the protein that provide its function suggest that a SNP that changes an amino acid in a conserved domain and a SNP that cause the truncation of the protein could be associated with cancer. The identification of TFAP2A upstream of the coding region suggests a regulatory mechanism that is often seen upregulated in breast cancer. Through simple analyis of the sequence, a preliminary link with breast cancer is found.


Aqeilan RI, Palamarchuk A, Weigel RJ, Herrero JJ, Pekarsky Y, Croce CM: Physical and Functional Interactions Between Wwox Tumor Suppressor Protein and AP-2gamma Transcription Factor [Abstract]. Cancer Res. 2004, 64(22):8256-61.

The Biology Project Biochemistry Phenylalanine F (Phe)

The Biology Project Biochemistry Leucine L (Leu)

Conserved Domains


Ensembl Human GeneSNPView;gene=ENSG00000074319

Jaspar Database                                      

Online Mendelian Inheritance in Man

Li, L.; Li, X.; Francke, U.; Cohen, S. N: The TSG101 tumor susceptibility gene is located in chromosome 11 band p15 and is mutated in human breast cancer [abstract]. Cell 1997, 88: 143-154.

Protein Hydrophobicity Plots

Steiner, P.; Barnes, D. M.; Harris, W. H.; Weinberg, R. A: Absence of rearrangements in the tumour susceptibility gene TSG101 in human breast cancer.  (Letter) Nature Genet. 1997, 16: 332-333.




Microarray Analysis of TSG101


Expression Analysis Summary:

        In the last section , TSG101’s gene sequence was analyzed, and single nucleotide polymorphisms (SNPs) were found that could potentially cause breast cancer.  Two of these SNPs were in conserved domains, which significantly match the sequence of the yeast ortholog STP22. STP22 actually contains the UEV conserved domain.   As microarrays  are now being used to classify and study breast cancers in humans, further experiments in yeast that allow for environmental manipulation may provide better understanding of gene regulation, gene function, and treatment effects.  Using MagicTool to generate expression and cluster files, I interpreted STP22’s expression profile for four Hoopes’s laboratory experiments: wild type (wt) stationary vs. wt log, snf1 deletion stationary vs. snf1 deletion log, msn deletion stationary vs. msn deletion log, and snf1 deletion stationary vs. wt stationary.  I then mined the SGD database and SPELL for expression profiles of STP22, coregulated genes ASH1 and COG8, and STP22’s potential repressors MBP1, NRG1, and NRG2.  Analysis shows that STP22 is induced when the transcription factor snf1 is knocked out and has an inverse expression to the transcriptional repressors MBP1, NRG1, and NRG2.  MBP1 and NRG1 showed a more consistent inversion of STP22’s expression profiles in the examined experiments.  These findings suggest that STP22 is repressed by a snf1 controlled transcriptional repressor, probably MBP1 or NRG1.  Further experiments using MBP1 and NRG1 knockouts could better identify STP22’s repressor.  Future microarray experiments with highly conserved orthologs of identified oncogenes can provide researchers with models of how a treatment affects those genes’ expression.

Expression Data Results:

        In humans microarrays have been employed to determine what type of cells compose the cancer and classify the tumors. Once a gene has been identified as being linked to cancer, model organisms can be used to test expression of that ortholog and identify what genes may influence or be influenced by the gene's expression. The yeast ortholog STP22 shares many of its biological processes and molecular functions with TSG101 (figure 4) (Entrez Gene 2007, SGD 2007). It also contains the same UEV conserved domain (Conserved Domains NCBI 2007). These features of STP22 combined with the quick generation time and ease of manipulation of yeast make yeast a good model organsim for trying to better understand TSG101.

        Hoope's data first provided a general description of STP22 expression. STP22 tends to be slightly more expressed during stationary phase than during log phase (figure 5). This could be because it is not needed until the proteins it transports have already been produced, which would most likely occur during log phase. As induction is slightly higher in stationary phase than in log phase, further comparisons between stationary phase of a mutation and stationary phase of wild type were made. When the transcription factor snf1 was deleted, STP22 showed a high induction compared to wild type. As snf1 is a transcription factor, an induction in its absence suggests that a gene regulated by snf1 represses STP22. Three transcriptional repressors showed repression when snf1 was deleted (MBP1, NRG1, and NRG2); therefore, these three were chosen to be explored more thoroughly to determine whether or not they were responsible for repressing STP22. Genes that were co-regulated with STP22 under these conditions were also identified in case they continued to be coregulated and could provide more information on how exactly STP22 and potentially TSG101 work. In the Hoope's data, two known genes (COG8 and ASH1) were coregulated with STP22 (figure 5).

           To see if repression occurs through these repressors, I looked at time courses that were related to STP22’s function (unfolded protein response) or had a highly repressed point for STP22 (response to arsenic) (SGD 2007).

           In response to arsenic, STP22 was generally repressed.  Besides ASH1, all genes have correlations of < 0.5 to STP22.  The potential inverse correlations with repressors continue to suggest repression of STP22 by NRG1, NRG2, or MBP1.  The high correlation with ASH1 (>= 0.8), a transcription factor, could suggest that ASH1 is a transcription factor for STP22 (figure 6).

           Part of STP22’s function in yeast is to transport misfolded protein to be destroyed.  The repressor NRG2 has a correlation of >= 0.5 to STP22, which suggests that NRG2 is not repressing STP22.  MBP1 and NRG1 are still basically showing an inverse expression to STP22.  The slight induction at the later time points could have no effect on STP22 at that point because STP22 is so highly induced by another factor that the small induction of repressor does not have a detectable effect (figure 6).  Finally as ASH1 has a correlation of < 0.5, it most likely is not the transcription factor regulatin STP22 (figure 6).

Future Experiments :

           While the microarray data available has provided some insight, future research will further answer the question: Is the repressor MBP1 or NRG1 responsible for regulating STP22?  I propose a three part experiment.

           In part 1, cells with the MBP1 disruption mutant strain (TRIPLES 2007) and wild type cells will be grown under normal conditions.  The cells would be sampled at 30 minute intervals and the RNA harvested.  The MBP1 mutant would be Cye5 and wild type would be Cy3.  The controls would be a dye swap as well as a test of wild type (Cy5) vs. wild type (Cy3) and mutant (Cy5) vs. mutant (Cy3).

           In part 2, cells with the NRG1 disruption mutant strain and wild type cells will be grown under normal conditions.  The cells would be sampled at 30 minute intervals and the RNA harvested.  The NRG1 mutant would be Cye5 and wild type would be Cy3.  The controls would be a dye swap as well as a test of wild type (Cy5) vs. wild type (Cy3) and mutant (Cy5) vs. mutant (Cy3).

           In part 3, in the presence cycloheximide (0.3 mg/ml), which increases genetic mutation and thus the likelihood of misfolded protein (Cycloheximide 2007), wild type and NRG1 or MBP1 mutant cells would be grown. The cells would be sampled at 30 minute intervals and the RNA harvested.  The NRG1 or MBP1 mutant would be Cye5 and wild type would be Cy3.  The controls would be a dye swap as well as a test of wild type + cycloheximide (Cy5) vs. wild type (Cy3) and mutant + cycloheximide (Cy5) vs. mutant (Cy3).

           If STP22 is induced in the mutant compared to wild type, it would suggest that the mutant had been repressing STP22.  By looking at both mutants, it may be possible to further eliminate one of the potential repressors of STP22 as the actual repressor.

            Analysis of expression data provides several insights into STP22.  First, the transcription factor snf1 has an inverse relationship with STP22.  The inverse relationship suggests that a transcriptional repressor that is directly controlled by snf1 represses STP22.  Of the three original potential repressors as suggested by the Hoope’s data, only MBP1 and NRG1 continued to be supported by other microarray data as potential repressors of STP22.  Finally future research is needed to answer if either MBP1 or NRG1 actually represses STP22. The data suggests that TSG101 maybe transcribed more during stationary phase and repressed by a protein similar to NRG1 or MBP1.



Conserved Domains

Entrez Gene: TSG101

Hoopes Lab Microarrays

 Saccharomyces Genome Database SGD.

Stanford Breast Cancer Microarray Project



TSG101 Disease Model

    Developing a Model:

          When beginning the genomics analysis, the Online Mendelian Inheritance in Man (OMIM) database linked TSG101 to breast cancer. To begin developing a model for the disease, a model of normal function hadd to be developed first. Using the data collected through genomic analysis, a simple model took form. In this model, TSG101 is regulated by the transcription factor TFAP2A (JASPAR 2007). This regulation generates normal TSG101 expression. The protein TSG101 then acts to transport the appropriate proteins and regulate cell division; thus, the cell functions normally. Structure and function of TSG101 contributed to knowledge of what the black boxes of what the protein does to lead to normal function but does not provide any specifics for this preliminary model as it does not tell what is being transported or how it regulates cell division.

         In order to better understand the regulatory mechanism of TSG101, the microarray data for the yeast ortholog was incorporated. While TSG101 has a yeast ortholog, none of the repressor genes in yeast had a human ortholog. I then looked up the function of the repressors to see if they had a functional equivalent in humans. The repressor MBP1, is both a transcripion factor and repressor. It is involved in the progression of the cell cycle expecially G and S phase (SGD 2007). This function is very similar to the transcription factor TFAP2A, which controls progression from G to S phase in humans (Li 2006). This information led to the modification of the model to show specifically that TFAP2A represses TSG101 transcription while the TATA box sequence identified in the genomic analysis most likely promotes the transcription (Li 2006).

         A search of the Human Interactome Map (2007) identified proteins that TSG101 directly interacts with and provided several targets for clarifying the black boxes. The four targets with the most potential were CDKN1A (Entrez Gene: CDKN1A 2007) , a kinase inhibitor; EP300 (EP300 2007), a transcription co-factor; AR, an androgen receptor (OMIM Androgen Receptor AR 2007); and NR3C1, a glucocorticoid receptor (Entrez Gene: NR3C1 2007). Both the androgen receptor and glucocorticoid receptor are found in the cytoplasm before they bind their ligands. Upon binding their ligands, they are transported into the nucleus where they work as transcription factors for many different genes (OMIM Androgen Receptor AR 2007, Entrez Gene: NR3C1 2007). As TSG101 functions as a protein transport protein and directly interacts with both AR and NR3C1, it most likely is responsible for transporting both of these receptors to the nucleus once they have bound to their ligand (figure 7). EP300 normally noted as p300 also is translated in the cytoplasm but works in the nucleus connecting polymerase with the transcription factors necessary to transcribe cyclins and polymerase alpha (Kohn 1999). TSG101 is most likely repsonsible for transporting p300 to the nucleus (figure 7). Finally, CDKN1A most likely binds to TSG101 and prevents a protein kinase from phosphorylating it. As a kinase was not specifically indicated in the interactome, it is possible that the correct kinase has not been found or isolated yet. In the model, we assume that the nonphosphorylated form is active but this is only a guess (figure 7).

Figure 7: The above circuit diagram models how TSG101 functions normally.

         Now that a model for normal function exists, a disease model can be generated by introducing a SNP to TSG101. Two potential cancer causing SNPs were identified in the genomics analysis. Because TSG101's activity in the normal circuit tends to be transport, the SNP that causes a frameshift at amino acid 94, which results in truncation at amino acid 97, and not the SNP in the cell division conserved domain was chosen for the model. As the SNP truncates the protein so early and even changes and cuts off the UEV domain, the most likely result is a totally nonfunctional protein (figure 8). With the protein so small it is probably not phosphorylated or protected from phosphorylation. Even if it were, TSG101 would still remain inactive; therefore, CDKN1A has been removed from the disease model (figure 8). As TSG101 no longer has the UEV domain and is truncated to the point of not functioning, the androgen receptor complex, the glucocorticoid receptor complex, and p300 are no longer being transported into the nucleus by TSG101; thus, the genes that they regulate are not being transcribed to the proper level if at all. The cyclins that are regulated by p300 are responsible for cell cycle progression (cyclin 2007); thus, irregular transcription could lead to cancer as could irregular transcription of the genes controlled by AR and NR3C1.

Figure 8: When a single guanine is added to TSG101, a frameshift starts at amino acid 94 and causes a truncated nonfunctional protein that results in cancer.

         While OMIM links TSG101 mutations to breast cancer, the circuit would suggest more cancers would also occur. Experiments in the Geo datbase at NCBI (2007) showed that TSG101 is repressed in other cancers such as gastric tubular carcinoma (GDS1792 2007). This experiment along with others in the GEO database suggest that the circuit above can cause other cancers and not just breast cancer specific. The question still remained why OMIM focused on breast cancer. Research into the glucocorticoid receptor revealed that normal function of the glucocorticoid receptor is essential to the generation and development of mammory cells and tissue. Reduction of glucocorticoid receptor has been shown to cause some types of cancers itself (Lien 2006). It is possible that the nature of the glucocorticoid receptor causes cancer from TSG101 mutations or repression to develop most often in the breast tissue. It is also possible that TSG101 simply was not considered the key factor in the other cancers and was pushed aside.

   Future Research:

         The developed model makes three assumptions that can be clarified by future research: 1) the interaction between TSG101 and AR, NR3C1, or p300 is transportation of those protein to the nucleus by TSG101, 2) the glucocorticoid receptor not being transported into the nucleus results in breast cancer being the cancer predominantly linked to TSG101, and 3) the active version of TSG101 is not phosphorylated. To check these assumptions, I have designed three experiments.

Experiment 1: To see whether TSG101 transports any of the three proteins to the nucleus from the cytoplasm. An experiment would be run for each protein where TSG101 was tagged with a fluorescent protein and the other protein (AR, NR3C1, or p300) was tagged with a different fluorescent protein. Real time video would be taken of the cells to determine if TSG101 transported the protein to the nucleus. Potential downfalls of this experiment include the chance that the addition of the the fluorescent tag to TSG101 or the other protein could change the function and behavior of the protein.

Experiment 2: To check if the glucocorticoid receptor is responsible for the link to breast cancer, transgenic mice with mutant p300 and AR and normal TSG101 and NR3C1 and transgenic mice with mutant p300, AR, and NR3C1 but normal TSG101 would be engineered. The mice would then be studied to see if they develop cancer and where the cancer develops. If the mice with the mutant NR3C1 develop breast cancer significantly more frequently, it would suggest that the failure to transport glucocorticoid receptor to the nucleus causes a slight predominance of cancer in breast tissue.

Experiment 3: Finally to check if there are two forms of TSG101 and if so which is the active form, CDKN1A knockout cells would be engineered. The phosphate would be labeled. The cells would be monitored for TSG101 activity before checking to see if TSG101 was being phosphorylated. If activity of TSG101 decreased or ceased and TSG101 was phosphorylated, the data would suggest that the nonphosphorylated form is the active form.

   Potential Treatment:

        The cancer is caused by a failure of specific proteins to be transported inside of the nucleus. If there is a redundant transport system for these proteins that normally acts only in a limited capacity, upregulating the gene that does the actually transporting could treat and potentially cure the cancer. Upregulating any alternative ways to promote regular transcription of AR complex, glucocorticoid complex, or p300 regulated genes without bringing those proteins and complexes into the nucleus would also have been a vaible option. Unfortunately, I have been able to find any of these alternatice pathways; therefore, I would have to suggest gene therapy as the treatment. The normal genotype of TSG101 would be put into an adenovirus vector. Through a series of injections the viral vector and gene would be given to the patient. Adenovirus has been chosen as it has already been approved to treat cancer (Gene Therapy 2007).



Entrez Gene: CDKN1A

Entrez Gene: NR3C1;Cmd=ShowDetailView&amp;TermToSearch=2908&amp;ordinalpos=1&amp;itool=EntrezSystem2.PEntrez.Gene.Gene_ResultsPanel.Gene_RVDocSum



Gene Therapy


Human Interactome Map

Jaspar Database                                      

Kohn, Molec. Biol. Cell 1999 Aug;10(8):2703-34

Li H, Goswami PC, and Domann FE: AP-2gamma induces p21 expression, arrests cell cycle, and inhibits tumor growth of human carcinoma cells [abstract]. Neoplasia 2006, 8(7):568-77.

Lien H-c, Lu Y-S, Cheng A-L, Chang W-C, Jeng Y-M, Kuo Y-H, Huang C-S, Chang K-J, and Yao Y-T: Differential expression of glucocorticoid receptor in human breast tissues and related neoplasms [abstract]. The Journal of Pathology 2006 209(3):317-27.

Online Mendelian Inheritance in Man

 Saccharomyces Genome Database SGD.