*This website was produced as an assignment for an undergraduate course at Davidson College.*

Sequence Homology in gp120

Introduction and Background:

The AIDS pandemic is arguably the most critical health crisis facing the world today. Greater knowledge of the structure of HIV glycoprotein 120 (gp120) can perhaps elucidate the mechanism the virus uses to bind to human lymphocytes. While HIV is a retrovirus, gp120 has no sequence homology with other retroviruses' envelope proteins. According to the authors who deduced the structure of gp120, the protein's sequence "has no precedant" (Kwong, 1998). Because the protein sequence is novel, there are no orthologous protein sequences in other viruses or organisms. However, because of the rapid viral mutation rate, the sequence of gp120 varies widely among many different immunodeficiency virus isolates. However, some residue sequences are more conserved than others. The inner loop of the protein contains many more conserved residues than the outer loop. There are two primary strains of HIV: HIV-1 and HIV-2. HIV-1 is the primary cause of the HIV pandemic. HIV-2 is found primarily in West Africa and European countries that border the Mediterranean and is both much slower at destroying the immune system and is not transmitted as effectively (MDH). The sequence of HIV-2 bears a 35% resemblance to the HXBc2 isolate. However, the inner loop of HIV-2 has a 45% identity with HXBc2.

 

Figure 1. Sequence homology of immunodeficiency viral envelope proteins. Sequences are taken from HIV-1 (clades C, O, and B), HIV-2, and SIV. Residues marked with an asterisk are in contact with CD4. Arrows and cylinders correspond with secondary structure (alpha helices and beta sheets) of folded protein. Hash marks underneath residues indicate sequence variability: "1, residues conserved among all primate immunodeficiency viruses; 2, conserved among all HIV-1 isolates; 3, moderate variation among HIV-1 isolates; and 4, significant variability among HIV-1 isolates." Red boxed residue chains indicate notably conserved regions of the protein sequence. (Adapted from Kwong et al, 1998. Permission granted from the author. Modified by Will Greendyke).

 

Therefore, rather than searching for sequences of homology among other genomes, it is perhaps best to examine sequences of divergence and conservation among HIV isolates in order to determine which segments of the protein seem to be necessary for protein function and structure. Sequence from Kwong et al. was used as an initial query in NCBI BLAST protein/protein searches.

Results:

Initial BLAST results returned 624 hits, with e values ranging from 0 to e-120. All results were from HIV-1 and SIV strains. The e value indicates the probability of these results occurring as the result of random chance. These probabilities ranged from none to 1 x 10-120. These hits showed nearly total identity with the query for all values returned, indicating very little variation between the query and these 624 potential matches. Therefore, it was impossible to distinguish between which sequences were conserved and which were not.

Figure 2. Graphical representation of initial NCBI BLAST result queries using amino acid sequence from Kwong, et al (1998). Red bars indicate score received from database. Expected value scores ranged from 0 to 1 x 10-120. Note the large identity between the query and the returned results; this indicates that there is very little variation among these immunodeficiency isolates.

In order to select viral strains that did not display total identity, the protein sequence of HIV-1 gp120 was compared to proteins from the Simian Immunodeficiency Virus (SIV) using NCBI's BLAST. This time, the e values of the returned results varied much more widely. BLAST results indicated some regions of strong identity between strains that widely differed on identity score compared to orignal query. Essentially, this means that as the sequences themselves begin to diverge evolutionarily, some sequences remain conserved. Of particular note are regions at residues 7-17, consisting of residues KPCVKLTPLC and 107-111 consisting of residues YCAP. However, also of importance are many conserved cysteine regions among all queries. Click here to see screenshots taken from SIV BLAST results page.

Next, HIV-1 gp120's sequence was checked against the genome of HIV-2. This genome returned fewer hits displying orthology than either the full database or SIV by itself. Therefore, more divergence was seen compared to the envelope proteins of other strains; this lesser identity suggests that HIV-2 might be evolutionary more distantly related to HIV-1 than SIV. Both the YCAP sequence and the KLTPLCV sequences seen in HIV-1 and SIV were often conserved; however, the YCAP sequence disappeared as e values began to approach 0.001. The KLTPLCV sequence was readily conserved among all strains. Click here to go to screenshots from HIV-2 BLAST results.

Discussion:

BLAST showed that both the KPCVKLTPLCV and YCAP residue sequences were highly conserved, even among widely variant strains of immunodeficiency viruses. Essentially, as various viral proteins began to diverge from the original query, KPCVKLTPLCV and YCAP remained conserved in the overall protein sequence. Perhaps what is most notable about the conservation is the presence of many different sulfur containing cysteine residues. These cysteine residues most likely contribute disulphide bonding to other cysteines or methionine. As a whole, it seems that there are a great deal of conserved cysteine residues among the various isolates. Because gp120 is an envelope protein that binds to the host cell, the shape of the molecule is absolutely critical in determining its ability to bind to its host. More likely than not, these disulfide bonds contribute to the secondary structure of the overall protein. Perhaps it is not too much of a stretch to suppose that these amino acids are primary determinants of the overall shape. Without these amino acids, the viral envelope protein would have its shape changed, making it impossible for gp120 to bind to its host. Such a virion would be unable to infect its host and reproduce. Therefore, only the virions that have the these cysteine residues conserved would be able to infect host cells.

References:

Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.

Kwong PD, Wyatt R, Robinson J, Sweet RW, Sodroski J, Hendrickson WA (1998), "Structure of an HIV gp120 envelope glycoprotein in complex with the CD4 receptor and a neutralizing antibody", Nature 393: 648-659.

Minnesota Department of Health. (2003), "Testing for HIV-1/HIV-2 in Minnesota", Disease Control Newsletter 31:3. <http://www.health.state.mn.us/divs/idepc/newsletters/dcn/may03/index.html> Accessed 8 Mar. 2005.

 

Molecular Biology Homepage

Questions? Comments? Concerns? Email me: wigreendyke@davidson.edu