*This website was produced as an assignment for an undergraduate course at Davidson College.*

Orthologs of Human Pepsin A


Figure 1-Human pepsin A and its subfamily. Small numbers indicate percent confidence of tree construction and codes are the proteins' accession numbers.
Click HERE for the full phylogenetic tree
Source: Hughes, et al. 2003

         Pepsin was the first enzyme to be discovered back in 1836, and soon thereafter was shown to break down proteins. (Foltmann 1981) Since it has been around for so long, its evolution has been studied and reviewed in great detail. This has allowed scientists to discover a number of different orthologs for human pepsin A. An ortholog is a gene or protein that is found in two or more species which share a common ancestor. The phylogenetic tree in Figure 1 above displays the pepsin A subfamily, which includes human pepsin A. The link below the figure shows a more detailed phylogenetic tree. As shown in the figure, human pepsin A's closest ortholog is Pig pepsin A as determined by an analysis of the amino acid sequences. The next closest relatives are a frog (Xenopus) pepsin, chicken pepsin A, and then a group consisting of cow, rat, and a second chicken pepsin. The larger tree goes on to show that human pepsin A has orthologs in most major groups of eukarya including Drosophila (house flies), yeast, roundworms such as C. elegans, and Arabidopsis. Rawlings and Bateman (2009) have gone on to compare the sequence of a number of bacteria, including some Shewanalla spp., but no mention was made of an ortholog in E. coli even though the study tested 960 different prokaryotes. They then added these species' orthologs to a pepsin A phylogenetic tree of their own, which can be seen HERE (Rawlings, Bateman 2009). As mentioned in the original protein website, pepsin starts out in an inactive form with some additional amino acid residues that must be irreversibly removed from the protein before the enzyme can become active pepsin A. This website will continue to focus on the active form of the protein

Detailed Sequence Analysis

Figure 2-Conserved domains of a selected number of human pepsin A orthologs. Conserved domains are depicted in red and amino acid numbers correspond to pig pepsin A, by convention. The thick black line depicts the point at which Kageyama hypothesizes that the two duplicate proteases joined in the making of the common ancestor of all pepsin-like gastric proteases. The first and final box encase the catalytic motif that surround the catalytic residues Asp32 and Asp215, which are indicated by bold, black marks. The box from 71-82 surrounds the flexible loop sequence of the S1 subsite. The box from 134-144 surrounds the residues of the active site flap.
Source: Edited from original obtained at NCBI Conserved Domains

         Examining the amino acid sequences of six of the seven pepsins in the pepsin A subfamily shows the critical sections to the proper function of these aspartic proteinases. Of particular importance are Asp32 and Asp215, which function as catalytic residues and are in the center of the active-site cleft. These two residues are so critical that replacement of these aspartates by other residues inactivates the enzyme. (Kageyama 2001) As you can see in Figure 2, both the catalytic motifs surrounding the critical aspartates and the aspartates themselves are conserved across all 7 species.
         The active-site cleft also contains at least seven subsites lababled S4 through S1 and S'1 through S'3. These subsites can accommodate seven residues in a substrate, and their composition speaks to pepsin's particular preference for large hydrophobic or aromatic amino acids. The S1 subsite has been shown to be a primary determinant of substrate specificity, and as shown in Figure 2 it features a loop (box from 71-82 and shown in Figure 3) primarily composed of hydrophobic residues whose importance is suggested by the relative conservation of the sequence in this section. Since this subsite and its components are hydrophobic, it makes sense that it preferentially attracts hydrophobic residues following the hydrophobic/hydrophilic rule that "like attracts like". Since aromatic rings also have hydrophobic character, aromatic residues, such as Phe, are attracted as well. Actually, as indicated in Table 1 below, Phe is the ideal residue for the S1 subsite. This makes sense because 7 of the 10 residues in this subsite are hydrophobic. The importance of the S1 subsite can be seen by the stabilizing selection between the 8 species in Figure 2 above when examining most of the residues listed in Table 1 below. Though Table 1 shows 7 subsites, there is some speculation that there may be as many as 4 more substrate-binding pockets. (Kageyama 2001)

Table 1 -The makeup of the subsites of pig pepsin A and each subsite's ideal substrate.
Source: Sielicki et al. 1990

Structural Analysis of Commonalities

Figure 3-A 3-dimensional model of the peptide chain of aspartic proteinases, labeling key residues. This model is drawn so the viewer looks into the mouth of the reaction cleft with the outermost sections protruding away from the middle portion and out of the page.
Source: Foltmann 1981

          The importance of the commonalities between these 8 enzymes is best understood after viewing the 3-dimensional figure above. The importance of the aspartate residues at 32 and 215 becomes evident in this figure since they are central to the two sides of the active-site cleft. Any protein looking to enter the active site must pass these two hydrophobic residues, making it less likely that the enzyme will interact with substrates that are not hydrophobic. The S1 subsite's flexible loop sequence is also visible on the left, protruding towards the active-site cleft. It is understandable how this could be a primary determinant of substrate specificity since it appears to be the section that is closest to the cleft, omitting the two catalytic motifs. The active-site flap, from amino acids 134-144, appears to contain or consist of an α-helical structure. (Kageyama 2001)

Works Cited

1. Hughes A, Green J, Piontkivska H, Roberts R. Aspartic proteinase phylogeny and the origin of pregnancy-associated glycoproteins. Mol. Bio. Evol. 2003; 20(11):1940-1945. Pubmed

2. Foltmann B. Gastric Proteinases - Structure, function, evolution, and mechanism of action. Essays in biochemistry 1981; 17: 52-84. Pubmed

3. Rawlings N, Bateman A. Pepsin homologues in bacteria. BMC Genomics 2009; 10: 437. Pubmed

4. NCBI. NCBI Conserved Domains. http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi Accessed 03/09/2010.

5. Kageyama T. Pepsinogens, progastricsins, and prochymosins: structure, function, evolution, and development. CMLS, Cell. Mol. Life Sci. 2002; 59: 288-306. Pubmed (Obtained through ILL)

Molecular Biology Homepage

Davidson College Homepage

Please direct questions or comments to algonzalezstewart@davidson.edu.