*This website was produced as an assignment for an undergratuate course at Davidson College.*


Orthologs are genes from multiple species that can be traced to a common ancestor and usually serve similar functions in their respective organisms. Scientists can analyze the nucleotide and amino acid sequences of orthologs to identify portions of the gene or protein that are highly conserved across species. Specific portions that are highly conserved often correspond to important sites--such as lipid binding sites--in the final protein or are important to the final configuration of the protein. Analyzing ortholog sequences can help further understanding of the relationship between structure and function in proteins such as Human Serum Albumin (HSA).

HSA is the human version of the serum albumin protein.  The serum albumins belong to a multigene family of proteins that includes α-fetoprotein and vitamin D-binding proteins with a highly conserved intron and exon organization. Evolutionary comparisons strongly support vitamin D-binding protein as the original gene in this group with subsequent local duplications generating the remaining genes in the cluster of albumin proteins. The α-fetoprotein binds various cations, fatty acids and bilirubin, while the vitamin D-binding protein binds to vitamin D, its metabolites, and fatty acids (Brown et al, 1982). Many of these binding properties are conserved by similar amino acid sequences in the HSA gene and its orthologs (Figure 1).

Figure 1: A BLAST of the HSA amino acid sequence provided evidence of a conserved 185 amino acid sequence from the vitamin D-binding protein that is found among all species within the albumin family, including HSA and its orthologs (Marchler-Bauer A, 2009). HSA protein is the top amino acid sequence, and the vitamin D-binding protein is the bottom amino acid sequence (NCBI).

The serum albumin protein has been isolated from a range of mammalian species, but no similarity to human data for serum albumin has been found in Drosophila, C. elegans, Arabidopsis, yeast, or E. coli.  Using BLASTp, I found that the human serum albumin gene is conserved in six mammalian species: Canis lupus familiaris (dog), Mus musculus (mouse), Rattus norvegicus (rat), Pan troglodytes (chimpanzee), Gallus gallus (chicken), and Bos taurus (cow). Overall, the sequences were fairly well conserved, which is consistent with the fact that the transport of fatty acids and steroids and the maintenance of osmotic pressure—the main functions of serum albumin—are essential for mammalian life.

Conservation of HSA Amino Acid Sequence in Different Species
     SPECIES                             % IDENTITY
Pan troglodytes                       98.8%
Canis lupus familiaris             80.1%
Bos Taurus                              76.6%
Rattus norvegicus                    73.5%
Mus musculus                          72.3%
Gallus gallus                           47.9%

Table 1: The percent conservation of amino acids in the protein sequences of the six homologs of HSA (BLASTp).

Figure 2: A cladogram shows ancestral relationships between species. The HSA and its six orthologs are all related to the same ancestor. Generally, the more highly conserved a gene or protein among different species, the more recently the two species diverged, such as for the serum albumin gene between Homo Sapiens and Pan troglodytes. (ClustalW2).

Multiple sequence alignment of the six orthologs show that this conservation of amino acid sequence allows for the fatty-acid binding and transport function of the serum albumin protein (NCBI). The overall human serum albumin protein, including the signal and propeptide sequences at the N-terminus, is of a different size than that of its homologs on the gene. HSA has 609 amino acids, while Pan troglodytes has 621 amino acids, Canis familiaris has 608 amino acids, Bos taurus has 607 amino acids, Mus musculus has 608 amino acids, Rattus norvegicus has 608 amino acids, and Gallus gallus has 615 amino acids. However, in these mammalian homologs, the functional albumin molecule consists of a single polypeptide chain with 585 amino acids held in three homologous domains by 17 disulfide bridges. Conserved in all the serum albumin orthologs, cysteine residues in eight sequential Cys-Cys pairs allow for the formation of these intradomain disulfide bridges, which are essential to the formation of the three domains and the folding of the peptide to form hydrophobic pockets for lipid and steroid binding (Figure 3; Dockal et al, 1999).

Figure 3: A Jalview image shows amino acid sequence alignment of a segment of the human, chimpanzee, dog, cow, mouse, rat and chicken serum albumin proteins. Colors emphasize which amino acids are conserved across species. Specifically, it is important to notice the Cys-Cys residues in pink that are essential in forming the characteristic disulfide bridges of serum albumin. Bar graphs showing the conservation of the amino acids at each residue are shown (ClustalW2).

In addition to the similarity of disulfide bridges among serum albumin orthologs, other structural similarities are evident that give rise to the unique function of serum albumin. In the orthologs of HSA, the subdomains of the protein structure form pockets in which the inner walls are formed by hydrophobic side chains, such as Arg257, Arg222, Lys199, His242, Arg218, and Lys195. The entrance of the pocket is surrounded by positiviely charged residues, such as Arg410 located at the mouth of the subdomain IIIa pocket and the hydroxy group of Tyr411 facing toward the inside of that pocket (Sugio et al, 1999). This hydrophobic environment allows for fatty acids to securely bind to the transporter protein away from the hydrophilic environment of the bloodstream. Since it is well-known that HSA and serum albumins are able to accommodate fatty acids, studies have been able to locate certain binding locations for specific fatty acids that have for the most part been conserved among serum albumins of different species. For example, palmitic acid has been found to be bound to Lys473, Lys349 and Lys116 on bovine serum albumin, which is equivalent to Lys475, Lys351, and Arg116 in HSA, respectively (Reed, 1986). In addition, research has shown that serum albumin orthologs tend to have two high-affinity sites for oleic acid in domain III, but only one associated with domain I. Domain III has one more disulfide bond that domain I, which may provide a peptide conformation critical to accomodating a second high-affinity site for long-chain fatty acids. Many other ligands besides oleic acid have been found to bind preferentially to this IIIA binding cavity, such as digitoxin, ibuprofen, and tryptophan (Hamilton, 1991). Trp411, conserved in mammalian albumins, plays an important structural role in the formation of the domain IIa binding site by limiting the solvent accessibility, and this amino acid aids in the binding of Warfarin (He et al, 1992). Figure 4 provides evidence of conserved amino acid sequences throughout the serum albumin proteins of homologous species.

Figure 4: A ClustalW2 alignment of HSA and its six orthologs. Conserved amino acid residues can be determined and provide evidence of similar fatty acid and ligand binding spots on the albumin protein. A “*” shows that residues in the column are identical in all sequences in the alignment. A “:” means that conserved substitutions have been observed, and “.” means that semi-conserved substitutions are observed. In order from top to bottom, the amino acid sequences are for Homo sapiens, Pan troglodytes, Canis lupus familiaris, Bos Taurus, Rattus norvegicus, Mus musculus, and Gallus gallus (ClustalW2).

Mutations in the serum albumin gene result in various anomalous proteins. Since albumin is a large protein, one might expect more genetic variability than in smaller proteins such as hemoglobin. This might suggest that selection is relatively active against variants of this molecule. However, mutations do exist and in fact, more than 80 genetically inherited variants of human albumin are known today. Analbuminemia is a rare autosomal recessive disorder in which serum albumin is absent in the bloodstream. After research on the HSA gene and the rat serum albumin gene, it is thought that a 7-bp deletion in an intron interferes with albumin mRNA formation (Shalaby et al, 1990). Yet variants in the albumin gene are noted to be generally benign. Even the rare condition analbuminemia, which causes edema and hyperlipidemia, does not appear to be life-threatening (Minchiotti et al, 2008). Glycosylation of several distinct lysine and arginine residues in the serum albumin protein has the potential to alter the biological structure and function of the protein. Elevated levels of glycosylated serum albumin, which can be observed in diabetes mellitus, can lead to tissue damage and further DNA mutation. It is not clear why only some residues can be subject to this glycosylation (Shaklai et al, 1984).



BLAST. Basic Local Alignment Search Tool. http://blast.ncbi.nlm.nih.gov/Blast.cgi.

Brown JR, Shockley P. In Jost P and Griffith OH (eds), Lipid-Protein Interactions, Vol. 1. Wiley, New York, pp. 25-68.

ClustalW2. EBI. http://www.ebi.ac.uk/Tools/clustalw2/index.html.

Dockal M, Carter DC, Ruker F. The three recombinant domains of human serum albumin: structural characterization and ligand binding properties. The Journal of Biological Chemistry 1999; 274: 29303-29310.

Hamilton JA, Era S, Bhamidipati SP, Reed RG. Locations of the three primary binding sites for long-chain fatty acids on serum albumin. Proc. Natl. Acad. Sci. USA 1991; 88: 2151-2054.

He XM, Carter DC. Atomic structure and chemistry of human serum albumin. Nature 1992; 358: 209-215.

Hutchinson DW, Matejtschuk P, Lord C. Albumin Carlisle: occurrence and properties of a new human albumin variant. IRCS Med. Sci. 1986; 14: 1095-1096.

Marchler-Bauer A. CDD: specific functional annotation with the Conserved Domain Database. Nucleic Acids Res 2009; 37: 205-210.

Minchiotti L, Galliano M, Kragh-Hansen U, Peters T. Mutations and polymorphisms of the gene of the major human blood protein, serum albumin. Hum. Mutat. 2008; 29: 1007-1016.

Reed RG. Location of long chain fatty acid-binding sites of bovine serum albumin by affinity labeling. J. Biol. Chem 1986; 261: 15619-15624.

Shaklai N, Garlick R, Bunn H. Nonenzymatic glycosylation of human serum albumin alters its conformation and function. J Biol Chem 1984; 259: 3812-3817.

Shalaby F, Shafritz DA. Exon skipping during splicing of albumin mRNA precursors in Nagase analbuminemic rats. Proc. Nat. Acad. Sci. 1990. 87: 2652-2656.

Spector AA. Fatty acid binding to plasma albumin. Journal of Lipid Research 1995; 16: 165-179.

Sugio S, Kashima A, Mochizuki S, Noda M, Kobayashi K. Crystal structure of human serum albumin at 2.5 Å resolution. Protein Engineering 1999; 12: 439-446.

Sarah Little's Homepage

Molecular Biology Homepage

Davidson College Homepage

Please direct questions or comments to Sarah Little