This web page was produced as an assignment for an undergraduate course at Davidson College.

The International HapMap Project

Human Variation

Humans all belong to one species, but differences in DNA make each person unique. While most of the DNA sequence is the same among individuals, 0.1% of nucleotides differ between genomes (The International HapMap Consortium, 2003). A genome is the haploid set of chromosomes in an organism. Variations in DNA sequences can cause phenotypic differences, which are the physical manifestations of a genotype. A specific DNA sequence can result in a disease. Finding the causes of diseases and how people will respond to potential therapies can improve human healthcare and provide information about human variation. Many genetic diseases are caused by mutations in more than one gene. Narrowing a search to one candidate gene is not sufficient. Instead, researchers must find all of the genes that are associated with a particular condition and analyze how environmental factors also influence the progression of a disease (The International HapMap Consortium, 2003).

Haplotypes and SNPs

To study human genetic variations, researchers use haplotypes, which is a combination of the words haploid and genotype (Campbell and Heyer, 2006). Chromosomes contain specific alleles that are grouped together, called haplotypes (National Human Genome Research Institute, 2001). Single nucleotide polymorphisms, SNPs, pronounced “snips,” are nucleotide variants among individuals at a specific location on the chromosome (Campbell and Heyer, 2006). (See Fig. 1). Haplotypes contain different combinations of SNPs. Specific SNPs, called tag SNPs, enable researchers to identify which haplotype is present in a particular genome (The International HapMap Consortium, 2003). At specific regions along every chromosome there are haplotypes. A person contains two haplotypes at each region, with one from each parent (Campbell and Heyer, 2006). Populations across the globe contain almost all haplotypes found in humans but the frequency of specific haplotypes varies (The International HapMap Consortium, 2003). It is rare to find a haplotype in only one population (National Human Genome Research Institute, 2001). Mutations and chromosomal recombination can create new, rare haplotypes that are only present in one population (Campbell and Heyer, 2006; National Human Genome Research Institute, 2001).

Figure 1. The diagram shows the same loci in two different genomes. The red letters signify SNPs. (Adapted from The International HapMap Consortium, 2003.)

The International HapMap Project

The HapMap is an international project that aims to find the associations between variations in human DNA and phenotypes (The International HapMap Consortium, 2007). Sequencing an individual’s entire genome is expensive and takes a considerable amount of time, although efforts are underway to create technology that can cheaply sequence a human genome. Looking for haplotypes is more efficient (The International HapMap Consortium, 2003). The goal of the International HapMap Project was to create a database of haplotypes that are present in genomes of individuals from around the world. This information is freely available for scientists to use to further their own research (The International HapMap Consortium, 2003).

Creating the Database

To create the International HapMap Database, scientists collected blood samples from people around the world. There were many sources of funding for this extensive project (The International HapMap Consortium, 2003). The four initial sample populations were people from Nigeria, China, Japan, and Northern/Western Europe. More population groups are being added to the database to improve the accuracy of conclusions. After Phase II, the database contained a total of 3.1 million SNPs (The International HapMap Consortium, 2007).

Individuals gave consent for their samples to be used and the only identifying label was population group. It was important to protect the subjects’ privacy and ensure that the samples are only used for the purposes explained on the informed consent documents. When comparing populations, the researchers must be aware that it is also important to avoid generalizations that can result in racial stereotypes and “genetic determinism.” Even if a specific population has a high frequency of a haplotype associated with a disease, it is not accurate to say that a certain population always gets that particular disease (The International HapMap Consortium, 2003).

Use of the Database

The HapMap database contains information that is used for association studies. There are numerous diseases that are influenced by more than one gene, such as cancer and psychiatric illnesses (The International HapMap Consortium, 2003). Scientists can sequence isolated regions of DNA from people with a genetic disease and from people who are not affected by that particular disease. If a specific haplotype variant is found more often in people with the disease, the data suggest that particular variant is near a gene that contributes to the disease. The haplotype is then associated with the disease. A SNP within the haplotype could contribute to some of the disease symptoms or just be a marker of the disease (National Human Genome Research Institute, 2001).

Regions of chromosomes are more likely to undergo recombination if they are farther apart. If sections on a chromosome rarely recombine, then the sections are probably close together, which is called linked. When a variant is found that is commonly linked to another variant, it is likely that the second variant is also present (The International HapMap Consortium, 2003). This logic is used to determine if an individual may develop a genetic disorder. The theory of Common-Disease/Common-Variant explains that if a patient has a haplotype that is frequently found in people with a particular disorder, then the patient may develop that disorder or be predisposed to develop the disorder (National Human Genome Research Institute, 2001). However, having a variant that is associated with a disease does not guarantee that the individual will develop the disease. The presence of other genetic variants and environmental factors also influence disease progression (The International HapMap Consortium, 2003).

Figure 2. This image shows the result when searching for a specific region on human chromosome five. The SNPs are visible as two different nucleotides. The window shows the frequency of SNPs in different populations.
International HapMap Project. 2014. <>. Accessed 2 February 2014.

The HapMap databases are not only used for finding genetic variations that lead to diseases. The data can also be used to figure out evolutionary relationships. A group in Japan used haplotype data to decipher the origins of the A allele of the ABO blood group. Their conclusion was that instead of the human A allele evolving from an ancestral A allele found in chimpanzees, the human A allele arose from a recombination event occurring between the B and O haplotypes (Itou et al., 2013).

Figure 3. This PDB image shows a glycosyltransferase. Differences in the terminal sugar chain of this molecule result in the different blood types. Image obtained from RCSB Protein Data Bank.

Future Discoveries

Since the HapMap is a public database, there are restrictions about patents that scientists can get to ensure other people still have free access to the data. Scientists can get a patent for finding a specific association between a disease and information found in the database (The International HapMap Consortium, 2003). Future studies of haplotypes will elucidate the relationships between phenotypes and variations in DNA sequences and hopefully lead to treatments for many genetic disorders.

To learn more about The International HapMap Project and access the database, visit:




Campbell, M. A., and L. Heyer. Discovering Genomics, Proteomics and Bioinformatics. 2nd ed. Cold Spring Harbor Laboratory Press and Benjamin Cummins. 28 February 2006.

Itou, Masaya, Mitsuharu Sato, and Takashi Kitano. 2013. Analysis of a Larger SNP Dataset from the HapMap Project Confirmed that the Modern Human A Allele of the ABO Blood Group Genes is a Descendant of a Recombinant between B and O Alleles. International Journal of Evolutionary Biology: 1-10.

National Human Genome Research Institute. 2001. Developing a Haplotype Map of the Human Genome for Finding Genes Related to Health and Disease.<>.

The International HapMap Consortium. 2003. The International HapMap Project. Nature 426: 789-796.

The International HapMap Consortium. 2007. A second generation human haplotype map of over 3.1 million SNPs. Nature 449: 851-862.


Devon's Home Page

Assignment #2:You get to choose (due Feb. 28, 2 pm)

Assignment #3: You get to choose (due Apr. 25, 2 pm)


Genomics Page
Biology Home Page

Email Questions or Comments.

© Copyright 2014 Department of Biology, Davidson College, Davidson, NC 28035