This webpage was created as part of an undergraduate assignment at Davidson College

The phenotypic legacy of admixture between modern humans and Neandertals


Simonti et al. published a paper in Science examining the impact of Neandertal (Homo neanderthalensis) DNA found in the modern human (Homo sapiens) genome on human traits (Simonti et al., 2016). This DNA makes up 1.5-4% of the modern Eurasian genome and is present because as modern human groups moved out of Africa and into Europe, they interbred with other ancient hominin populations, including Neantertals. At the time, many of the traits that entered the human gene pool/phenome were probably advantageous as humans adapted to their new climate, but over time these genes have lost their advantage. However, they still are present in the genome, yet no one has provided an in-depth analysis of the impact of these introgressed alleles on modern human phenotypes. Simonti et al. sought to fill this void, and present conclusions to three hypotheses in the aforementioned paper:

1) Is DNA inherited from Neandertals associated with modern human disease?

2) Can current phenotypes be related to specific Neandertal haplotypes?

3) Are electronic health records useful for this type of analysis?

Relevant terms:

Experimental Design

Genotypic and phenotypic data on ~28,000 AMH individuals of European descent (currently residing in the USA) were gathered from the eMERGE network EHRs. The individuals were organized into two cohorts according to the date of release of their information (13,686 individuals in Phase 1 and 14,730 individuals in Phase 2). Phase 1 was used as the "discovery cohort" and Phase 2 represented the "replication cohort". Approximately 1000 phenotypes were derived from the EHRs.
A genome-wide map of Neandertal haplotypes was compared to the genotypes of sequenced individuals from the 1000 Genomes Project. Matches were considered "putative introgressed haplotypes." Specific Neandertal SNPs were narrowed down by eliminating those that occurred at a frequency significantly different from overall Neandertal haplotype frequency, as well as eliminating haplotypes with fewer than four Neandertal-derived SNPs. The authors then applied a variety of meta-analysis techniques to identify, characterize, and quantify relationships between Neandertal DNA and AMH phenotypes.

Examining the Data:
All figures and tables come directly from Simonti et al., 2016.

Figure 1:

Figure 1 is very busy. It depicts both methods and some sample conclusions of the EHR analysis of genotypic and phenotypic data. It is necessary first to break it down one panel at a time. However, each panel contains information to help the reader understand other panels. There are two critical things to note overall: the method of labeling is consistent throughout panels A, B, and D; and each panel represents only one or a few examples of the kind of analysis that was done for the 1689 considered phenotypes.

Panel A:

Panel A shows how the phenotypes derived from EHR's were compared to the genotype of each patient. These data are specific to cohort E1; cohort E2 followed the same procedure. The first column identifies each patient with a P followed by two subscript numbers. The first is a site number, which indicates the site at which the data were collected (for example, the Mayo Clinic). The second identifies patients within the site. So patient P1,1 is patient 1 at site 1.
The second column ("Genotypes") shows three DNA nucleotides that were part of the patient's genotype. Nucleotides that are shaded in red match one of the Neandertal SNPs.
In the most right column, the EHR was used to identify whether or not each patient expressed any of the studied phenotypes, identified here as 1, 2...all the way up to M (the last phenotype). A check means the patient expresses the phenotype; an X means the patient did not.
Because phenotype 2 has been hypothesized to be associated with Neandertal variants at the third position shown, a box is drawn around the associated SNP and phenotype. These boxes will be relevant later when we consider the larger figure.

Panel B:

Panel B shows the genotypic and phenotypic similarity of patients. The left side shows the overall genetic similarity of pairs of patients, comparing genotype at all of the Neandertal SNP.s The darker the coloration, the more similar the two individuals (which is why the intersection of P1,1 and P1,1 is darkest). On the right, phenotypes are compared. If both patients express (or don't express) the specific trait, the intersection is dark, while if only one patient expresses it, it is white. This information was used to determine which phenotypes are impacted by the Neandertal SNP's (see Panel C).

Panel C:

Genome-wide complex trait analysis (GCTA) was used to determine which phenotypes in AMHs seem to be impacted by the presence of Neandertal variants in the corresponding parts of the genome. GCTA is used to estimate the proportion of phenotypic variance explained by genome-wide SNPs. Panel C identifies a few of the phenotypes that were found to be explained to a nominally significant extent by Neandertal SNPs.

Panel D:

Panel D is a forest plot which displays an example of the results of a phenome-wide association study (PheWAS), a meta-analysis used to test individual Neandertal alleles for trait association. The top shows the first iteration, which used the E1 cohort ("Discovery"). The number of cases (individuals exhibiting the trait) and controls are listed. The P-value indicates that the correlation between having the SNP (rs3917862) and the phenotype (hypercoagulable state) is significant. To the far right, a chart shows the odds of having the phenotype with the SNP versus without it. After displaying these data for each individual site in the E1 cohort, the meta-analysis data are shown (numbers for the overall association across sites). The problem with this particular plot is that it requires a lot of deduction. We are never directly told what odds are being calculated, or what the p-value is for.
Below the Discovery PheWAS are the data for the Replication PheWAS. The same analysis was conducted using individuals from the E2 cohort. The odds ratio was found to be similar, but more importantly the P-value was still much smaller than 0.05, indicating that significant association between the two is reproducible. The association depicted here is just one of four Neandertal SNP-AMH phenotype associations that was found to be significant in the E1 analysis and reproducible in the E2 analysis.

Panel E:

Panel E is a box plot that shows the expression of SELP when the human allele for rs3917862 is present (Human/Human) compared to when the Neandertal allele is present (Neandertal/Human). The y-axis is the relative level of expression compared to the average expression when the human allele is present. SELP codes for a cell adhesion protein that attracts leukocytes to injuries during inflammation, and plays a key role in blood coagulation. The take-away of this figure is simply that there is a significant increase in SELP expression if the Neandertal allele is present, which is consistent with the association of the Neandertal haplotype with hypercoagulation.

Putting it all together: Figure 1

Now that we understand a little more about each panel, we can look at Figure 1 cohesively. First, genotype and phenotype were matched by aligning genome sequencing data with phenotypic data from EHRs. Boxes indicate SNPs associated with phenotype 2. The arrows connecting panels A and D show how the blue shaded Site 1 corresponds to samples from Marshfield, while the green shaded Site 2 corresponds to samples from Mayo.
Panel B demonstrates how similarity of individual genotypes and phenotypes shown in panel A was determined using pair-wise comparisons. From these data, panel C lists some phenotypes associated with Neandertal SNPs, determined via GCTA. The forest plot in panel D demonstrates the association between a Neandertal SNP and an AMH phenotype, in part using data collected in panel A. Finally, panel E is another visual representation of the increased phenotype occurrence from D when the corresponding SNP is present. Overall, this figure displays some methods used throughout the paper and demonstrates the association between Neandertal alleles and AMH phenotypes, as well as relationships between specific SNP-phenotype pairs.

Table 1:

The GCTA analysis in figure 1 identified 8 traits for which Neandertal alleles explained a reproducible, nominally significant proportion of variance. They are here listed according to P value in the replication (E2) cohort, with actinic keratosis having the smallest P value (and thus the greatest confidence).
Three of these 8 were reproducible when a stricter model that also accounted for non-Neandertal SNPs elsewhere in the genome were considered. To further understand this figure and the methods behind it, please refer to the following methods-results flow chart:

Figure by Prudencio, 2016.

Table 2:

Table 2 displays the four Neandertal SNP-phenotype associations that remained reproducibly significant after a Bonferroni correction. From left to right, it includes a description of each phenotype, the chromosomal position of the SNP, the name of the SNP, the associated gene(s), the ratio of phenotype occurrence with Neandertal SNP to occurrence without the SNP and P value in the discovery (E1) cohort, and the ratio and P value in the replication (E2) cohort. The first association, hypercoagulable state and rs3917862, was seen above in Figure 1. To be included, P must be <0.05. The take-away is that specific SNPs can be identified as associated with specific phenotypes.

Figure 2:

This figure shows the types of phenotypes most frequently associated with Neandertal SNPs. Zero on the y-axis indicates that the number of Neandertal SNPs associated with phenotypes in the group is equal to the number expected, based on association with non-Neandertal SNPs. The y-axis displays a positive value if more phenotypes were associated than expected (enrichment), and a negative value if fewer were associated (depletion). Along the x-axis, we see the category of phenotype and in parentheses, the number of Neandertal SNPs associated with that group. Asterisks indicate significant enrichment in neurologic phenotype association (11 associations) and significant depletion in digestive phenotype association (zero associations).

Conclusions about Introgression Effects:

In the introduction to this analysis, we considered 3 questions:

1) Is DNA inherited from Neandertals associated with modern human disease?

2) Can current phenotypes be related to specific Neandertal haplotypes?

3) Are electronic health records useful for this type of analysis?

Question 1 was answered in Figure 1 and Table 1, in which the presence of Neandertal SNPs was overall found to explain multiple AMH phenotypes (GCTA quantifies influence of SNPs together, not specific ones). The authors state in the abstract: "Neandertal alleles together explained a significant fraction of the variation in risk for depression and skin lesions resulting from sun exposure (actinic keratosis)". Based on Figure 2, they also conclude that certain types of traits are more likely to be associated with Neandertal DNA, which could be evidence of environmental selection pressures as AMHs moved out of Africa and into Europe.

Question 2 was answered in Table 2, in which the influence of Neandertal DNA on AMH phenotype was pinpointed to a specific chromosomal position. Again, the abstract summarizes this: "individual Neandertal alleles were associated with specific human phenotypes".

Additionally, Figure 2 suggests that specific types of phenotypes are most controlled by Neandertal loci. The authors suggest that Neandertal introgression influenced AMH brain phenotypes, which may have been advantageous for AMHs as they moved out of Africa into new climates with different environments and sun exposure. However, in modern Western environments many of these traits, such as depression risk, neurologically-based disease, and other psychiatric phenotypes, are no longer advantageous and are even detrimental to AMHs.

Conclusions about Methods:

Question 3 is answered overall. In the conclusion, Simonti et al. write: "EHR data, paired with DNA sequencing, hold promise for characterizing the phenotypic impact of regions identified through evolutionary analyses." They showed that their method of overlapping genomic data with phenotypic data can be used, through meta-analysis, to draw conclusions about the association of introgression on phenotypes of the recipient species.

Comprehensive Critique:

Overall, based on the data presented, I believe the authors are justified in concluding that their method is a useful tool for studying the phenotypic effects of ancient hominid admixture. The methods, while complicated, were well described - they clearly stated the kind of analysis they used to produce each type of data. However, because there were so many methods, it was at times confusing. It would probably be helpful to read the Materials and Methods, which are available in the supplementary materials. I had to look up some acronyms, as well as statistical tests, in order to really understand the data. I would have liked to see clearer explanations of precisely what was being analyzed in each study; however, given the space constraints I commend the authors for giving enough information that I could research concepts further on my own. The biggest challenge was the varying statistical tests and P values used; perhaps this was necessary but it was confusing.

They did successfully convince me of the impact of introgression on AMH phenotypes. I thought the hypotheses they developed regarding the categories of phenotype associations were especially fascinating. It's interesting that digestive phenotypes weren't more effected, as I expect the diet likely had to change in the new environment. An interesting followup would be to compare frequency of Neandertal SNPs and phenotypic similarity of AMHs of Eurasian descent to that of individuals living in sub-Saharan Africa.

My biggest complaint was the layout of the paper. Figure 1 was incredibly complicated. There was too much going on, and while the legend explained what to get out of the figure, it did not give many clues on how to read it. For example, I had to guess what phenotype "M" meant. I also had to deduce for myself that P1,1 indicated patient 1 at site 1. This was very aggravating to me and it took three read-throughs of the paper and a long time intently studying the figure to fully understand all parts. Another challenge was that panel E was not referenced until two pages after the figure appeared. I would have laid the paper out differently, so most or all of the text describing the data in the figure would appear before the figure. I prefer to have to flip ahead to a figure than struggle to examine it before it has been explained, especially since the panels in this figure were all related. I also don't think those relationships were always made clear.

Thus, I respect the methods used and the written communication of the results. I don't feel they made unsubstantiated claims. However, I would deduct points from these authors for layout and for trying to cram too much information into a single figure with a poor layout and legend.


Simonti CN, Vernot B, Bastarache L, Bottinger E, Carrell DS, Chisholm RL, Crosslin DR, Hebbring SJ, Jarvik GP, Kullo IJ, et al. 2016. The phenotypic legacy of admixutre between modern humans and Neandertals. Science[Internet]. [cited 7 Mar 2016]; 351(6274): 737-41. Available from:

Back to Home Page

Genomics Page
Biology Home Page

Email Questions or Comments:

Copyright 2016 Department of Biology, Davidson College, Davidson, NC 28035