Programming Update: Resampling procedure

In the last post, I was talking about learning to program in R. I was able to re-program a resampling project that I’d written in Visual Basic, but I could not figure out how to store my resampled test-statistics, so I could plot a histogram of their distribution. Last night, after searching the world wide webs, I stumbled upon an even shorter code for resampling. I was able to tweak that code to the specs of what I wanted it to do, and–voila–I have a program that resamples my comparative sample, takes two specimens and computes a test statistic, compares that to my fossil specimens, and repeats the process as many times as I want. I’ll print the code at the end for readers to mess with if they want.

The basic idea of the project is that the “habiline” cranial fossils–early Homo from 1.9-1.6 million years ago are quite variable. Because of this, researchers have tried to fit these square pegs of fossils into round holes of species–H. habilis and rudolfensis. A simple, univariate trait that has been claimed to be evidence of multiple species is cranial capacity variation. For a long time this idea was propagated by comparing ER 1470 with ER 1813, because the two have very different cranial capacities and were thought to date to 1.9 Ma. Turns out, though, that while ER 1470 is 1.9 Ma, and ER 1813 might be closer to 1.65 Ma (Gathogo and Brown 2006). Gathogo and Brown state that a more geologically apt comparison, then, would be between ER 1813 and ER 3733. For the 1.9 Ma interval, the most disparate comparison is between ER 1470 and OH 24. Here’s the summary of specimens, ages, and their cranial capacities:

1.9 Ma: ER 1470 (752 cc) and OH 24 (590 cc)

1.65 Ma: ER 3733 (848 cc) and ER 1813 (510 cc)

Ratio of ER 1470/OH24 = 1.274

Ratio of ER 3733/ER 1813 = 1.663

So I wrote a program that tests the null hypothesis that early habiline cranial capacity variation is no different from that of extant gorillas–gorillas being one of the most size-dimorphic living relatives of hominins. If I cannot reject the null hypothesis, this would suggest that I cannot reject the idea that the habiline fossils sample a single species, based on cranial capacities. To test this, I took the ratio of the larger cranial capacity to the smallest, rather than male to female–this makes it easier to compare to the max-min ratio of habilines, without having to assume the sex of the specimens. The resampling program randomly selects two gorilla specimens, takes the ratio of the larger to the smaller, and repeats 5000 times. This way, I can assess the probability of randomly sampling two gorilla cranial capacities that are as different as the two sets of habilines.

The results, displayed in the histogram, show that there is a 25% chance of sampling two gorilla cranial capacities as different as the early habilines (1.9 Ma). However, there is only a 0.006% chance of sampling to gorillas as different as ER 1813 and ER 3733. Thus, we reject the null hypothesis for the comparison of ER 3733 and ER 1813. This means it is very unlikely that these two specimens come from a single species with a level of dimorphism/variation similar to modern day gorillas. This could mean that the two represent two different, contemporaneous species, or that they represent a single species with a level of variation greater than in our extant analog. The test cannot distinguish between these alternatives. Histogram of the resampled gorilla cranial capacity ratios. Notice that it is one-tailed. The red-dashed line indicates the 95th percentile. The early habiline ratio (E) is well within the 95 limit, while the later ratio (L) is outside the 95th percentile.

Here’s the code (the first two lines tell the program where to find the data, since I haven’t posted these, in order to run this program you’ll need a) to reset the directory to where you store your data [use the setwd() command] and b) a data file with cranial capacities listed in a single column, the first four of which are the habilines, and the remainder your comparative sample)

setwd(“C:/Users/zacharoo/Documents/Data”)

habs <- get[1:4,2]

OH24 <- habs; ER1470 <- habs; ER1813 <- habs; ER3733 <- habs

early <- ER1470/OH24; late <- ER3733/ER1813

gors <- get[5:105,2]

p = 0 # incremented if G1/G2 >= “early”

q = 0 # incremented if G1/G2 >= “late”

n = 5000 # <– NUMBER OF ITERATIONS !!!

gor.boot <- numeric(n) # x-number vector containing test statistics

for (i in 1:n) {

sub.samp <- gors[sample(101,2, replace = FALSE)] # sub.samp = 2 randomly sampled gorillas

G1 <- sub.samp; G2 <- sub.samp

if (G1 >= G2) {ratio <-G1/G2} else {ratio <-G2/G1}

gor.boot[i] <- ratio

if (gor.boot[i] >= early) {p = p +1} else {p = p} #frequencies

if (gor.boot[i] >= late) {q = q +1} else {q = q}

}

pval <- p/i

qval <- q/i

qntl <- quantile(gor.boot, .95)

hist(gor.boot, col = 3, xlab = “Resampled Gorilla ratio”, ylab = “Frequency”, main = “Frequency Distribution”)

points(early,25, pch = “E”, col = 2, cex = 1.25); points(late,25, pch = “L”, col = 2, cex = 1.25)

abline(v = qntl, col = 2, lty = 6) # line marking the 95% limit

print(pval); print(qval)

summary(gor.boot)

Reference

Gathogo PN, and Brown FH. 2006. Revised stratigraphy of Area 123, Koobi Fora, Kenya, and new age estimates of its fossil mammals, including hominins. Journal of Human Evolution 51(5):471-479.