A new method for analyzing growth in extinct animals (dissertation summary 1)

The last year and a half was a whirlwind, and so I never got around to blogging about the fruits of my dissertation: Mandibular growth in Australopithecus robustus… Sorry! So this post will be the first installment of my description of the outcome of the project. The A. robustus age-series of jaws allowed me to address three questions: [1] Can we statistically analyze patterns of size change in a fossil hominid; [2] how ancient is the human pattern of subadult growth, a key aspect of our life history;  and [3] how does postnatal growth contribute to anatomical differences between species? This post will look at question [1] and the “zeta test,” new method I devised to answer it.

Over a year ago, and exactly one year ago, I described some of the rational for my dissertation. Basically, in order to address questions [2-3] above, I had to come up with a way to analyze age-related variation in a fossil sample. A dismal fossil record means that fossil samples are small and specimens fragmentary – not ideal for statistical analysis. The A. robustus mandibular series, however, contains a number of individuals across ontogeny – more ideal than other samples. Still, though, some specimens are rather complete while most are fairly fragmentary, meaning it is impossible to make all the same observations (i.e. take the same measurements) on each individual. How can growth be understood in the face of these challenges to sample size and homology?

Because traditional parametric statistics – basically growth curves – are ill-suited for fossil samples, I devised a new technique based on resampling statistics. This method, which I ended up calling the “zeta test,” rephrases the question of growth, from a descriptive to a comparative standpoint: is the amount of age-related size change (growth) in the small fossil sample likely to be found in a larger comparative sample? Because pairs of specimens are likelier to share traits in common than an entire ontogenetic series, the zeta test randomly grabs pairs of differently-aged specimens from one sample, then two similarly aged specimens from the second sample, and compares the 2 samples’ size change based only on the traits those two pairs share (see subsequent posts). Pairwise comparisons maximize the number of subadults that can be compared, and further address the problem of homology. Then you repeat this random selection process a bajillion times, and you’ve got a distribution of test statistics describing how the two samples differ in size change between different ages. Here’s a schematic:

1. Randomly grab a fossil (A) and a human (B) in one dental stage (‘younger’), then a fossil and a human in a different dental stage (‘older’). 2. Using only traits they all share, calculate relative size change in each species (older/younger): the zeta test statistic describes the difference in size change between species. 3. Calculate as many zetas as you can, creating a distribution giving an idea of how similar/different species’ growth is.

The zeta statistic is the absolute difference between two ratios – so positive values mean species A  grew more than species B, while negative values mean the opposite. If 0 (zero, no difference) is within the great majority of resampled statistics, you cannot reject the hypothesis that the two species follow the same pattern of growth. During each resampling, the procedure records the identity and age of each specimen, as well as the number of traits they share in common. This allows patterns of similarity and difference to be explored in more detail. It also makes the program run for a very long time. I wrote the program for the zeta test in the statistical computing language, R, and the codes are freely available. (actually these are from April, and at my University of Michigan website; until we get the Nazarbayev University webpage up and running, you can email me for the updated codes)

The zeta test itself is new, but it’s based on/influenced by other techniques: using resampling to compare samples with missing data was inspired by Gordon et al. (2008). The calculation of ‘growth’ in one sample, and the comparison between samples, is very similar to as Euclidean Distance Matrix Analysis (EDMA), devised in the 1990s by Subhash Lele and Joan Richtsmeier (e.g. Richtsmeier and Lele, 1993). But since this was a new method, I was glad to be able to show that it works!

I used the zeta test to compare mandibular growth in a sample of 13 A. robustus and 122 recent humans. I first showed that the method behaves as expected by using it to compare the human sample with itself, resampling 2 pairs of humans rather than a pair of humans and a pair of A. robustus. The green distribution in the graph to the left shows zeta statistics for all possible pairwise comparisons of humans. Just as expected, that it’s strongly centered at zero: only one pattern of growth should be detected in a single sample. (Note, however, the range of variation in the green zetas, the result of individual variation in a cross-sectional sample)

In blue, the human-A. robustus statistics show a markedly different distribution. They are shifted to the right – positive values – indicating that for a given comparison between pairs of specimens, A. robustus increases size more than humans do on average.

We can also examine how zeta statistics are distributed between different age groups (above). I had broken my sample into five age groups based on stage of dental eruption – the plots above show the distribution of zeta statistics between subsequent eruption stages, the human-only comparison on the left and the human-A. robustus comparison on the right. As expected, the human-only statistics center around zero (red dashed line) across ontogeny, while the human-A. robustus statistics deviate from zero markedly between dental stages 1-2 and 3-4. I’ll explain the significance of this in the next post. What’s important here is that the zeta test seems to be working – it fails to detect a difference when there isn’t one (human-only comparisons). Even better, it detects a difference between humans and A. robustus, which makes sense when you look at the fossils, but had never been demonstrated before.

So there you go, a new statistical method for assessing fossil samples. The next two installments will discuss the results of the zeta test for overall size (important for life history), and for individual traits (measurements; important for evolutionary developmental biology). Stay tuned!

ResearchBlogging.org Several years ago, when I first became interested in growth and development, I changed this blog’s header to show this species’ subadults jaws – it was only last year that I realized this would become the focus of my graduate career.

References
Gordon AD, Green DJ, & Richmond BG (2008). Strong postcranial size dimorphism in Australopithecus afarensis: results from two new resampling methods for multivariate data sets with missing data. American journal of physical anthropology, 135 (3), 311-28 PMID: 18044693

Richtsmeier JT, & Lele S (1993). A coordinate-free approach to the analysis of growth patterns: models and theoretical considerations. Biological Reviews, 68 (3), 381-411 PMID: 8347767

Programming Update: Resampling procedure

In the last post, I was talking about learning to program in R. I was able to re-program a resampling project that I’d written in Visual Basic, but I could not figure out how to store my resampled test-statistics, so I could plot a histogram of their distribution. Last night, after searching the world wide webs, I stumbled upon an even shorter code for resampling. I was able to tweak that code to the specs of what I wanted it to do, and–voila–I have a program that resamples my comparative sample, takes two specimens and computes a test statistic, compares that to my fossil specimens, and repeats the process as many times as I want. I’ll print the code at the end for readers to mess with if they want.

The basic idea of the project is that the “habiline” cranial fossils–early Homo from 1.9-1.6 million years ago are quite variable. Because of this, researchers have tried to fit these square pegs of fossils into round holes of species–H. habilis and rudolfensis. A simple, univariate trait that has been claimed to be evidence of multiple species is cranial capacity variation. For a long time this idea was propagated by comparing ER 1470 with ER 1813, because the two have very different cranial capacities and were thought to date to 1.9 Ma. Turns out, though, that while ER 1470 is 1.9 Ma, and ER 1813 might be closer to 1.65 Ma (Gathogo and Brown 2006). Gathogo and Brown state that a more geologically apt comparison, then, would be between ER 1813 and ER 3733. For the 1.9 Ma interval, the most disparate comparison is between ER 1470 and OH 24. Here’s the summary of specimens, ages, and their cranial capacities:

1.9 Ma: ER 1470 (752 cc) and OH 24 (590 cc)

1.65 Ma: ER 3733 (848 cc) and ER 1813 (510 cc)

Ratio of ER 1470/OH24 = 1.274

Ratio of ER 3733/ER 1813 = 1.663

So I wrote a program that tests the null hypothesis that early habiline cranial capacity variation is no different from that of extant gorillas–gorillas being one of the most size-dimorphic living relatives of hominins. If I cannot reject the null hypothesis, this would suggest that I cannot reject the idea that the habiline fossils sample a single species, based on cranial capacities. To test this, I took the ratio of the larger cranial capacity to the smallest, rather than male to female–this makes it easier to compare to the max-min ratio of habilines, without having to assume the sex of the specimens. The resampling program randomly selects two gorilla specimens, takes the ratio of the larger to the smaller, and repeats 5000 times. This way, I can assess the probability of randomly sampling two gorilla cranial capacities that are as different as the two sets of habilines.

The results, displayed in the histogram, show that there is a 25% chance of sampling two gorilla cranial capacities as different as the early habilines (1.9 Ma). However, there is only a 0.006% chance of sampling to gorillas as different as ER 1813 and ER 3733. Thus, we reject the null hypothesis for the comparison of ER 3733 and ER 1813. This means it is very unlikely that these two specimens come from a single species with a level of dimorphism/variation similar to modern day gorillas. This could mean that the two represent two different, contemporaneous species, or that they represent a single species with a level of variation greater than in our extant analog. The test cannot distinguish between these alternatives.Histogram of the resampled gorilla cranial capacity ratios. Notice that it is one-tailed. The red-dashed line indicates the 95th percentile. The early habiline ratio (E) is well within the 95 limit, while the later ratio (L) is outside the 95th percentile.

Here’s the code (the first two lines tell the program where to find the data, since I haven’t posted these, in order to run this program you’ll need a) to reset the directory to where you store your data [use the setwd() command] and b) a data file with cranial capacities listed in a single column, the first four of which are the habilines, and the remainder your comparative sample)



setwd(“C:/Users/zacharoo/Documents/Data”)

get <- read.csv("CranCaps.csv")

habs <- get[1:4,2]

OH24 <- habs[1]; ER1470 <- habs[2]; ER1813 <- habs[3]; ER3733 <- habs[4]

early <- ER1470/OH24; late <- ER3733/ER1813

gors <- get[5:105,2]

p = 0 # incremented if G1/G2 >= “early”

q = 0 # incremented if G1/G2 >= “late”

n = 5000 # <– NUMBER OF ITERATIONS !!!

gor.boot <- numeric(n) # x-number vector containing test statistics

for (i in 1:n) {

sub.samp <- gors[sample(101,2, replace = FALSE)] # sub.samp = 2 randomly sampled gorillas

G1 <- sub.samp[1]; G2 <- sub.samp[2]

if (G1 >= G2) {ratio <-G1/G2} else {ratio <-G2/G1}

gor.boot[i] <- ratio

if (gor.boot[i] >= early) {p = p +1} else {p = p} #frequencies

if (gor.boot[i] >= late) {q = q +1} else {q = q}

}

pval <- p/i

qval <- q/i

qntl <- quantile(gor.boot, .95)

hist(gor.boot, col = 3, xlab = “Resampled Gorilla ratio”, ylab = “Frequency”, main = “Frequency Distribution”)

points(early,25, pch = “E”, col = 2, cex = 1.25); points(late,25, pch = “L”, col = 2, cex = 1.25)

abline(v = qntl, col = 2, lty = 6) # line marking the 95% limit

print(pval); print(qval)

summary(gor.boot)



Reference

Gathogo PN, and Brown FH. 2006. Revised stratigraphy of Area 123, Koobi Fora, Kenya, and new age estimates of its fossil mammals, including hominins. Journal of Human Evolution 51(5):471-479.

Programming

I’m no wiz when it comes to computers, but lots of really smart people have told me I’d do myself an academic favor by learning to write programs. In Paleobiology last year, Phil Gingerich introduced me to programming in Visual Basic. While I did not become so proficient to find it as useful he did, it still got me interested. Then last semester, Milford urged me and other lawnchair anthropologists to take a course in the department of Ecology and Evolutionary Biology taught by George Estabrook, where we learned to write resampling programs for Excel in Visual Basic. It was a bit stressful at times–reading and writing computer codes at first was like learning to read and write Japanese with little training. But overall I thought it was a very useful class, and I’ll certainly be writing more resampling programs in the future (i.e. for my Australopithecus robustus projects…).

But the computer language that everybody’s been abuzz about in the past year or so is R. To quote the website (see the link), “R is a language and environment for statistical computing and graphics.” The software and codes for it are available FOR FREE on the website. I’ve been meaning to sit down and figure out how to use it for a year now, and now that I’m all alone at the museum in the evenings, I’ve begun learning to use R.

At first I had quite a difficult time simply loading data into the program. But once I figured it out (I had the idea, I just wasn’t typing it correctly–“syntax” errors), it was quite easy to run. With some datasets supplied from the Paleobiology class, I was able quite easily to compute and plot statistics like principle components analysis and linear models. It’s really easy! There are even built-in boostrap (resampling) programs, although I could not figure out how to use them as I wanted.

But that isn’t programming–I wasn’t making a custom program that would do what I wanted it to do. So I decided to give it a shot. For the programming class last semester (and Milford’s “Evolution of the Genus Homo” course), I wrote a resampling program that gave the probability of randomly sampling two gorilla cranial capacities as different as some early habilines. So I decided to try to rewrite that program in R. It took about two evenings of reading manuals and fiddling with commands (and drinking red wine and watching Roadhouse). The R code is much simpler than the original one I wrote in Visual Basic–this is largely because R has a built-in command that permutes rows/columns. Plus, it runs much, much faster–I was able to resample 100,000 times in a few seconds, whereas in the original (VB) program it took a few minutes to do 5,000. I think after a few more evenings of tinkering, I can figure out how to also plot the distribution of resampled test-statistics (although if anyone can tell me how I’d love to hear).

If I can figure out how to do it, I’ll post the code so other people can tinker with it if they want. Lesson: use R!