9 Mile Prairie Worksheet

Introduction

The goal of this week’s lab is to analyze the species area curve data from 9 Mile Prairie, and get more experience making figures and calculating summary statistics in R. Like all labs, I expect you to submit an R Markdown file that I can compile to html. You should assume any data files are in a subdirectory called “data”. Your file should not echo the code used. In the handout, there may be several code blocks making the same graph multiple times with slight tweaks. You should only keep the best and final graph that you chose in your submitted file. Answer the numbered questions in text, referring to graphical and tabular output as needed. You may use base graphics or ggplot2, according to your preference.

We collected species occurence data in nested quadrats in 3 different parts of the original 9 mile Prairie. Each section collected data from a portion burned in 2015, 2016 or earlier this year in 2017. Dr. Wedin’s hypothesis is that the biodiversity will be the same in these three fields, even though they look very different to the eye.

Open RStudio and create a new project. I recommend creating a project for each week of class inside a folder called NRES222 or whatever else works for you.

Download the data from here, saving it into a subdirectory of your project called data. Make sure to keep the filename exactly the same: 9mile_plants_all.csv!

Create a new R Markdown file by clicking the button in the top left of RStudio with a + on it. Save this file into your project directory with a filename yourlastname_ninemile.Rmd.

At the top of the file is some header information that you can leave as is for now.

On line 8 you will see a block of text with ``` on the top and bottom:
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
Remember that this is a “code chunk”. Everything in between the ``` is treated as instructions to the computer. Everything outside the code chunk is rendered as regular text. Add a line to the code chunk to load the tidyverse package we installed last week.
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
```

You can delete everything from the end of that code chunk to the bottom of the file. Click the Knit button at the top. You should get a preview of an HTML file that just has the header information in it. An html file will also appear in the file pane next to the Rmd file.

Bring the data in

We add another line to the setup code to read the data into a dataframe called plants.
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
plants <- read_csv("data/9mile_plants_all.csv")
```

Plot the raw data

For the first question, you will calculate the total number of species observed in each size quadrat, for each group and section of the lab.

# make a factor to get colors right
plants$fBurned <- factor(plants$Burned)

## first get numbers of species by quadrat area, grouped by id and day
spparea1 <- group_by(plants, fBurned, id, Quadrat_Area)
spparea2 <- summarise(spparea1, newspp = n())  
spparea3 <- mutate(spparea2, 
                   totspp = cumsum(newspp))

# now make the plot
ggplot(data = spparea3,
       mapping = aes(x=Quadrat_Area, y=totspp)) + 
  geom_point(mapping=aes(color=fBurned),
             position = "jitter")

Adding the argument position = "jitter" to geom_point() just adds a small random number to each data point to avoid points plotting on top of each other. Another way to separate the sections that might be easier to answer Q1 is to make a separate subplot for each unit burned.

ggplot(data = spparea3,
       mapping = aes(x=Quadrat_Area, y=totspp)) + 
  geom_point(mapping=aes(color=fBurned),
             position = "jitter") +
  facet_wrap(~fBurned)
  1. Using the graphs with the raw data, rank the 3 fields (burned in 2015, 2016, or 2017) from most homogeneous (all student data sets similar) to most heterogeneous. You are looking at the spread vertically within a single size of quadrat. Why are some fields more variable (speculate)?

Now calculate the average number of species found in each size plot for each field. Use summarize() again, this time to summarize each quadrat size across all groups. Also calculate the standard error of the mean, a measure of how precise the estimate is.

spparea4 <- group_by(spparea3, fBurned, Quadrat_Area)
spparea4 <- summarize(spparea4,
                      meanspp = mean(totspp),
                      sespp = sd(totspp) / sqrt(n()))

Next, make a graph with the average numbers of species versus plot size for each field. We also add error bars to each point.

ggplot(data = spparea4,
       mapping = aes(x=Quadrat_Area, y=meanspp)) + 
  geom_point() +
  facet_wrap(~fBurned) + 
  geom_errorbar(mapping = aes(ymin = meanspp - qt(0.975, df = 3) * sespp, 
                              ymax = meanspp + qt(0.975, df = 3) * sespp)) + 
  labs(x = "Quadrat Area [m^2]", 
       y = "Cumulative species richness (95% CI)",
       caption = "Figure 1: Cumulative species richness as a function of total quadrat area searched in three fields at 9 Mile Prairie, Nebraska.") + 
  theme(plot.caption = element_text(hjust=0))
  1. Compare the average curves for the 3 fields. Does the species area curve appear to follow a smooth relationship? Is the shape of the curve the same for all four fields?

  2. Which of the 3 fields had the highest plant diversity?

  3. In which size plots do we generally gain the most new species?

  4. Pick the field with the highest diversity. Visually estimate (extrapolate) how many species you would expect to find in a \(100 m^2\) plot in that field?

  5. If you were surveying many grassland types and had to pick a single plot size to use for data collection, what size would you pick and why?


Ecologists often present species-area curves on log10 axes (both the X and Y axes) or transform the data by taking logs before graphing. This generally turns the curve into a straight line (See the discussion of species-area curves in your ecology textbook). It also allows the viewer to compare changes in diversity in small plots equally with changes in large plots. On regular (linear) axes, the changes in big plots dwarf changes seen in small plots.

Mathematically, species-area curves can be fit with the non-linear equation \(S = cA^z\) where S is species number, A is plot area, and c and z are constants. We won’t use this equation in lab; non-linear equations such as this are much more difficult to work with. However, if you transform the data for species number and plot area to log values, the relationship becomes linear. A line fit to the data averages across point to point irregularities to give the best description of the species area relationship in each area. The equation now has the form: \[ Log(S) = log c + z log(A) \]

Create a new figure with log10 – transformed data. We can simply change the axes on the plot, and add some linear trendlines.

ggplot(data = spparea4,
       mapping = aes(x=Quadrat_Area, y=meanspp)) + 
  geom_point() +
  facet_wrap(~fBurned) + 
  geom_errorbar(mapping = aes(ymin = meanspp - qt(0.975, df = 3) * sespp, 
                              ymax = meanspp + qt(0.975, df = 3) * sespp)) + 
  labs(x = "Quadrat Area [m^2]", 
       y = "Cumulative species richness (95% CI)",
       caption = "Figure 1: Cumulative species richness as a function of total quadrat area searched in three fields at 9 Mile Prairie, Nebraska.") + 
  theme(plot.caption = element_text(hjust=0)) + 
    scale_x_log10() + scale_y_log10() + # here is the new bit
      geom_smooth(method="lm")
  1. Do the log10 – transformed data appear to follow a linear relationship? Is the slope of the line similar for all 3 areas?

  2. Which of the 3 areas had the highest plant diversity? Do the differences among the grassland types change between small and large plots? (hint: might be easier if you remove facet_wrap() and get all three lines on same graph.)

  3. Pick the grassland type with the highest diversity. Visually estimate (extrapolate) how many species you would expect to find in a 100m2 quadrat using the log10 – transformed data.


Those lines look fine, but what if we want to know the values of \(c\) and \(z\) actually are for each field? To do that we need the summary of the linear regressions shown in the graphs.

burned2015 <- filter(spparea3, fBurned == "2015")
b2015_lm <- lm(log(totspp) ~ log(Quadrat_Area), data= burned2015)
summary(b2015_lm)

The estimate labeled (intercept) is \(log(c)\), and the estimate labeled log(Quadrat_Area) is \(z\). Repeat for the other two groups. Each one needs to be in a new code chunk.

  1. What are the equations of your log-transformed relationships for the 3 areas? Type the equations with the numbers to 2 decimal places only, and one equation to a line.

  2. Use the equations to predict how many species each habitat would have at a scale of \(1 m^2\)? (Remember that \(log(1) =0\), so your answer is the value of the intercept. BUT, you have to transform it from a log scale back to linear, so take 10 raised to the value of the intercept to give the average number of species in a 1m2 plot).

  3. Use the equations to predict how many species each habitat would have at a scale of 100m2 plot?

You can calculate this using the equations you typed out above.
Example: let’s say your equation is

\(Log(S) = 1.0332 + 0.1642 log(A)\)

If A = 100, then log(A) = 2.

Plug that into the equation and you get Log(S) = 1.362.

Does that mean you expect to find 1.362 species in a 100 m2 plot? NO, the log of the number of species you predict is 1.362. How do you get rid of the log? Raise 10 to the number you have. 10 raised to the 1.263 is 21.9, which is the predicted number of species in a 100 m2 plot given the equation above. We’ll use logs (base 10) and natural logs quite a bit in this course, so it pays to review them. You can do these calculations in R

logS <- 1.0332 + 0.1642 * log10(100)
10^logS
  1. Do you think these relationships (the straight lines on the logS vs logA graph) would increase smoothly indefinitely? What would happen if the study area crossed into a different ecosystem type (for example a stand of trees or a corn field)?

  2. Recall that most studies find \(0.25 <= z <=0.35\). Is our data consistent with these expectations?

  3. Why do you think the various fields we studied differ or do not differ in plant diversity? How does the time since fire affect diversity?

Turn in

In Canvas, you will submit your R markdown file. Be sure your last name is the first part of the filename, and that it says “9mile” somewhere as well. Double check that it will work (necessary for full points) by clicking the “knit” button. If it produces an HTML file with all your figures and answers to the questions, you’re good!

Avatar
Andrew Tyre
Professor of Wildlife Ecology
comments powered by Disqus