---
title: "Community Stability Metrics"
author: "Lauren M. Hallett"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
bibliography: biblio.bib
vignette: >
  %\VignetteIndexEntry{Community stability metrics}
  %\VignetteEngine{knitr::rmarkdown}
  %\SweaveUTF8
  %\VignetteEncoding{UTF-8}
  %\usepackage[utf8]{inputenc}

---
```{r echo = F}
library(knitr)
options(digits = 3)
```

## Overview
Ecologists have long debated the relationship between species diversity and stability. One key question in this debate is how the individual components of a community (e.g., species in species-rich communities) affect the stability of aggregate properties of the whole community (e.g., biomass production). It is increasingly recognized that unstable species populations may still maintain stable community productivity if a decrease in one species is compensated for by an increase in another. In a time series, this should be reflected by a pattern in which species negatively covary or fluctuate asynchronously while total community stability remained relatively stable. 

`codyn` includes a function to characterize community stability, `community_stability`, and three metrics to characterize species covariance and asynchrony:

- `variance_ratio` characterizes species covariance [@schluter1984; @houlahan2007], and includes a null-modeling approach to test significance [@hallett2014].

- `synchrony` has two options. The first compares the variance of the aggregated community with the variance of individual components [@loreau2008]. The second compares the average correlation of each individual species with the rest of the aggregated community [@gross2014].

## Example dataset
To illustrate each function, `codyn` uses a dataset of plant composition from an annually burned watershed at the Konza Prairie Long-Term Ecological Research site in Manhattan, KS. The `knz_001d` dataset spans 24 years and includes 20 replicate subplots.

```{r echo = F}
options(digits = 3)
```

```{r results='asis'}
library(codyn)
library(knitr)
data(knz_001d)
kable(head(knz_001d))
```


## Community stability
The `community_stability` function aggregates species abundances within replicate and time period, and uses these values to calculate community stability as the temporal mean divided by the temporal standard deviation [@tilman1999]. It includes an optional argument to calculate community stability across multiple replicates, which returns a data frame with the name of the replicate column and the stability value.
```{r results='asis'}
KNZ_stability <- community_stability(knz_001d, 
                                   time.var = "year", 
                                   abundance.var = "abundance", 
                                   replicate.var = "subplot")
kable(head(KNZ_stability))
```

If `replicate.var` is left as `NA`, the function will aggregate all values within a time period and return an integer value.
```{r results='asis'}
KNZ_A1_stability <- community_stability(df = subset(knz_001d, subplot=="A_1"),  
                                      time.var = "year", 
                                      abundance.var = "abundance")
KNZ_A1_stability
```

## Species covariance
### Variance ratio

#### Calculating the variance ratio
The variance ratio was one of the first metrics to characterize patterns of species covariance [@schluter1984] and was used in an early synthesis paper of species covariance in long time series [@houlahan2007]. The metric compares the variance of the community ($C$) as a whole relative to the sum of the individual population ($x_i$) variances:


$$ VR = \frac{Var(C)}{\sum_{i}^{N} Var(x_i)} $$

where:

$$ Var(C)  = \sum_{i = 1}^{N} Var(x_i) + 2\left(\sum_{i = 1}^{N - 1} \sum_{j = i + 1}^{N} Covar(x_i, x_j)\right) $$ 


If species vary independently then the variance ratio will be close to 1. A variance ratio  < 1 indicates predominately negative species covariance, whereas a variance ratio > 1 indicates that species generally positively covary. 

#### Significance testing
The variance ratio remains widely used but has been subject to a number of criticisms. Importantly, early uses of the variance ratio either did not include significance tests, or tested significance by comparing observed values to those returned by scrambling each species' time series. Null models using fully-scrambled species time series can generate spurious null expectations of covariance because the process disrupts within-species autocorrelation. Phase-scrambling [@Grman2010] and cyclic shifts [@hallett2014; adapted from @harms2001] have been used to address this issue. 

`variance_ratio` uses a cyclic shift permutations to conduct null modeling for significance tests. In this method a starting time point is randomly selected for each species' time series. This generates a null community matrix in which species abundances vary independently but within-species autocorrelation is maintained (for each species, the time series is disrupted only once). 

If a replicate column is specified, the default `variance_ratio` setting calculates a null variance ratio for each replicate in the dataset, averages these values, and repeats as many times as specified by `bootnumber`. This vector of averaged, null variance ratios is then sampled for lower and upper confidence intervals, which are returned along with the average observed variance ratio. 
```{r results='asis', eval=FALSE}
KNZ_variance_ratio <- variance_ratio(df = knz_001d, 
                                   species.var = "species", 
                                   time.var = "year",
                                   abundance.var = "abundance", 
                                   bootnumber = 10, 
                                   replicate.var = "subplot")

kable(KNZ_variance_ratio)
```


Alternatively, if a replicate column is specified and  `average.replicates` is set to `FALSE`, the function will return a vector of null variance ratios for each replicate in the dataset, and return the subsequent confidence intervals and observed variance ratios for each replicate.
```{r results='asis', eval=FALSE}
KNZ_variance_ratio_avgrep <- variance_ratio(knz_001d, 
                                          time.var = "year",
                                          species.var = "species",
                                          abundance.var = "abundance",  
                                          bootnumber = 10, 
                                          replicate.var = "subplot", 
                                          average.replicates = FALSE)

kable(head(KNZ_variance_ratio_avgrep))
```


If `replicate.var` is left as `NA` the function assumes that there is a single observation for each species within a given time period.
```{r results='asis', eval=FALSE}
KNZ_A1_variance_ratio <- variance_ratio(df = subset(knz_001d, subplot=="A_1"), 
                                      time.var = "year", 
                                      species.var = "species",  
                                      abundance.var = "abundance",  
                                      bootnumber = 10) 
kable(KNZ_A1_variance_ratio)
```

#### General cyclic shift function
`codyn` also includes the option to apply a cyclic shift permutation on other test statistics:

- `cyclic_shift`  returns an S3 object of test statistics from a user-specified function when applied to a null community generated via a cyclic shift permutation. It requires a dataframe with a `species.var`, `time.var` and `abundance.var` column, and optional `replicate.var` column. The user-specified function should operate on a community matrix (e.g., `cov`).

The length of the "out" parameter in the object is the number of null iterations as specified by `bootnumber`). If multiple replicates are specified, null values are averaged among replicates for each iteration, but a different cyclic shift permutation is applied to each replicate within an iteration. 

- `confint` returns the confidence intervals from the S3 object produced by `cyclic_shift`.


####  Variance ratio and species richness
A second criticism of the variance ratio is that it is sensitive to species richness. This is a particular concern when the metric is used to compare communities that have different levels of species richness. The most conservative approach is to restrict use of the variance ratio to two-species communities [@hector2010]. Comparing the effect size of the observed versus null variance ratio can also account for richness differences between communities. Two alternative metrics that quantify species asynchrony have been developed in part to respond to this issue.

## Species synchrony 
### Synchrony Option 1: "Loreau"
Loreau and de Mazancourt (2008) developed a metric of species synchrony that compares the variance of aggregated species abundances with the summed variances of individual species:

$$ Synchrony = \frac{{\sigma_(x_T)}^{2}}{({\sum_{i} \sigma_(x_i)})^{2}}$$

where:

$$ x_T(t) = {\sum_{i=1}^{N} x_i(t))} $$

This measure of synchrony is standardized between 0 (perfect asynchrony) and 1 (perfect synchrony). A virtue of this metric is that it can be applied across communities of variable species richness. It can also be applied not only to species abundance but also population size and per capita growth rate. However, unlike the variance ratio it does not lend itself to significance testing. In addition, it will return similar values for communities shaped by different processes -- for example, even if species vary independently, the synchrony metric may be affected by the number of species and individual species variances [@gross2014].

In `codyn`, this is the default metric for the `synchrony` function and can be easily calculated for multiple replicates in a dataset. 
```{r results='asis'}
KNZ_synchrony_Loreau <- synchrony(df = knz_001d, 
                         time.var = "year", 
                         species.var = "species", 
                         abundance.var = "abundance", 
                         replicate.var = "subplot")
kable(head(KNZ_synchrony_Loreau))
```


If there are no replicates within the dataset allow the `replicate.var` argument to default to `NA`. 
```{r results='asis'}
KNZ_A1_synchrony_Loreau <- synchrony(df = subset(knz_001d, subplot=="A_1"),
                            time.var = "year",
                            species.var = "species", 
                            abundance.var = "abundance")
KNZ_A1_synchrony_Loreau
```


### Synchrony Option 2: "Gross"
Gross et al. (2014) developed a metric of synchrony that compares the average correlation of each individual species with the rest of the aggregated community:

$$ Synchrony = (1/N){{\sum_{i}Corr(x_i, \sum_{i\neq{j}}{x_j})}}$$

This measure of synchrony is standardized from -1 (perfect asynchrony) to 1 (perfect synchrony) and is centered at 0 when species fluctuate independently. A virtue of this metric is it not sensitive to richness and has the potential for null-model significance testing. It may under-perform on short time series because it is based on correlation, and care should be taken when applying it to communities that contain very stable species (i.e., whose abundances do not change throughout the time series). 


In `codyn`, this metric is calculated with the `Gross` option in the `synchrony` function and can be easily calculated for multiple replicates in a dataset. If a species does not vary over the course of the time series `synchrony` will issue a warning and will remove that species from the calculation.
```{r results='asis'}
KNZ_synchrony_Gross <- synchrony(df = knz_001d, 
                           time.var = "year", 
                           species.var = "species",  
                           abundance.var = "abundance", 
                           metric = "Gross", 
                           replicate.var = "subplot")
kable(head(KNZ_synchrony_Gross))
```


If there are no replicates within the dataset allow the `replicate.var` argument to default to `NA`. 
```{r results='asis'}
KNZ_A1_synchrony_Gross <- synchrony(df = subset(knz_001d, subplot=="A_1"),
                              time.var = "year", 
                              species.var = "species",  
                              abundance.var = "abundance", 
                              metric = "Gross")
KNZ_A1_synchrony_Gross
```

###Comparison between "Loreau" and "Gross"
Qualititively, the degree to which the synchrony metrics calculated by `Loreau` versus `Gross` will differ depends on the abundance distributions of the species in a community. The `Loreau` method and the variance ratio are both based on variances, and are therefore more heavily influenced by abundant species. In contrast,  the `Gross` method is based on correlation and consequently weights species equally.

## Citations