Title: | Bootstrap Trait Values to Calculate Moments |
---|---|
Description: | Calculates trait moments from trait and community data using the methods developed in Maitner et al (2021) <doi:10.22541/au.162196147.76797968/v1>. |
Authors: | Richard James Telford [aut, cre]
|
Maintainer: | Richard James Telford <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0 |
Built: | 2025-03-12 04:48:42 UTC |
Source: | https://github.com/plant-functional-trait-course/traitstrap |
Function calculates the trait coverage of the community for each level of the sampling hierarchy and makes a barplot.
Shows at which level the data are coming from in each plot.
## S3 method for class 'filled_trait' autoplot(object, other_col_how, ...)
## S3 method for class 'filled_trait' autoplot(object, other_col_how, ...)
object |
output from |
other_col_how |
what to do with the other columns in other data.
Options are to |
... |
optional filters for use with |
a ggplot2 plot
require("ggplot2") data(community) data(trait) filled_traits <- trait_fill( comm = community, traits = trait, scale_hierarchy = c("Site", "PlotID"), taxon_col = "Taxon", value_col = "Value", trait_col = "Trait", abundance_col = "Cover" ) autoplot(filled_traits)
require("ggplot2") data(community) data(trait) filled_traits <- trait_fill( comm = community, traits = trait, scale_hierarchy = c("Site", "PlotID"), taxon_col = "Taxon", value_col = "Value", trait_col = "Trait", abundance_col = "Cover" ) autoplot(filled_traits)
A dataset containing plant cover in control plots on Svalbard from PFCT4 TraitTrain course.
community
community
A data frame with 110 rows and 4 variables:
species name
cover, in percent
site name
plot name
https://www.uib.no/en/rg/EECRG/114808/plant-functional-traits-course-4
Helper function for bootstrap_traits_multivariate that extracts results from a correlation matrix
cor_to_df(corr)
cor_to_df(corr)
corr |
correlation matrix |
A data.frame of correlations
Modified from https://stackoverflow.com/a/23476844/2055765
x <- matrix(ncol = 5, rnorm(20)) colnames(x) <- letters[1:5] cor(x) |> cor_to_df()
x <- matrix(ncol = 5, rnorm(20)) colnames(x) <- letters[1:5] cor(x) |> cor_to_df()
Calculates at which level the data are coming from in each plot.
fortify_filled_trait(object, other_col_how, ...)
fortify_filled_trait(object, other_col_how, ...)
object |
filled traits from |
other_col_how |
what to do with the other columns in other data.
Options are to |
... |
optional filters for use with |
a tibble
A dataset containing plant traits in control plots on Svalbard from PFCT4 TraitTrain course.
trait
trait
A data frame with 705 rows and 6 variables:
site name
plot name
species name
Unique leaf ID
trait name with unit
trait value
https://www.uib.no/en/rg/EECRG/114808/plant-functional-traits-course-4
A function for trait filling using a hierarchical sampling design, which allows to account for incomplete trait collections, traits from different spatial or temporal levels (i.e. local traits vs. databases) and experimental designs.
trait_fill( comm, traits, scale_hierarchy = c("Country", "Site", "BlockID", "PlotID"), global = TRUE, taxon_col = "taxon", trait_col = "trait", value_col = "Value", abundance_col = "Cover", treatment_col = NULL, treatment_level = NULL, other_col = character(0), keep_all = FALSE, min_n_in_sample = 5, complete_only = FALSE, leaf_id = NULL )
trait_fill( comm, traits, scale_hierarchy = c("Country", "Site", "BlockID", "PlotID"), global = TRUE, taxon_col = "taxon", trait_col = "trait", value_col = "Value", abundance_col = "Cover", treatment_col = NULL, treatment_level = NULL, other_col = character(0), keep_all = FALSE, min_n_in_sample = 5, complete_only = FALSE, leaf_id = NULL )
comm |
a dataframe in long format with community data |
traits |
a dataframe in long format with trait data |
scale_hierarchy |
character vector of the sampling hierarchy from large to small (e.g. site/block/plot) |
global |
logical; calculate traits at global scale. Must not be a column called global in the traits data. |
taxon_col |
character; name of taxon column in comm and traits. Can be a vector (e.g. "species", "genus"), in which case if traits cannot be filled for the first taxon column, subsequent columns will be used in order. |
trait_col |
character; name of trait name column in traits |
value_col |
character; name of trait value column in traits |
abundance_col |
character; name of species abundance column in comm |
treatment_col |
character; optional name of treatment_col in comm and traits. Must refer to a factor where first level is control. |
treatment_level |
character; optional name of scale_hierarchy at which treatment should be filtered |
other_col |
name of other grouping columns in comm |
keep_all |
logical; keep trait data at all available levels or just finest scale available |
min_n_in_sample |
numeric; minimum number in sample with traits to accept before searching for traits higher up the hierarchy. The default is 5. |
complete_only |
logical; use only leaves with a full set of traits.
Set to TRUE when imputing for |
leaf_id |
character; unique leaf identifiers.
Only needed when |
The function uses a hierarchical sampling design, which allows it to account for incomplete trait collections, traits from different spatial or temporal levels (i.e. local traits vs. databases) and/or experimental designs.
With scale_hierarchy you can define the levels at which the traits have been collected and their order starting with the highest level (e.g. global database, region, site, block, plot).
trait_fill()
will choose if available a trait value from the lowest
level, i.e. species X from plot A and if no trait is available from that
level, it will move up the hierarchy and choose a trait from species X from
plot B at the same site.
If there is no trait available from species X in the same site, it
will choose a trait value from another site.
The argument min_n_in_samples
allows users to define the minimum number in
sample at each level for the trait filling.
If the minimum number is not reached,
trait values from the next level will also be selected, to avoid sampling the
same individual several times, which will result in unrealistic variances.
The default value is 5.
In the other_col
argument other grouping variables in the community dataset
can be defined and will be kept after the trait filling step.
Traitstrap also allows to include taxonomy and experimental design in the trait filling step.
With taxon_col a hierarchy for the taxonomy can be defined. If traits for a specific species are not available, traits from next level, e.g. the genus will be selected. For this a list of the taxonomic hierarchy has to be defined (e.g. "taxon", "genus", "family").
The argument treatment_col allows to incorporate an experimental design where traits are selected from the same experimental treatment or the first factor level, which is assumed to be the control. Therefore, it is important to order the levels of a treatment in the right order, i.e. the first level has to be the control. If you have two or more treatments and you want filling to be done only within a treatment, and not from a treatment and the control, then make the first level of the factor a level that is not in the data. The filling step can be defined at certain level using the treatment_level argument. Depending on the experimental design trait filling should occur a certain level, e.g. block or site.
a tibble with extra class filled_trait
.
data(community) data(trait) filled_traits <- trait_fill( comm = community, traits = trait, scale_hierarchy = c("Site", "PlotID"), taxon_col = "Taxon", value_col = "Value", trait_col = "Trait", abundance_col = "Cover" )
data(community) data(trait) filled_traits <- trait_fill( comm = community, traits = trait, scale_hierarchy = c("Site", "PlotID"), taxon_col = "Taxon", value_col = "Value", trait_col = "Trait", abundance_col = "Cover" )
Function to fit parametric distributions for each species-by-trait combination at the finest scale of the user-supplied hierarchy. This function returns a tibble containing the fitted parameters.
trait_fit_distributions(filled_traits, distribution_type = "normal")
trait_fit_distributions(filled_traits, distribution_type = "normal")
filled_traits |
output from the trait_fill function. |
distribution_type |
the type of statistical distribution to use. Character or named list. Currently accepts "normal","lognormal", and "beta". |
The distributions can either be a single distribution type which is
used for all traits, or traits can be assigned specific distributions types
by supplying the function with a named vector of traits, e.g.
c(height = "normal", mass = "lognormal"))
.
a tibble containing fitted distribution parameters for each trait in each species for each plot.
library(dplyr) data(community) data(trait) filled_traits <- trait_fill( comm = community |> filter(PlotID %in% c("A", "B")), traits = trait, scale_hierarchy = c("Site", "PlotID"), taxon_col = "Taxon", value_col = "Value", trait_col = "Trait", abundance_col = "Cover" ) fitted_distributions <- trait_fit_distributions( filled_traits = filled_traits, distribution_type = "normal" )
library(dplyr) data(community) data(trait) filled_traits <- trait_fill( comm = community |> filter(PlotID %in% c("A", "B")), traits = trait, scale_hierarchy = c("Site", "PlotID"), taxon_col = "Taxon", value_col = "Value", trait_col = "Trait", abundance_col = "Cover" ) fitted_distributions <- trait_fit_distributions( filled_traits = filled_traits, distribution_type = "normal" )
Function gives overview of which taxa are missing traits.
trait_missing(filled_trait, comm)
trait_missing(filled_trait, comm)
filled_trait |
output of trait_fill function. |
comm |
community data |
A tibble
with columns
Taxon |
Species names (actual name depends on |
max_abun |
Maximum abundance of that taxa. Be more concerned about taxa missing traits with high abundances. |
n |
Number of occurrences of the taxon. Be more concerned about taxa missing traits with many occurrences. |
n_traits |
Number of traits for each species. Ideally all should equal the number of traits you have measured. |
data(community) data(trait) filled_traits <- trait_fill( comm = community, traits = trait, scale_hierarchy = c("Site", "PlotID"), taxon_col = "Taxon", value_col = "Value", trait_col = "Trait", abundance_col = "Cover" ) trait_missing(filled_traits, community)
data(community) data(trait) filled_traits <- trait_fill( comm = community, traits = trait, scale_hierarchy = c("Site", "PlotID"), taxon_col = "Taxon", value_col = "Value", trait_col = "Trait", abundance_col = "Cover" ) trait_missing(filled_traits, community)
Function for nonparametric bootstrap resampling to calculate community weighted trait correlations, other bivariate or multivariate statistics
trait_multivariate_bootstrap( filled_traits, nrep = 100, sample_size = 200, raw = FALSE, id = "ID", fun )
trait_multivariate_bootstrap( filled_traits, nrep = 100, sample_size = 200, raw = FALSE, id = "ID", fun )
filled_traits |
output from the trait_fill function. |
nrep |
number of bootstrap replicates |
sample_size |
bootstrap size |
raw |
logical; argument to extract the raw data of the trait
distributions.
The default is |
id |
column name of unique identifiers of each leaf |
fun |
bivariate or multivariate function to apply |
The observed and filled leaves are re-sampled in proportion to
their weights, e.g. the abundance of a species or the biomass.
Values across all individuals in a community are
resampled sample_size
times to incorporate the full
spectrum of trait variation, generating nrep
trait distributions.
The function fun
is applied to the trait distribution at the finest level
of the filled trait hierarchy.
Unexpected columns in filled_traits
are deleted with a warning.
Note that due to the flexibility of this function,
the output CAN NOT be summarized using
trait_summarise_boot_moments
.
a tibble with columns for the elements of the scale_hierarchy,
and a list column result which includes the output of fun
.
require(dplyr) require(tidyr) require(ggplot2) require(purrr) data(community) data(trait) filled_traits <- trait_fill( comm = community |> filter( PlotID %in% c("A", "B"), Site == 1 ), traits = trait, scale_hierarchy = c("Site", "PlotID"), taxon_col = "Taxon", value_col = "Value", trait_col = "Trait", abundance_col = "Cover", complete_only = TRUE, leaf_id = "ID" ) # Note that more replicates and a greater sample size are advisable # Here we set them low to make the example run quickly boot_traits <- trait_multivariate_bootstrap(filled_traits, fun = cor, nrep = 10, sample_size = 100 ) boot_traits_long <- boot_traits |> mutate(correlations = map(result, ~ cor_to_df(.x))) |> select(-result) |> unnest(correlations) boot_traits_long |> ggplot(aes(x = paste(row, "v", col), y = value)) + geom_violin() + facet_grid(Site ~ PlotID) + coord_flip() + labs(y = "Correlation", x = "")
require(dplyr) require(tidyr) require(ggplot2) require(purrr) data(community) data(trait) filled_traits <- trait_fill( comm = community |> filter( PlotID %in% c("A", "B"), Site == 1 ), traits = trait, scale_hierarchy = c("Site", "PlotID"), taxon_col = "Taxon", value_col = "Value", trait_col = "Trait", abundance_col = "Cover", complete_only = TRUE, leaf_id = "ID" ) # Note that more replicates and a greater sample size are advisable # Here we set them low to make the example run quickly boot_traits <- trait_multivariate_bootstrap(filled_traits, fun = cor, nrep = 10, sample_size = 100 ) boot_traits_long <- boot_traits |> mutate(correlations = map(result, ~ cor_to_df(.x))) |> select(-result) |> unnest(correlations) boot_traits_long |> ggplot(aes(x = paste(row, "v", col), y = value)) + geom_violin() + facet_grid(Site ~ PlotID) + coord_flip() + labs(y = "Correlation", x = "")
Function for nonparametric bootstrap resampling to calculate community weighted trait mean and higher moments.
trait_np_bootstrap(filled_traits, nrep = 100, sample_size = 200, raw = FALSE)
trait_np_bootstrap(filled_traits, nrep = 100, sample_size = 200, raw = FALSE)
filled_traits |
output from the trait_fill function. |
nrep |
number of bootstrap replicates |
sample_size |
bootstrap size |
raw |
logical; argument to extract the raw data of the trait
distributions.
The default is |
The observed traits are re-sampled in proportion to their weights,
e.g. the abundance of a species or the biomass.
Values across all individuals
in a community are resampled sample_size
times to incorporate the full
spectrum of trait variation, generating nrep
trait distributions.
From these distributions the function estimates the mean and the higher
moments including variance, skewness and kurtosis.
#' The output of trait_np_bootstrap()
can be summarized using
trait_summarize_boot_moments()
.
a tibble with columns for each grouping variable of filled_traits
(usually the elements of scale_hierarchy and the traits column),
and the moments mean, variance, skewness, and kurtosis.
library(dplyr) data(community) data(trait) # Filter community data to make example faster community <- community |> filter( PlotID %in% c("A", "B"), Site == 1 ) filled_traits <- trait_fill( comm = community, traits = trait, scale_hierarchy = c("Site", "PlotID"), taxon_col = "Taxon", value_col = "Value", trait_col = "Trait", abundance_col = "Cover" ) boot_traits <- trait_np_bootstrap(filled_traits, nrep = 20, sample_size = 200 )
library(dplyr) data(community) data(trait) # Filter community data to make example faster community <- community |> filter( PlotID %in% c("A", "B"), Site == 1 ) filled_traits <- trait_fill( comm = community, traits = trait, scale_hierarchy = c("Site", "PlotID"), taxon_col = "Taxon", value_col = "Value", trait_col = "Trait", abundance_col = "Cover" ) boot_traits <- trait_np_bootstrap(filled_traits, nrep = 20, sample_size = 200 )
Function for parametric bootstrap resampling to calculate community weighted trait mean and higher moments.
trait_parametric_bootstrap( fitted_distributions, nrep = 100, sample_size = 200, raw = FALSE )
trait_parametric_bootstrap( fitted_distributions, nrep = 100, sample_size = 200, raw = FALSE )
fitted_distributions |
Fitted distribution object returned by trait_fit_distributions |
nrep |
number of bootstrap replicates |
sample_size |
bootstrap size |
raw |
logical; argument to extract the raw data of the
trait distributions.
The default is |
trait_parametric_bootstrap()
is a parametric analogue of the
trait_np_bootstrap()
.
It randomly samples from among the fitted distributions
proportionally to species abundance.
The number of samples per replicated are drawn
specified with the parameter sample_size,
and the number of replicates is specified
by the parameter nrep
.
From these distributions the function estimates the mean
and the higher moments including variance, skewness and kurtosis.
The output of trait_parametric_bootstrap()
can be summarized using
trait_summarize_boot_moments()
.
a tibble
library(dplyr) data(community) data(trait) # Filter trait and community data to make example faster community <- community |> filter( PlotID %in% c("A", "B"), Site == 1 ) trait <- trait |> filter(Trait %in% c("Plant_Height_cm")) filled_traits <- trait_fill( comm = community, traits = trait, scale_hierarchy = c("Site", "PlotID"), taxon_col = "Taxon", value_col = "Value", trait_col = "Trait", abundance_col = "Cover" ) fitted_distributions <- trait_fit_distributions( filled_traits = filled_traits, distribution_type = "normal" ) # Note that more replicates and a greater sample size are advisable # Here we set them low to make the example run quickly parametric_distributions <- trait_parametric_bootstrap( fitted_distributions = fitted_distributions, nrep = 5, sample_size = 100 ) moment_summary <- trait_summarise_boot_moments( bootstrap_moments = parametric_distributions, parametric = FALSE )
library(dplyr) data(community) data(trait) # Filter trait and community data to make example faster community <- community |> filter( PlotID %in% c("A", "B"), Site == 1 ) trait <- trait |> filter(Trait %in% c("Plant_Height_cm")) filled_traits <- trait_fill( comm = community, traits = trait, scale_hierarchy = c("Site", "PlotID"), taxon_col = "Taxon", value_col = "Value", trait_col = "Trait", abundance_col = "Cover" ) fitted_distributions <- trait_fit_distributions( filled_traits = filled_traits, distribution_type = "normal" ) # Note that more replicates and a greater sample size are advisable # Here we set them low to make the example run quickly parametric_distributions <- trait_parametric_bootstrap( fitted_distributions = fitted_distributions, nrep = 5, sample_size = 100 ) moment_summary <- trait_summarise_boot_moments( bootstrap_moments = parametric_distributions, parametric = FALSE )
Summarizes the mean and confidence interval for each trait moment.
trait_summarise_boot_moments( bootstrap_moments, parametric = TRUE, sd_mult = 1, ci = 0.95 )
trait_summarise_boot_moments( bootstrap_moments, parametric = TRUE, sd_mult = 1, ci = 0.95 )
bootstrap_moments |
trait moments from trait_np_bootstrap or trait_parametric_bootstrap |
parametric |
logical; default is TRUE. Should Confidence Intervals be calculated parametrically (using the mean and SD) or nonparametrically (using quantiles). |
sd_mult |
Number of standard deviations around each moment, defaults to one |
ci |
Desired confidence level for use when parametric is false. Defaults to 0.95. |
tibble with the grouping variables and the mean of each moment (+/- sd_mult * SD)
library(dplyr) data(community) data(trait) # Filter community data to make example faster community <- community |> filter(PlotID %in% c("A", "B")) filled_traits <- trait_fill( comm = community, traits = trait, scale_hierarchy = c("Site", "PlotID"), taxon_col = "Taxon", value_col = "Value", trait_col = "Trait", abundance_col = "Cover" ) # Note that more replicates and a greater sample size are advisable # Here we set them low to make the example run quickly boot_traits <- trait_np_bootstrap(filled_traits, nrep = 20, sample_size = 100 ) trait_summarise_boot_moments(boot_traits)
library(dplyr) data(community) data(trait) # Filter community data to make example faster community <- community |> filter(PlotID %in% c("A", "B")) filled_traits <- trait_fill( comm = community, traits = trait, scale_hierarchy = c("Site", "PlotID"), taxon_col = "Taxon", value_col = "Value", trait_col = "Trait", abundance_col = "Cover" ) # Note that more replicates and a greater sample size are advisable # Here we set them low to make the example run quickly boot_traits <- trait_np_bootstrap(filled_traits, nrep = 20, sample_size = 100 ) trait_summarise_boot_moments(boot_traits)