Undergraduate student projects

Automating Data Collection and Enhancing Species Distribution Prediction for Climate Change Impact on Wildlife

Project background:

Temperature has strong effects on ectotherms and their diseases, leading to potentially complex effects of a changing climate. A challenge is that pathogens and their hosts each have distinct temperature-dependent physiological responses, such that pathogen growth in or on hosts depends on thermal mismatches between pathogen and host thermal performance curves (TPCs). To investigate the effects of global climate change (GCC) on ectotherm host-parasite systems at many different scales, we have developed the thermal impact of parasite performance (TIPP) database, a resource now containing >1,900 TPCs of parasites and pathogens spanning a wide taxonomic range, including bacteria, fungi, helminths, and arthropods.

We have constructed the TIPP database based on the existing ectotherm performance database BioTraits (Dell et al. 2013) which largely describes predator-prey interactions. BioTraits has also informed VecTraits (https://vectorbyte.crc.nd.edu/vectraits-explorer ), another database describing traits of ectothermic disease vectors, such as mosquitoes and ticks, as a function of an environmental gradient, including, but not limited to, temperature. By combining TIPP with these databases, we can provide a comprehensive resource with the potential to inform management decisions in light of GCC-caused population declines and extinctions. We can leverage TPCs for species of interest to generate predictions of thermal performance in multiple climate change projections and feed these predictions into mechanistic species distribution models to explore how species distributions may change.

Project tasks:

1)      Automated Data Collection for TPCs

We are seeking an undergraduate student to use the current database to build an automated data collection tool for the continuing extraction of TPCs from the scientific literature as they are published. The manual extraction of TPCs from scientific literature is labor-intensive and not scalable with the continuous publication of relevant research. This tool would use Large Language Models (LLMs) to identify and extract TPC-related data from scientific papers.

2)      Enhanced Species Distribution Prediction

We are also seeking assistance with the Mechanistic Species Distribution Models (SDMs)  workflow.

First, we would like to increase the spatial and taxonomic scope of our modeling efforts.  SDMs implemented in sdm package (https://cran.r-project.org/web/packages/sdm/index.html) include a number of the most common algorithms used by ecologists. However, these open-source implementation algorithms can be computationally expensive, especially when comparing and/or averaging outputs from multiple algorithms [Kaky et al. 2020]. Using computation time estimates given by Valavi et al. 2021, the generation and subsequent ensemble of five of the most commonly used SDM techniques is roughly 25 minutes, when projecting at a regional scale and using a platform with 16GB of memory and eight CPU cores. If these models are generated over 4 separate climate change scenarios at 3 future time points, it would take roughly 5.5 hours to generate predictions for a single species at a regional scale. This is just the time to produce predictions, and does not account for precursors like occurrence cleaning. This is a hurdle that could be overcome in a variety of ways like automating the workflow so it is constantly running, cluster/parallel computing, and/or using higher performance computers.

Following this, the development of a pipeline for these models to a web dashboard/app (i.e., an R-Shiny application https://www.rstudio.com/products/shiny/) is necessary for stakeholders using these methods in decision-making for conservation. Development of this dashboard can occur simultaneously with other project tasks using a small subset of data and a limited number of SDMs which can be expanded upon in the future. R-Shiny applications utilize the R, HTML, CSS, and javascript languages for building web-based interfaces of analysis. Features of the app include selecting for certain species, displaying a choice of individual model algorithm or the ensemble model output, map manipulation, and visualizing different climate change scenarios.

[Dell et al. 2013]: Anthony I. Dell, Samraat Pawar, Van M. Savage. The thermal dependence of biological traits Ecological Archives E094-108. 2013. 

[Kaky et al. 2020] Kaky, Emad, Victoria Nolan, Abdulaziz Alatawi, and Francis Gilbert. “A comparison between Ensemble and MaxEnt species distribution modelling approaches for conservation: A case study with Egyptian medicinal plants.” Ecological Informatics 60 (2020): 101150.

[Valavi et al. 2021] Valavi, Roozbeh, Jane Elith, José J. Lahoz‐Monfort, and Gurutzeta Guillera‐Arroita. “Modelling species presence‐only data with random forests.” Ecography 44, no. 12 (2021): 1731-1742.