occTest: An integrated approach for quality control of species occurrence data

Josep M. Serra-Diaz*, Jeremy Borderieux, Brian Maitner, Coline C.F. Boonman, Daniel Park, Wen Yong Guo, Arnaud Callebaut, Brian J. Enquist, Jens C. Svenning, Cory Merow

*Corresponding author for this work

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperJournal articleResearchpeer-review

Abstract

Aim: Species occurrence data are valuable information that enables one to estimate geographical distributions, characterize niches and their evolution, and guide spatial conservation planning. Rapid increases in species occurrence data stem from increasing digitization and aggregation efforts, and citizen science initiatives. However, persistent quality issues in occurrence data can impact the accuracy of scientific findings, underscoring the importance of filtering erroneous occurrence records in biodiversity analyses. Innovation: We introduce an R package, occTest, that synthesizes a growing open-source ecosystem of biodiversity cleaning workflows to prepare occurrence data for different modelling applications. It offers a structured set of algorithms to identify potential problems with species occurrence records by employing a hierarchical organization of multiple tests. The workflow has a hierarchical structure organized in testPhases (i.e. cleaning vs. testing) that encompass different testBlocks grouping different testTypes (e.g. environmental outlier detection), which may use different testMethods (e.g. Rosner test, jacknife,etc.). Four different testBlocks characterize potential problems in geographic, environmental, human influence and temporal dimensions. Filtering and plotting functions are incorporated to facilitate the interpretation of tests. We provide examples with different data sources, with default and user-defined parameters. Compared to other available tools and workflows, occTest offers a comprehensive suite of integrated tests, and allows multiple methods associated with each test to explore consensus among data cleaning methods. It uniquely incorporates both coordinate accuracy analysis and environmental analysis of occurrence records. Furthermore, it provides a hierarchical structure to incorporate future tests yet to be developed. Main conclusions: occTest will help users understand the quality and quantity of data available before the start of data analysis, while also enabling users to filter data using either predefined rules or custom-built rules. As a result, occTest can better assess each record's appropriateness for its intended application.

Original languageEnglish
Article numbere13847
JournalGlobal Ecology and Biogeography
Volume33
Issue7
ISSN1466-822X
DOIs
Publication statusPublished - Jul 2024

Keywords

  • data cleaning
  • outlier
  • quality
  • R
  • species occurrence

Fingerprint

Dive into the research topics of 'occTest: An integrated approach for quality control of species occurrence data'. Together they form a unique fingerprint.

Cite this