TY - JOUR
T1 - occTest
T2 - An integrated approach for quality control of species occurrence data
AU - Serra-Diaz, Josep M.
AU - Borderieux, Jeremy
AU - Maitner, Brian
AU - Boonman, Coline C.F.
AU - Park, Daniel
AU - Guo, Wen Yong
AU - Callebaut, Arnaud
AU - Enquist, Brian J.
AU - Svenning, Jens C.
AU - Merow, Cory
N1 - Publisher Copyright:
© 2024 John Wiley & Sons Ltd.
PY - 2024/7
Y1 - 2024/7
N2 - Aim: Species occurrence data are valuable information that enables one to estimate geographical distributions, characterize niches and their evolution, and guide spatial conservation planning. Rapid increases in species occurrence data stem from increasing digitization and aggregation efforts, and citizen science initiatives. However, persistent quality issues in occurrence data can impact the accuracy of scientific findings, underscoring the importance of filtering erroneous occurrence records in biodiversity analyses. Innovation: We introduce an R package, occTest, that synthesizes a growing open-source ecosystem of biodiversity cleaning workflows to prepare occurrence data for different modelling applications. It offers a structured set of algorithms to identify potential problems with species occurrence records by employing a hierarchical organization of multiple tests. The workflow has a hierarchical structure organized in testPhases (i.e. cleaning vs. testing) that encompass different testBlocks grouping different testTypes (e.g. environmental outlier detection), which may use different testMethods (e.g. Rosner test, jacknife,etc.). Four different testBlocks characterize potential problems in geographic, environmental, human influence and temporal dimensions. Filtering and plotting functions are incorporated to facilitate the interpretation of tests. We provide examples with different data sources, with default and user-defined parameters. Compared to other available tools and workflows, occTest offers a comprehensive suite of integrated tests, and allows multiple methods associated with each test to explore consensus among data cleaning methods. It uniquely incorporates both coordinate accuracy analysis and environmental analysis of occurrence records. Furthermore, it provides a hierarchical structure to incorporate future tests yet to be developed. Main conclusions: occTest will help users understand the quality and quantity of data available before the start of data analysis, while also enabling users to filter data using either predefined rules or custom-built rules. As a result, occTest can better assess each record's appropriateness for its intended application.
AB - Aim: Species occurrence data are valuable information that enables one to estimate geographical distributions, characterize niches and their evolution, and guide spatial conservation planning. Rapid increases in species occurrence data stem from increasing digitization and aggregation efforts, and citizen science initiatives. However, persistent quality issues in occurrence data can impact the accuracy of scientific findings, underscoring the importance of filtering erroneous occurrence records in biodiversity analyses. Innovation: We introduce an R package, occTest, that synthesizes a growing open-source ecosystem of biodiversity cleaning workflows to prepare occurrence data for different modelling applications. It offers a structured set of algorithms to identify potential problems with species occurrence records by employing a hierarchical organization of multiple tests. The workflow has a hierarchical structure organized in testPhases (i.e. cleaning vs. testing) that encompass different testBlocks grouping different testTypes (e.g. environmental outlier detection), which may use different testMethods (e.g. Rosner test, jacknife,etc.). Four different testBlocks characterize potential problems in geographic, environmental, human influence and temporal dimensions. Filtering and plotting functions are incorporated to facilitate the interpretation of tests. We provide examples with different data sources, with default and user-defined parameters. Compared to other available tools and workflows, occTest offers a comprehensive suite of integrated tests, and allows multiple methods associated with each test to explore consensus among data cleaning methods. It uniquely incorporates both coordinate accuracy analysis and environmental analysis of occurrence records. Furthermore, it provides a hierarchical structure to incorporate future tests yet to be developed. Main conclusions: occTest will help users understand the quality and quantity of data available before the start of data analysis, while also enabling users to filter data using either predefined rules or custom-built rules. As a result, occTest can better assess each record's appropriateness for its intended application.
KW - data cleaning
KW - outlier
KW - quality
KW - R
KW - species occurrence
UR - http://www.scopus.com/inward/record.url?scp=85190790056&partnerID=8YFLogxK
U2 - 10.1111/geb.13847
DO - 10.1111/geb.13847
M3 - Journal article
AN - SCOPUS:85190790056
SN - 1466-822X
VL - 33
JO - Global Ecology and Biogeography
JF - Global Ecology and Biogeography
IS - 7
M1 - e13847
ER -