TY - JOUR
T1 - SentemQC - A novel and cost-efficient method for quality assurance and quality control of high-resolution frequency sensor data in fresh waters
AU - van't Veen, Sofie Gyritia Madsen
AU - Kronvang, Brian
AU - Audet, Joachim
AU - Davidson, Thomas Alexander
AU - Jeppesen, Erik
AU - Kristensen, Esben Astrup
AU - Larsen, Søren Erik
AU - Laugesen, Jane R.
AU - Levi, Eti Ester
AU - Nielsen, Anders
AU - Andersen, Peter Mejlhede
PY - 2024/11/7
Y1 - 2024/11/7
N2 - The growing use of sensors in fresh waters for water quality measurements generates an increasingly large amount of data that requires quality assurance (QA)/quality control (QC) before the results can be exploited. Such a process is often resource-intensive and may not be consistent across users and sensors. SentemQC (QA-QC of high temporal resolution sensor data) is a cost-efficient, and open-source Python approach developed to ensure the quality of sensor data by performing data QA and QC on large volumes of high-frequency (HF) sensor data. The SentemQC method is computationally efficient and features a six-step user-friendly setup for anomaly detection. The method marks anomalies in data using five moving windows. These windows connect each data point to neighboring points, including those further away in the moving window. As a result, the method can mark not only individual outliers but also clusters of anomalies. Our analysis shows that the method is robust for detecting anomalies in HF sensor data from multiple water quality sensors measuring nitrate, turbidity, oxygen, and pH. The sensors were installed in three different freshwater ecosystems (two streams and one lake) and experimental lake mesocosms. Sensor data from the stream stations yielded anomaly percentages of 0.1%, 0.1%, and 0.2%, which were lower than the anomaly percentages of 0.5%, 0.6%, and 0.8% for the sensors in Lake and mesocosms, respectively. While the sensors in this study contained relatively few anomalies (<2%), they may represent a best-case scenario in terms of use and maintenance. SentemQC allows the user to include the individual sensor uncertainty/accuracy when performing QA-QC. However, SentemQC cannot function independently. Additional QA-QC steps are crucial, including calibration of the sensor data to correct for zero offsets and implementation of gap-filling methods prior to the use of the sensor data for determination of final real-time concentrations and load calculations.
AB - The growing use of sensors in fresh waters for water quality measurements generates an increasingly large amount of data that requires quality assurance (QA)/quality control (QC) before the results can be exploited. Such a process is often resource-intensive and may not be consistent across users and sensors. SentemQC (QA-QC of high temporal resolution sensor data) is a cost-efficient, and open-source Python approach developed to ensure the quality of sensor data by performing data QA and QC on large volumes of high-frequency (HF) sensor data. The SentemQC method is computationally efficient and features a six-step user-friendly setup for anomaly detection. The method marks anomalies in data using five moving windows. These windows connect each data point to neighboring points, including those further away in the moving window. As a result, the method can mark not only individual outliers but also clusters of anomalies. Our analysis shows that the method is robust for detecting anomalies in HF sensor data from multiple water quality sensors measuring nitrate, turbidity, oxygen, and pH. The sensors were installed in three different freshwater ecosystems (two streams and one lake) and experimental lake mesocosms. Sensor data from the stream stations yielded anomaly percentages of 0.1%, 0.1%, and 0.2%, which were lower than the anomaly percentages of 0.5%, 0.6%, and 0.8% for the sensors in Lake and mesocosms, respectively. While the sensors in this study contained relatively few anomalies (<2%), they may represent a best-case scenario in terms of use and maintenance. SentemQC allows the user to include the individual sensor uncertainty/accuracy when performing QA-QC. However, SentemQC cannot function independently. Additional QA-QC steps are crucial, including calibration of the sensor data to correct for zero offsets and implementation of gap-filling methods prior to the use of the sensor data for determination of final real-time concentrations and load calculations.
KW - SentemQC
KW - high-frequency data
KW - python tool
KW - quality assurance
KW - quality control
KW - sensor data
UR - http://www.scopus.com/inward/record.url?scp=85217503043&partnerID=8YFLogxK
U2 - 10.12688/openreseurope.18134.1
DO - 10.12688/openreseurope.18134.1
M3 - Journal article
SN - 2732-5121
VL - 4
JO - Open Research Europe
JF - Open Research Europe
IS - 244
M1 - 244
ER -