TY - JOUR
T1 - Associations Between Radiation Oncologist Demographic Factors and Segmentation Similarity Benchmarks
T2 - Insights From a Crowd-Sourced Challenge Using Bayesian Estimation
AU - Wahid, Kareem A.
AU - Sahin, Onur
AU - Kundu, Suprateek
AU - Lin, Diana
AU - Alanis, Anthony
AU - Tehami, Salik
AU - Kamel, Serageldin
AU - Duke, Simon
AU - Sherer, Michael V.
AU - Rasmussen, Mathis
AU - Korreman, Stine
AU - Fuentes, David
AU - Cislo, Michael
AU - Nelms, Benjamin E.
AU - Christodouleas, John P.
AU - Murphy, James D.
AU - Mohamed, Abdallah S.R.
AU - He, Renjie
AU - Naser, Mohammed A.
AU - Gillespie, Erin F.
AU - Fuller, Clifton D.
PY - 2024/6/1
Y1 - 2024/6/1
N2 - PURPOSE: The quality of radiotherapy auto-segmentation training data, primarily derived from clinician observers, is of utmost importance. However, the factors influencing the quality of clinician-derived segmentations are poorly understood; our study aims to quantify these factors. METHODS: Organ at risk (OAR) and tumor-related segmentations provided by radiation oncologists from the Contouring Collaborative for Consensus in Radiation Oncology data set were used. Segmentations were derived from five disease sites: breast, sarcoma, head and neck (H&N), gynecologic (GYN), and GI. Segmentation quality was determined on a structure-by-structure basis by comparing the observer segmentations with an expert-derived consensus, which served as a reference standard benchmark. The Dice similarity coefficient (DSC) was primarily used as a metric for the comparisons. DSC was stratified into binary groups on the basis of structure-specific expert-derived interobserver variability (IOV) cutoffs. Generalized linear mixed-effects models using Bayesian estimation were used to investigate the association between demographic variables and the binarized DSC for each disease site. Variables with a highest density interval excluding zero were considered to substantially affect the outcome measure. RESULTS: Five hundred seventy-four, 110, 452, 112, and 48 segmentations were used for the breast, sarcoma, H&N, GYN, and GI cases, respectively. The median percentage of segmentations that crossed the expert DSC IOV cutoff when stratified by structure type was 55% and 31% for OARs and tumors, respectively. Regression analysis revealed that the structure being tumor-related had a substantial negative impact on binarized DSC for the breast, sarcoma, H&N, and GI cases. There were no recurring relationships between segmentation quality and demographic variables across the cases, with most variables demonstrating large standard deviations. CONCLUSION: Our study highlights substantial uncertainty surrounding conventionally presumed factors influencing segmentation quality relative to benchmarks.
AB - PURPOSE: The quality of radiotherapy auto-segmentation training data, primarily derived from clinician observers, is of utmost importance. However, the factors influencing the quality of clinician-derived segmentations are poorly understood; our study aims to quantify these factors. METHODS: Organ at risk (OAR) and tumor-related segmentations provided by radiation oncologists from the Contouring Collaborative for Consensus in Radiation Oncology data set were used. Segmentations were derived from five disease sites: breast, sarcoma, head and neck (H&N), gynecologic (GYN), and GI. Segmentation quality was determined on a structure-by-structure basis by comparing the observer segmentations with an expert-derived consensus, which served as a reference standard benchmark. The Dice similarity coefficient (DSC) was primarily used as a metric for the comparisons. DSC was stratified into binary groups on the basis of structure-specific expert-derived interobserver variability (IOV) cutoffs. Generalized linear mixed-effects models using Bayesian estimation were used to investigate the association between demographic variables and the binarized DSC for each disease site. Variables with a highest density interval excluding zero were considered to substantially affect the outcome measure. RESULTS: Five hundred seventy-four, 110, 452, 112, and 48 segmentations were used for the breast, sarcoma, H&N, GYN, and GI cases, respectively. The median percentage of segmentations that crossed the expert DSC IOV cutoff when stratified by structure type was 55% and 31% for OARs and tumors, respectively. Regression analysis revealed that the structure being tumor-related had a substantial negative impact on binarized DSC for the breast, sarcoma, H&N, and GI cases. There were no recurring relationships between segmentation quality and demographic variables across the cases, with most variables demonstrating large standard deviations. CONCLUSION: Our study highlights substantial uncertainty surrounding conventionally presumed factors influencing segmentation quality relative to benchmarks.
UR - http://www.scopus.com/inward/record.url?scp=85196137760&partnerID=8YFLogxK
U2 - 10.1200/CCI.23.00174
DO - 10.1200/CCI.23.00174
M3 - Journal article
C2 - 38870441
AN - SCOPUS:85196137760
SN - 2473-4276
VL - 8
SP - e2300174
JO - JCO Clinical Cancer Informatics
JF - JCO Clinical Cancer Informatics
M1 - e2300174
ER -