TY - JOUR
T1 - Larynx cancer survival model developed through open-source federated learning
AU - Rønn Hansen, Christian
AU - Price, Gareth
AU - Field, Matthew
AU - Sarup, Nis
AU - Zukauskaite, Ruta
AU - Johansen, Jørgen
AU - Eriksen, Jesper Grau
AU - Aly, Farhannah
AU - McPartlin, Andrew
AU - Holloway, Lois
AU - Thwaites, David
AU - Brink, Carsten
N1 - Publisher Copyright:
© 2022 The Authors
PY - 2022/11
Y1 - 2022/11
N2 - Introduction: Federated learning has the potential to perfrom analysis on decentralised data; however, there are some obstacles to survival analyses as there is a risk of data leakage. This study demonstrates how to perform a stratified Cox regression survival analysis specifically designed to avoid data leakage using federated learning on larynx cancer patients from centres in three different countries. Methods: Data were obtained from 1821 larynx cancer patients treated with radiotherapy in three centres. Tumour volume was available for all 786 of the included patients. Parameter selection among eleven clinical and radiotherapy parameters were performed using best subset selection and cross-validation through the federated learning system, AusCAT. After parameter selection, β regression coefficients were estimated using bootstrap. Calibration plots were generated at 2 and 5-years survival, and inner and outer risk groups’ Kaplan-Meier curves were compared to the Cox model prediction. Results: The best performing Cox model included log(GTV), performance status, age, smoking, haemoglobin and N-classification; however, the simplest model with similar statistical prediction power included log(GTV) and performance status only. The Harrell C-indices for the simplest model were for Odense, Christie and Liverpool 0.75[0.71–0.78], 0.65[0.59–0.71], and 0.69[0.59–0.77], respectively. The values are slightly higher for the full model with C-index 0.77[0.74–0.80], 0.67[0.62–0.73] and 0.71[0.61–0.80], respectively. Smoking during treatment has the same hazard as a ten-years older nonsmoking patient. Conclusion: Without any patient-specific data leaving the hospitals, a stratified Cox regression model based on data from centres in three countries was developed without data leakage risks. The overall survival model is primarily driven by tumour volume and performance status.
AB - Introduction: Federated learning has the potential to perfrom analysis on decentralised data; however, there are some obstacles to survival analyses as there is a risk of data leakage. This study demonstrates how to perform a stratified Cox regression survival analysis specifically designed to avoid data leakage using federated learning on larynx cancer patients from centres in three different countries. Methods: Data were obtained from 1821 larynx cancer patients treated with radiotherapy in three centres. Tumour volume was available for all 786 of the included patients. Parameter selection among eleven clinical and radiotherapy parameters were performed using best subset selection and cross-validation through the federated learning system, AusCAT. After parameter selection, β regression coefficients were estimated using bootstrap. Calibration plots were generated at 2 and 5-years survival, and inner and outer risk groups’ Kaplan-Meier curves were compared to the Cox model prediction. Results: The best performing Cox model included log(GTV), performance status, age, smoking, haemoglobin and N-classification; however, the simplest model with similar statistical prediction power included log(GTV) and performance status only. The Harrell C-indices for the simplest model were for Odense, Christie and Liverpool 0.75[0.71–0.78], 0.65[0.59–0.71], and 0.69[0.59–0.77], respectively. The values are slightly higher for the full model with C-index 0.77[0.74–0.80], 0.67[0.62–0.73] and 0.71[0.61–0.80], respectively. Smoking during treatment has the same hazard as a ten-years older nonsmoking patient. Conclusion: Without any patient-specific data leaving the hospitals, a stratified Cox regression model based on data from centres in three countries was developed without data leakage risks. The overall survival model is primarily driven by tumour volume and performance status.
KW - Cox survival model
KW - Data leakage
KW - Distributed learning
KW - Federated learning
KW - Larynx cancer
KW - Stratified Cox model
UR - http://www.scopus.com/inward/record.url?scp=85140718735&partnerID=8YFLogxK
U2 - 10.1016/j.radonc.2022.09.023
DO - 10.1016/j.radonc.2022.09.023
M3 - Journal article
C2 - 36208652
AN - SCOPUS:85140718735
SN - 0167-8140
VL - 176
SP - 179
EP - 186
JO - Radiotherapy and Oncology
JF - Radiotherapy and Oncology
ER -