Embryo selection within in vitro fertilization (IVF) is the process of evaluating qualities of fertilized oocytes (embryos) and selecting the best embryo(s) available within a patient cohort for subsequent transfer or cryopreservation. In recent years, artificial intelligence (AI) has been used extensively to improve and automate the embryo ranking and selection procedure by extracting relevant information from embryo microscopy images. The AI models are evaluated based on their ability to identify the embryo(s) with the highest chance(s) of achieving a successful pregnancy. Whether such evaluations should be based on ranking performance or pregnancy prediction, however, seems to divide studies. As such, a variety of performance metrics are reported, and comparisons between studies are often made on different outcomes and data foundations. Moreover, superiority of AI methods over manual human evaluation is often claimed based on retrospective data, without any mentions of potential bias. In this paper, we provide a technical view on some of the major topics that divide how current AI models are trained, evaluated and compared. We explain and discuss the most common evaluation metrics and relate them to the two separate evaluation objectives, ranking and prediction. We also discuss when and how to compare AI models across studies and explain in detail how a selection bias is inevitable when comparing AI models against current embryo selection practice in retrospective cohort studies.
- Artificial intelligence
- Embryo selection
- Model evaluation and comparison
- Selection bias