Aarhus University Seal

Can unsupervised learning methods applied to milk recording big data provide new insights into dairy cow health?

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperJournal articleResearchpeer-review


  • S. Franceschini, University of Liege, Belgium
  • Clement Grelet, Centre wallon de recherches agronomiques, Belgium
  • J. Leblois, Walloon Breeders Association Group (Elevéo by Awé groupe), 5590 Ciney, Belgium, Belgium
  • Nicolas Gengler, University of Liege, Belgium
  • H. Soyeurt, University of Liege, Belgium
  • GplusE Consortium, Genotype Plus Environment Consortium (www.gpluse.eu)

Among the dairy sector's current concerns, the assessment of global animal health status is a complex challenge. Its multidimensionality means that global monitoring tools are rarely considered. Instead, specific disease detection is often studied separately and, due to financial and ethical issues, uses small-scale data sets focusing on few biomarkers. Several studies have already been conducted using milk Fourier transform mid-infrared (FT-MIR) spectroscopy to detect mastitis and lameness or to quantify health-related biomarkers in milk or blood. Those studies are relevant but they focus mainly on one biomarker or disease. To solve this issue and the small-scale data set, in this study, we proposed a holistic approach using big data obtained from milk recording, including milk yield, somatic cell count, and 27 FT-MIR–based predictors related to milk composition and animal health status. Using 740,454 records collected from 114,536 first-parity Holstein cows in southern Belgium, we performed repeated unsupervised learning algorithms based on Ward's agglomerative hierarchical clustering method to find potential interesting patterns. A divide-and-conquer approach was used to overcome the limitation of computational resources in clustering a relatively large data set. Five groups of records were identified. Differences observed in the fourth group suggested a relationship to metabolic disorders. The fifth group seemed to be related to mastitis. In a second step, we performed a partial least squares discriminant analysis (PLS-DA) to predict the probability of belonging to those specific groups for the entire data set. The obtained global accuracy was 0.77 and the balanced accuracy (i.e., the mean between sensitivity and specificity) of discriminating the fourth and fifth groups was 0.88 and 0.96, respectively. Then, a validation of the interpretation of those groups was performed using 204 milk and blood reference records. The predicted probability associated with the metabolic disorders issue had significant correlations of 0.54 with blood β-hydroxybutyrate, 0.44 with blood nonesterified fatty acids, −0.32 with blood glucose, −0.23 with milk glucose-6-phosphate, and 0.38 with milk isocitrate. In contrast, the predicted probability of belonging to the mastitis group had correlations of 0.69 with milk lactate dehydrogenase, 0.46 with milk N-acetyl-β-D-glucosaminidase, −0.18 with milk free glucose, and 0.16 with milk glucose-6-phosphate. Consequently, these results suggest that the obtained quantitative traits indirectly reflect some of the main health disorders in dairy farming and could be used to monitor dairy cows on a large scale. By using unsupervised learning on large-scale milk recording data and then validating the pattern using reference laboratory measures, we propose a new approach to quickly assess dairy cow health status.

Original languageEnglish
JournalJournal of Dairy Science
Pages (from-to)6760-6772
Number of pages13
Publication statusPublished - Aug 2022

    Research areas

  • big data, animal health, unsupervised learning, milk, mid-infrared, Lactation, Big Data, Cattle Diseases, Pregnancy, Unsupervised Machine Learning, Animals, Cattle, Biomarkers, Mastitis/veterinary, Female, Glucose-6-Phosphate

See relations at Aarhus University Citationformats

ID: 279528205