TY - JOUR
T1 - Predicting phenotypes from genetic, environment, management, and historical data using CNNs
AU - Washburn, Jacob D.
AU - Cimen, Emre
AU - Ramstein, Guillaume
AU - Reeves, Timothy
AU - O’Briant, Patrick
AU - McLean, Greg
AU - Cooper, Mark
AU - Hammer, Graeme
AU - Buckler, Edward S.
PY - 2021/12
Y1 - 2021/12
N2 - Key Message: Convolutional Neural Networks (CNNs) can perform similarly or better than standard genomic prediction methods when sufficient genetic, environmental, and management data are provided. Abstract: Predicting phenotypes from genetic (G), environmental (E), and management (M) conditions is a long-standing challenge with implications to agriculture, medicine, and conservation. Most methods reduce the factors in a dataset (feature engineering) in a subjective and potentially oversimplified manner. Deep neural networks such as Multilayer Perceptrons (MPL) and Convolutional Neural Networks (CNN) can overcome this by allowing the data itself to determine which factors are most important. CNN models were developed for predicting agronomic yield from a combination of replicated trials and historical yield survey data. The results were more accurate than standard methods when tested on held-out G, E, and M data (r = 0.50 vs. r = 0.43), and performed slightly worse than standard methods when only G was held out (r = 0.74 vs. r = 0.80). Pre-training on historical data increased accuracy compared to trial data alone. Saliency map analysis indicated the CNN has “learned” to prioritize many factors of known agricultural importance.
AB - Key Message: Convolutional Neural Networks (CNNs) can perform similarly or better than standard genomic prediction methods when sufficient genetic, environmental, and management data are provided. Abstract: Predicting phenotypes from genetic (G), environmental (E), and management (M) conditions is a long-standing challenge with implications to agriculture, medicine, and conservation. Most methods reduce the factors in a dataset (feature engineering) in a subjective and potentially oversimplified manner. Deep neural networks such as Multilayer Perceptrons (MPL) and Convolutional Neural Networks (CNN) can overcome this by allowing the data itself to determine which factors are most important. CNN models were developed for predicting agronomic yield from a combination of replicated trials and historical yield survey data. The results were more accurate than standard methods when tested on held-out G, E, and M data (r = 0.50 vs. r = 0.43), and performed slightly worse than standard methods when only G was held out (r = 0.74 vs. r = 0.80). Pre-training on historical data increased accuracy compared to trial data alone. Saliency map analysis indicated the CNN has “learned” to prioritize many factors of known agricultural importance.
UR - http://www.scopus.com/inward/record.url?scp=85113598473&partnerID=8YFLogxK
U2 - 10.1007/s00122-021-03943-7
DO - 10.1007/s00122-021-03943-7
M3 - Journal article
C2 - 34448888
AN - SCOPUS:85113598473
SN - 0040-5752
VL - 134
SP - 3997
EP - 4011
JO - Theoretical and Applied Genetics
JF - Theoretical and Applied Genetics
IS - 12
ER -