TY - JOUR
T1 - Turning Biases into Hypotheses through Method: A Logic of Scientific Discovery for Machine Learning
AU - Enni, Simon
AU - Bak Herrie, Maja
PY - 2021/1
Y1 - 2021/1
N2 - Machine learning (ML) systems have shown great potential for performing or supporting inferential reasoning through analyzing large data sets, thereby potentially facilitating more informed decision-making. However, a hindrance to such use of ML systems is that the predictive models created through ML are often complex, opaque, and poorly understood, even if the programs “learning” the models are simple, transparent, and well understood. ML models become difficult to trust, since lay-people, specialists, and even researchers have difficulties gauging the reasonableness, correctness, and reliability of the inferences performed. In this article, we argue that bridging this gap in the understanding of ML models and their reasonableness requires a focus on developing an improved methodology for their creation. This process has been likened to “alchemy” and criticized for involving a large degree of “black art,” owing to its reliance on poorly understood “best practices”. We soften this critique and argue that the seeming arbitrariness often is the result of a lack of explicit hypothesizing stemming from an empiricist and myopic focus on optimizing for predictive performance rather than from an occult or mystical process. We present some of the problems resulting from the excessive focus on optimizing generalization performance at the cost of hypothesizing about the selection of data and biases. We suggest embedding ML in a general logic of scientific discovery similar to the one presented by Charles Sanders Peirce, and present a recontextualized version of Peirce’s scientific hypothesis adjusted to ML.
AB - Machine learning (ML) systems have shown great potential for performing or supporting inferential reasoning through analyzing large data sets, thereby potentially facilitating more informed decision-making. However, a hindrance to such use of ML systems is that the predictive models created through ML are often complex, opaque, and poorly understood, even if the programs “learning” the models are simple, transparent, and well understood. ML models become difficult to trust, since lay-people, specialists, and even researchers have difficulties gauging the reasonableness, correctness, and reliability of the inferences performed. In this article, we argue that bridging this gap in the understanding of ML models and their reasonableness requires a focus on developing an improved methodology for their creation. This process has been likened to “alchemy” and criticized for involving a large degree of “black art,” owing to its reliance on poorly understood “best practices”. We soften this critique and argue that the seeming arbitrariness often is the result of a lack of explicit hypothesizing stemming from an empiricist and myopic focus on optimizing for predictive performance rather than from an occult or mystical process. We present some of the problems resulting from the excessive focus on optimizing generalization performance at the cost of hypothesizing about the selection of data and biases. We suggest embedding ML in a general logic of scientific discovery similar to the one presented by Charles Sanders Peirce, and present a recontextualized version of Peirce’s scientific hypothesis adjusted to ML.
KW - Charles Sanders Peirce
KW - Machine learning
KW - artificial intelligence
KW - hypothesis
KW - inferential reasoning
KW - scientific methodology
UR - http://www.scopus.com/inward/record.url?scp=85106973377&partnerID=8YFLogxK
U2 - 10.1177/20539517211020775
DO - 10.1177/20539517211020775
M3 - Journal article
SN - 2053-9517
VL - 8
JO - Big Data & Society
JF - Big Data & Society
IS - 1
ER -