Aarhus University Seal / Aarhus Universitets segl

Imputation of missing well log data by random forest and its uncertainty analysis

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperJournal articleResearchpeer-review

Well logs are commonly used by geoscientists to infer and extrapolate physical properties of subsurface rocks. However, at some depth intervals, well log values might be missing due to operational issues in the logging process. To overcome this problem, an innovative approach to reconstruct well logs is proposed using machine learning methods. Based on other complete logging features, the missing well log values are predicted by data-driven machine learning algorithms, namely random forest. A grid-searching scheme is applied to find a combination of hyper-parameters for the best cross-validation score. During the training process, the relative importance of different input features is analysed to remove weakly sensitive measurements and prioritize data with strong correlation with the target variables. Principal component analysis is applied to explore the multicollinearity in the input features, such that only few principal components in the new data vector are used to represent a large fraction of the variance in the original data. To quantify the uncertainty in the predictions, a quantile regression tree is used for determining prediction intervals. Well log data from the Volve Field are used for validation of the prediction obtained by random forest, in which a high correlation coefficient between prediction and reference is achieved. The prediction intervals of different percentiles are estimated, and show more accurate results at depth points where a small range of the prediction intervals exists.

Original languageEnglish
Article number104763
JournalComputers and Geosciences
Volume152
ISSN0098-3004
DOIs
Publication statusPublished - Jul 2021

Bibliographical note

Publisher Copyright:
© 2021 Elsevier Ltd

    Research areas

  • Feature importance, Log imputation, Prediction interval, Random forest

See relations at Aarhus University Citationformats

ID: 220758240