Developments in machine and deep learning for facies classification
In my last article, I revisited the SEG machine learning contest that was launched in the October 2016 issue of the Leading Edge. Since then, it has been amazing to see the quality and quantity of projects that have used the contest as inspiration for their own work. Students, professionals, researchers and enthusiasts from all over the world have continued to develop the ideas generated during the contest, applying new technologies and geological insight to explore how machine learning can be used to create useful innovative tools for subsurface science. In this article I want to review some of this work. If you are interested in this topic or want to get started in machine learning for subsurface data, the projects described below cover a wide selection of machine learning approaches to facies classification using the same dataset.
Before jumping in to the recent work, it’s worthwhile to take a look at the project that originated the data in the first place. The Hugoton Asset Management Project was a consortium run by the folks at the Kansas Geological Survey (KGS). This was a large project led by Martin Dubois and Tim Carr at KGS that used machine learning and data driven techniques to generate models to study the largest gas field in North America. The final report describes all aspects of this integrated effort. They proposed a data driven workflow for generating geologic models and identifying lithofacies using machine learning. For example, Bohling and Dubois used machine learning and hidden Markov Models to generate realistic realizations of geomodels in this area.
Dubois, Bohling and Chakrabarti provide a survey of the machine learning approaches that were explored during the project. They also give an overview of the dataset, and describe the reasoning behind the original facies classification scheme. The compare using Bayesian methods, k-nearest neighbors, fuzzy logic and an artificial neural network (ANN). The ANN tested was relatively small by today’s standards, consisting of a single hidden layer with 50 nodes. The ANN outperformed the other methods though, perhaps foreshadowing what was to come.
Fast forward 15 years, many people have approached the same problem using new tools and techniques. Most of the innovation has focused on two important aspects of machine learning: feature engineering and model design.
Feature engineering is using knowledge of the problem domain (like geology and petrophysics) or otherwise, to generate new features that improve the accuracy of the model results. The subsurface doesn’t behave randomly - there are physical processes that cause rocks to have their distinct character. Bestigini, Lipari and Tubaro used geologic insight to develop a set of augmented features for classifying well log data that were adopted by the top teams in the contest. In addition to including non-linear features, they computed gradients of each of the well log measurements. This indicates how the rock properties are changing with depth, which may suggest the environment of deposition. This feature lets the machine learning algorithm see the neighborhood of a sample, giving it contextual information it can use to discriminate between rock types. You can see the code for computing these augmented features in their contest entry notebook.
In a similar vein, Chen and Zeng demonstrated that using petrophysical features computed from the base well logs could improve classifier performance. Taking inspiration from Archie’s equation, they added the log ratio of two measurements, resistivity and neutron porosity, to the feature set, which improved the accuracy of their predictions. Marcelo Guarido noticed circular patterns when cross-plotting certain pairs of features. He computed the polar coordinates for these feature pairs providing a new feature that is easier to separate into distinct classes. This demonstrates how visualizing the relationships in your data can provide intuition into features that may improve understanding.
Machine learning model design has undergone a renaissance in recent years. This has largely been driven by the excitement around deep learning, as well as the availability of libraries that make it easier for anyone to experiment and try things out. A number of people have been inspired to report on the results using different models with the contest data.
Lukas Mosser and Alfredo de la Fuente Briceño demonstrated the effectiveness of the XGBoost model (as well as feature engineering) by winning the contest (check out their winning notebook). XGBoost is a decision-tree-based algorithm that has proven very effective with small to medium sized datasets. It was used by all the top teams in the contest. Licheng Zhang reported on using XGBoost for facies classification at a SEG conference in China. Recently, Mandal and Rezaee compared four different classifiers to the dataset, with an ANN achieving the best result.
A number of groups have investigated how deep learning can be applied to this challenge. Deep models, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) can learn to recognize important features in the data, reducing the need for manual feature engineering. Deep models have many parameters however, and require very large datasets for training (there are only 9 wells in the dataset). Tschannen et al. describe how to build a CNN that can be used for facies prediction. They modified the architecture using inception modules, and the code used in the paper can be found in their GitHub repo.
Imamverdiyev and Sukhostat have used the dataset as a benchmark to compare a custom 1D CNN with RNN and long short-term memory (LSTM) networks. Their CNN design outperformed the other models. Jiajun Jiang explored using LSTMs in more detail. These models treat the well logs as time series, and can learn to recognize patterns in the logs (such as cyclical deposition patterns) for improved facies prediction. Ben Lasscock has posted how LSTMs provide a prediction result with less unphysical noise, by incorporating more contextual information (see his notebook for the code).
Figure 1. Schematic of the CNN proposed by Tschannen et al (2017).
Model stacking is a technique that combines the predictions of different models into a single, improved prediction. Shashank and Mahpatra showed how a different facies classification models could be combined to give improve predictions. Ben Lasscock wrote a post about model stacking, and posted the code on GitHub.
The work above are all examples of supervised learning approaches. At SEG 2019, Dunham and Welford and Gkortsas and Liang reported on using unsupervised and semi-supervised methods. These techniques are useful when the amount of labeled data is limited, which is often the case in subsurface science. Dunham, Malcolm and Welford further explored semi-supervised label propagation and self-training, showing that these approaches can exceed the performance of supervised methods when the amount of labeled data is limited.
An astonishing amount of learning innovation has come from ONLY 9 well logs with core from a specific region in Southwest Kansas. Similar datasets from other regions would help us understand the challenges of predicting facies in other environments of deposition. Unintentionally, this dataset as become the defacto benchmark for building and testing facies classification models. We desperately need more datasets from other environments of deposition to investigate the challenges of predicting facies of different reservoir types such as shale and carbonates. Continued innovation depends on the availability of quality benchmark datasets.
Figure 2. The Kansas dataset and SEG contest continue to drive innovation. The color map used in the original tutorial can be found in papers, hackathons, conferences, product demos, even t-shirts!
Having demonstrated the unreasonable effectiveness of these nine well logs from Kansas, we encourage those in our community that can contribute data like this, to do so. Have you built your own machine learning models, and continued to make improvements? Have you found some open data, and cleaned and prepared it for machine learning? Please share your results and what you discovered. Perhaps you can write a tutorial demonstrating how to use your model or even better, share your code on GitHub.
Senior Geophysicist at WintershallDea
4yThank you! Highly inspiring!
Founder, Tech Connect Alberta I Helping Tech Job Seekers I Al & Cloud Advocate Event Organizer I Community Builder
5yThis is great, thank you for sharing it.
Data Scientist | Computer Scientist | Chemical Engineering Ph.D. |
5yIs there any more information on how they would like to use these results of facies classification and whether F-1 is the right metric? Sometimes a false positive is worse than a false negative.
CEO at RadiXplore | Geoscientist | Data scientist
5yI have a Well data set that I have prepared for the Canning Basin which I used to estimate TOC (regression problem). I've been meaning to do a write up to get others started.
Mother Earth citizen, geoscientist, P.Geo.
5yA great sequel to your initial article Brendon!