Bridging the accuracy gap in material discovery using machine learning

  1. Bridging the accuracy gap in material discovery using machine learning

    20MAT02 / Solid-state physics
    Promotor(en): S. Cottenier, T. Verstraelen / Begeleider(s): M. Sluydts, M. Larmuseau, M. Cools-Ceuppens


    The discovery of new materials is one of the key drivers of technological innovations in many sectors. In the case of energy materials an intensive search is being undertaken to find new materials like high capacity solid-state batteries or high-temperature superconductors. Finding these materials experimentally is both time-intensive and expensive. Quantum mechanical simulations have become a crucial tool in the discovery of these materials. Using high-throughput screening, large scale databases of hypothetical materials can be created and explored for new applications. From these databases, rankings of interesting candidate materials can be extracted and used for experimental synthesis.   An example of such a database is the Materials Project, already containing  well over 100.000 materials, making it a prime candidate for data mining. There is however a catch. To create databases of this size, compromises had to be made, both in simulation method and numerical accuracy. For certain applications, the resulting errors can be unacceptable. At the same time recalculating the entire database, while potentially highly valuable, is a costly endeavor.

    To reduce this computational burden, we can use machine learning. With only a small percentage of the data, a machine learning model can be created which converts the existing low quality material properties to newly simulated high quality data. To do this in a data-efficient way we can use ‘active learning’, where we perform a live update of our machine learning model while we are performing quantum-mechanical simulations. By quantifying the uncertainty of the model across Materials Project, we can extract the next important material to create the ultimate upscaling model. Previous research has shown that this can offer at least a factor of 10 speedup.


    In this thesis we will build machine learning models able to convert simulated materials properties from low quality methods. The first step in the process will be to build a new high quality dataset using active learning. Using this data an incrementally improving machine learning model can then be created. One of the challenges will be to create a model generic enough to handle the wide variety of materials in Materials Project. Including information on the crystal geometry in an efficient way, using for instance deep learning, is one of the key challenges in this process.

    Key points:

    • Create a new high quality dataset using high-throughput quantum mechanical simulations.
    • Optimize the sampling strategy of your simulations using uncertainty quantification and active learning. In-house software is available.
    • Create upscaling machine learning models to bride the gap between low and high quality simulation methods
    • Explore the importance of geometrical information to maximize the applicability of the models

    Interesting literature:

    1. Gu, G. H., Noh, J., Kim, I. & Jung, Y. Machine learning for renewable energy materials. Journal of Materials Chemistry A 7, (2019).
    2. Lookman, T., Balachandran, P. V., Xue, D. & Yuan, R. Active learning in materials science with emphasis on adaptive sampling using uncertainties for targeted design. npj Comput. Mater. 5, (2019).
  1. Study programme
    Master of Science in Engineering Physics [EMPHYS], Master of Science in Sustainable Materials Engineering [EMMAEN], Master of Science in Physics and Astronomy [CMFYST]
    machine learning, quantum mechanical simulations, uncertainty quantification


Stefaan Cottenier
Toon Verstraelen