Overcoming the barriers: selecting collective variables via machine learning to efficiently construct free energy surfaces

  1. Overcoming the barriers: selecting collective variables via machine learning to efficiently construct free energy surfaces

    20NANO07 / Nanoporous materials - catalysis
    Promotor(en): V. Van Speybroeck / Begeleider(s): K. Dedecker


    Metal-organic frameworks (MOFs) are an interesting class of nanoporous materials with applications in gas storage, gas separation, catalysis, drug delivery, and more. These hybrid materials consist of metal-oxide bricks interconnected by organic linkers, which results in an ordered framework with nanosized pores. Some of these materials show the ability to structurally transform between different crystalline phases, further increasing their potential for new applications. However, to convert the immense potential of MOFs into real-life applications, we need to understand the thermodynamic factors driving this structural flexibility. This is provided by knowledge of the underlying free energy surface of the material. Since we cannot construct such a surface based on experiments, we need to rely on computational modeling.

    For this reason, molecular dynamics (MD) simulations are typically performed to sample the free energy surface. However, most often the interesting phase transformations occur on long time scales compared to what is accessible for regular MD. For this reason, we will apply enhanced sampling MD methods which improve the sampling of the free energy surface in certain well-chosen directions that drive the transformations: the so-called collective variables. This set of variables should be as small as possible to reduce computational effort, but should be sufficiently large to cover all essential information on the transition. The importance of choosing suitable collective variables is highlighted by the following figure.

    In the figure above we introduce a hypothetical (and unknown) free energy surface where we want to sample the A-to-B transition. If we would start from valley A and bias the sampling solely in the direction of X, which corresponds to X driving a car from A, the final state B would never be reached, even though X can discriminate between the initial and final states. This results in erroneous one dimensional free energy profile as a function of X. The same holds for Y. To reach valley B, we need to identify a collective variable Q that can sample the transition of interest, which is a non-trivial task. In our analogy, this means that Q should be able to drive the phase transformation, resulting in the only correct free energy profile.

    However vital the selection of a small yet adequate set of collective variables may be, there are no clear selection rules available to this date, and collective variables are selected based mainly on physical insight and experimental observations. This thesis aims to tackle this problem by systematic application of machine learning techniques to identify an essential set of variables for various MOFs with different types of flexibility.


    In a first step, the thesis student will obtain insight in the problem by applying dimensionality reduction machine learning techniques on a reference MOF with a topological flexibility: CoBDP [1]. The flexibility of this material can be attributed to a single collective variable: the unit cell volume, making this material suitable as a benchmark for newly proposed machine learning methods. These machine learning methods require trajectory data that cover the relevant transitions, which can be obtained for instance from computationally expensive molecular dynamics simulations at increased temperatures (to ease the transitions). A first interesting machine learning technique to be tested is the linear time-lagged independent component analysis (tICA) [2], which can be seen as an extension of the well-known principal component analysis technique (see figure below).

    The main drawback of this and other linear techniques is that the user has to identify all relevant input variables (features) for the algorithm. For this reason, we will also look into more recently developed non-linear techniques, where non-linear combinations of the Cartesian input coordinates are selected automatically. Within the class of non-linear unsupervised machine learning techniques, we will focus on state-of-the-art methods like time-lagged auto-encoders [3] or neural networks. The goal of this thesis is to enable – in an automated fashion – the selection of suitable collective variables for materials that display different types of flexibility. In this respect, we will study materials with a topological flexibility, like CoBDP and DMOF-1(Zn), as well as materials with a linker flexibility, as observed for instance in DUT-49(Cu). For the latter material, the unit cell volume is presumably not the only important variable, and other collective variables are yet to be identified.

    The quality of the selected variables will be checked by error analysis of the resulting free energy surfaces and comparison with experimental results. The student will be actively coached to make him/her acquainted with the advanced simulations techniques early in the thesis year, and to transfer necessary programming skills needed to perform the research.