Specialization Topic

AI School - 2025

October 6^th – 9^th, AI School in Leipzig by IOM e.V.

Specialization Topic

1. Machine learning interatomic potentials for atomistic simulations

Omid Shayestehpour
(Universität Paderborn)

This course provides an introduction to machine learning interatomic potentials (MLIPs), aimed at students and researchers interested in combining atomistic simulations with modern machine learning techniques. We begin with an overview of atomistic simulations, setting the stage for understanding how ML models can be used to approximate potential energy surfaces with high accuracy and efficiency.

Participants will then be introduced to the theoretical foundations of MLIPs, including the principles behind their construction and application. We will explore a hands-on toy example, demonstrating how both neural networks and Gaussian process regression can be used to learn a simple potential energy surface from data.

Building on these fundamentals, the workshop will introduce graph neural networks (GNNs) and their advantages for learning interatomic interactions in complex systems. Finally, we will apply these concepts in a real-world example, where participants will train GNN-based interatomic potentials for a material system and use the trained model to perform molecular dynamics simulations.

2. Chemometrics and Data Science: obtaining information from photonic data

Thomas Bocklitz
(Leibniz IPHT)

Photonic technologies provide powerful, non-destructive measurement capabilities with minimal sample preparation, making them essential for a wide range of research and clinical applications. However, interpreting raw data from label-free measurement techniques can be challenging due to its complexity. In the first part of this course, we will explore how chemometrics and machine learning can be used to extract valuable insights from spectroscopic and imaging data. Using Raman spectroscopy as a key example, we will examine challenges related to experimental design and device-to-device comparability. Besides chemometrics for spectroscopic data, the interpretability of deep learning models utilized for multimodal nonlinear imaging methods will be discussed.

After the introductory section, a hands-on tutorial will offer practical experience with machine learning for analyzing photonic data. Participants will learn how to preprocess data and apply both unsupervised and supervised analysis methods. Such important issues as standardization of data, reliable model evaluation, interpretability and reliability of models for applying on unseen data will be explored during the practical part. The tutorial will be conducted in Python / Jupyter notebooks, with all necessary materials made available through Git.

3. Gaussian processes and Bayesian Optimization

Martin Rudolph
(Leibniz IOM)

Gaussian Processes (GPs) offer a flexible, non-parametric approach to modeling a response surface from input data with the goal of predicting the response for unseen inputs. They inherently provide a quantification of uncertainty in the response, which is variable across the input space and naturally increases in regions with no data. This makes them especially valuable in fields where collecting data is costly and predictions must be made on sparse data sets. A GP naturally indicates those regions with limited data coverage by a larger uncertainty. A second key characteristics is its ability to interactively update the GP model as new data is collected. Based on the currently modeled response and its local uncertainty, two strategies exist for collecting new data. In an explorative strategy data is collected in regions where the uncertainty is high, in an exploitative strategy data is gathered in regions where the response is predicted to be optimum. Bayesian optimization (BO) is an algorithm to decide which strategy to follow on a step-by-step basis depending on the number of data points that one can afford to collect. The module will introduce the fundamental concepts behind GP, highlighting their strengths in interpolating data and quantifying uncertainty. Additionally, it will furthermore outline the concepts behind BO. The module consists of both lectures (40%) and practical exercises (60%).

Contact