# Lectures

The Summer School will be offered as a hybrid event. Due to the ongoing COVID-19 pandemic it is not guaranteed that every international participant/lecturer can visit Dortmund. The event will thus be a mixture of local and (possibly some) remote lectures.
All lectures will also be streamed via Zoom and Youtube to the remote audience of participants that could not travel to Germany.
Lectures will be available on-demand on YouTube during the week of the Summer School.
Each lecture will be accompanied by a dedicated Q&A session.
The schedule for lectures and Q&A session will be announced on this webpage soon.

The following lectures will be part of the Summer School.
Note that this is a preliminary list.
More lectures are yet to be added and the list will be updated regularly.

The warm welcome to the summer school comes with an introduction of the collaborative research center SFB 876, which organises REAML 2022. What are the hot topics of resource-aware machine learning? Why and how should we save energy and communication when learning or applying the learned models? We conclude with practical hints.

In the field of data mining, there is one big open problem: the optimization subject to binary constraints. Binary constraints make data mining results interpretable and definite. Is this picture showing a cat? Should this movie be recommended to this user? Should the next chess move be this one? Binary results give a definite answer to questions like these. There are a lot of methods which are able to solve binary constrained problems, however, they mostly work under one condition: exclusivity. That is, if a picture shows a cat, then it can not show a dog, if a movie is recommended to one user, then it can't be recommended to the other, and there should only be one next chess move which is the optimal one. Depending on the application, this assumption is is more or less justified. The field of clustering is an area in which the optimization subject to binary constraints is explicitly studied. In this talk, we will discuss the broad spectrum of tasks where a matrix factorization approximation error in Frobenius norm is minimized subject to binary constraints. We will unveil under which circumstances this optimization task defines the clustering objectives of k-means, spectral clustering and subspace clustering, but we will also make connections to methods like deep learning. We will also see how bridging those disciplines under the umbrella of matrix factorization establishes novel research ideas and insights, providing inspiration to tackle pending research questions of adversarial learning, computing meaningful embeddings and to learn sensible similarity metrics.

Although cosmic rays were already discovered more than 100 years ago, their exact origin, as well as the physical mechanisms involved in their acceleration remain largely unknown. In order to resolve this mystery large scale facilities have been set up around the globe, which target different messenger particles. Popular examples For many of these facilities the use of machine learning algorithms has become a standard analysis technique. Algorithms and their application differ between individual analyses, but especially Boosting, Random Forests and Deep Neural Networks are not only populare but also highly successful choices. This lecture will provide an overview over the challenges associate with the detection and analysis of different messenger particles and how these challenges can be addressed via the use of machine learning and deep learning algorithms.

Data analyses usually entail the application of many scripts, notebooks, and command line tools to transform, filter, aggregate or plot data and results. With ever increasing amounts of data being collected in science, reproducible and scalable automatic workflow management becomes increasingly important. Snakemake is a workflow management system, consisting of a text-based workflow specification language and a scalable execution environment, that allows the parallelized execution of workflows on workstations, compute servers and clusters without modification of the workflow definition. Snakemake thereby puts a particular focus on transparency and human readability, as well as adaptability and modularization of data analyses.
With over 380,000 downloads and on average more than 7 new citations per week in 2021 (>1300 in total), Snakemake is one of the most widely used systems for reproducible data analysis.
This tutorial will introduce the Snakemake workflow definition language and describe how to use the execution environment. Further, it will be shown how Snakemake helps to create reproducible and transparent analyses that can be adapted to new data with little effort.

Hardware implementation is critical to reducing execution time and energy consumption for the training and deployment of deep learning models. The use of field-programmable gate arrays (FPGAs) is a promising approach to achieve a good trade-off between the design cycle and performance for deep learning systems. This lecture on FPGA-based deep learning consists of two parts. The first part gives an overview of state-of-the-art FPGA design for training and inference of deep learning models. Specifically, this part covers potential benefits, application scenarios, main challenges, design optimisation techniques for FPGA-based deep learning with examples. The second part discusses a basic FPGA design for feed-forward networks (FFNs). The design accelerates the back-propagation process for FFN training and can be extended to support more complicated network architectures.

The generalization mystery in deep learning is the following: Why do over-parameterized neural networks trained with gradient descent (GD) generalize well on real datasets even though they are capable of fitting random datasets of comparable size? Furthermore, from among all solutions that fit the training data, how does GD find one that generalizes well (when such a well-generalizing solution exists)?
We argue that the answer to both questions lies in the interaction of the gradients of different examples during training. Intuitively, if the per-example gradients are well-aligned, that is, if they are coherent, then one may expect GD to be (algorithmically) stable, and hence generalize well. We formalize this argument with an easy to compute and interpretable metric for coherence, and show that the metric takes on very different values on real and random datasets for several common vision networks. The theory also explains a number of other phenomena in deep learning, such as why some examples are reliably learned earlier than others, why early stopping works, and why it is possible to learn from noisy labels. Moreover, since the theory provides a causal explanation of how GD finds a well-generalizing solution when one exists, it motivates a class of simple modifications to GD that attenuate memorization and improve generalization.
Generalization in deep learning is an extremely broad phenomenon, and therefore, it requires an equally general explanation. We conclude with a survey of alternative lines of attack on this problem, and argue that the proposed approach is the most viable one on this basis.

This tutorial provides an introduction to a rapidly growing area of new methods for training and evaluating intelligent systems that act autonomously, including applications from recommendation and search engines to automation and robotics. At the core of these methods are counterfactual estimators that enable the use of existing log data to estimate how some new target policy would have performed, if it had been used instead of the policy that logged the data. We say that those estimators work "off-policy", since the policy that logged the data is different from the target policy. In this way, counterfactual estimators enable Off-policy Evaluation (OPE) akin to an unbiased offline A/B test, as well as learning new decision-making policies through Off-policy Learning (OPL). The goal of this tutorial is to summarize the foundations of OPE and OPL, and provide an overview of activity and future directions in this field.

The COVID-19 pandemic has shown the importance of medical testing for an early detection of regional disease hot spots and for monitoring the course of the pandemic. In particular, the coupling of medical biosensors with concepts of machine learning has the potential to meet the requirements for efficient and robust detection of current and future pathogens. The lecture illustrates this with the example of the plasmon-assisted microscopy sensor that can make nanometer-sized particles (e.g., viruses) visible. The principle of operation of the sensor and the concept for the detection of nanometer-sized particles is explained. The challenge is that the analysis is carried out on the basis of data-intensive and very noisy or artefact afflicted image sequences and that the processing of the image sequences should be done in (soft) real-time while minimising resource consumption, e.g., of energy and memory. The lecture is thus at the same time an introduction to the hackathon.

The lecture starts with information theoretic considerations, which show why inverse problem are hard when a measurement is distorted by finite-resolution effects. In most cases this implies that the problem can only be solved by biasing the result, for example by regularisation methods. Different ways are discussed to implement this approach and to control the resulting bias, with a special focus on the proper interpretation of the results.

Data science and machine learning is taking the world by storm. Almost all theory and methods, however, are inherently flawed in such a basic way that it prevents them from being used in practice. Unlike what most papers assume, in many applications (e.g., autonomous driving, industrial machines, or healthcare) it is impossible or hugely impractical to gather all data into one place. This is not only due to privacy concerns, but the sheer size of data makes centralizing and processing it infeasible. Federated learning offers a solution: models are trained only locally and combined to create a well-performing joint model - without sharing data. Like many data science techniques, applying them in practice requires a high level of trust. However, giving a guarantee on the model quality, training and resource efficiency, bounding the communication, and ensuring data privacy is a huge undertaking. In this talk I will present efficient, theoretically sound, and practically useful methods for efficient federated machine learning, as well as identify important and exciting open problems.

Over the last decade, energy harvesting has seen significant growth as different markets adopt green, sustainable ways to produce electrical energy. Even though costs have fallen, the embedded sensing and Internet of Things community have not yet widely adopted energy-harvesting-based solutions. This is partly due to a mismatch between power density in energy harvesters and electronic devices which, until recently, required a battery or super-capacitor to be functional. This mismatch is especially accentuated in indoor environments, where there is comparably less primary energy available than in outdoor environments. In this talk, I will present a design methodology that can optimize energy flow in dynamic environments without requiring batteries or super-capacitors. Furthermore, I will discuss the general applicability of this approach by presenting two light-powered batteryless sensing systems, smartcards and cameras, together with optimization techniques to maximize their performance and energy efficiency.

In this talk, I will describe a project that take advantage of embedded crowdsensing to collect pavement condition data and discuss how crowdsensing platforms conduct road damage detection using deep neural networks with images captured through smartphones. We will then explore how to properly motivate users to participate in low platform-cost crowdsensing tasks. I will describe how to model the incentive problem for pavement crowdsensing and design new incentive mechanisms based on a platform-driven greedy algorithm. Lastly, I will discuss the performance of the incentive mechanisms in different scenarios in terms of the platform cost and the overall task completion time.

Bayesian methods are often used to solve inverse problems and machine learning tasks. In a Bayesian method, one represents one's state of knowledge about an unknown object of interest using a probability measure, and then iteratively updates this probability measure each time a new data point is obtained, by using a likelihood function and Bayes' formula.
One challenge common to many Bayesian methods is that evaluating the likelihood function for an arbitrary input can be computationally expensive. This motivates the use of cheaper approximations of the likelihood function. Random approximations of the likelihood --- for example, using randomised linear algebra --- have become popular in recent years, because they are often parallelisable. However, since these approximations introduce errors into the probability measure, one must analyse the errors to ensure that they do not 'break' the Bayesian method.
In this lecture, we will present the basic ideas of Bayesian inference, motivate the use of random approximations of the likelihood function using some powerful ideas from mathematics, and analyse the approximation errors of the corresponding randomised Bayesian method.

Coresets are arguably the most important paradigm used in the design and analysis of big data algorithms. Succintly, a coreset compresses the input such that for any candidate query, the query evaluation on the coreset and the query evaluation on the original data are approximately the same. For clustering, this means that a coreset is a small weighted sample of the points such that for any set of centers, the cost on the original point set and the cost on the coreset are equal up to some small multiplicative distortion. In this talk, we will give an in-depth and yet also very simple and basic introduction into coreset algorithms and their analysis.