## Introduction to machine learningSchedule: Tuesday, 04.09., 09.00-15.30
Machine learning is all about building some (educated) common sense from the data jungle where we live.
Many learning criteria have been designed and none is universal: your prior knowledge - about the application domain and/or about the learning algorithms - is what makes the difference. The course will provide you with some general principles (which criteria are sound, which are effective depending on the context) and methodology (rules of good practice - how to conduct a ML application). |

## Numerical Optimization in Data AnalysisSchedule: Tuesday, 04.09., 16.00-17.30
Many interesting problems in data analysis can be formulated as mathematical programs for which solutions can be found via numerical optimization. Optimization studies canonical forms of such programs, providing us with useful tools to understand their structures and thereby to design resource-efficient computation algorithms. In this lecture we discuss some fundamental ideas in optimization that are important in efficient data analysis. |

## Data Mining with RapidMinerSchedule: Tuesday, 04.09., 18.00-19.00
Data Mining can become a lot easier using the right tools. One of
these popular tools and voted as the most widely used solutions on
KDNugget is Example datasets will be provided for download on this website a few weeks prior to the workshop. |

## Data Mining from Ubiquitous Data StreamsSchedule: Wednesday, 05.09., 09.00-10.30 and 11.00-12.30
The lecture discusses the challenges in learning from distributed sources of continuous data generated by dynamic environments. Learning in these environments is faced with new challenges: we need to continuously maintain a decision model consistent with the most recent data. Stream learning algorithms work with limited computational resources. They need to be able to maintain any time decision models, modify the decision model when new information is available, detect and react to changes in the underlying process generating data, and forget outdated information. The tutorial will introduce the area of data stream mining using illustrative problems, present state-of-the-art learning algorithms in change detection, clustering, classification, and discuss current trends and opportunities of research in learning from ubiquitous data streams. The second part includes exercises using software for massive online analysis (MOA) and other software for stream mining. |

## Statistical methods for model selectionSchedule: Wednesday, 05.09., 14.00-15.30 and 16.00-17.30
Model selection is a central challenge both for regression and classification tasks. For evaluating the quality of a statistical model many different partly conflicting criteria are applied, for example the fit of the model in terms of likelihood, the sparsity of the model, the computation time of an algorithm for model fitting, or the interpretability of the resulting model. In the first part, we give a general introduction to model selection criteria, starting with the tradeoff between model fit and complexity referring to bias-variance decompositions. We explain how in a linear model the estimation of the optimism of the model fit motivates the popular AIC (Akaike information criterion). As alternatives for such an explicit score methods based on resampling or cross-validation are introduced, which require less model assumptions but more computation time. Finally we discuss variable selection algorithms like forward or backward selection or based on regularized estimates. All these approaches are demonstrated in a practical session in the statistical programming language R. Reasons for time-consuming experiments are the application of resampling algorithms, a large number of potential models, or large data sets. In these situations comparisons of statistical algorithms are best performed on high performance computing clusters. We present two new R packages that greatly simplify working in batch computing environments. The package BatchJobs implements the basic objects and procedures to control a batch cluster from within R. It is structured around cluster versions of the well-known higher order functions Map/Reduce/Filter from functional programming. The package BatchExperiments is tailored for the general scenario of analyzing arbitrary algorithms on problem instances. It extends BatchJobs by letting the user define an array of jobs of the kind 'apply algorithm A on problem instance P and store results R'. It is possible to associate statistical designs with parameters of algorithms and problems and therefore systematically study their influence on the algorithm's performance. A persistent database guarantees reproducible results, even on other systems. Examples for the application of these packages are presented in a practical session in R. |

## Exploitation of memory hierarchiesSchedule: Thursday, 06.09., 09.00-10.30
According to Burks, Goldstine and v. Neumann (1946), Modern computers contain such memory hierarchies. Their presence may have a dramatic impact on the performance of applications. Nevertheless, programmers are frequently unaware of this fact. In the talk, we will present examples of memory hierarchy levels and we will also demonstrate how these levels can be exploited. In particular, we will look at techniques for improving the utilization of caches and of scratchpad memories. Corresponding source-to-source code transformations will be presented. We will close with a brief look at secondary memory and the related so-called I/O-Algorithms. |

## Battery capacity modelsSchedule: Thursday, 06.09., 11.00-12.30
Batteries are very crucial components of all portable electronic devices. Nevertheless, users of batteries are frequently unaware of their characteristics. In this talk, we would like to provide fundamental knowledge in this area. We will start with a look at the expected future of battery technology and we will continue with a presentation of models of the remaining battery charges. We will then briefly present real-time calculus and show how it can be applied to model remaining battery charge over time. |

## Towards Self-Powered SystemsSchedule: Thursday, 06.09., 14.00-15.30 and Friday, 07.09., 09.00-10.30
A Wireless Sensor Network (WSN) is a distributed network, where a large number of computational components (also referred to as "sensor nodes" or simply "nodes") are deployed in a physical environment. Each component collects information about and offers services to its environment, e.g. environmental monitoring and control, healthcare monitoring and traffic control, to name a few. The collected information is processed either at the component, in the network or at a remote location (e.g. the “cloud”), or in any combination of these. WSNs are typically required to run unattended for very long periods of time, often several years, only powered by standard batteries. This makes energy-awareness a particular important issue when designing WSNs. With the advances in energy harvesting technologies, energy harvesting is an attractive new source of energy to power the individual nodes of a WSN. Not only is it possible to extend the lifetime of the WSN, it may eventually be possible to run them without batteries – effectively turning them into self-powered systems. However, this will require that the WSN system is carefully designed to effectively use adaptive energy management, and hence, adds to the complexity of the problem. One of the key challenges is that the amount of energy being harvested over a period of time is highly unpredictable. In this lecture I will address trends and challenges of energy efficiency for both single sensor nodes and networks of sensor nodes, when powered by energy harvested from the environment – leading towards self-powered systems. |

## From Data Taking to Rapid Mining – Analysis of large Data Volumes in Astroparticle PhysicsSchedule: Thursday, 06.09., 16.00-17.30
In Astroparticle Physics experiments are set up at barely
accessible places like the South Pole or the top of high mountains
like the Roque de los Muchachos, La Palma, to explore the most
exotic sources of astrophysically accelerated particles. Under
restriction of the resources energy, bandwidth, CPU-time and
storage huge amounts of data are pre-analyzed and stored,
containing only one signal event per 10 |

## Use the power of your GPU: Massively parallel programming with CUDASchedule: Thursday, 05.09., 18.00-19.00
Todays graphics processing units (GPUs) may be used as highly parallel co-processor beside the usual central processing unit. They are now an established platform for high-performance scientific computing, and a multitude of general and domain-specific programming environments, libraries and tools have emerged over the last years. We will give an overview on common techniques and libraries which can be used directly to benefit from this high parallelism. The goals of this tutorial are to provide an introduction to GPU computing and to explain how to accelerate data mining and machine learning with GPUs. |