• German

Main Navigation

News Archiv from the SFB 876 Group

In this section all historic news regarding the Collaborative Research Center SFB 876 can be explored.

Complex Network Mining on Digital and Physical Information Artefacts

In the world of today, a variety of interaction data of humans, services and systems is generated, e.g., utilizing sensors and social media. This enables the observation and capture of digital and physical information artefacts at various levels in offline and online scenarios.
Then, data science provides for the means of sophisticated analysis of the collected information artefacts and emerging structures.
Targeting that, this talk focuses on data mining on complex networks and graph structures and presents exemplary methods and results in the context of real-world systems. Specifically, we focus on the grounding and analysis of behavior, interactions and complex structures emerging from heterogeneous data, and according modeling approaches.


Martin Atzmueller is assistant professor at Tilburg University as well as visiting professor at the Université Sorbonne Paris Cité.
He earned his habilitation (Dr. habil.) in 2013 at the University of Kassel, where he also was appointed as adjunct professor (Privatdozent).
He received his Ph.D. (Dr. rer. nat.) in Computer Science from the University of Würzburg in 2006. He studied Computer Science at the University of Texas at Austin (USA) and at the University of Wuerzburg where he completed his MSc in Computer Science.

Martin Atzmueller conducts fundamental and applied research at the nexus of Data Science, Network Analysis, Ubiquitous Social Media, the Internet of Things, and Big Data. In particular, his research focuses on how to successfully analyze and design information and knowledge processes in complex ubiquitous and social environments. This is implemented by developing according methods and approaches for augmenting human intelligence and to assist the involved actors in all their purposes, both online and in the physical world.

Algorithmic Symmetry Detection and Exploitation

Symmetry is a ubiquitous concept that can both be a blessing and a curse. Symmetry arises naturally in many computational problems and can for example be used for search space compression or pruning. I will talk about algorithmic techniques to find symmetries and application scenarios that exploit them.

Starting with an introduction to the framework that has been established as the de facto standard over the past decades, the talk will highlight the underlying central ideas. I will then discuss several recent results and developments from the area. On the one hand, these results reassert the effectiveness of symmetry detection tools, but, on the other hand, they also show the limitations of the framework that is currently applied in practice. Finally, I will focus on how the central algorithmic ideas find their applications in areas such as machine learning and static program analysis.


Since 2014, Pascal Schweitzer is a junior-professor for the complexity of discrete problems at RWTH Aachen University. Following doctoral studies at the Max-Planck Institute for Computer Science in Saarbrücken, he was first a post-doctoral researcher at the Australian National University and then a laureate of the European Post-Doctoral Institute for Mathematical Sciences. His research interests comprise a wide range of discrete mathematics, including algorithmic and structural graph and group theory, on-line algorithms, and certifying algorithms.

Bashir Al-Hashimi

Runtime management for many core embedded systems: the PRiME approach

PRiME (Power-efficient, Reliable, Manycore Embedded Systems, http://www.prime-project.org) is a national research programme funded by UK EPSRC, which started in 2013. My talk will outline the key scientific challenges in energy efficiency and hardware reliability of many-core embedded systems which PRiME has addressed / is still addressing. I will describe the main theoretical and experimental advances achieved to date. This includes presentation of learning-based runtime algorithms and OpenCL based cross-layer framework for energy optimization.


Bashir M. Al-Hashimi (M’99-SM’01-F’09) is a Professor of Computer Engineering and Dean of the Faculty of Physical Sciences and Engineering at University of Southampton, UK.

He is ARM Professor of Computer Engineering and Co-Director of the ARM- ECS research centre. His research interests include methods, algorithms and design automation tools for energy efficient of embedded computing systems. He has published over 300 technical papers, authored or co-authored 5 books and has graduated 33 PhD students.

Real-Time Mobility Data Mining


We live on a digital era. Weather, communications and social interactions start, happen and/or are triggered on some sort of cloud – which represent the ultimate footprint of our existence. Consequently, millions of digital data interactions result from our daily activities. The challenge of transforming such sparse, noisy and incomplete sources of heterogeneous data into valuable information is huge. Nowadays, such information is key to keep up a high modernization pace across multiple industries. Transportation is not an exception.

One of the key insights on mobility data mining are GPS traces. Portable digital devices equipped with GPS antennas are ubiquitous sources of continuous information for location-based decision support systems. The availability of these traces on the human mobility patterns is growing explosively, as industrial players modernize their infrastructure, fleets as well as the planning/control of their operations. However, to mine this type of data possesses unique characteristics such as non-stationarity, recurrent drifts or high communication rate. These latest issues clearly disallow the application of traditional off-the-shelf Machine Learning frameworks to solve these problems.

In this presentation, we approach a series of Transportation problems. Solutions involve near-optimal decision support systems based on straightforward Machine Learning pipelines which can handle the particularities of these problems. The covered applications include Mass Transit Planning (e.g. buses and subways), Operations of On-Demand Transportation Networks (e.g. taxis and car-sharing) and Freeway Congestion Prediction and Categorization. Experimental results on real-world case studies of NORAM, EMEA and APAC illustrate the potential of the proposed methodologies.


Dr. Luis Moreira-Matias received his Ms.c. degree in Informatics Engineering and Ph.d. degree in Machine Learning from the University of Porto, in 2009 and 2015, respectively. During his studies, he won an International Data Mining competition held during a Research Summer School at TU Dortmund (2012). Luis served in the Program Committee and/or as invited reviewer of multiple high-impact research venues such as KDD, AAAI, IEEE TKDE, ESWA, ECML/PKDD, IEEE ITSC, TRB and TRP-B, among others. Moreover, he encloses a record of successful real-world deployment of AI-based software products across EMEA and APAC.

Currently, he is Senior Researcher at NEC Laboratories Europe (Heidelberg, Germany), integrated in the Intelligent Transportation Systems group. His research interests include Machine Learning, Data Mining and Predictive Analytics in general applied to improve Urban Mobility. He was fortunate to author 30+ high-impact peer-reviewed publications on related topics.

How to Time a Black Hole: Time series Analysis for the Multi-Wavelength Future


Virtually all astronomical sources are variable on some time scale, making studies of variability across different wavelengths a major tool in pinning down the underlying physical processes. This is especially true for accretion onto compact objects such as black holes: “spectral-timing”, the simultaneous use of temporal and spectral information, has emerged as the key probe into strong gravity and accretion physics. The new telescopes currently starting operations or coming online in the coming years, including the Square Kilometre Array (SKA), the Large Synoptic Survey Telescope (LSST) and the Cherenkov Telescope Array (CTA), will open up the sky to transient searches, monitoring campaigns and time series studies with an unprecedented coverage and resolution. But at the same time, they collect extraordinarily large data sets of previously unknown complexity, motivating the necessity for new tools and statistical methods. In this talk, I will review the state-of-the-art of astronomical time series analysis, and discuss how recent developments in machine learning and statistics can help us study both black holes and other sources in ever greater detail. I will show possible future directions of research that will help us address the flood of multiwavelength time series data to come.


Daniela Huppenkothen received a Bachelor Degree in Geosciences and Astrophysics from the Jacobs University in Bremen in 2008 and the M.Sc. and Ph.D. degrees from the University of Amsterdam in Astronomy and Astrophysics in 2010 and 2014 respectively. Since October 2016 she works as an James Arthur Postdoctoral Fellow at the New York University. Her interests are time series analysis in astronomy, astrostatistics, X-ray data analysis, and machine learning.

Sketching as a Tool for Geometric Problems


I will give an overview of the technique of sketching, or data dimensionality reduction, and its applications to fundamental geometric problems such as projection (regression) onto flats and more general objects, as well as low rank approximation and clustering applications.

Learning with Knowledge Graphs

In recent years a number of large-scale triple-oriented knowledge graphs have been generated. They are being used in research and in applications to support search, text understanding and question answering. Knowledge graphs pose new challenges for machine learning, and research groups have developed novel statistical models that can be used to compress knowledge graphs, to derive implicit facts, to detect errors, and to support the above mentioned applications. Some of the most successful statistical models are based on tensor decompositions that use latent representations of the involved generalized entities. In my talk I will introduce knowledge graphs and approaches to learning with knowledge graphs. I will discuss how knowledge graphs can be related to cognitive semantic memory, episodic memory and perception. Finally I will address the question if knowledge graphs and their statistical models might also provide insight into the brain's memory system.

Volker Tresp received a Diploma degree from the University of Goettingen, Germany, in 1984 and the M.Sc. and Ph.D. degrees from Yale University, New Haven, CT, in 1986 and 1989 respectively. Since 1989 he is the head of various research teams in machine learning at Siemens, Research and Technology. He filed more than 70 patent applications and was inventor of the year of Siemens in 1996. He has published more than 100 scientific articles and administered over 20 Ph.D. theses. The company Panoratio is a spin-off out of his team. His research focus in recent years has been „Machine Learning in Information Networks“ for modeling Knowledge Graphs, medical decision processes and sensor networks. He is the coordinator of one of the first nationally funded Big Data projects for the realization of „Precision Medicine“. Since 2011 he is also a Professor at the Ludwig Maximilian University of Munich where he teaches an annual course on Machine Learning.

At BTW 2017 in Stuttgart, Jens Teubner received the Best Paper Award for his Paper "Efficient Storage and Analysis of Genome Data in Databases". He developed this work together with the University Magdeburg, Bayer AG, and TU Berlin.

The paper discusses technique to store genome data efficiently in a relational database. This makes the flexibility and performance of modern relational database engines accessible to the analysis of genome data.

At the same day, Stefan Noll, a Master Student of Jens Teubner, received the Best Student Paper Award ath BTW 2017 in Stuttgart. His contribution "Energy Efficiency in Main Memory Databases" reports on the key results of his Master Thesis. The Master Thesis was prepared within the DBIS Group and in the context of the Collaborative Research Center SFB876, Project A2.

His paper shows how the energy efficiency of a database system can be improved by balancing the compute capacity of the system with the available main memory bandwidth. To this end, he proposes to use Dynamic Voltage and Frequency Scaling (DVFS) as well as the selective shutdown of individual cores.

Abstract: "Efficient Storage and Analysis of Genome Data in Databases"
Genome-analysis enables researchers to detect mutations within genomes and deduce their consequences. Researchers need reliable analysis platforms to ensure reproducible and comprehensive analysis results. Database systems provide vital support to implement the required sustainable procedures. Nevertheless, they are not used throughout the complete genome-analysis process, because (1) database systems suffer from high storage overhead for genome data and (2) they introduce overhead during domain-specific analysis. To overcome these limitations, we integrate genome-specific compression into database systems using a specialized database schema. Thus, we can reduce the storage overhead to 30%. Moreover, we can exploit genome-data characteristics during query processing allowing us to analyze real-world data sets up to five times faster than specialized analysis tools and eight times faster than a straightforward database approach.


Big data in machine learning is the future. But how to deal with data analysis and limited resources: Computational power, data distribution, energy or memory? From September 25th to 28th, TU Dortmund University, Germany, hosts the 4th summer school on resource-aware machine learning. Further information and online registration at: http://sfb876.tu-dortmund.de/SummerSchool2017

Topics of the lectures include: Machine learning on FPGAs, Deep Learning, Probabilistic Graphical Models and Ultra Low Power Learning.

Exercises help bringing the contents of the lectures to life. The PhyNode low power computation platform was developed at the collaborative research center SFB 876. It enables sensing and machine learning for transport and logistic scenarios. These devices provide the background for hands-on experiments with the nodes in the freshly built logistics test lab. Solve prediction tasks under very constrained resources and balance accuracy versus energy.

The summer school is open to advanced graduate, post-graduate students as well as industry professionals from across the globe, who are eager to learn about cutting edge techniques for machine learning with constrained resources.

Excellent students may apply for a student grant supporting travel and accommodation. Deadline for application is July 15th.


Sensors Journal Cover

The most recent B2-Project publication "Application of the PAMONO-sensor for Quantification of Microvesicles and Determination of Nano-particle Size Distribution" has been selected by the journal Sensors as the leading article for their current issue. The article is available via Open Access on the journals web site. The article was co-authored by Alexander Schramm, project leader of SFB-project C1.


The PAMONO-sensor (plasmon assisted microscopy of nano-objects) demonstrated an ability to detect and quantify individual viruses and virus-like particles. However, another group of biological vesicles—microvesicles (100–1000 nm)—also attracts growing interest as biomarkers of different pathologies and needs development of novel techniques for characterization. This work shows the applicability of a PAMONO-sensor for selective detection of microvesicles in aquatic samples. The sensor permits comparison of relative concentrations of microvesicles between samples. We also study a possibility of repeated use of a sensor chip after elution of the microvesicle capturing layer. Moreover, we improve the detection features of the PAMONO-sensor. The detection process utilizes novel machine learning techniques on the sensor image data to estimate particle size distributions of nano-particles in polydisperse samples. Altogether, our findings expand analytical features and the application field of the PAMONO-sensor. They can also serve for a maturation of diagnostic tools based on the PAMONO-sensor platform.


Learning over high dimensional data streams

High dimensional data streams are collected in many scientific projects, humanity research, business processes, social media and the Web.
The challenges of data stream mining are aggravated in high dimensional data, since we have to decide with one single look at the data also about the dimensions that are relevant for the data mining models.
In this talk we will discuss about learning over high dimensional

i) numerical and
ii) textual streams.

Although both cases refer to high dimensional data streams, in

(i) the feature space is fixed, that is, all dimensions are present at each timepoint, whereas in

(ii) the feature space is also evolving as new words show up and old words get out of use.



Eirini Ntoutsi is an Associate Professor of Intelligent Systems at the Faculty of Electrical Engineering and Computer Science, Leibniz University Hannover, since March 2016. Her research lies in the areas of Data Mining, Machine Learning and Data Science and can be summarized as learning over complex data and data streams.

Prior to joining LUH, she was a postdoctoral researcher at the Ludwig-Maximilians-University (LMU) in Munich, Germany under the supervision of Prof. Hans-Peter Kriegel. She joined LMU in 2010 with an Alexander von Humboldt Foundation fellowship.

She received her PhD in data mining from the University of Piraeus, Greece under the supervision of Prof. Yannis Theodoridis.

Scalable Algorithms for Extreme Multi-class and Multi-label Classifcation

In the era of big data, large-scale classification involving tens of thousand target categories is not uncommon these days. Also referred to as Extreme Classification, it has also been recently shown that the machine learning challenges arising in recommendation systems and web-advertising can be effectively addressed by reducing it to extreme multi-label classification. In this talk, I will discuss my two recent works which have been accepted at SDM 2016 and WSDM 2017, and present TerseSVM and DiSMEC algorithms for extreme multi-class and multi-label classification. The training process for these agorithms makes use of openMP based distributed architectures, thereby using thousands of cores for computation, and train models in a few hours which would otherwise take several weeks. The precision@k and nDCG@k results using DiSMEC improve by upto 10% on benchmark datasets over state-of-the-art methods such as SLEEC and FastXML, which are used by Microsoft in Bing Search. Furthermore, the model size is upto three orders of magnitutde smaller than that obtained by off-the-shelf solvers.

Rohit Babbar is currently a post-doc in the Empirical Inference group at Max-Planck Institute Tuebingen since October 2014. His work has primarily been focused around large-scale machine learning and Big data problems. His research interests also include optimization and deep learning. Before that, he finished his PhD from University of Grenoble in 2014.

Alexander Schramm

The Fritz-Lampert-Award of the TRANSAID-foundation for cancer-suffering children of the year 2016 has been awarded to Alexander Schramm (C1), head of the pediatric-oncologic research lab at the University Clinic Essen. The german-russian research award recognises excellent researchers and their work in the field of pediatric hematology and oncology for fundamental and clinical research. The award has been handed over at the semi-annual meeting of the Gesellschaft für Pädiatrische Onkologie und Hämatologie (GPOH) in Frankfurt at the 8th of November.
Recognised was his work in the publication Mutational dynamics between primary and relapse neuroblastomas, published together with national and international researchers in the Nature Genetics Journal. Beside Prof. Dr. Schramm the two further C1 project leaders, Prof. Dr. Sven Rahmann and Dr. Sangkyun Lee, also contributed to the publication.
Major concern of doctors is the recurrence of tumors, often leading to worse treatment results. Novel data analysis techniques can focus on differences between primary (at diagnosis) and recurrent neuroblastoma cancer cell genetic profiles. Found genetic patterns provide a chance for upcoming, target-specific therapies.

A year with successful international exchanges is nearing its end.

This year, six of our SFB researchers were (or will be in the near future) between news, space and science. Amongst others, they were at Google, NASA, Stanford and the Wirtschaftswoche. While it was certainly not a walk in the park, it was definitely an experience and a great success.

Following the topical seminar visit by Luca Benini, Mojtaba Masoudinejad (A4) could visit his lab at the ETH Zurich complementing the SFB research on energy-efficient systems and energy harvesting. Already aroung the turn of the last year, Nils Kriege (A6) visited Universities at York and Nottingham covering graph mining topics. Kai Brügge (C3) will at the beginning of 2017 stay at the French Alternative Energies and Atomic Energy Commission (CEA) to port the concepts and algorithms of the project to the upcoming Cherenkov Telescope Array (CTA).

Elena Erdmann (A6) received a Google News Lab Fellowship and worked two months at the Wirtschaftswoche. She has developed both journalistic know-how and technical skills to drive innovation in digital and data journalism. Nico Piatkowski (A1) visited Stefano Ermon at Stanford University. Together they worked on techniques for scalable and exact inference in graphical models. He also made a detour to NASA, Netflix and Google. Last but not least, Martin Mladenov (A6/B4) got an internship at Google. Some people say this is more difficult than getting admitted to Stanford or Harvard. Who knows? But this year they accepted about 2% of applicants (1,600 people). What did he work on? We do not know it, but he visited Craig Boutilier, so very likely something related to making decisions under uncertainty.

IEEE Outstanding Paper Award

In July Jian-Jia Chen has already been awarded the outstanding paper award 2016 of the ECRTS for the publication Partitioned Multiprocessor Fixed-Priority Scheduling of Sporadic Real-Time Tasks.

Now the next award, this time by the IEEE RTSS symposion, was awarded to Wen-Hung Huang, Maolin Yang and Jian-Jia Chen for the publication Resource-Oriented Partitioned Scheduling in Multiprocessor Systems: How to Partition and How to Share?


When concurrent real-time tasks have to access shared resources, to prevent race conditions, the synchronization and resource access must ensure mutual exclusion, e.g., by using semaphores. That is, no two concurrent accesses to one shared resource are in their critical sections at the same time. For uniprocessor systems, the priority ceiling protocol (PCP) has been widely accepted and supported in real-time operating systems. However, it is still arguable whether there exists a preferable approach for resource sharing in multiprocessor systems. In this paper, we show that the proposed resource-oriented partitioned scheduling using PCP combined with a reasonable allocation algorithm can achieve a non-trivial speedup factor guarantee. Specifically, we prove that our task mapping and resource allocation algorithm has a speedup factor 11-6 / ( m + 1) on a platform comprising m processors, where a task may request at most one shared resource and the number of requests on any resource by any single job is at most one. Our empirical investigations show that the proposed algorithm is highly efective in terms of task sets deemed schedulable.


Opportunities and Challenges in Global Network Cameras

Millions of network cameras have been deployed. Many of these cameras provide publicly available data, continuously streaming live views of national parks, city halls, streets, highways, and shopping malls. A person may see multiple tourist attractions through these cameras, without leaving home. Researchers may observe the weather in different cities. Using the data, it is possible to observe natural disasters at a safe distance. News reporters may obtain instant views of an unfolding event. A spectator may watch a celebration parade from multiple locations using street cameras. Despite the many promising applications, the opportunities of using global network cameras for creating multimedia content have not been fully exploited. The opportunities also bring forth many challenges. Managing the large amount of data would require fundamentally new thinking. The data from network cameras are unstructured and have few metadata describing the content. Searching the relevant content would be a challenge. Because network cameras continuously produce data, processing must be able to handle the streaming data. This imposes stringent requirements of the performance. In this presentation, I will share the experience building a software system that aims to explore the opportunities using the data from global network cameras. This cloud-based system is designed for studying the worldwide phenomena using network cameras. It provides an event-based API (application programming interface) and is open to researchers to analyze the data for their studies. The cloud computing engine can scale in response to the needs of analysis programs.


Yung-Hsiang Lu is an associate professor in the School of Electrical and Computer Engineering and (by courtesy) the Department of Computer Science of Purdue University. He is an ACM distinguished scientist and ACM distinguished speaker. He is a member in the organizing committee of the IEEE Rebooting Computing Initiative. He is the lead organizer of Low-Power Image Recognition Challenge, the chair (2014-2016) of the Multimedia Communication Systems Interest Group in IEEE Multimedia Communications Technical Committee. He obtained the Ph.D. from the Department of Electrical Engineering at Stanford University and BSEE from National Taiwan University.

With more than 6,200 employees in research, teaching and administration and its unique profile, TU Dortmund University shapes prospects for the future: The cooperation between engineering and natural sciences as well as social and cultural studies promotes both technological innovations and progress in knowledge and methodology. And it is not only the more than 33,500 students who benefit from that.

The Faculty for Computer Science at TU Dortmund University, Germany, is looking for a

Research Assistant (m/f)

with a strong background in Machine Learning/Data Mining, to start at the next possible date and for the duration of up to three years.

Salary will be paid, in agreement with the lawful regulations of tariffs, according to salary group E13 TV-L resp. according to the provisional regulations of the TVÜ-L, if applicable. The position is a full time appointment; it is in principle suitable for part-time employment too. Duration of the contract will be based on the targeted qualification (e.g. PhD).

The Department of Artificial Intelligence at Dortmund is a small team that is involved in international research on Machine Learning and Data Mining, and develops application-oriented theories as well as theoretically well-founded applications. We expect:
• The candidate must have a university master degree in computer science
• Motivation to push research forward
• Interest in exchanging ideas within the team and with international researchers
• Excellent software development skills
• Ability to supervise and motivate students
• Outstanding performance resulting in publications

Responsibilities include teaching (four hours per week, e.g. tutoring, project groups, supervision of students) and support of research on machine learning. Participation at the collaborative research center SFB 876 is expected.

We offer:
• Participation in an inspiring, highly motivated team
• Support in developing the candidate's specific scientific strengths and qualification
• Opportunity to obtain a Ph.D.

The TU Dortmund University aims at increasing the percentage of women in academic positions in the Department of Computer Science and strongly encourages women to apply.

Disabled candidates with equal qualifications will be given preference.


On the Smoothness of Paging Algorithms

We study the smoothness of paging algorithms. How much can the number of page faults increase due to a perturbation of the request sequence? We call a paging algorithm smooth if the maximal increase in page faults is proportional to the number of changes in the request sequence. We also introduce quantitative smoothness notions that measure the smoothness of an algorithm.

We derive lower and upper bounds on the smoothness of deterministic and randomized demand-paging and competitive algorithms. Among strongly-competitive deterministic algorithms LRU matches the lower bound, while FIFO matches the upper bound.

Well-known randomized algorithms like Partition, Equitable, or Mark are shown not to be smooth. We introduce two new randomized algorithms, called Smoothed-LRU and LRU-Random. Smoothed-LRU allows to sacrifice competitiveness for smoothness, where the trade-off is controlled by a parameter. LRU-Random is at least as competitive as any deterministic algorithm while smoother.

This is joint work with Alejandro Salinger.


Jan Reineke is an Assistant Professor of Computer Science at Saarland University. He tries to understand what makes systems predictable, and applies his insights in the design of resource-efficient, timing-predictable microarchitectures for real-time systems. Besides design, he is interested in analysis, usually by abstract interpretation, with applications in static timing analysis, quantification of side-channel vulnerabilities, and shape analysis.

Customized OS support for data processing on modern hardware

For decades, data processing systems have found the generic interfaces and policies offered by the operating systems at odds with the need for efficient utilization of hardware resources. As a result, most engines circumvent the OS and manage hardware resources directly. With the growing complexity and heterogeneity of modern machines, data processing engines are now facing a steep increase in the complexity they must absorb to achieve good performance.

In this talk we will focus on the challege of running concurrent workloads in multi-programming execution environments, as systems' performance often suffers from resource interaction among multiple parallel jobs. In the light of recent advancements in operating system design, such as multi-kernels, we propose two key principles: the separation of compute and control planes on a multi-core machine, and customization of the compute plane as a light weight OS kernel tailored for data processing. I will present some of our design decisions, and how they help to improve the performance of workloads consisting of common graph algorithms and relational operators.

Short Bio:

Jana Giceva is a final year PhD student in the Systems Group at ETH Zurich, supervised by Gustavo Alonso, and co-advised by Timothy Roscoe. Her research interests revolve around systems running on modern hardware, with inclination towards engines for in-memory data processing and operating systems. During her PhD studies she has been exploring various cross-layer optimizations across the systems stack, touching aspects from both hardware/software and database/OS co-design. Some of these projects are part of industry collaboration with Oracle Labs. She received the European Google PhD Fellowship 2014 in Operating Systems.

Runtime Reconfigurable Computing - from Embedded to HPC

Today, FPGAs are virtually deployed in any application domain ranging from embedded systems all the way to HPC installations. While FPGAs are commonly used rather statically (basically as ASIC substitutes), this talk will focus on exploiting reprogrammability of FPGAs to improve performance, cost and the energy efficiency of a system.

For embedded systems and future Internet of things systems, it will be demonstrated how tiny FPGA fabrics can replace hardened functional blocks in, for example, an ARM A9 processor. Furthermore, a database acceleration system will be presented that uses runtime reconfiguration of FPGAs to compose query optimized dataflow processing engines. Finally, the talk will introduce the ECOSCALE project that aims at using FPGAs for exascale computing.


Dirk Koch is a lecturer in the Advanced Processor Technologies Group at the University of Manchester. His main research interest is on runtime reconfigurable systems based on FPGAs, embedded systems, computer architecture and VLSI. Dirk Koch leaded a research project at the University of Oslo, Norway which was targeting to make partial reconfiguration of FPGAs more accessible. Current research projects include database acceleration using FPGAs based on stream processing as well as reconfigurable instruction set extensions for CPUs.

Dirk Koch was a program co-chair of the FPL2012 conference and he is a program committee member of several further conferences including FCCM, FPT, DATE, ISCAS, HEART, SPL, RAW, and ReConFig. He is author of the book "Partial Reconfiguration on FPGAs" and co-editor of "FPGAs For Software Programmers". Dirk holds two patents, and he has (co-)authored 80 conference and journal publications.

Festschrift Solving Large Scale Learning Tasks

In celebration of Prof. Dr. Moriks 60th birthday, the Festschrift ''Solving Large Scale Learning Tasks'' covers research areas and researchers Katharina Morik worked with. This Festschrift has now been published at the Springer series on Lecture Notes in Artificial Intelligence.

Official presentation of the Festschrift will be on 20th of October at auditorium E23 at Otto-Hahn-Str. 14 starting 16.15 o’clock.

Articles in this Festschrift volume provide challenges and solutions from theoreticians and practitioners on data preprocessing, modeling, learning and evaluation. Topics include data mining and machine learning algorithms, feature selection and creation, optimization as well as efficiency of energy and communication. Talks for the presentation of the Festschrift are: Bart Goethals: k-Morik: Mining Patterns to Classify Cartified Images of Katharina, Arno Siebes: Sharing Data with Guaranteed Privacy, Nico Piatkowski: Compressible Reparametrization of Time-Variant Linear Dynamical Systems and Marco Stolpe: Distributed Support Vector Machines: An Overview.


In celebration of Prof. Dr. Moriks 60th birthday, the Festschrift ''Solving Large Scale Learning Tasks'' covers research areas and researchers Katharina Morik worked with.

Articles in this Festschrift volume provide challenges and solutions from theoreticians and practitioners on data preprocessing, modeling, learning and evaluation. Topics include data mining and machine learning algorithms, feature selection and creation, optimization as well as efficiency of energy and communication.

Bart Goethals: k-Morik: Mining Patterns to Classify Cartified Images of Katharina

When building traditional Bag of Visual Words (BOW) for image classification, the k-Means algorithm is usually used on a large set of high dimensional local descriptors to build a visual dictionary. However, it is very likely that, to find a good visual vocabulary, only a sub-part of the descriptor space of each visual word is truly relevant for a given classification problem. In this paper, we explore a novel framework for creating a visual dictionary based on Cartification and Pattern
Mining instead of the traditional k-Means algorithm. Preliminary experimental results on face images show that our method is able to successfully differentiate photos of Elisa Fromont, and Bart Goethals from Katharina Morik.

Arno Siebes: Sharing Data with Guaranteed Privacy

Big Data is both a curse and a blessing. A blessing because the unprecedented amount of detailed data allows for research in, e.g., social sciences and health on scales that were until recently unimaginable. A curse, e.g., because of the risk that such – often very private – data leaks out though hacks or by other means causing almost unlimited harm to the individual.
To neutralize the risks while maintaining the benefits, we should be able to randomize the data in such a way that the data at the individual level is random, while statistical models induced from the randomized data are indistinguishable from the same models induced from the original data.
In this paper we first analyse the risks in sharing micro data – as statisticians tend to call it – even if it is anonymized,  discretized, grouped, and perturbed. Next we quasi-formalize the kind of randomization we are after and argue why it is safe to share such data. Unfortunately, it is not clear that such randomizations of data sets exist. We briefly discuss why, if they exist at all, will be hard to find. Next I explain why I think they do exist and can be constructed by showing that the code tables computed by, e.g., Krimp are already close to what we would like to achieve. Thus making privacy safe sharing of micro-data possible.

Nico Piatkowski: Compressible Reparametrization of Time-Variant Linear Dynamical Systems

Linear dynamical systems (LDS) are applied to model data from various domains—including physics, smart cities, medicine, biology, chemistry and social science—as stochastic dynamic process. Whenever the model dynamics are allowed to change over time, the number of parameters can easily exceed millions. Hence, an estimation of such time-variant dynamics on a relatively small—compared to the number of variables—training sample typically results in dense, overfitted models.

Existing regularization techniques are not able to exploit the temporal structure in the model parameters. We investigate a combined reparametrization and regularization approach which is designed to detect redundancies in the dynamics in order to leverage a new level of sparsity. On the basis of ordinary linear dynamical systems, the new model, called ST-LDS, is derived and a proximal parameter optimization procedure is presented. Differences to l1 -regularization-based approaches are discussed and an evaluation on synthetic data is conducted. The results show, that the larger the considered system, the more sparsity can be achieved, compared to plain l1 -regularization.

Marco Stolpe: Distributed Support Vector Machines: An Overview

Support Vector Machines (SVM) have a strong theoretical foundation and a wide variety of applications. However, the underlying optimization problems can be highly demanding in terms of runtime and memory consumption. With ever increasing usage of mobile and embed ded systems, energy becomes another limiting factor. Distributed versions of the SVM solve at least parts of the original problem on different networked nodes. Methods trying to reduce the overall running time and memory consumption usually run in high performance compute clusters, assuming high bandwidth connections and an unlimited amount of available energy. In contrast, pervasive systems consisting of battery-powered devices, like wireless sensor networks, usually require algorithms whose main focus is on the preservation of energy. This work elaborates on this distinction and gives an overview of various existing distributed SVM approaches developed in both kinds of scenarios.

Group picture participants RAPP workshop

In fall 2015, the Ruhr Astroparticle and Plasma Physics Center (RAPP center) was founded in order to combine research efforts within the fields of plasma- and particle-astrophysics in the Ruhr area. The three universities Ruhr-Universität Bochum, Technische Universität Dortmund and Universität Duisbug/Essen are located in a radius of 20 kilometers, enabling close collaboration between the universities.

The founding PIs include Prof. Wolfgang Rhode and Prof. Bernhard Spaan, who are also one of the project leaders of the SFB projects C3, respectively C5. During the Inauguration Workshop Katharina Morik gave an invited talk on the research impact of Data Mining for astroparticle physics.

In the RAPP center, about 80 researchers, from master’s level up to staff members, join forces to investigate fundamental physics questions and to break new ground by combining knowledge from the fields of plasma-, particle- and astrophysics.


Participants of the SPP 1736-Workshops

From 26th to 28th of September the annual meeting of the DFG-SPP 1736: Algorithms for BIG DATA will be held in Dortmund. SPP members of the TU Dortmund are Johannes Fischer, Oliver Koch and Petra Mutzel. The SFB 876 participates via invited talks of Katharina Morik and Sangkyun Lee.

Focus of the SPP:

Computer systems pervade all parts of human activity and acquire, process, and exchange data at a rapidly increasing pace. As a consequence, we live in a Big Data world where information is accumulating at an exponential rate and often the real problem has shifted from collecting enough data to dealing with its impetuous growth and abundance. In fact, we often face poor scale-up behavior from algorithms that have been designed based on models of computation that are no longer realistic for big data.

While it is getting more and more difficult to build faster processors, the hardware industry keeps on increasing the number of processors/cores per board or graphics card, and also invests into improved storage technologies. However, all these investments are in vain, if we lack algorithmic methods that are able to efficiently utilize additional processors or memory features.


In domain adaptation, the goal is to find common ground between two, potentially differently distributed, data sets. By finding common concepts present in two sets of words pertaining to different domains, one could leverage the performance of a classifier for one domain for use on the other domain. We propose a solution to the domain adaptation task, by efficiently solving an optimization problem through Stochastic Gradient Descent. We provide update rules that allow us to run Stochastic Gradient Descent directly on a matrix manifold: the steps compel the solution to stay on the Stiefel manifold. This manifold encompasses projection matrices of word vectors onto low-dimensional latent feature representations, which allows us to interpret the results: the rotation magnitude of the word vector projection for a given word corresponds to the importance of that word towards making the adaptation. Beyond this interpretability benefit, experiments show that the Stiefel manifold method performs better than state-of-the-art methods.

Published at the European Conference for Machine Learning ECML 2016 by Christian Poelitz, Wouter Duivesteijn, Katharina Morik


The Cherenkov Telescope Array (CTA) is the next generation ground-based gamma-ray observatory, aimed at improving the sensitivity of current-generation experiments by an order of magnitude and provide coverage over four decades of energy. The current design consists of two arrays, one in each hemisphere, composed by tens of imaging atmospheric Cherenkov telescopes of different sizes. I will present the current status of the project, focusing on the analysis and simulation work carried on to ensure the best achievable performance, as well as how to use muons for the array calibration.

I received my PhD in Italy working on simulation and analysis for a space-based gamma-ray instrument. As an IFAE postdoc, I am currently working in both MAGIC and CTA, but still dedicating part of my time to gamma-ray satellites. For CTA, I'm part of the Monte Carlo working group, analyzing the simulations of different possible array layouts, and muon simulations for the calibration of the Large Size Telescope (LST).

September  7,  2016

There is a vacant job as project assistent


Support the speaker from CRC876 Mrs. Prof. Katharina Morik and the executive office. You have to support carrying out colloquia, meetings, workshops, summer schools and public appearance. As well as control of the funding and staff contracts from all part projects.


  • A degree in journalism, economic computer science, STEM or a compareable qualification
  • Good skills in the german and english language, spoken and written

Ideally you allready have

  • certain knowledge of common software (Word, Excel, Outlook, PowerPoint)
  • practical experience in the usage of SAP-applications (SRM/NetWeaver)
  • communication skills
  • team- and service capabilities
  • good organization skills and a independent efficiency working method

We offer:

  • Interesting and varying tasks
  • The opportunity for personal development through supporting further education
  • The cooperation in a modern and interconnected collegial team at a familyfriendly uinversity

For further information, on german, you can follow the link


Toward zero-power sensing: a transient computing approach

Current and future IoT applications envision huge numbers of smart sensors in deploy-and-forget scenarii. We still design these smart sensing systems based on the assumption of significant energy storage availability, and working on low-power electronics to minimize the depletion rate of the stored energy. In this talk I will take a different perspective - I will look into designing smart sensing systems for operating exclusively from sporadically available environmental energy (zero-power sensing) and extremely limited energy storage. These "unusual" constraints open interesting new opportunities for innovation. I will give several examples of practical "transient computing" systems and I will outline future research and application challenges in this field.


Luca Benini is Full Professor at the University of Bologna. He also holds the chair of digital circuits and systems at ETHZ . He received a Ph.D. degree in electrical engineering from Stanford University in 1997.

Dr. Benini's research interests are in energy-efficient system design and Multi-Core SoC design. He is also active in the area of energy-efficient smart sensors and sensor networks for biomedical and ambient intelligence applications.

He has published more than 700 papers in peer-reviewed international journals and conferences, four books and several book chapters (h-index 86). He has been general chair and program chair of the Design Automation and Test in Europe Conference, the International Symposium on Low Power Electronics and Design, the Network on Chip Symposium. He is Associate Editor of the IEEE Transactions on Computer-Aided Design of Circuits and Systems the ACM Transactions on Embedded Computing Systems. He is a Fellow of the IEEE and a member of the Academia Europaea.

Analysing Big Data typically involves developing for or comparing to Hadoop. For researching new algorithms, a personal Hadoop cluster, running independently of other software or other Hadoop clusters, should provide a sealed environment for testing and benchmarking. Easy setup, resizing and stopping enables rapid prototyping on a containerized playground.

DockHa is a project developed at the Artificial Intelligence Group, TU Dortmund University, that aims to simplify and automate the setup of independent Hadoop clusters in the SFB 876 Docker Swarm cluster. The Hadoop properties and setup parameters can be modified to suit the application. More information can be found in the software section (DockHa) and the Bitbucket repository (DockHa-Repository).


As part of the work for project B3 the survey on Opportunities and Challenges for Distributed Data Analysis has now been published by Marco Stolpe at ACM SIGKDD.

This survey motivates how the real-time analysis of data, embedded into the Internet of Things (IoT), enables entirely new kinds of sustainable applications in sectors such as manufacturing, transportation and distribution, energy and utilities, the public sector as well as in healthcare. It presents and discusses the challenges of real-time constraints for state-of-the-art analysis methods. Current research strongly focuses on cloud-based big data analysis. Our survey provides a more balanced view, taking also into account highly communication-constrained scenarios which require research on decentralized analysis algorithms. These must analyse data directly on sensors and small devices. Discussed is the vertical partitioning of data common for the IoT, which is particularly challenging, since information about observations is assessed at different networked nodes. The paper includes a comprehensive bibliography that should provide readers with a good starting point for their own work.


The publication can now be found online. It compiles profiles of the most important players involved in Big Data in Germany:

60 technology providers (p. 55 ff LS11, LS8 of TU Dortmund Computer Sience), 40 cooperation partners and 30 research centers and institutions, including SFB 876 on p.47.


Applications of Machine Learning: From Brain-Machine-Interfaces to Autonomous Driving

Machine learning methods have been established in many areas and produce better results for special fields than humans. Current developments like Deep Learning strengthen these trends. This presentation gives a short introduction to the state of the art of Machine Learning and shows a couple of examples where the Department of Computer Engineering, University of Tübingen, especially has its focus on. By means of Brain-Machine-Interfaces good results could be achieved regarding the rehabilitation of stroke patients. Another good example is the Brain-Machine-Interface being tested for adaptive learning systems. More challenges exist in the area of autonomous driving where for example the recognition of the state of the driver plays a key role to check if he or she is able to take back the control of the car in corresponding situations.


Professor Dr. Wolfgang Rosenstiel studied Informatics at the University of Karlsruhe. There he also received his Diploma in 1980 and his Ph.D. in 1984. From 1986 to 1990 he led the department “Automatization of Integrated Circuit Design” at the Research Center for Information Technology Karlsruhe (FZI Karlsruhe). Since 1990 he is Professor (Chair for Computer Engineering) at the Wilhelm-Schickard-Institute for Informatics at the University of Tübingen. Since 1st October 2010 he is Dean of the Faculty of Science. He was committee member of DFG senate for Collaborative Research Centers. He is editor-in-chief of the Springer journal „Design Automation for Embedded Systems“. He is active in numerous program and executive committees. He is member of GI, IEEE, IFIP 10.5 as well as ITRS-Design-Committee. In 2009 he received an ERC advanced research grant and he is DATE Fellow since 2008. In 2007 he received a Shared University Research Grant from IBM.

Deep Learning for Big Graph Data

Big data can often be represented as graphs. Examples include chemical compounds, communication and traffic networks, and knowledge graphs. Most existing machine learning methods such as graph kernels do not scale and require ad-hoc feature engineering. Inspired by the success of deep learning in the image and speech domains, we have developed neural representation learning approaches for graph data. We will present two approaches to graph representation learning. First, we present Patchy-SAN, a framework for learning convolutional neural networks (CNNs) for graphs. Similar to CNNs for images, the method efficiently constructs locally connected neighborhoods from the input graphs. These neighborhoods serve as the receptive fields of a convolutional architecture, allowing the framework to learn effective graph representations. Second, we will discuss a novel approach to learning knowledge base representations. Both frameworks learn representations of small and locally connected regions of the input graphs, generalize these to representations of more and more global regions, and finally embed the input graphs in a low-dimensional vector space. The resulting embeddings are successfully used in several classification and prediction tasks.

Mathias Niepert is a senior researcher at NEC Labs Europe in Heidelberg. From 2012-2015 he was a research associate at the University of Washington, Seattle, and from 2009-2012 also a member of the Data and Web Science Research Group at the University of Mannheim. Mathias was fortunate enough to win awards at international conferences such as UAI, IJCNLP, and ESWC. He was the principle investigator of a Google faculty and a bilateral DFG-NEH research award. His research interests include tractable machine learning, probabilistic graphical models, statistical relational learning, digital libraries and, more broadly, the large-scale extraction, integration, and analysis of structured data.

Sildes from Topical Seminar

The slides from Mathias Niepert's talk can be found here.


Predictable Real-Time Computing in GPU-enabled Systems

Graphic processing units (GPUs) have seen wide-spread use in several computing domains as they have the power to enable orders of magnitude faster and more energy-efficient execution of many applications. Unfortunately, it is not straightforward to reliably adopt GPUs in many safety-critical embedded systems that require predictable real-time correctness, one of the most important tenets in certification required for such systems. A key example is the advanced automotive system where timeliness of computations is an essential requirement of correctness due to the interaction with the physical world. In this talk, I will describe several system-level and algorithmic challenges on ensuring predictable real-time correctness in GPU-enabled systems, as well as our recent research results on using suspension-based approaches to resolve some of the issues.


Cong Liu is currently a tenure-track assistant professor in the Department of Computer Science at the University of Texas at Dallas, after obtaining his Ph.D in Computer Science from the University of North Carolina at Chapel Hill in summer 2013. His current research focuses on Real-Time and Embedded Systems, Battery-Powered Cyber-Physical Systems, and Mobile and Cloud Computing. He is the author and co-author of over 50 papers in premier journals and conferences such as RTSS, ICCPS, ECRTS , RTAS, EMSOFT, ICNP, INFOCOM. He received the Best Student Paper Award at the 30th IEEE Real-Time Systems Symposium, the premier real-time and embedded systems conference; he also received the best papers award at the 17th RTCSA.

The Fundamental Theorem of Perfect Simulation

Perfect simulation algorithms give a method for sampling exactly from high dimensional distributions. With applications both in Bayesian and Frequentist Statistics, Computer Science approximation algorithms, and statistical physics, several protocols for creating such algorithms exist. In this talk I will explore the basic principle of probabilistic recursion that underlies these different algorithms, and show how the Fundamental Theorem of Perfect Simulation can be used as a tool for building more complex methods.

Academic Bio

Mark Huber received his Ph.D. in Operations Research from Cornell University working in the area of perfect simulation. After completing a two-year postdoc with Persi Diaconis at Stanford, he begin a stint at Duke, where he received an NSF Early Career Award. Huber then moved to the Department of Mathematical Sciences at Claremont McKenna College, where he is the Fletcher Jones Foundation Associate Professor of Mathematics and Statistics, and Robert S. Day Fellow. Currently he is also the chair of the department.

Reprocessing and analysis of high-throughput data to identify novel therapeutic non-coding targets in cancer

Genome-wide studies have shown that our genome is pervasively transcribed, producing a complex pool of coding and non-coding transcripts that shape the cancer transcriptome. Long non-coding RNAs or lncRNAs dominate the non-coding transcriptome and are emerging as key regulatory factors in human disease and development. Through re-analysis of RNA-sequencing data from 10000 cancer patients across 33 cancer types (The Cancer Genome Atlas), we define a PAN-cancer lncRNA landscape, revealing insights in cancer-specific lncRNAs with therapeutic and diagnostic potential.

Journalism students from the TU Dortmund University spoke during the media talk "Think Big" with various experts on the topic of "Big Data". In the series Prof. Kristian Kersting (projects A6 and B4), Prof. Christian Sohler (projects A2, A6 and C4), Prof. Katharina Morik (projects A1, B3 and C3) and Prof. Michael ten Hompel (project A4) were guests of the students at the TU Dortmund University. They discussed questions of large data collections, their analysis, forecasts on them and even more. The questions discussed were for example how data mining influenced our life, which conclusion is possible due to our social network on facebook or how data mining influenced the medicine. They also talked about the risks arising with data mining. Another topic was Industry 4.0, for example warehousing could be automated by seonsrs and data mining, on long-term there could be self-organizing systems. This format was created under the direction of journalism professor Michael Steinbrecher, whose research area also deals with the topic "Big Data".

Broadcast with Prof. Kristian Kersting

Broadcast with Prof. Christian Sohler

Broadcast with Prof. Katharina Morik

Broadcast with Prof. Michael ten Hompel


Katharina Morik at the bestowal of the certificate

Katharina Morik, speaker of the collaborative research center SFB 876, has been appointed as a new member of the North Rhine-Westphalian Academy of Sciences, Humanities and the Arts for the group Engineering and Economic Science. The academy puts its focus on fundamental research. It provides a platform for discussion via regular public events and bridges the gap between research, government and industry. The certficate of appointment will be granted at the yearly academy ceremony on 11th of March 2016.

By the appointment of Katharina Morik does the acadamy acknowledge her outstanding resarch profile, her achievements as speaker of the research center SFB 876 and her international reputation and research in machine learning.

When Bits meet Joules: A view from data center operations' perspective

The past decade has witnessed the rapid advancement and great success of information technologies. At the same time, new energy technologies including the smart grid and renewables have gained significant momentum. Now we are in a unique position to enable the two technologies to work together and spark new innovations.

In this talk, we will use data center as an example to illustrate the importance of the co-design of information technologies and new energy technologies. Specifically, we will focus on how to design cost-saving power management strategies for Internet data center operations. We will conclude the discussion with future work and directions.


Xue (Steve) Liu is a William Dawson Scholar and an Associate Professor in the School of Computer Science at McGill University. He received his Ph.D. in Computer Science (with multiple distinctions) from the University of Illinois at Urbana-Champaign. He has also worked as the Samuel R. Thompson Chaired Associate Professor in the University of Nebraska-Lincoln and at HP Labs in Palo Alto, California. His research interests are in computing systems and communication networks, cyber-physical systems, and smart energy technologies. His research appeared in top venues including Mobicom,S&P (Oakland), Infocom, ACM Multimedia, ICNP, RTSS, RTAS, ICCPS, KDD, ICDE etc, and received several best paper awards.

Dr. Liu's research has been reported by news media including the New York Times, IDG/Computer World, The Register, Business Insider, Huffington Post, CBC, NewScientist, MIT Technology Review's Blog etc. He is a recipient of the Outstanding Young Canadian Computer Science Researcher Prizes from the Canadian Association of Computer Science, and a recipient of the Tomlinson Scientist Award from McGill University.

He has served on the editorial boards of IEEE Transactions of Parallel and Distributed Systems (TPDS), IEEE Transactions on Vehicular Technology (TVT), and IEEE Communications Surveys and Tutorials (COMST).

Analysis and Optimization of Approximate Programs

Many modern applications (such as multimedia processing, machine learning, and big-data analytics) exhibit an inherent tradeoff between the accuracy of the results they produce and the execution time or energy consumption. These applications allow us to investigate new optimization approaches that exploit approximation opportunities at every level of the computing stack and therefore have the potential to provide savings beyond the reach of standard semantics-preserving program optimizations.

In this talk, I will describe a novel approximate optimization framework based on accuracy-aware program transformations. These transformations trade accuracy in return for improved performance, energy efficiency, and/or resilience. The optimization framework includes program analyses that characterize the accuracy of transformed programs and search techniques that navigate the tradeoff space induced by transformations to find approximate programs with profitable tradeoffs. I will particularly focus on how we (i) automatically generate computations that execute on approximate hardware platforms, while ensuring that they satisfy the developer's accuracy specifications and (ii) apply probabilistic reasoning to quantify uncertainty coming from inputs or caused by program transformations, and analyze the accuracy of approximate computations.


Sasa Misailovic graduated with a Ph.D. from MIT in 2015. He will start as an Assistant Professor in the Computer Science Department at the University of Illinois at Urbana-Champaign in Fall 2016. During this academic year he is visiting Software Reliability Lab at ETH Zurich. His research interests include programming languages, software engineering, and computer systems, with an emphasis on improving performance, energy efficiency, and resilience in the face of software errors and approximation opportunities.

Discovering Compositions

The goal of exploratory data analysis -- or, data mining -- is making sense of data. We develop theory and algorithms that help you understand your data better, with the lofty goal that this helps formulating (better) hypotheses. More in particular, our methods give detailed insight in how data is structured: characterising distributions in easily understandable terms, showing the most informative patterns, associations, correlations, etc.

My talk will consist of three parts. I will start by explaining what is a pattern composition. Simply put, databases often consist of parts, each best characterised by a different set of patterns. Young parents, for example, exhibit different buying behaviour than elderly couples. Both, however, buy bread and milk. A pattern composition jointly characterises the similarities and differences between such components of a database, without redundancy or noise, by including only patterns that are descriptive for the data, and assigning those patterns only to the relevant components of the data.

In the second part of my talk I will go into the more important question of how to discover the pattern composition of a database when all we have is just a single database that has not yet been split into parts. That is, we are after that partitioning of the data by which we can describe it most succinctly using a pattern composition.

In the third part I will make the connection to causal discovery, as in the end that is our real goal.

On March 7 the panel discussion on Big Data - Small devices has been held in New York. The video including presentations and discussion is now available online. The collaborative research center SFB 876 has been represented by Katharina Morik (Resource-Aware Data Science), Wolfgang Rhode (Science for Science) and Kristian Kersting (Not so Fast: Driving into (Mobile) Traffic Jams), while a as a local presenter Claudia Perlich (Dstillery) gave her view on big data analysis. The discussion itself was moderated by Tina Eliassi-Rad (Northeastern-University/Rutgers University). The event has been organized by the New York German Center for Research and Innovation and co-sponsored by Deutsche Forschungsgemeinschaft (DFG) and University Allicane UA Ruhr.


Graphs, Ellipsoids, and Balls-into-Bins: A linear-time algorithm for constructing linear-sized spectral sparsification

Spectral sparsification is the procedure of approximating a graph by a sparse graph such that many properties between these two graphs are preserved. Over the past decade, spectral sparsification has become a standard tool in speeding up runtimes of the algorithms for various combinatorial and learning problems.

In this talk I will present our recent work on constructing a linear-sized spectral sparsification in almost-linear time. In particular, I will discuss some interesting connections among graphs, ellipsoids, and balls-into-bins processes.

This is based on joint work with Yin Tat Lee (MIT). Part of the results appeared at FOCS'15.

On March 7th a panel discussion on topics of the SFB will be held at the German Embassy New York. The event is organized by the University Alliance UA Ruhr and the German Center for Research and Innovation. Speakers from the SFB are Katharina Morik, Wolfgang Rhode and Kristian Kersting. The group of presenters also includes Claudia Perlich, Dstillery New York, und is moderated by Tina Eliassi-Rad (Northeastern University, currently on leave from Rutgers University).


The amount of digitally recorded information in today’s world is growing exponentially. Massive volumes of user-generated information from smart phones and social media are fueling this Big Data revolution. As data flows throughout every sector of our global economy, questions emerge from commercial, government, and non-profit organizations interested in the vast possibilities of this information. What is Big Data? How does it create value? How can we as digital consumers and producers personally benefit? While Big Data has the potential to transform how we live and work, others see it as an intrusion of their privacy. Data protection concerns aside, the mere task of analyzing and visualizing large, complex, often unstructured data will pose great challenges to future data scientists. We invite you to join us for an exciting discussion on the technological developments and sociological implications of this Big Data revolution.


Kernel-based Machine Learning from Multiple Information Sources

In my talk I will introduce multiple kernel learning, a machine learning framework for integrating multiple types of representation into the learning process. Furthermore I will present an extension called multi-task multiple kernel learning, which can be used for effectively learning from multiple sources of information, even when the relations between the sources are completely unknown. The applicability of the methodology is illustrated by applications taken from the domains of visual object recognition and computational biology.


Since 2014 Marius Kloft is a junior professor of machine learning at the Department of Computer Science of Humboldt University of Berlin, where he is since 2015 also leading the Emmy-Noether research group on statistical learning from dependent data. Prior to joining HU Berlin he was a joint postdoctoral fellow at the Courant Institute of Mathematical Sciences and Memorial Sloan-Kettering Cancer Center, New York, working with Mehryar Mohri, Corinna Cortes, and Gunnar Rätsch. From 2007-2011, he was a PhD student in the machine learning program of TU Berlin, headed by Klaus-Robert Müller. He was co-advised by Gilles Blanchard and Peter L. Bartlett, whose learning theory group at UC Berkeley he visited from 10/2009 to 10/2010. In 2006, he received a diploma (MSc equivalent) in mathematics from the University of Marburg with a thesis in algebraic geometry.

Marius Kloft is interested in statistical machine learning methods for analysis of large amounts of data as well as applications, in particular, computational biology. Together with colleagues he has developed learning methods for integrating the information from multiple sensor types (multiple kernel learning) or multiple learning tasks (transfer learning), which have successfully been applied in various application domains, including network intrusion detection (REMIND system), visual image recognition (1st place at ImageCLEF Visual Object Recognition Challenge), computational personalized medicine (1st place at NCI-DREAM Drug Sensitivity Prediction Challenge), and computational genomics (most accurate gene start detector in international comparison of 19 leading models). For his research, Marius Kloft received the Google Most Influential Papers 2013 award.

Peter Marwedel in the German embassy

From the 19th to the 20th of January, the "U.S.-German Workshop on the Internet of Things (IoT)/Cyber-Physical-System (CPS)" took place in Washington. The purpose of the workshop was the preparation of an intensified German-American collaboration in the subject area of the workshop. The workshop was organized by the American National Science Foundation (NSF), the Fraunhofer-Institute for Software in Kaiserslautern, Germany, and the CPS-VO. The CPS-VO organizes the CPS programs that are funded by the NSF. The workshop was well-cast with high-ranking lecturers. The first day was hosted at the German Embassy in Washington. On the second day, the workshop was conducted in Arlington, in close proximity to the National Science Foundation.

The workshop made it obvious that economy, research institutes and universities, both in the USA and in Germany, have high expectations of the potential that CPS- and IoT-Systems hold. The participants saw complementary strong points on both sides of the Atlantic. While the USA has its strong suit in the subject area of the Internet, Germany is especially strong in the fields of security and confidentiality from the American view point.

As one of three representatives of German universities, Prof. Peter Marwedel was invited to give a lecture. In his lecture he talked about the possibilities of CPS- and IoT-Systems but also emphasized the necessity to consider efficient resource usage and resource constraints during implementation. This is especially the case in applications with big data volumes and complex algorithms, he said as he referred to the collaborative research center SFB 876. Because of technical difficulties, the lecture was recorded again in an uninterrupted version.The video is available on Youtube.

The workshop also produced some opportunities to add aspects of resource efficiency and big data volumes to future consideration as subject areas.


Giovanni de Micheli

Nano-Tera.ch: Electronic Technology for Health Management

Electronic-health or E-health is a broad area of engineering that leverages transducer, circuit and systems technologies for applications to health management and lifestyle. Scientific challenges relate to the acquisition of accurate medical information from various forms of sensing inside/outside the body and to the processing of this information to support or actuate medical decisions. E-health systems must satisfy safety, security and dependability criteria and their deployment is critical because of the low-power and low-noise requirements of components interacting with human bodies. E-health is motivated by the social and economic goals of achieving better health care at lower costs and will revolutionize medical practice in the years to come. The Nano-Tera.ch program fosters the use of advance nano and info technologies for health and environment monitoring. Research issues in these domains within nano-Tera.ch will be shown as well as practical applications that can make a difference in everyday life.


Giovanni De Micheli is Professor and Director of the Institute of Electrical Engineering and of the Integrated Systems Centre at EPF Lausanne, Switzerland. He is program leader of the Nano-Tera.ch program. Previously, he was Professor of Electrical Engineering at Stanford University.He holds a Nuclear Engineer degree (Politecnico di Milano, 1979), a M.S. and a Ph.D. degree in Electrical Engineering and Computer Science (University of California at Berkeley, 1980 and 1983).

Prof. De Micheli is a Fellow of ACM and IEEE and a member of the Academia Europaea. His research interests include several aspects of design technologies for integrated circuits and systems, such as synthesis for emerging technologies, networks on chips and 3D integration. He is also interested in heterogeneous platform design including electrical components and biosensors, as well as in data processing of biomedical information. He is author of: Synthesis and Optimization of Digital Circuits, McGraw-Hill, 1994, co-author and/or co-editor of eight other books and of over 600 technical articles. His citation h-index is 85 according to Google Scholar. He is member of the Scientific Advisory Board of IMEC (Leuven, B), CfAED (Dresden, D) and STMicroelectronics.

Dr. Lee presented his recent research about proximal point algorithms to solve nonsmooth convex penalized regression problems, in the 8th International Conference on the ERCIM WG on Computational and Methodological Statistics (CMStatistics 2015), London UK, Dec 12-14 (http://www.cmstatistics.org/CMStatistics2015/), in the session EO150: Convex optimization in statistics. Dr. Lee was invited by the session organizer Prof. Keith Knight from Department of Statistics, University of Toronto.

Accelerated proximal point methods for solving penalized regression problems

Efficient optimization methods to obtain solutions of penalized regression problems, especially in high dimensions, have been studied quite extensively in recent years, with their successful applications in machine learning, image processing, compressed sensing, and bioinformatics, just to name a few. Amongst them, proximal point methods and their accelerated variants have been quite competitive in many cases. These algorithms make use of special structures of problems, e.g. smoothness and separability, endowed by the choices of loss functions and regularizers. We will discuss two types of first-order proximal point algorithms, namely accelerated proximal gradient descent and accelerated proximal extra gradient techniques, focusing on the latter, in the context of Lasso and generalized Dantzig selector.


Alexander Schramm

On 8th of December Alexander Schramm becomes an adjunct professor. The faculty director of the medical faculty, Prof. Dr. Jan. Buer, award him for his work on the subject of "Experimental Oncology". So he can continue his research on molecular causes for development of tumors during the childhood, as a part of the CRC876.

Katharina Morik

The National Acadamy of Science and Engineering advises society and governments in all questions regarding the future of technology. Acatech is one of the most important academies for novel technology research. Additionally, acatech provides a platform for transfer of concepts to applications and enables the dialogue between science and industry. The members work together with external researchers in interdisciplinary projects to ensure the practiability of recent trends. Internationally oriented, acatech wants to provide solutions for global problems and new perspectives for technological value added in Germany.

By the appointment of Katharina Morik as member of acatech, the acadamy recognizes her research profile, her achievements as speaker of the collaborative research center SFB 876, her international reputation and innovative research in machine learning.

From Average Treatment Effects to Batch Learning from Bandit Feedback

Log data is one of the most ubiquitous forms of data available, as it can be recorded from a variety of systems (e.g., search engines, recommender systems, ad placement) at little cost. The interaction logs of such systems (e.g., an online newspaper) typically contain a record of the input to the system (e.g., features describing the user), the prediction made by the system (e.g., a recommended list of news articles) and the feedback (e.g., number of articles the user read). This feedback, however, provides only partial-information feedback -- aka ''contextual bandit feedback'' -- limited to the particular prediction shown by the system. This is fundamentally different from conventional supervised learning, where ''correct'' predictions (e.g., the best ranking of news articles for that user) together with a loss function provide full-information feedback.

In this talk, I will explore approaches and methods for batch learning from logged bandit feedback (BLBF). Unlike the well-explored problem of online learning with bandit feedback, batch learning with bandit feedback does not require interactive experimental control of the underlying system, but merely exploits log data collected in the past. The talk explores how Empirical Risk Minimization can be used for BLBF, the suitability of various counterfactual risk estimators in this context, and a new learning method for structured output prediction in the BLBF setting. From this, I will draw connections to methods for causal inference in Statistics and Economics.

Joint work with Adith Swaminathan.

Thorsten Joachims is a Professor in the Department of Computer Science and the Department of Information Science at Cornell University. His research interests center on a synthesis of theory and system building in machine learning, with applications in information access, language technology, and recommendation. His past research focused on support vector machines, text classification, structured output prediction, convex optimization, learning to rank, learning with preferences, and learning from implicit feedback. In 2001, he finished his dissertation advised by Prof. Katharina Morik at the University of Dortmund. From 1994 to 1996 he was a visiting scholar with Prof. Tom Mitchell at Carnegie Mellon University. He is an ACM Fellow, AAAI Fellow, and Humboldt Fellow.


Waiting Time Models for Mutual Exclusivity and Order Constraints in Cancer Progression

In recent years, high-throughput sequencing technologies have generated an unprecedented amount of genomic cancer data, opening the way to a more profound understanding of tumorigenesis. In this regard, two fundamental questions have emerged: 1) which alterations drive tumor progression? and 2) what are the evolutionary constraints on the order in which these alterations occur? Answering these questions is crucial for targeted therapeutic decisions, which are often based on the identification of early genetic events. During this talk, I will present two models, TiMEx: a waiting time model for mutually exclusive cancer alterations, and pathTiMEx: a waiting time model for the joint inference of mutually exclusive cancer pathways and their dependencies in tumor progression. We regard tumorigenesis as a dynamic process, and base our model on the temporal interplay between the waiting times to alterations, characteristic for every gene and alteration type, and the observation time. We assume that, in tumor development, alterations can either occur independently, or depend on eachother by being part of the same pathway or by following particular progression paths. By inferring these two types of potential dependencies simultaneously, we jointly addresses the two fundamental questions of identifying important cancer genes and progression, on the basis of the same cancer dataset. On biological cancer datasets, TiMEx identifies gene groups with stronger functional biological relevance than previous methods, while also proposing many new candidates for biological validation. Additionally, the results of pathTiMEx on tumor progression are highly consistent with the literature in the case of colorectal cancer and glioblastoma.



Simona Constantinescu is a graduate student at ETH Zurich, in Switzerland, in Niko Beerenwinkel's group. Her main research interest is the design of models and algorithms with application to cancer genomics data. Particularly, she is working on projects related to inferring the temporal progression and mutual exclusivity in cancer, evolutionary dynamics of cancer, and toxicogenomics. Simona obtained a Master's Degree in Computational Biology and Bioinformatics (Department of Computer Science) from ETH Zurich, and degrees in Mathematics and Economic Informatics from the University of Bucharest. During her Master studies, she was awarded an ETH Excellence Scholarship.

Significant Pattern Mining

Pattern Mining is steadily gaining importance in the life sciences: Fields like Systems Biology, Genetics, or Personalized Medicine try to find patterns, that is combinations of (binary) features, that are associated with the class membership of an individual, e.g. whether the person will respond to a particular medical treatment or not.
Finding such combinations is both a computational and a statistical challenge. The computational challenge arises from the fact that a large space of candidate combinations has to be explored. The statistical challenge is due to each of these candidates representing
one hypothesis that is to be tested, resulting in an enormous multiple testing problem. While there has been substantial effort in making the search more efficient, the multiple testing problem was deemed intractable for many years. Only recently, new results started to emerge in data mining, which promise to lead to solutions for this multiple testing problem and to important applications in the biomedical domain. In our talk, we will present these recent results, including our own work in this direction.


Prof. Dr. Karsten Borgwardt is Professor of Data Mining at ETH Zürich, at the Department of Biosystems located in Basel. His work has won several awards, including the NIPS 2009 Outstanding Paper Award, the Krupp Award for Young Professors 2013 and a Starting Grant 2014 from the ERC-backup scheme of the Swiss National Science Foundation. Since 2013, he is heading the Marie Curie Initial Training Network for "Machine Learning for Personalized Medicine" with 12 partner labs in 8 countries. The business magazine "Capital" lists him as one of the "Top 40 under 40" in Science in/from Germany.

Whole Systems Energy Transparency (or: More Power to Software Developers!)

Energy efficiency is now a major (if not the major) concern in electronic systems engineering. While hardware can be designed to save a modest amount of energy, the potential for savings are far greater at the higher levels of abstraction in the system stack. The greatest savings are expected from energy consumption-aware software. This talk emphasizes the importance of energy transparency from hardware to software as a foundation for energy-aware system design. Energy transparency enables a deeper understanding of how algorithms and coding impact on the energy consumption of a computation when executed on hardware. It is a key prerequisite for informed design space exploration and helps system designers to find the optimal tradeoff between performance, accuracy and energy consumption of a computation. Promoting energy efficiency to a first class software design goal is therefore an urgent research challenge. In this talk I will outline the first steps towards giving "more power" to software developers. We will cover energy monitoring of software, energy modelling at different abstraction levels, including insights into how data affects the energy consumption of a computation, and static analysis techniques for energy consumption estimation.


Kerstin Eder is a Reader in Design Automation and Verification at the Department of Computer Science of the University of Bristol. She set up the Energy Aware COmputing (EACO) initiative (http://www.cs.bris.ac.uk/Research/eaco/) and leads the Verification and Validation for Safety in Robots research theme at the Bristol Robotics Laboratory (http://www.brl.ac.uk/vv).

Her research is focused on specification, verification and analysis techniques which allow engineers to design a system and to verify/explore its behaviour in terms of functional correctness, performance and energy efficiency. Kerstin has gained extensive expertise in verifying complex microelectronic designs at leading semiconductor design and EDA companies. She seeks novel combinations of formal verification and analysis methods with state-of-the-art simulation/test-based approaches to achieve solutions that make a difference in practice.

Her most recent work includes Coverage-Driven Verification for robots that directly interact with humans, using assertion checks and theorem proving to verify control system designs, energy modelling of software and static analysis to predict the energy consumption of programs. She is particularly interested in safety assurance for learning machines and in software design for low power.

Kerstin has co-authored over 50 internationally refereed publications, was awarded a Royal Academy of Engineering "Excellence in Engineering" prize and manages a portfolio of active research grants valued in excess of £1.7M.

She is currently Principal Investigator on the EPSRC projects "Robust Integrated Verification of Autonomous Systems" and "Trustworthy Robotic Assistants". She also leads the Bristol team working on the EC-funded Future and Emerging Technologies MINECC (Minimizing Energy Consumption of Computing to the Limit) collaborative research project ENTRA (Whole Systems Energy Transparency) which aims to promote energy efficiency to a first class software design goal.

Kerstin holds a PhD in Computational Logic, an MSc in Artificial Intelligence and an MEng in Informatics.

After a successful first edition in polish Warsaw the second workshop on "Algorithmic Challenges of Big Data" (short: ACBD) took place on September 28-30, which was organized by SFB876 and the department of computer science. ACBD focused on information compression/extraction, ressource efficient algorithms, distributed and parallel computing, sublinear algorithms and other question arising in modern data analysis.

Teilnehmer des internationalen ACBD-Workshops

Horsaal whrend des Workshops

Cache-Efficient Aggregation: Hashing Is Sorting

Abstract: For decades researchers have studied the duality of hashing and sorting for the implementation of the relational operators, especially for efficient aggregation. Depending on the underlying hardware and software architecture, the specifically implemented algorithms, and the data sets used in the experiments, different authors came to different conclusions about which is the better approach. In this paper we argue that in terms of cache efficiency, the two paradigms
are actually the same. We support our claim by showing that the complexity of hashing is the same as the complexity of sorting in the external memory model. Furthermore we make the similarity of the two approaches obvious by designing an algorithmic framework that allows to switch seamlessly between hashing and sorting during execution. The fact that we mix hashing and sorting routines in the same algorithmic framework allows us to leverage the advantages of both approaches and makes their similarity obvious. On a more practical note, we also show how to achieve very low constant factors by tuning both the hashing and the sorting routines to modern hardware. Since we observe a complementary dependency of the constant factors of the two routines to the locality of the input, we exploit our framework to switch to the
faster routine where appropriate. The result is a novel relational aggregation algorithm that is cache-efficient---independently and without prior knowledge of input skew and output cardinality---, highly parallelizable on modern multi-core systems, and operating at a speed close to the memory bandwidth, thus outperforming the state-of-the-art by up to 3.7x.

Wen-Hung Huang and Jian-Jia Chen (B2 SFB project) received the Best Paper Award of IEEE Real-Time Computing Systems and Applications. (RTCSA) Aug 19, 2015 - Aug 21, 2015, Hong Kong. The awarded paper is "Techniques for Schedulability Analysis in Mode Change Systems under Fixed-Priority Scheduling”. The paper explores a very essential scheduling property in cyber-physical systems when the execution time, relative deadline, and period of sampling can change over time according to different physical conditions. We conclude a 58.57% utilization bound for a very dynamic environment under mode-level fixed-priority scheduling. 


Abstract: With the advent of cyber-physical systems, realtime tasks shall be run in different modes over time to react to the change of the physical environment. It is preferable to adopt high expressive models in real-time systems. In the light of simple implementation in kernels, fixed-priority scheduling has been widely adopted in commercial real-time systems. In this work we derive a technique for analyzing schedulability of the system where tasks can undergo mode change under fixed-priority scheduling. We study two types of fixed-priority scheduling in mode change systems: task-level and mode-level fixed-priority scheduling. The proposed tests run in polynomial time. We further show that a utilization of 58.57% can be guaranteed in implicit-deadline multi-mode systems if each mode is prioritized according to rate-monotonic policy. The effectiveness of the proposed tests is also shown via extensive simulation results.

Sommerschule 2015


As part of the ECML PKDD, in cooperation with the SFB 876, a summer school was hosted in Porto this year. For further information click here.

The Paper Online Analysis of High-Volume Data Streams in Astroparticle Physics has won the Best industrial paper award of the ECML-PKDD 2015.
On thursday the 10th of September will the paper be presented in a special session at the ECML-PKDD in Porto.

August  17,  2015

The 2nd Workshop on Algorithmic Challenges of Big Data (ACBD 2015)

September 28-30, 2015 in Dortmund, Germany

The Department of Computer Science and the SFB876 are excited to announce the second workshop on Algorithmic Challenges of Big Data. ACBD is focused on information compression/extraction, ressource efficient algorithms, distributed and parallel computing, sublinear algorithms, machine learning, and other question arising in modern data analysis.

ACBD 2015 will include invited presentations from leading researches in the field, as well as a forum for discussions.


To register, please send an email to acbd-info@ls2.cs.tu-dortmund.de. The registration deadline is September 15th. There is no registration fee.

Invited speakers

Stephen Alstrup (University of Copenhagen)
Hannah Bast (University of Freiburg)
Jarek Byrka (University of Wroclaw)
Ioannis Caragiannis (University of Patras)
Graham Cormode (University of Warwick)
Artur Czumaj (University of Warwick)
Ilias Diakonikolas (University of Edinburgh)
Guy Even (Tel-Aviv University)
Pierre Fraigniaud (CNRS and University Paris Diderot)
Fabrizio Grandoni (IDSIA)
Giuseppe F. Italiano (University of Rome “Tor Vergata”)
Robert Krauthgamer (The Weizmann Institute of Science)
Stefano Leonardi (University of Rome “Sapienza”)
Yishay Mansour (Microsoft Research and Tel-Aviv University)
Alberto Marchetti-Spaccamela (University of Rome “Sapienza”)
Kurt Mehlhorn (Max Planck Institute for Computer Science)
Friedhelm Meyer auf der Heide (University of Paderborn)
Ulrich Meyer (Goethe University Frankfurt am Main)
Adi Rosen (CNRS and Universite Paris Diderot)
Piotr Sankowski (University of Warsaw)
Ola Svensson (EPFL)
Dorothea Wagner (Karlsruhe Institute of Technology)


TU Dortmund
Otto Hahn Straße 14, 44227 Dortmund, Germany


Christian Sohler
Alexander Munteanu
Chris Schwiegelshohn

For further information, please contact us under



In Learning with Label Proportions (LLP), the objective is to learn a supervised classifier when, instead of labels, only label proportions for bags of observations are known. This setting has broad practical relevance, in particular for privacy preserving data processing. We first show that the mean operator, a statistic which aggregates all labels, is sufficient for the minimization of many proper losses with linear classifiers without using labels. We provide a fast learning algorithm that estimates the mean operator via a manifold regularizer with guaranteed approximation bounds. Experiments display that our algorithms outperform the state-of-the-art in LLP, and in many cases compete with the Oracle, that learns knowing all labels. In more recent work, we show that the mean operator’s trick can be generalized, such that it is possible to learn without knowing individual feature vectors either. We can leverage this surprising result to design learning algorithms that do not need any individual example -only their aggregates- for training and for which many privacy guarantees can be proven.

Bio: Giorgio Patrini is a PhD student in Machine Learning at the Australian National University/NICTA. His main research is on understanding how learning is possible when some variables are only known as aggregates; for example, how to learn individual-level models from census-like data. His research naturally touches themes in social sciences, econometrics and privacy. He cofounded and advises Waynaut, an online travel start-up based in Milan, Italy.

Huge progress on understanding neuroblastoma tumors

Treatment of children with cancer has seen a lot of improvements in recent years. Major concern of doctors is the recurrence of tumors, often leading to worse treatment results. Researchers of the collaborative research center together with national and international colleagues now investigated differences in the genetic expressions of tumors in several different stadiums.

A current model of tumorigenesis implies that a collection of cascaded mutational events occur and that it therefore is critical to identify relevant events to better understand mechanisms underlying disease progression. For discovery, integrated analysis of high dimensional data is a key technology, which is however very challenging because of computational and statistical issues. In our work, we developed and applied an integrated data analysis technique, focusing on differences between primary (at diagnosis) and recurrent neuroblastoma cancer patients, profiled with whole-exome sequencing, mRNA expression, array CGH and DNA methylation data. Our analysis discovered characteristics of evolutionary dynamics in neuroblastoma, along with new mutational changes in relapse patients. Our results showed that this type of analysis is a promising approach to detect genetic and epigenetic changes in cancer evolution.

Research on this topic has been funded by the Deutsche Forschungsgemeinschaft (DFG) and supported by the Deutsche Konsortium für translationale Krebsforschung (DKTK) and the Mercator Research Center Ruhr (MERCUR).

An interview with a project leader, Sangkyun Lee, has been published on the TU Dortmund homepage.


On 22 and 23 September the 6th Symposium "Metabolites in Process Exhaust Air and Breath Air" will take place at Reutlingen University. It is a joint event with SFB 876, the new Center of Breath Research at the University of Saarland and the B&S Analytik Dortmund. Participation is free. Prior registration is mandatory and possible until 1st august. More details can be found here.

This year the open access journal of metabolism and metabolomics Metabolites announced the recipients of the first Metabolites Best Paper Award for 2015. Paper submitted by Anne-Christin Hauschild, Dominik Kopczynski, Marianna D’Addario, Jörg Ingo Baumbach, Sven Rahmann and Jan Baumbach titled "Peak Detection Method Evaluation for Ion Mobility Spectrometry by Using Machine Learning Approaches" won this price. Supported by SFB876 and DFG it was published in Metabolites in 2013 and can be found here.


Apache Flink and the Berlin Big Data Center

Data management research, systems, and technologies have drastically improved the availability of data analysis capabilities, particularly for non-experts, due in part to low-entry barriers and reduced ownership costs (e.g., for data management infrastructures and applications). Major reasons for the widespread success of database systems and today’s multi-billion dollar data management market include data independence, separating physical representation and storage from the actual information, and declarative languages, separating the program specification from its intended execution environment. In contrast, today’s big data solutions do not offer data independence and declarative specification.

As a result, big data technologies are mostly employed in newly-established companies with IT-savvy employees or in large well-established companies with big IT departments. We argue that current big data solutions will continue to fall short of widespread adoption, due to usability problems, despite the fact that in-situ data analytics technologies achieve a good degree of schema independence. In particular, we consider the lack of a declarative specification to be a major road-block, contributing to the scarcity in available data scientists available and limiting the application of big data to the IT-savvy industries. In particular, data scientists currently have to spend a lot of time on tuning their data analysis programs for specific data characteristics and a specific execution environment.

We believe that computer science research needs to bring forward the powerful concepts of declarative specification, query optimization and automatic parallelization as well as adaption to novel hardware, data characteristics and workload to current data analysis systems, in order to achieve the broad big data technology adoption and effectively deliver the promise that novel big data technologies offer. We will present the technologies that we have researched and developed in the context of Apache Flink (http://flink.apache.org ) and will give an outlook on further research and development that we are conducting at Database Systems and Information Management Group (DIMA) at TU Berlin and the Berlin Big Data Center (http://bbdc.berlin , http://www.dima.tu-berlin.de) as well as some current research challenges.


Volker Markl is a Full Professor and Chair of the Database Systems and Information Management (DIMA) group at the Technische Universität Berlin (TU Berlin). Volker also holds a position as an adjunct full professor at the University of Toronto and is director of the research group “Intelligent Analysis of Mass Data” at DFKI, the German Research Center for Artificial Intelligence. Earlier in his career, Dr. Markl lead a research group at FORWISS, the Bavarian Research Center for Knowledge-based Systems in Munich, Germany, and was a Research Staff member & Project Leader at the IBM Almaden Research Center in San Jose, California, USA. His research interests include: new hardware architectures for information management, scalable processing and optimization of declarative data analysis programs, and scalable data science, including graph and text mining, and scalable machine learning. Volker Markl has presented over 200 invited talks in numerous industrial settings and at major conferences and research institutions worldwide.

He has authored and published more than 100 research papers at world-class scientific venues. Volker regularly serves as member and chair for program committees of major international database conferences. He has been a member of the computer science evaluation group of the Natural Science and Engineering Research Council of Canada (NSERC). Volker has 18 patent awards, and he has submitted over 20 invention disclosures to date. Over the course of his career, he has garnered many prestigious awards, including the European Information Society and Technology Prize, an IBM Outstanding Technological Achievement Award , an IBM Shared University Research Grant , an HP Open Innovation Award , an IBM Faculty Award, a Trusted-Cloud Award for Information Marketplaces by the German Ministry of Economics and Technology, the Pat Goldberg Memorial Best Paper Award, and a VLDB Best Paper award. He has been speaker and principal investigator of the Stratosphere collaborative research unit funded by the German National Science Foundation (DFG), which resulted in numerous top-tier publications as well as the "Apache Flink" big data analytics system. Apache Flink is available open source and is currently used in production by several companies and serves as basis for teaching and research by several institutions in Germany, Europe and the United States. Volker currently serves as the secretary of the VLDB Endowment, is advising several companies and startups, and in 2014 was elected as one of Germany's leading "digital minds" (Digitale Köpfe) by the German Informatics Society (GI).

B-meson decay observed

As part of the SFB 876, data from the LHCb experiment at CERN is analyzed by the project C5. A major challenge is to observe the variety of events and detect the most interesting ones while their occurrence is very rare. The LHCb-group has now succeeded in cooperation with a further experiment at CERN, the CMS experiment, to observe the yet rare decay of a B-meson. The decay Bs0 → μ+ μ- was detected while the 50 observed decays yield a branching ratio of about 3 ∙ 10−9 from more than 1014 proton-proton collisions. The importance of this measure, which was published in the journal Nature, is very high, since it represents an extremely sensitive test of the standard model of particle physics. The measured value is in excellent agreement with the expectations of the standard model, so that new physics models are severely limited. Through the collaboration within the SFB 876 both the quality of the data analysis and thus the sensitivity of the measurements should be increased even further.


Thermal-Aware Power Budgeting and Transient Peak Computation for Dark Silicon Chip

System designers usually use TDP as power budget. However, using a single and constant value as power budget is a pessimistic approach for manycore systems.
Therefore, we proposed a new power budget concept, called Thermal Safe Power (TSP), which is an abstraction that provides safe power constraints as a function of the number of active cores. Executing cores at power values below TSP results in a higher system performance than state-of-the-art solutions, while the chip's temperature remains below the critical levels.

Furthermore, runtime decisions (task migration, power gating, DVFS, etc.) are typically used to optimize resource usages. Such decisions change the power consumption, which can result in transient temperatures much higher than steady-state scenarios. To be thermally safe, it is important to evaluate the transient peaks before making resource management decisions.
Hence, we developed a lightweight method for computing these transient peaks, called MatEx, based on analytically solving the system of thermal differential equations using matrix exponentials and linear algebra, instead of using regular numerical methods.

TSP and MatEx (available at http://ces.itec.kit.edu/download) are new steps towards dealing with dark silicon. TSP alleviates the pessimistic dark silicon estimations of TDP, and it enables new avenues for performance improvements. MatEx allows for lightweight transient and peak temperature computations useful to quickly predict the thermal behavior of runtime decisions.

Dusza presents his certificate

The prestigious awards for the best PhD thesis was awarded by the society "Verein der Freunde und Förderer der ComNets-Einrichtungen" in Aachen at 03.13.2015. This society is researching on future communication networks. The Bernhard-Walke-Award, which is endowed with 1500 Euro, was given to Dr.-Ing. Björn Dusza, for his PhD Thesis with the Title "Context-Aware Battery Lifetime Modeling for Next Generation Wireless Networks". He was working on that subject as an employee at the chair for communication networks (Prof. Dr.-Ing. C. Wietfeld) at the TU Dortmund. The thesis was a contribution to the collaborative research centre 876 "Providing Information by Resource-Constrained Data Analysis". An analysis and stochastic models were used, to research on the power consumption from LTE communication processes from end devices. The results from this thesis makes it possible for the first time, that a network operators could measure the influence from design and assigning network ressources on the battery running time. The collaborative research centre 876 uses the results to decide whether it is better that the data from a sensor is localy analyzed or transfered to some infrastructure.

Summer School 2015

The next summer school will be hosted at the faculty of sciences of the university of Porto from 2nd to 5th of September and is collocated with ECMLPKDD 2015. It will be organize by LIAAD-INESC TEC and TU Dortmund.

For the summer school, world leading researchers in machine learning and data mining will give lectures on recent techniques for example dealing with huge amounts of data or spatio-temporal streaming data.

SFB members should register via the internal registration page.


Employees  and participating professors of the collaborative research centre 876

The collaborative research centre 876 has build a bridge between the data analysis and cyber-physical systems. The second phase was granted by Deutsche Forschungsgemeinschaft, so the work is continued from 2015-2018.

The coordinator, Prof. Dr. Katharina Morik, reviewed in the starting presentation the last four years. She emphasized the collaboration from the different discplines which are computer science, statistic, medicine, physics, electrical and machine engineering. The characteristic of the collaborative research centre is, that different disciplines are paired and influence each other. Only the combined undestanding of the set of problems could be the base for the next four years. The frame for the research is to extend the runtime from smartphones or to study galaxys in astrophysics.

Dr. Stefan Michaelis gave a review about the application form for the second phase of the collaborative research centre and on the resources that were available. After that Prof. Dr. Kristian Kersting and Prof. Jian-Jia Chen talked briefly about there field of research.

Prof. Dr. Kersting introduced the "Democratization Of Optimization", which are concepts for scalable and easy-to-use methods. Many problems are so complex that they can not be complete solved in acceptable time. Methods that exploit symmetries inside the data set or incorporate expert knowledge simplify a problem so it could be solved.

Prof. Dr. Jian-Jia talked about "Flexible execution models for cyber-physical systems". Computer systems have to provide a result in a predetermined time, which depends on the task. Even in the case of dynamic processes and changing execution times, the worst case running time has to be predictable. The combination of machine learning and cyber-phyiscal systems will lead to optimal execution models in the future.

Opening the SQL Kingdom to the R-ebels

Databases today appear as isolated kingdoms, inaccessible, with a unique culture and strange languages. To benefit from our field, we expect data and analysis to be brought inside these kingdoms. Meanwhile, actual analysis takes place in more flexible, specialised environments such as Python or R. There, the same data management problems reappear, and are solved by re-inventing core database concepts. We must work towards making our hard-earned results more accessible, by supporting (and re-interpreting) their languages, by opening up internals and by allowing seamless transitions between contexts. In our talk, we present our extensive work on bringing a statistical environment (R) together with a analytical data management system (MonetDB).

Thermal-Aware Design of 2D/3D Multi-Processor System-on-Chip Architectures

The evolution of process technologies has allowed us to design compact high-performance computing servers made of 2D and 3D multi-processor system-on-chip (MPSoC) architectures. However, the increase in power density, especially in 3D-stacked MPSoCs, significantly increases heat densities, which can result in degraded performance if the system overheats or in significant overcooling costs if temperature is not properly managed at all levels of abstraction. In this talk I will first present the latest approaches to capture transient system-level thermal behavior of 2D/3D MPSoC including fluidic micro-cooling capabilities, as in the case of IBM Aquasar (1st chip-level water-cooled) supercomputer. Next, I will detail a new family of model-based temperature controllers for energy-efficient 2D/3D MPSoC management. These new run-time controllers exploit both hardware and software layers to limit the maximum MPSoC temperature, and include a thermal-aware job scheduler and apply selectively dynamic frequency and voltage scaling (DVFS) to also balance the temperature across the chip in order to maximize cooling efficiency. One key feature of this new proposed family of thermal controllers is their maximum system temperature forecasting capability, which is used to dynamically compensate for the cooling system delays in reacting to temperature changes. The experiments on modeled 2- and 4-layered 2D/3D MPSoCs industrial designs show that this system-level thermal-aware design approach can enable up to 80% energy savings with respect to state-of-the-art computing severs designs. Finally, I will outline how the combination of inter-tier liquid cooling technologies and micro-fluidic fuel cells can overcome the problem of dark silicon and energy proportionality deployment in future generations of many-core servers and datacenters.

Short biography

David Atienza is associate professor of EE and director of the Embedded Systems Laboratory (ESL) at EPFL, Switzerland. He received his MSc and PhD degrees in computer science and engineering from UCM, Spain, and IMEC, Belgium, in 2001 and 2005, respectively. His research interests focus on system-level design methodologies for high-performance multi-processor Systems-on-Chip (MPSoC) and low-power embedded systems, including new thermal-aware design for 2D and 3D MPSoCs, design methods and architectures for wireless body sensor networks, and memory management. In these fields, he is co-author of more than 200 publications in prestigious journals and international conferences, several book chapters and seven U.S. patents.

He has earned several best paper awards at top venues in electronic design automation and computer and system engineering in these areas; he received the IEEE CEDA Early Career Award in 2013, the ACM SIGDA Outstanding New Faculty Award in 2012 and a Faculty Award from Sun Labs at Oracle in 2011. He is a Distinguished Lecturer (2014-2015) of the IEEE CASS, and is Senior Member of IEEE and ACM. He serves at TPC Chair of DATE 2015 and has been recently appointed as General Chair of DATE 2107.

On October 14, 2014, Peter Marwedel, received the award of the Embedded Systems Week in Delhi. The award honors the scientific works of Peter Marwedel. Prof. Balakrishnan from the Indian Institute (IIT) in Delhi awarded the prize on behalf of the ESWEEK (see photo). Preisverleihung ESWEEK, a cooperation between the ACM and the IEEE (see www.esweek.org), is one of the major events in the field of embedded systems, each year it took place on different continents.

Further Information ...

On 6th of November the 3rd Westfalenkongress in Dortmund was supported by the collaborative research center SFB 876. Head of the SFB, Katharina Morik, provided the research view on Big Data Analysis during the opening panel discussion. Later, several presentations of members of the research center showed latest research results on social network analysis, mobile network communcation and road traffic control as well as efficient processing of data streams.

The video below (German only) provides a review of the congress' topics.


Programme and abstracts for the workshop on 5th December 2014 are online. Registration is still possible.

November  21,  2014

The Deutsche Forschungsgemeinschaft (DFG) granted the next four years of the collaborative research center SFB 876.

Dynamic Resource Scheduling on Graphics Processors

Graphics processors offer tremendous processing power, but do not deliver peak performance, if programs do not offer the ability to be parallelized into thousands of coherently executing threads of execution. This talk focuses on this issue, unlocking the gates of GPU execution for a new class of algorithms.

We present a new processing model enabling fast GPU execution. With our model, dynamic algorithms with various degrees of parallelism at any point during execution are scheduled to be executed efficiently. The core of our processing model is formed  by a versatile task scheduler, based on highly efficient queuing strategies. It combines work to be executed by single threads or groups of thread for efficient execution.

Furthermore, it allows different processes to use a single GPU concurrently, dividing the available processing time fairly between them. To assist highly parallel programs, we provide a memory allocator which can serve concurrent requests of tens of thousands of threads. To provide algorithms with the ultimate control over the execution, our execution model supports custom priorities, offering any possible scheduling policy. With this research, we provide the currently fastest queuing mechanisms for the GPU, the fastest dynamic memory allocator for massively parallel architectures, and the only autonomous GPU scheduling framework that can handle different granularities of parallelism efficiently. We show the advantages of our model in comparison to state-of-the-art algorithms in the field of rendering, visualization, and geometric modeling.

The working group "Bayes Methods" and SFB 876 jointly organise the workshop "Algorithms for Bayesian inference for complex problems". The workshop will take place on Friday, 5th of December 2014, at TU Dortmund University.

Presentations on the following topics are particularly welcome:

  • Alternatives to MCMC (INLA, approximate Bayesian computation, ...)
  • MCMC variants (Stan, reversible jump, adaptive, ...)
  • MCMC software implementations (R packages, SAS PROC MCMC, JAGS, …)
  • Applications (meta-analysis, informative missingness, modelling molecular data, …)

For further information please visit http://www.imbei.uni-mainz.de/bayes. Registration via mail including your name and affiliation to Manuela Zucknick (m.zucknick@dkfz-heidelberg.de). There is no registration fee.



The Westfalenkongress presents the SFB 876 in a Forum that is dedicated to knowledge transfer.



Non-parametric Methods for Correlation Analysis in Multivariate Data
Knowledge discovery in multivariate data often is involved in analyzing the relationship of two or more dimensions. Correlation analysis with its root in statistics is one of the most effective approaches towards addressing the issue.

In this seminar, I will present some non-parametric methods for correlation analysis in multivariate data. I will focus on real-valued data where probability density functions (pdfs) are in general not available at hand. Instead of estimating them, we propose to work with cumulative distribution functions (cdfs) and cumulative entropy - a new concept of entropy for real-valued data.

For the talk, I will first discuss two methods for scalable mining of correlated subspaces in large high dimensional data. Second, I will introduce an efficient and effective non-parametric method for computing total correlation - a well-known correlation measure based on Shannon entropy. This method is based on discretization and hence, can be perceived as a technique for correlation-preserving discretization (compression) of multivariate data. Lastly, I will go beyond correlation analysis and present our ongoing research in multivariate causal inference.

Hoang-Vu Nguyen is working as a PhD candidate in the Institute for Program Structures and Data Organization (IPD) - Chair Prof. Böhm, Karlsruhe Institute of Technology (KIT). Before joining KIT, he obtained his Master's and Bachelor's degrees from Nanyang Technological University (NTU), Singapore.

His research lies in the junction between theory and practice. Currently, he is focusing on scalable multivariate correlation analysis with applications in data mining. He develops efficient and practical computation methods for correlation measures, and applies them in clustering, outlier detection, mining big data, schema extraction, graph mining, time series analysis, etc.

The Pamono-sensor is a joint development of the institutes for graphical and embedded systems of the TU Dortmund University together with the ISAS - Institute for Analytical Sciences in Dortmund as part of the collaborative research center SFB 876. The sensor will be shown during the TV show "Großen Show der Naturwunder" (Great show of miracles of nature) on 24th of July at 20.15 o'clock at the ARD. Ranga Yogeshwar and Frank Elstner present the sensor together with members of project B2, Pascal Libuschewski and Alexander Zybin, while analyzing the salive of Ranga Yogeshwar on the search for viruses.

The portable sensor device is based on modern multi-core processors and uses sophisticated methods for CPU-intensive algorithms to detect viruses locally and in real-time. Time between taking of samples (blood, salive) and getting analysis results is shortened drastically. The system can therefore be used outside of laboratories where it is needed, e.g. during crisis scenarios.


Algorithmic mechanism design on cloud computing and facility location

Algorithmic mechanism design is now widely studied for various scenarios. In this talk, we discuss two applications: CPU time auction and facility location problem. In CPU time auction, we designed two greedy frameworks which can achieve truthfulness (approximate-truthfulness) from the bidders while at the same time a certain global objective is optimized or nearly optimized. In facility location problem, we introduce weight to the traditional study and prove that those mechanisms that ignore weight are the best we can have. Furthermore, we also propose a new threshold based model where the solution that optimizes the social welfare is incentive compatible.

From Web 2.0 to the Ubiquitous Web

Andreas Hotho

Millions of users are active in the Web 2.0 and enjoy services likeFlickr, Twitter or Facebook. These services are not only used on thecomputer at home but more frequently on smartphones which have becomemore powerful in the last years. Thus, large amounts of content but alsoof usage data are collected - partially with location information usingGPS in smartphones - which allow for various analyses e.g. on the socialrelationship of users. Enriching subjective data like human perceptionsby additional low cost sensor information (not only using smartphonesbut also virtually every device) is an important next step on the waytowards establishing the ubiquitous web. Researchers, especially frommachine learning, data mining, and social network analysis, areinterested in these kinds of data enhanced by additional sensorinformations and work on novel methods and new insides into theunderlying human relationship and interactions with the environment.

One common phenomenon of the Web 2.0 is tagging, observed in manypopular systems. As an example, we will present results on data from ourown social tagging system BibSonomy, which allows the management ofbookmarks and publications. The system is designed to supportresearchers in their daily work but it also allows the integration anddemonstration of new methods and algorithms. Beside a new rankingapproach which was integrated into BibSonomy, we present resultsinvestigating the influence of user behaviour on the emergent semanticsof tagging systems. Starting from results on simple tagging data, thetalk will present results on the combination of user data - againexpressed as tags - and sensor data - in this case air qualitymeasurements - as an example of the emergent ubiquitous web. We willdiscuss the upcoming area of combining these two information sources togain new insides, in this case on environmental conditions and theperceptions of humans.


Andreas Hotho is professor at the university of Würzburg and the head of the DMIR group. Prior, he was a senior researcher at the university of Kassel. He is working in the area of Data Mining, Semantic Web and Mining of Social Media. He is directing the BibSonomy project at the KDE group of the university of Kassel. Andreas Hotho started his research at the AIFB Institute at the University of Karlsruhe where he was working on text mining, ontology learning and semantic web related topics.

Big data in machine learning is the future. But how to deal with data analysis and limited resources: Computational power, data distribution, energy or memory? From 29th of September to 2nd of October, the TU Dortmund University, Germany, will host this summer school on resource-aware machine learning. Further information and online registration at: http://sfb876.tu-dortmund.de/SummerSchool2014

Topics of the lectures include: Data stream analysis. Energy efficiency for multi-core embedded processors. Factorising huge matrices for clustering. Using smartphones to detect astro particles.

Exercises help bringing the contents of the lecture to life. All participants get the chance to learn how to transform a smartphone into an extra-terrestial particle detector using machine learning.

The summer school is open for international PhD or advanced master students, who want to learn cutting edge techniques for machine learning with constrained resources.

Excellent students may apply for a student grant supporting travel and accommodation. Deadline for application is 30th of June.


Die Universität Bremen lädt wieder ein zu zwei Sommeruniversitäten für Frauen in den Ingenieurwissenschaften und in der Informatik:

Die 6. internationale Ingenieurinnen-Sommeruni vom 11. bis 22. August 2014: http://www.ingenieurinnen-sommeruni.de

sowie das 17. internationale Sommerstudium Informatica Feminale vom 18. bis 29. August 2014: http://www.informatica-feminale.de

Das Angebot der beiden Sommeruniversitäten richtet sich an Studentinnen aller Hochschularten und aller Fächer sowie an Weiterbildung interessierte Frauen. Die Sommeruniversitäten umfassen rund 60 Kurse mit Fachinhalten der Ingenieurwissenschaften und der Informatik vom Studieneinstieg über Grundlagen bis zu Spezialthemen. Workshops zu Beruf und Karriere runden das Programm ab.

Das Themenspektrum beinhaltet Lehrveranstaltungen u. a. zu Stoff- und Energieströmen, Datenschutz, Robotik und technischen Netzen, Werkstoffen und Qualitätsmanagement, agiler Softwareentwicklung, Betriebssystemen, Elektronik in Lebenswelten, Projektmanagement, akademischem Englisch, Stimmbildung und Interkulturellen Kompetenzen.

Gauss-Markov modeling and online crowdsensing for spatio-temporal processes

Francois Schnitzler

This talk will discuss (1) modelling and (2) monitoring of large spatio-temporal processes covering a city or country, with an application to urban traffic. (1) Gauss-Markov models are well suited for such processes. Indeed, they allow for efficient and exact inference and can model continuous variables. I will explain how to learn a discrete time Gauss-Markov model based on batch historical data using the elastic net and the graphical lasso.(2) Such processes are traditionally monitored by dedicated sensors set up by civil authorities, but sensors deployed by individuals are increasingly used due to their cost-efficiency. This is called crowdsensing. However, the reliability of these sensors is typically unknown and must be estimated. Furthermore, bandwidth, processing or cost constrains may limit the number of sensors queried at each time-step. We model this problem as the selection of sensors with unknown variance in a large linear dynamical system. We propose an online solution based on variational inference and Thompson sampling.


Francois Schnitzler is a post doctoral researcher at the Technion, working under the supervision of Professor Shie Mannor. He works on time-series modelling and event detection from heterogenous data and crowdsourcing. He obtained his PhD in September 2012 from the University of Liege, where he studied probabilistic graphical models for large probability distributions, and in particular ensemble of Markov trees.

"The IEEE International Conference on Data Mining (ICDM) has established itself as the world's premier research conference in data mining. We invite high-quality papers reporting original research on all aspects of data mining, including applications, algorithms, software, and systems."

  • Paper submission: June 24, 2014
  • Acceptance notification: September 24, 2014
  • Conference dates: December 14-17, 2014



Workshop collocated with INFORMATIK 2014, September 22-26, Stuttgart, Germany.

This workshop focuses on the area where two branches of data analysis research meet: data stream mining, and local exceptionality detection.

Local exceptionality detection is an umbrella term describing data analysis methods that strive to find the needle in a hay stack: outliers, frequent patterns, subgroups, etcetera. The common ground is that a subset of the data is sought where something exceptional is going on: finding the needles in a hay stack.

Data stream mining can be seen as a facet of Big Data analysis. Streaming data is not necessarily big in terms of volume per se but instead it can be in terms of the high troughput rate. Gathering data for analyzing is infeasible so the relevant data of a data point has to be extracted when it arrives.


Submissions are possible as either a full paper or extended abstract. Full papers should present original studies that combine aspects of both the following branches of data analysis:

stream mining: extracting the relevant information from data that arrives at such a high throughput rate, that analysis or even recording of records in the data is prohibited;
local exceptionality mining: finding subsets of the data where something exceptional is going on.

In addition, extended abstracts may present position statements or results of original studies concerning only one of the aforementioned branches.

Full papers can consist of a maximum of 12 pages; extended abstracts of up to 4 pages, following the LNI formatting guidelines. The only accepted format for submitted papers is PDF. Each paper submission will be reviewed by at least two members of the program committee.


Efficient Cryptography with Provable Security

We survey some recent result on efficient cryptographic protocols with the predicate of provable security, in particular focusing on symmetric authentication protocols. In turns out that in this context mathematical lattices play a crucial role for obtaining practical solutions. No deep knowledge in mathematics will be required for this talk.

On February, 25th, the regional competition of Jugend forscht will be held in Dortmund at the DASA exhibition. Jugend forscht provides a platform for young researchers of age 15-21 to present their research ideas and projects. The collaborative research center SFB 876 supports the event again by participating in the jury. This year Stefan Michaelis will evaluate the projects for the domains mathematics and computer science.

ACM SIGDA proudly announces that 2014 ACM SIGDA Distinguished Service Award will be presented to Dr. Peter Marwedel in recognition for his multiple years of service maintaining and chairing the DATE PhD Forum.

The award will be presented at the opening ceremony of the DATE 2014, March 25 in Dresden (Germany).

OpenML: Open science in machine learning

Research in machine learning and data mining can be speeded uptremendously by moving empirical research results out of people'sheads and labs, onto the network and into tools that help us structureand alter the information. OpenML is a collaborative open scienceplatform for machine learning. Through plugins for the major machinelearning environments, OpenML allows researchers to automaticallyupload all their experiments and organize them online. OpenMLautomatically links these experiments to all related experiments, andadds meta-information about the used datasets and algorithms. As such,all research results are searchable, comparable and reusable in manydifferent ways. Beyond the traditional publication of results inpapers, OpenML offers a much more collaborative, dynamic and fasterway of doing research.

Supervised learning of link quality estimates in wireless networks

Eduardo Feo

Systems composed of a large number of relatively simple, and resource-constrained devices can be designed to interact and cooperate with each other in order to jointly solve tasks that are outside their own individual capabilities. However, in many applications, the emergence of the collective behavior of these systems will depend on the possibility and quality of communication among the individuals. In the particular case of wireless data communication, a fundamental and challenging problem is the one of estimating and predicting the quality of wireless links.

In this talk, I will describe our work and experiences in using supervised learning based methods to model the complex interplay among the many different factors that affect the quality of a wireless link. Finally, I will discuss application scenarios in which the prediction models are used by network protocols to derive real-time robust estimates of link qualities, and by mobile robots to perform spatial predictions of wireless links for path planning.


Eduardo Feo received his masters degrees in Software Systems Engineering at RWTH Aachen and in Informatics at University of Trento, Italy. Currently he is working as a Ph.D. candidate at the Dalle Molle Institute for Artificial Intelligence in Lugano, Switzerland on the topic Mission Planning in Heterogeneous Networked Swarms. The work is funded by the project SWARMIX - Synergistic Interactions of Swarms of Heterogeneous Agents.

His research interests include

  • Combinatorial optimization: NP problems, mathematical programming, meta-heuristics.
  • Networking: Sensor Networks, network performance modelling, link quality learning.
  • Swarm robotics: task planning/allocation in heterogeneous systems.


"The british magazine 'Physics World' awards the first observations of high-energy cosmic neutrinos by the Ice-Cube-Neutrinotelescope as the "Breakthough of the Year 2013". Scientists from Dortmund are involved."


The collaborative research center SFB 876 is back from the two day fair Wissenswerte in Bremen. During the event the SFB's research has been presented with the larger scope of Big Data - small devices. Experiments and demonstrations enabled a clear view on both ends of the spectrum for science journalists.

Project A4 - Plattform presented waste of energy in recent mobile network technology with visible excess heat. Especially during fairs and conferences the problem of suboptimal energy management in mobile devices becomes obvious with the need to recharge often. Project B2 - Nano brought the complete system setup to Bremen and showed the full range of research challenges, from camera and detector technology to data analysis.

For big and complex data the projects C1 - DimRed and C3 - RaumZeit delivered the background. The mere amount of data points per patient in contrast to the low number of severe incidences per year depcited how important a reliable and stable analysis is for neuroblastoma risk prognosis.

Highly relevant are also the big data analysis results in C3, as just recently the detection of high-energetic neutrinos has been confirmed by the IceCube collaboration.

Mobile network emulator Talks during the fair Table and TU-Logo

MDL for Pattern Mining

Pattern mining is arguably the biggest contribution of data mining to data analysis with scaling to massive volumes as a close contender. There is a big problem, however, at the very heart of pattern mining, i.e., the pattern explosion. Either we get very few – presumably well-known patterns – or we end up with a collection of patterns that dwarfs the original data set. This problem is inherent to pattern mining since patterns are evaluated individually. The only solution is to evaluate sets of patterns simultaneously, i.e., pattern set mining.

In this talk I will introduce one approach to solve this problem, viz., our Minimum Description Length (MDL) based approach with the KRIMP algorithm. After introducing the pattern set problem I will discuss how MDL may help us. Next I introduce the heuristic algorithm called KRIMP. While KRIMP yields very small pattern sets, we have, of course, to validate that the results are characteristic pattern sets. We do so in two ways, by swap randomization and by classification.

Time permitting I will then discuss some of the statistical problems we have used the results of KRIMP for, such as data generation, data imputation, and data smoothing.

Short Biography

Since 2000, Arno is Chair of Algorithmic Data Analysis at Utrecht University. After doing his PhD and some years as a postdoc as a database researcher, he switched his attention to data mining in 1993 and he still hasn’t recovered. His research has been mostly in the area of pattern mining and since about 8 years in pattern set mining. In the second half of the nineties he was a co-founder of and chief-evangelist and sometimes consultant at Data Distilleries, which by way of SPSS is now a part of IBM. He has acted as PC-member, vice chair or even PC chair of many of the major conferences of the field for many years. Currently he is also on the editorial board of DMKD and KAIS.

Brian Niehfer

With their contribution "Smart Constellation Selection for Precise Vehicle Positioning in Urban Canyons using a Software-Defined Receiver Solution"  Brian Niehoefer Florian Schweikowski  and  Christian Wietfeld  were awarded with the coveted Best Student Paper Award at the 20th IEEE Symposium on Communications and Vehicular Technology ( SCVT ) .

The contribution, originated within the  Collaborative Research Project 876 (Sonderforschungsbereich 876), sub-project B4 ,  deals with a resource-efficient accuracy improvement for Global Navigation Satellite Systems (GNSS).  Implementation and performance of the so-called SCS was quantified using a developed software-defined GNSS receiver in more than 500 measurements with two geo-reference points on the campus of the technical university of Dortmund. Thereby the main objective is to achieve an improved positioning accuracy of objects in order to increase the performance and possible scenarios for relying applications. Examples would be a more detailed traffic prediction by detecting lane-specific events (e.g. daily road works, etc.) or more accurate swarm mobilities of Unmanned Aerial Vehicles (UAVs).

The Wissenswerte fair in Bremen is the larget German conference and exhibition for journalists and science. The research center SFB 876 will present the projects A4, B2, C1 and C3 with experiments and results during the fair on 25th and 26th of November 2014.

Would you expect, that some amounts of data are transported faster by ship than by satellite? Which algorithms are needed to cope with these quantities of data? And how much energy do they need? Which algorithms heats up computers beyond function - and which keeps them cool? Where are the parallels between cancer treatment and astro physics?

Questions like these will be answered by the project teams during the Wissenswerte.


Indirect Comparison of Interaction Graphs

Motivation: Over the past years, testing for differential coexpression of genes has become more and more important, since it can uncover biological differences where differential expression analysis fails to distinguish between groups. The standard approach is to estimate gene graphs in the two groups of interest by some appropriate algorithm and then to compare these graphs using a measure of choice. However, different graph estimating algorithms often produce very different graphs, and therefore have a great influence on the differential coexpression analysis.

Results: This talk presents three published proposal and introduces an indirect approach for testing the differential conditional independence structures (CIS) in gene networks. The graphs have the same set of nodes and are estimated from data sampled under two different conditions. Out test uses the entire pathplot in a Lasso regression as the information on how a node connects with the remaining nodes in the graph, without estimating the graph explicitly. The test was applied on CLL and AML data in patients with different mutational status in relevant genes. Finally, a permutation test was performed to assess differentially connected genes. Results from simulation studies are also presented.

Discussion: The strategy presented offers an explorative tool to detect nodes in a graph with the potential of a relevant impact on the regulatory process between interacting units in a complex process. The findings introduce a practical algorithm with a theoretical basis. We see our result as the first step on the way to a meta-analysis of graphs. A meta-analysis of graphs is only useful if the graphs available for aggregation are homogeneous. The assessment of homogeneity of graphs needs procedures like the one presented.

Using dynamic chain graphs to model high-dimensional time series: an application to real-time traffic flow forecasting

This seminar will show how the dynamic chain graph model can deal with the ever-increasing problems of inference and forecasting when analysing high-dimensional time series. The dynamic chain graph model is a new class of Bayesian dynamic models suitable for multivariate time series which exhibit symmetries between subsets of series and a causal drive mechanism between these subsets. This model can accommodate non-linear and non-normal time series and simplifies computation by decomposing a multivariate problem into separate, simpler sub-problems of lower dimensions. An example of its application using real-time multivariate traffic flow data as well as potential applications of the model in other areas will be also discussed.

German newspaper "Süddeutsche" reports on breath analysis done in project B1. How can innovative breath analysis support disease identification and treatment? What can we derive of increased levels of acetone or ammonia in human breath?


The slides for the talk by Albert Bifet on Mining Big Data in Real Time are now available for download

Big Data is a new term used to identify datasets that we can not managewith current methodologies or data mining software tools due to their large size and complexity. Big Data mining is the capability of extracting useful information from these large datasets or streams of data. New mining techniques are necessary due to the volume, variability, andvelocity, of such data.


Mit über 7.000 Beschäftigten in Forschung, Lehre und Verwaltung und ihrem einzigartigen Profil gestaltet die Technische Universität Dortmund Zukunftsperspektiven: Das Zusammenspiel von Ingenieur- und Naturwissenschaften, Gesellschafts- und Kulturwissenschaften treibt technologische Innovationen ebenso voran wie Erkenntnis- und Methodenfortschritt, von dem nicht nur die mehr als 30.000 Studierenden profitieren.


Mining Big Data in Real Time

Albert Bifet

Big Data is a new term used to identify datasets that we can not managewith current methodologies or data mining software tools due to their large size and complexity. Big Data mining is the capability of extracting useful information from these large datasets or streams of data. New mining techniques are necessary due to the volume, variability, andvelocity, of such data. In this talk, we will focus on advanced techniquesin Big Data mining in real time using evolving data stream techniques:

  1. using a small amount of time and memory resources, and
  2. being able to adapt to changes.

We will present the MOA software framework with classification, regression, and frequent pattern methods, the upcoming SAMOA distributed streaming software, and finally we will discuss someadvanced state-of-the-art methodologies in stream mining based in the use of adaptive size sliding windows.


Albert Bifet

Researcher in Big Data stream mining at Yahoo LabsBarcelona. He is the author of a book on Adaptive Stream Mining and Pattern Learning and Mining from Evolving Data Streams. He is one of the project leaders of MOA software environment for implementing algorithms and running experiments for online learning from evolving data streams at theWEKA Machine Learning group at University of Waikato, New Zealand.

Eine Experimentierplattform für die automatische Parallelisierung von R-Programmen

Die Skriptsprache R ist bei Anwendern aus Wissenschaft und Technik wegen ihrer Interaktivität und ihrer guten Bibliotheken beliebt. Für die schnelle Verarbeitung großer Datenmengen, wie sie etwa bei der Genomanalyse in der Bioinformatik anfallen, ist der R-Interpretierer allerdings zu langsam. Es wäre wünschenswert, die hohe Leistung der modernen Mehrkernprozessoren für R nutzen zu können – aber ohne von den Anwendern verlangen zu müssen, daß sie parallele Programme schreiben.

Im Vortrag zeige ich, mit welchen Techniken sich R-Programme zur Laufzeit automatisch parallelisieren lassen, und das transparent für den Anwender. Unsere Experimentierplattform ALCHEMY erlaubt es, ein R-Programm in kombinierbaren Stufen zur Laufzeit zu analysieren, zu parallelisieren und auf parallelen Backends auszuführen. Am Beispiel von Techniken zur automatischen Schleifenparallelisierung, die wir als Module in ALCHEMY realisiert haben, zeigen sich typische Abwägungen, die bei der R-Parallelisierung zu beachten sind. Unsere Messungen belegen, daß sich bei großen Datenmengen der Laufzeitaufwand für die R-Parallelisierung bereits auf einem handelsüblichen Mehrkernprozessor lohnt.


Dr. Frank Padberg leitet die Forschergruppe "Automatische Parallelisierung" (APART) am KIT, die gemeinsam vom KIT und Siemens getragen wird. Neben der Parallelisierung forscht er an Techniken zur automatischen Fehlererkennung, Methoden der Software-Zuverlässigkeit, der mathematischen Optimierung von Softwareprozessen und schlanken Entwicklungstechniken. Dr. Padberg wurde in den Communications ACM unter den "Top 50 International Software Engineering Scholars" gelistet.

On the last day of EDBT/ICDT 2014, 28. March 2014, there are some workshops. More information about formatting guidelines and registration can be found here.

Deadline: 7. December


CPSweek is the meeting point for leading researchers in the thriving area of cyber-physical systems. Topics of CPSweek cover a large range of scientific areas, spanning topics from computer science, physics, embedded systems, electrical engineering, control theory, as well as application disciplines such as systems biology, robotics, and medicine, to name just a few.

CPSWeek 2014 will include a workshop and tutorial day on April 14, 2014. Each workshop will provide an arena for presentations and discussions about a special topic of relevance to CPSWeek. Each tutorial will present in-depth content in a mini-course format aimed primarily at students, researchers, or attendees from industry.

Submission deadline for workshop and tutorial proposals: 29. September 2013


The International Conference on Extending Database Technology is a leading international forum for database researchers, practitioners, developers, and users to discuss cutting-edge ideas, and to exchange techniques, tools, and experiences related to data management. Data management is an essential enabling technology for scientific, engineering, business, and social communities. Data management technology is driven by the requirements of applications across many scientific and business communities, and runs on diverse technical platforms associated with the web, enterprises, clouds and mobile devices. The database community has a continuing tradition of contributing with models, algorithms and architectures, to the set of tools and applications enabling day-to-day functioning of our societies. Faced with the broad challenges of today's applications, data management technology constantly broadens its reach, exploiting new hardware and software to achieve innovative results.

EDBT 2014 invites submissions of original research contributions, as well as descriptions of industrial and application achievements, and proposals for tutorials and software demonstrations. We encourage submissions relating to all aspects of data management defined broadly, and particularly encourage work on topics of emerging interest in the research and development communities.

Deadline: 15. October 2013


The paper Spatio-Temporal Random Fields: Compressible Representation and Distributed Estimation by Nico Piatkowski (A1), Sankyun Lee (C1) and Katharina Morik is the winner of this year's ECMLPKDD 2013 machine learning best student paper award. The ceremony takes place on Monday, September 23rd, in Prague (www.ecmlpkdd2013.org).

Nico Piatkowski Sangkyun Lee Katharina Morik
Nico Piatkowski Sangkyun Lee Katharina Morik

The Open Source Satellite Simulator (OS³), developed as part of the SFB 876 at the Communication Networks Institute has been officially integrated into the INET framework for Omnet++.

OS³ provides a modular system for addressing satellite specific communication testing and research. The simulator enables high accuracy results due to its inclusion of recent satellite orbits and atmospheric parameters influencing signal transmission during startup. Beside the modularity and extensibility of OS³ the graphical user interface enables a easy learning curve for adapting the system to the user's needs.

The inclusion of OS³ into the INET framework provides an important milestone for dissemination. Omnet++ is a widely adopted solution for simulating communication networks and builds together with INET the de-facto research standard for simulation of mobile networks.


The publication about Gamma-Hadron-Separation in the MAGIC Experiment by Tobias Voigt, Roland Fried, Michael Backes and Wolfgang Rhode (SFB-project C3) has been granted with the Best Application Paper Award at the 36th annual conference of the GfKI (German Classification Society).


The MAGIC-telescopes on the canary island of La Palma are two of the largest Cherenkov telescopes in the world, operating in stereoscopic mode since 2009. A major step in the analysis of MAGIC data is the classification of observations into a gamma-ray signal and hadronic background.
In this contribution we introduce the data which is provided by the MAGIC telescopes, which has some distinctive features. These features include high class imbalance and unknown and unequal misclassification costs as well as the absence of reliably labeled training data. We introduce a method to deal with some of these features. The method is based on a thresholding approach and aims at minimization of the mean square error of an estimator, which is derived from the classification. The method is designed to fit into the special requirements of the MAGIC data.

In enger Zusammenarbeit mit dem Technion (Israel Institute of Technology) entstand basierend auf dem *streams* Framework ein System zur Echtzeitanalyse von Fußball-Daten für den Wettbewerb der diesjährigen DEBS Konferenz. Aufgabe der Challenge war die Berechnung von Statistiken über das Lauf- und Spielverhalten der Spieler, die mit Bewegungs- und Ortungssensoren des RedFIR Systems (Fraunhofer) augestattet wurden.
Im Rahmen des Wettbewerbs entwickelte der Lehrstuhl 8 zusammen mit dem Technion das "TechniBall" System auf Basis des *streams* Frameworks von Christian Bockermann. TechniBall ist in der Lage, die erforderlichen Statistiken deutlich schneller als in Echtzeit (mehr als 250.000 Events pro Sekunde) zu verarbeiten und wurde vom Publikum des Konferenz zum Gewinner des DEBS Challenge 2013 gekürt.


2 papers from SFB-authors accepted -- one of them in the journal track where only 14 out of 182 submissions made it!

  • "Spatio-Temporal Random Fields: Compressible Representation and Distributed Estimation"
    Nico Piatkowski, Sangkyun Lee, and Katharina Morik
  • "Anomaly Detection in Vertically Partitioned Data by Distributed Core Vector Machines"
    Marco Stolpe, Kanishka Bhaduri, Kamalika Das, and Katharina Morik


The 2013 KDnuggets Software Poll was marked by a battle between RapidMiner and R for the first place. Surprisingly, commercial and free software maintained parity, with about 30% using each exclusively, and 40% using both. Only 10% used their own code - is analytics software maturing? Real Big Data is still done by a minority - only 1 in 7 used Hadoop or similar tools, same as last year.
The 14th annual KDnuggets Software Poll attracted record participation of 1880 voters, more than doubling 2012 numbers.

KDnuggets Annual Software Poll


New Algorithms for Graphs and Small Molecules:

Exploiting Local Structural Graph Neighborhoods and Target Label Dependencies

In the talk, I will present recently developed algorithms for predicting properties of graphs and small molecules: In the first part of the talk, I will present several methods exploiting local structural graph (similarity) neighborhoods: local models based on structural graph clusters, locally weighted learning, and the structural cluster kernel. In the second part, I will discuss methods that exploit label dependencies to improve the prediction of a large number of target labels, where the labels can be just binary (multi-label classification) or can again have a feature vector attached. The methods make use of Boolean matrix factorization and can be used to predict the effect of small molecules on biological systems.

The goal of the International Conference on Mobile Ubiquitous Computing, Systems, Services and Technologies, UBICOMM 2013, is to bring together researchers from the academia and practitioners from the industry in order to address fundamentals of ubiquitous systems and the new applications related to them. The conference will provide a forum where researchers shall be able to present recent research results and new research problems and directions related to them. The conference seeks contributions presenting novel research in all aspects of ubiquitous techniques and technologies applied to advanced mobile applications.

Deadline: 17. May 2013

April  25,  2013

The IEEE International Conference on Data Mining (ICDM) has established itself as the world's premier research conference in data mining. The 13th ICDM conference (ICDM '13) provides a premier forum for the dissemination of innovative, practical development experiences as well as original research results in data mining, spanning applications, algorithms, software and systems. The conference draws researchers and application developers from a wide range of data mining related areas such as statistics, machine learning, pattern recognition, databases and data warehousing, data visualization, knowledge-based systems and high performance computing. By promoting high quality and novel research findings, and innovative solutions to challenging data mining problems, the conference seeks to continuously advance the state of the art in data mining. As an important part of the conference, the workshops program will focus on new research challenges and initiatives, and the tutorials program will cover emerging data mining technologies and the latest developments in data mining.

Deadline: 21. Juni 2013


Algorithms and Systems for Analyzing Graph-Structured Data

Data analysis, data mining and machine learning are centrally focused on algorithms and systems for producing structure from data. In recent years, however, it has become obvious that it is just as important to look at the structure already present in the data in order to produce the best possible models. In this talk, we will give an overview of a line of research we have been pursuing towards this goal over the past years, focusing in particular on algorithms for efficient pattern discovery and prediction with graphs, applied to areas such as molecule classification or mobility analysis. Especially for the latter, we will also briefly outline how visual approaches can greatly enhance the utility of algorithmic approaches.

Peter Marwedel receives EDAA award

Good news for collaborative research center 876: Peter Marwedel, vice-chair of SFB 876, received a top award for his work. He was selected as the recipient of the EDAA lifetime achievement award 2013 by the European Design and Automation Association (EDAA). The Lifetime Achievement Award is given to individuals who made outstanding contributions to the state of the art in electronic design, automation and testing of electronic systems in their life. In order to be eligible, candidates must have made innovative contributions which had an impact on the way electronic systems are being designed.

This selection of Peter Marwedel reflects his work on

  • pioneering the synthesis of hardware from algorithms,
  • the introduction of compilers which can be easily retargeted to new processors by using an explicit processor description,
  • the generation of efficient embedded systems (where efficiency metrics include the energy consumption and real-time performance),
  • education in embedded system design, and
  • recent work on cyber-physical systems.
EDAA award

The award was openly announced and handed over at this year’s DATE conference in Grenoble on March 19th. The press release for this announcement is available on the website of EDAA.

EDAA is a professional society supporting electronic design automation in particular in Europe. EDAA is the main sponsor of the successful DATE conference.

The EDAA Lifetime Achievement Award can be considered to be the top scientific award in the area of electronic design automation. Past recipients of the award are Kurt Antreich (TU Munich, 2003), Hugo De Man (IMEC, Leuven, 2004), Jochen Jess (TU Eindhoven, 2005), Robert Brayton (UC Berkeley, 2006), Tom W. Williams (Synopsys Inc., Mountain View, California, 2007), Ernest S. Kuh (UC Berkeley, 2008), Jan M. Rabaey (UC Berkeley, 2009), Daniel D. Gajski (UC Irvine, 2010), Melvin A. Breuer (University of Southern California, Los Angeles, 2011) and Alberto L. Sangiovanni-Vincentelli (UC Berkeley, 2012). This means that, so far, only three scientists working at European institutions had received the award. It also means that the quality of research performed at TU Dortmund is at par with that at top universities in the world.

Our collaborative research center is very proud of this international recognition of our vice chair.

Empirical analysis of statistical algorithms often demands time-consuming experiments which are best performed on high performance computing clusters. For this purpose we developed two R packages which greatly simplify working in batch computing environments.

The package BatchJobs implements the basic objects and procedures to control a batch cluster within R. It is structured around cluster versions of the well-known higher order functions Map, Reduce and Filter from functional programming. An important feature is that the state of computation is persistently available in a database. The user can query the status of jobs and then continue working with a desired subset.

The second package, BatchExperiments, is tailored for the still very general scenario of analyzing arbitrary algorithms on problem instances. It extends BatchJobs by letting the user define an array of jobs of the kind "apply algorithm A to problem instance P and store results". It is possible to associate statistical designs with parameters of algorithms and problems and therefore to systematically study their influence on the results.

More details, the source code, installation instructions and much more can be found on the project's web site.


Transactions chasing Instruction Locality on multicores

For several decades, online transaction processing (OLTP) has been one ofthe main applications that drive innovations in the data managementecosystem and in turn the database and computer architecture communities.Despite fundamentally novel approaches from industry and various researchproposals from academia, the fact that OLTP workloads cannot properlyexploit the modern micro-architectural features of the commodity hardwarehas not changed for the last 15 years. OLTP wastes more than half of itsexecution cycles to memory stalls and, as a result, OLTP performancedeteriorates and the underlying modern hardware is largely underutilized.In this talk, I initially present the findings of our recent workloadcharacterization studies, which advocate that the large instructionfootprint of the transactions is the dominant factor in the lowutilization of the existing micro-architectural resources. However, theworker threads of an OLTP system usually execute similar transactions inparallel, meaning that threads running on different cores share anon-negligible amount of instructions. Then, I show an automated way toexploit the instruction commonality among transactional threads andminimize instruction misses. By spreading the execution of a transactionover multiple cores in an adaptive way through thread migration, we enableboth an ample L1 instruction cache capacity and re-use of commoninstructions by localizing them to cores as threads migrate.

Curriculum Vitae

Pinar Tozun is a fourth year PhD student at Ecole Polytechnique Federalede Lausanne (EPFL) working under supervision of Prof. Anastasia Ailamakiin Data-Intensive Applications and Systems (DIAS) Laboratory. Her researchfocuses on scalability and efficiency of transaction processing systems onmodern hardware. Pinar interned at University of Twente (Enschede, TheNetherlands) during summer 2008 and Oracle Labs (Redwood Shores, CA)duringSummer 2012. Before starting her PhD, she received her BSc degree inComputer Engineering department of Koc University in 2009 as the topstudent.

Case-Based Reasoning:

Was ist es und wie kann man es gebrauchen?

Zum Ausgangspunkt fangen wir ganz einfach an: Wir wollen aus Erfahrungen Nutzen ziehen. Was sind hier Fälle und wie gebraucht man sie für Schlussweisen? Wir haben Fragen und erwarten Antworten. Frühere Situationen der Erfahrungen sind fast nie identisch mit aktuellen Situationen. Da ist mit Logik und Gleichheit nicht viel zu machen, Approximation ist wichtig. Der zentrale Begriff ist viel mehr die Ähnlichkeit, von der es freilich eine Unendlichkeit von Formen gibt und die wir diskutieren werden. Hier erörtern wir die Semantik von Ähnlichkeitsmaßen und die Beziehung zu Nutzenfunktionen.

Eine essentielle Erweiterung: Ähnlichkeit direkt zwischen Problemen und Lösungen. Hier werden Erfahrungen nicht mehr direkt verwendet, aber die Techniken sind unverändert. Eine Demo als kleiner Einschub: Wir wollen ein Auto kaufen.

Die Frage, was ein System als CBR-System qualifiziert, beantworten wir durch die Gegenwart eines Prozessmodelles und der Wissenscontainer. Diese werden vorgestellt. Dabei haben wir mit verschiedenen Schwierigkeiten zu kämpfen: Mehrere Formen von Unsicherheit, große Datenmengen, Subjektivität, verschiedene Repräsentationsformen wie Texte, Bilder und gesprochene Sprache.

R2: Biologist friendly web-based genomics analysis & visualization platform

Making the ends meet

Jan Koster (Dept. Oncogenomics, Academic Medical Center, University of Amsterdam , Amsterdam, the Netherlands)

High throughput datasets, such as microarrays are often analyzed by (bio) informaticians, and not the biologist that performed the experiment(s). With the biologist in mind as the end-user, we have developed the freely accessible online genomics analysis and visualization tool, R2 (http://r2.amc.nl).

Within R2, researchers with little or no bioinformatics skills can start working with mRNA, aCGH, ChIP-seq, methylation, up to whole genome sequence data and form/test their own hypothesis.

R2 consists of a database, storing the genomic information, coupled to an extensive set of tools to analyze/visualize the datasets. Analyses within the software are highly connected, allowing quick navigation between various aspects of the data mining process.

In the upcoming lecture, I will give an overview of the platform, provide some insights into the structure of R2, and show some examples on how we have made the ends meet to provide our users with a biologist friendly experience.

During the 14th of March Wouter Duiversteijn was visiting the collaborative research center. Beside the talks with our researchers, he presented his work on Exceptional Model Mining.

Contents of the presentation: Exceptional Model Mining - Identifying Deviations in Data


Patterns that Matter -- MDL for Pattern Mining
by Matthijs van Leeuwen

Matthijs van Leeuwen

Pattern mining is one of the best-known concepts in the field of exploratory data mining. A big problem, however, is that humongous amounts of patterns can be mined even from very small datasets. This hinders the knowledge discovery process, as it is impossible for domain experts to manually analyse so many patterns.

In this seminar I will show how compression can be used to address the pattern explosion. We argue that the best pattern set is that set of patterns that compresses the data best. Based on an analysis from MDL (Minimum Description Length) perspective, we introduce a heuristic algorithm, called Krimp, that approximates the best set of patterns. High compression ratios and good classification scores confirm that Krimp constructs pattern-based summaries that are highly characteristic for the data.

Our MDL approach to pattern mining is very generic and can be used to take on a large number of problems in knowledge discovery. One such example is change detection in data streams. I will show how sudden changes in the underlying data distribution of a data stream can be detected using compression, and argue that this can be generalised to concept drift and other slower forms of change.


Matthijs van Leeuwen is a post-doctoral researcher in the Machine Learning group at the KU Leuven. His main interests are pattern mining and related data mining problems; how can we identify patterns that matter? To this end, the Minimum Description Length (MDL) principle and other information theoretic concepts often proof to be very useful.

Matthijs defended his Ph.D. thesis titled 'Patterns that Matter' in February 2010, which he wrote under the supervision of prof.dr. Arno Siebes in the Algorithmic Data Analysis group (Universiteit Utrecht). He received the ECML PKDD 2009 'Best student paper award', and runner-up best student paper at CIKM 2009. His current position is supported by a personal Rubicon grant from the Netherlands Organisation for Scientific Research (NWO).

He was co-chair of MPS 2010, a Lorentz workshop on Mining Patterns and Subgroups, and IID 2012, the ECML PKDD 2012 workshop on Instant and Interactive Data Mining. Furthermore, he was demo co-chair of ICDM 2012 and is currently poster chair of IDA 2013.


Exceptional Model Mining - Identifying Deviations in Data

Finding subsets of a dataset that somehow deviate from the norm, i.e. where something interesting is going on, is an ancient task. In traditional local pattern mining methods, such deviations are measured in terms of a relatively high occurrence (frequent itemset mining), or an unusual distribution for one designated target attribute (subgroup discovery). These, however, do not encompass all forms of "interesting".

To capture a more general notion of interestingness in subsets of a dataset, we develop Exceptional Model Mining (EMM). This is a supervised local pattern mining framework, where several target attributes are selected, and a model over these attributes is chosen to be the target concept. Then, subsets are sought on which this model is substantially different from the model on the whole dataset. For instance, we can find parts of the data where:

  • two target attributes have an unusual correlation;
  • a classifier has a deviating predictive performance;
  • a Bayesian network fitted on several target attributes has an exceptional structure.

We will discuss some fascinating real-world applications of EMM instances, for instance using the Bayesian network model to identify meteorological conditions under which food chains are displaced, and using a regression model to find the subset of households in the Chinese province of Hunan that do not follow the general economic law of demand. Additionally, we will statistically validate whether the found local patterns are merely caused by random effects. We will simulate such random effects by mining on swap randomized data, which allows us to attach a p-value to each found pattern, indicating whether it is likely to be a false discovery. Finally, we will shortly hint at ways to use EMM for global modeling, enhancing the predictive performance of multi-label classifiers and improving the goodness-of-fit of regression models.

On February, 19th, the regional competition of Jugend forscht will be held in Dortmund at the DASA exhibition. Jugend forscht provides a platform for young researchers of age 15-21 to present their research ideas and projects. For the domains mathematics and computer science Christian Bockermann of SFB-project C1 will be a member of the jury.

Peter Marwedel is honored with the EDAA (European Design and Automation Assocation) lifetime achievement award.

This award is given to individuals who made outstanding contributions to the state of the art in electronic design, automation and testing of electronic systems in their life. In order to be eligible, candidates must have made innovative contributions which had an impact on the way electronic systems are being designed.

The Award will be presented at the plenary session of the 2013 DATE Conference, to be held 18-22 March in Grenoble, France.


Anwendungen der Drei-Phasen Verkehrstheorie zur intelligenten Verkehrssteuerung

Nach einer kurzen Vorstellung der Forschungsarbeiten der Daimler AG gibt der Vortrag einen Überblick über die Kerner'sche Drei-Phasen-Verkehrstheorie und einige ihrer Anwendungen. Basierend auf gemessenen Verkehrsdaten vieler Jahre werden die empirischen Eigenschaften von Verkehrszusammenbrüchen und deren Folgen dargelegt.

Das Verständnis der zeitlich-räumlichen Eigenschaften des Verkehrs führte zu Anwendungen, die bis zu einem online-Betrieb ausgebaut wurden. Aktuelle Beispiele aus dem Car-2-X-Feldversuch SIMTD zeigen und bestätigen Aussagen und Anwendungen dieser Verkehrstheorie.

Dr. Hubert Rehborn ist Manager für Group Research and Advanced Engineering Telematics System Functions and Features in der Vorentwicklung Daimler AG, Stuttgart.

Solutions to optimization problems in resource constrained systems

This talk explores topics that relate to methods and techniques applicable for solving optimization problems that emerge from resource constrained systems. It addresses both deterministic problems, characterized by crisp decision variables, and stochastic problems, where decisions are described by probability distributions.

The presentation will include an overview of the most popular solution methods and two novel methodologies: Randomized Search method for solving hard non-linear, non-convex combinatorial problems and generalized stochastic Petri net (GSPN) based framework for stochastic problems.

The second part of the talk focuses on solutions of exact problems. First, we address a problem of energy efficient scheduling and allocation in heterogeneous multi-processor systems. The solution uses GSPN framework to address the problem of scheduling and allocating concurrent tasks when execution and arrival times are described by probability distributions. Next, we present a Gaussian mixture model vector quantization technique for estimating power consumption in virtual environments. The technique uses architectural metrics of the physical and virtual machines (VM) collected dynamically to predict both the physical machine and per VM level power consumption.

Curriculum Vitae

Kresimir Mihic is a Senior Researcher in Modeling, Simulation and Optimization group, Oracle Labs. His work is in the area of optimization of complex systems, with specific interest in discrete optimization techniques and applications thereof on non-linear, non-convex multi-objective problems, for static and dynamic cases. Kresimir received D.Engr in Electrical Engineering from Stanford University in 2011.

The book Managing and Mining Sensor Data has been published as an ebook and will be available as hardcover from 28th of February 2013. The book has been supported by the collaborative research center by the authors Marco Stolpe (project B3, Artificial Intelligence) and the guest researcher Kanishka Bhaduri. They contributed the chapter on Distributed Data Mining in Sensor Networks.

Especially sensor networks provide data at different, distributed locations. For an efficient analysis new technologies need to calculate results even if communication ressources are constrained.


Database Joins on Modern Hardware

Computing hardware today provides abundant compute performance. But various I/O bottlenecks—which cannot keep up with the exponential growth of Moore's Law—limit the extent to which this performance can be harvested for data-intensive tasks, database tasks in particular. Modern systems try to hide these limitations with sophisticated techniques such as caching, simultaneous multi-threading, or out-of-order execution.

In the talk I will discuss whether/how database join algorithms can benefit from these sophisticated techniques. As I will show in the talk, database alone is not good enough to hide its own limitations. But once database algorithms are made aware of the hardware characteristics, they achieve unprecedented performance, pairing hundreds of millions of database tuples per second.

The work reported in this work has been conducted in the context of the Avalanche project at ETH Zurich and funded by the Swiss National Science Foundation (SNSF).

Algorithms and Systems for Analyzing Graph-Structured Data

Data analysis, data mining and machine learning are centrally focused on algorithms and systems for producing structure from data. In recent years, however, it has become obvious that it is just as important to look at the structure already present in the data in order to produce the best possible models. In this talk, we will give an overview of a line of research we have been pursuing towards this goal over the past years, focusing in particular on algorithms for efficient pattern discovery and prediction with graphs, applied to areas such as molecule classification or mobility analysis. Especially for the latter, we will also briefly outline how visual approaches can greatly enhance the utility of algorithmic approaches.

OS³, the Open Source Satellite Simulator, was developed as a framework for simulating various kinds of satellite-based communication, based on OMNeT++. The objective is to create a platform that makes evaluating satellite communication protocols as easy as possible. OS³ will also be able to automatically import real satellite tracks and weather data to simulate conditions at a certain point in the past or in the future, and offer powerful visualization.

OS³ will enable a comfortable analysis of complex screnarios which may be infeasible to test in reality. Starting anywhere from calculating attenuation losses for earth-bound receivers up to complex mobility scenarios, the variety of topics is only limited by creativity. For example, users will be able to test new protocols or satellite orbits and evaluate the resulting performance pertaining to SNR, bit error rate, packet loss, round trip time, jitter, reachability, and other measures.

Since OS³ will be released under a public license and will include a comprehensive documentation, users always have the possibilty to add customizations. Yet another advantage is that users will be able to share their code with the community and improve the overall quality of OS³ even further. Because OS³ is operating system independent, employing OS³ is feasible for anyone who is restricted to a specific operating system.

January  7,  2013

Resource-aware computing has become a more and more active research topic. Combining this with the increasing interest in data mining, particularly mining big data, puts our research centre at a successful track!

Distributed data usage control is about what happens to data once it is given away ("delete after 30 days;" "notify me if data is forwarded;" "copy at most twice"). In the past, we have considered the problem in terms of policies, enforcement and guarantees from two perspectives:

(a) In order to protect data, it is necessary to distinguish between content (a song by Elvis called "Love me Tender") and representations of that content (song.mp3; song.wav, etc.). This requires data flow-tracking concepts and capabilities in data usage control frameworks.

(b) These representations exist at different layers of abstraction: a picture downloaded from the internet exists as pixmap (window manager), as element in the browser-created DOM tree (application), and as cache file (operating system). This requires the data flow tracking capabilities to transcend the single layers to which they are deployed.

In distributed systems, it has turned out that another system can be seen as another set of abstraction layers, thus generalizing the basic model. Demo videos of this work are available at http://www22.in.tum.de/forschung/distributed-usage-control/.

In this talk, we present recent work on extending our approach to not only protecting entire data items but possibly also fractions of data items. This allows us to specify and enforce policies such as "not more than 20% of the data may leave the system", evidently leading to interesting questions concerning the interpretation of "20%", and if the structure of data items cannot be exploited. We present a respective model, an implementation, and first experimental results.

The German newspaper "Ruhr Nachrichten" has published an article about the Virus-Sensor developed within project B2 of the SFB 876. The full article can be found on their website.


As nowadays massive amounts of data are stored in database systems, it becomes more and more difficult for a database user to exactly retrieve data that are relevant to him: it is not easy to formulate a database query such that, on the one hand, the user retrieves all the answers that interest him, and, on the other hand, the user does not retrieve too much irrelevant data.

A flexible query answering mechanism automatically searches for informative answers: it offers the user information that is close to (but not too far away from) what the user intended. In this talk, we show how to apply generalization operators to queries; this results in a set of logically more general queries which might have more answers than the original query.

A similarity-based or a weight-based strategy can be used to obtain only answers close to the user's interest.

The German newspaper "Westdeutsche Allgemeine Zeitung" has published an article about Katharina Morik. The full article can be found on their website.


Resource-Efficient Processing and Communication in Sensor/Actuator Environments

The future of computer systems will not be dominated by personal computer like hardware platforms but by embedded and cyber-physical systems assisting humans in a hidden but omnipresent manner. These pervasive computing devices can, for example, be utilized in the home automation sector to create sensor/actuator networks supporting the inhabitants of a house in everyday life.

The efficient usage of resources is an important topic at design time and operation time of mobile embedded and cyber-physical systems. Therefore, this thesis presents methods which allow an efficient use of energy and processing resources in sensor/actuator networks. These networks comprise different nodes cooperating for a smart joint control function. Sensor/actuator nodes are typical cyber-physical systems comprising sensors/actuators and processing and communication components. Processing components of today’s sensor nodes can comprise many-core chips.

This thesis introduces new methods for optimizing the code and the application mapping of the aforementioned systems and presents novel results with regard to design space explorations for energy-efficient and embedded many-core systems. The considered many-core systems are graphics processing units. The application code for these graphics processing units is optimized for a particular platform variant with the objectives of minimal energy consumption and/or of minimal runtime. These two objectives are targeted with the utilization of multi-objective optimization techniques. The mapping optimizations are realized by means of multi-objective design space explorations. Furthermore, this thesis introduces new techniques and functions for a resource-efficient middleware design employing service-oriented architectures. Therefore, a service-oriented architecture based middleware framework is presented which comprises a lightweight service orchestration. In addition to that, a flexible resource management mechanism will be introduced. This resource management adapts resource utilization and services to an environmental context and provides methods to reduce the energy consumption of sensor nodes.

Submission deadline for WESE Workshop on Embedded and Cyber-Physical Systems Education at ESWEEK is now August 7th, 2012. For further information see http://esweek.acm.org .

The Johnson-Lindenstrauss Transform and Applications to Dimensionality Reduction

The Johnson-Lindenstrauss transform is a fundamental dimensionality reduction technique with a wide range of applications in computer science. It is given by a projection matrix that maps vectors in Rˆd to Rˆk, where k << d, while seeking to approximately preserve their norm and pairwise distances. The classical result states that k = O(1/fˆ2 log 1/p) dimensions suffice to approximate the norm of any fixed vector in Rˆn to within a factor of 1 + f with probability at least 1-p, where 0 < p,f < 1. This is a remarkable result because the target dimension is independent of d. The projection matrix is itself produced by a random process that is oblivious to the input vectors. We show that the target dimension bound is optimal up to a constant factor, improving upon a previous result due to Noga Alon. This based on joint work with David Woodruff (SODA 2011).

BIO: Dr. T.S. Jayram is a manager in the Algorithms and Computation group at IBM Almaden Research Center and currently visiting IBM India Research Lab. He is interested in the theoretical foundations of massive data sets such as data streams, and has worked on both the algorithmic aspects and their limitations thereof. The latter has led to new techniques for proving lower bounds via the information complexity paradigm. For work in this area, he has received a Research Division Accomplishment Award in Science from IBM and was invited to give a survey talk on Information Complexity at PODS 2010.

The textbook "Embedded System Design: Embedded Systems Foundations of Cyber-Physical Systems" by Prof. Dr. Peter Marwedel gets very good reviews. Embedded System Design starts with an introduction into the area and a survey of specification models and languages for embedded and cyber-physical systems. It provides a brief overview of hardware devices used for such systems and presents the essentials of system software for embedded systems, like real-time operating systems. The book also discusses evaluation and validation techniques for embedded systems. Furthermore, the book presents an overview of techniques for mapping applications to execution platforms. Due to the importance of resource efficiency, the book also contains a selected set of optimization techniques for embedded systems, including special compilation techniques. The book closes with a brief survey on testing.

Here some comments:

"This is a nice book, structured and orgnized very well. It will give you a clear understanding of design of embedded system along the way. This book is far more clear and better than the "Introduction to Embedded Systems: A Cyber-Physical Systems Approach" which is published by a Berkeley professor. I would hope that my graduate school could use this book as the primary textbook in future semesters on teaching embedded system design, instead of the "Introduction to Embedded Systems: A Cyber-Physical Systems Approach". "

"My grad school class used this book to supplement and get a different type of explanation to specifically tricky concepts. We did not use it as the main book so it was not read in it's entirety. But was very different than our primary book (author is a professor from Berkley), so it served its purpose and I am glad I bought it."


There has been a spectacular advance in our capability to acquire data and in many cases the data may arrive very rapidly. Applications processing this data have caused a renewed focus on efficiency issues of algorithms. Further, many applications can work with approximate answers and/or with probabilistic guarantees. This opens up the area of design of algorithms that are significantly time and space efficient compared to their exact counterparts.

The workshop will be held in the campus of the Technical University of Dortmund, in the Department of Computer Science, as part of the SFB 876. It is planned as a five-day event from 23rd to 27 of July and consists of only invited talks from leading experts on the subject.

The workshop aims at bringing together leading international scientists to present and discuss recent advances in the area of streaming algorithms. In the context of the sponsoring collaborative research center on the more general topic of data analysis under resource-restrictions, such algorithms are being developed as well as applied to large-scale data sets. The workshop will give all participants the opportunity to learn from each others' knowledge and to cooperate in further research on interesting theoretical as well as applied topics related to streaming algorithms.


Individuell bewegen - Das Internet der Dinge und Dienste in der Logistik

Der Vortrag Individuell bewegen - Das Internet der Dinge und Dienste gibt einen groben Überblick über den Entwicklungsstand der Forschung und Entwicklung im Bereich der hoch dezentralisierten, echtzeitfähigen Steuerung intralogistischer Systeme im Zusammenspiel mit dem überlagerten, Cloud-basierten Internet der Dienste.

Internet der Dinge

Das Internet der Dinge ist für die Logistik zunächst mit der Einführung von AutoID- Technologien und der Speicherung von Informationen am Gut oder Ladehilfsmittel - jenseits der reinen Identifikation - verbunden. Damit werden Material- und Informationsfluss vereint, Schnittstellen überbrückt und die Individualität der logistischen Entscheidungsfindung im Echtzeitbereich ermöglicht. Zentrales Ziel adäquater Entwicklungen ist die Beherrschung der ständig steigenden Komplexität logistischer Netze durch hochgradige Dezentralisierung und Autonomie der unterlagerten, echtzeitnahen Steuerungsebene. Der Bezug zum SFB 876 ergibt sich u. A. durch die Notwendigkeit, Datenmengen zu beschränken und zugleich sinnvolle, dezentrale Entscheidungen zu ermöglichen. Eine physische Umsetzung findet das Internet der Dinge in den Schwärmen autonomer Fahrzeuge der Zellularen Transportsysteme, die ebenfalls kurz im Vortrag vorgestellt werden.

Internet der Dienste

Die normative Auftragssteuerung auf Basis serviceorientierter Architekturen ist der zweite wesentliche Schritt in Richtung eines neuen, wandelbaren Logistikmanagements. Das Internet der Dienste soll Flexibilität und Dynamik jenseits starrer Prozessketten gewährleisten, aber zugleich die Standardisierung von IT und Logistik-Services ermöglichen. Im Vortrag werden einige Grundgedanken umrissen, die zum Fraunhofer-Innovationscluster Logistics Mall - Cloud Computing for Logistics führten und es wird versucht, ein Gesamtbild des Internets der Dinge und Dienste für die Logistik zu zeichnen.

The summer school is organized by the PhD students of the Integrated Research Training Group (IRTG), which is part of the University’s Collaborative Research Center (CRC) SFB 944. Within the CRC, several research groups of the biology and physics departments from the Universities of Osnabrück and Münster work closely together with a common interest in studying microcompartments as basic functional units of a variety of cells. The aim of the Summer School is to bring together distinguished scientists from different disciplines for intense scientific discussions on this topic.

Our International Summer School will take place as a conference in the Bohnenkamp-Haus at the Botanical Garden from September 21st to 22nd, 2012. The panel of invited speakers is intended to represent the variety of topics and approaches, but also the common interest in studying the function and dynamics of cellular microcompartments. Interested students and scientists from Osnabrück and elsewhere are cordially invited to join the sessions. For the PhD students of our CRC, it will be a unique opportunity to get into contact with outstanding international scientists to discuss science and share insights.


Privacy Preserving Publishing of Spatio-temporal Data Sets

Spatio-temporal datasets are becoming more and more popular due to the widespread usage of GPS enabled devices, wi-fi location technologies, and location based services that rely on them. However, location, as a highly sensitive data type also raises privacy concerns. This is due to the fact that our location can be used to infer a lot about us. Therefore special attention must be paid when publishing spatio-temporal data sets. In this seminar, I will first make a general introduction to privacy preserving data publishing and then talk about some research issues regarding privacy-preserving publishing of spatio-temporal data sets together with the proposed solutions.

The European Soccer Championship 2012 has begun and everybody wants to know who will win it. A team of graduates of the collaborative research center SFB 876 tries to answer this already before each match.

Using their Data Mining skills, they predict the outcomes of the matches in a series of Blog-Posts during the championship. Everybody is invited to follow the articles to see the evolution from raw data to prediction. Beside the prediction of the winning team itself, the whole process of retrieving data, training learning models and generating results is covered as well.

Join us on the journey and see, whether technology will succeed or soccer stays unpredictable as before.


In der Fakultät für Informatik der Technischen Universität Dortmund ist baldmöglichst eine Universitätsprofessur W2 (Praktische Informatik) Data Mining zu besetzen.

Bewerberinnen und Bewerber sollen sich in Forschung und Lehre schwerpunktmäßig der Analyse sehr großer Datenmengen, z. B. mit Spezialisierung im Bereich des Relationalen Lernens und Anwendungen in den Lebenswissenschaften widmen und darin international in besonderem Maße wissenschaftlich ausgewiesen sein. Die Mitwirkung am SFB 876, Verfügbarkeit von Information durch Analyse unter Ressourcenbeschränkung, wird erwartet, ebenso wie eine angemessene Beteiligung an der grundständigen Lehre in den Studiengängen der Fakultät für Informatik, bei der Förderung des wissenschaftlichen Nachwuchses und an der Selbstverwaltung der Technischen Universität Dortmund.

Bewerbungen mit den üblichen Unterlagen werden erbeten bis zum 07.06.2012 an die Dekanin der Fakultät für Informatik,
Prof. Dr. Gabriele Kern-Isberner,
Technische Universität Dortmund,
44221 Dortmund,
Tel.: 0231 755-2121,
Fax: 0231 755-2130,
E-Mail: dekan.cs@udo.edu


Route Planning: Energy-efficient, Constraint-respecting, and fast!

While the classical problem of computing shortest paths in a graph is still an area of active research, the growing interest in energy-efficient transportation has created a large number of new and interesting research questions in the context of route planning.

How can I find the energy-optimal path from A to B for my electric vehicle (EV)? Where are the best locations for battery switch stations such that I can get anywhere with my EV? What is the shortest path from A to B which does not exceed a total height difference of 200m? For some of these problems we exhibit their inapproximability, for others we present very efficient algorithms.

Every year Informatica Feminale offers compact teachings in Informatics (Computer Science) for women students of all types of universities and colleges as well as for women professionals interested in further training. Entering higher education, developing student careers, transition into labor market and lifelong academic learning are equally in the field of vision. Inter/national lecturers and students meet at the Summer University in Bremen to exchange, experiment and find new concepts for Informatics and related disciplines in higher education.

The 15th International Summer University is held at the University of Bremen from Monday, 20th of August 2012 until Friday, 31st of August 2012.


Algorithmic Tools for Spectral Image Annotation and Registration

Annotating microspectroscopic images by overlaying them with stained microscopic images is an essential task required in many applications of vibrational spectroscopic imaging. This talk introduces two novel tools applicable in this context. First, an image registration approach is presented that allows to locate (register) a spectral image within a larger H+E stained image, which is an essential prerequisite to annotate the spectral image. The second part introduces the interactive Lasagne annotation tool that allows to explore spectral images by highlighting regions sharing high spectral similarity using distance geometry.

New Lower Bounds and Algorithms in Distributed Computing

We study several classical graph-problems such as computing all pairs shortest paths, as well as the related problems of computing the diameter, center and girth of a network in a distributed setting. The model of distributed computation we consider is: in each synchronous round, each node can transmit a different (but short) message to each of its neighbors. For the above mentioned problems, the talk will cover algorithms running in time O(n), as well as lower bounds showing that this is essentially optimal. After extending these results to approximation algorithms and according lower bounds, the talk will provide insights into distributed verification problems. That is, we study problems such as verifying that a subgraph H of a graph G is a minimum spanning tree and it will turn out that in our setting this can take much more time than actually computing a minimum spanning tree of G. As an application of these results we derive strong unconditional time lower bounds on the hardness of distributed approximation for many classical optimization problems including minimum spanning tree, shortest paths, and minimum cut. Many of these results are the first non-trivial lower bounds for both exact and approximate distributed computation and they resolve previous open questions. Our result implies that there can be no distributed approximation algorithm for minimum spanning tree that is significantly faster than the current exact algorithm, for any approximation factor.

We now have an access to the Foundations and Trends in Machine Learning journal. Each issue has a 50~100 page tutorial/survey written by research leaders, covering important topics in machine learning.


Leysin, Switzerland, 1-6 July 2012

Deadline for grant application: 25 April, 2012
Deadline for registration: 15 May, 2012

The 2nd Summer School on Mobility, Data Mining, and Privacy is co-organized by the FP7/ICT project MODAP - Mobility, Data Mining and Privacy - and the COST Action IC0903 MOVE - Knowledge Discovery from Moving Objects. It is also supported by the FP7/Marie Curie project SEEK and by CUSO, a coordination body for western Switzerland universities

The specific focus of this edition is on privacy-aware social mining, i.e. how to discover the patterns and models of social complexity from the digital traces of our life, in a privacy preserving way.


Modeling User Navigation on the Web

Understanding how users navigate through the Web is essential for improving user experience. In contrast to traditional approaches, we study contextual and session-based models for user interaction and navigation. We devise generative models for sessions which are augmented by context variables such as timestamps, click metadata, and referrer domains. The probabilistic framework groups similar sessions and naturally leads to a clustering of the data. Alternatively, our approach can be viewed as a behavioral clustering where each user belongs to several clusters. We evaluate our approach on click logs sampled from Yahoo! News. We observe that the incorporation of context leads to interpretable clusterings in contrast to classical approaches. Conditioning the model on the context significantly increases the predictive accuracy for the next click. Our approach consistently outperforms traditional baseline methods and personalized user models.

Christoph Borchert, researcher at the Embedded Systems Group of Prof. Olaf Spinczyk and member of the SFB 876 project A4, received the Hans-Uhde-Award for outstanding accomplishments during his academic studies. Amongst other things, his master thesis is written about Development of on aspect-oriented TCP/IP-Stack for embedded systems.

The software developed in the thesis enables memory-efficient management of TCP/IP communication sessions. The aspect oriented approach guarantees easy reconfiguration of the stack to adapt to different application scenarios.

Since 1986, the Hans-Uhde-Foundation promotes science and education. Every year, outstanding academic achievements are awarded.

Optimizing Sensing: Theory and Applications

Where should we place sensors to quickly detect contamination in drinking water distribution networks? Which blogs should we read to learn about the biggest stories on the web? These problems share a fundamental challenge: How can we obtain the most useful information about the state of the world, at minimum cost?

Such sensing problems are typically NP-hard, and were commonly addressed using heuristics without theoretical guarantees about the solution quality. In this talk, I will present algorithms which efficiently find provably near-optimal solutions to large, complex sensing problems. Our algorithms exploit submodularity, an intuitive notion of diminishing returns, common to many sensing problems; the more sensors we have already deployed, the less we learn by placing another sensor. To quantify the uncertainty in our predictions, we use probabilistic models, such as Gaussian Processes. In addition to identifying the most informative sensing locations, our algorithms can handle more challenging settings, where sensors need to be able to reliably communicate over lossy links, where mobile robots are used for collecting data or where solutions need to be robust against adversaries, sensor failures and dynamic environments.

I will also present results applying our algorithms to several real-world sensing tasks, including environmental monitoring using robotic sensors, deciding which blogs to read on the web, and detecting earthquakes using community-held accelerometers.

Big data in machine learning is the future. But how to deal with data analysis and limited resources: Computational power, data distribution, energy or memory?

From 4th to 7th of September, the TU Dortmund University, Germany, will host this summer school on resource-aware machine learning. Further information and online registration at: http://sfb876.tu-dortmund.de/SummerSchool2012

Topics of the lectures include: Mining of ubiquitous data streams, criteria for efficient model selection or dealing with energy constraints... The theoretical lessons are accompanied by exercises and practical introductions: Analysis with RapidMiner and R, massively parallel programming with CUDA. A Data Mining Competition lets you test your machine learning skills on real world smartphone data.

The summer school is open for international PhD or advanced master students, who want to learn cutting edge techniques for machine learning with constrained resources.

Excellent students may apply for a student grant supporting travel and accommodation. Deadline for application is 1st of June.

February  28,  2012

The IEEE International Conference on Data Mining (ICDM) has established itself as a premier research conference in data mining. It provides a leading forum for the presentation of original research results, as well as exchange and dissemination of innovative ideas, drawing researchers and practitioners from a wide range of data mining related areas such as statistics, machine learning, pattern recognition, databases, visualization, high performance computing, and so on. By promoting novel, high quality research findings, and innovative solutions to challenging data mining problems, the conference seeks to continuously advance the state-of-the-art in data mining. Besides the technical program, the conference will feature invited talks from research and industry leaders, as well as workshops, tutorials, panels, and the ICDM data mining contest.

Dealine: June, 18th, 2012


The Ditmarsch Tale of Wonders - the dynamics of lying

We propose a dynamic logic of lying, wherein a lie is an action inducing the transformation of an information structure encoding the uncertainty of agents about their beliefs. We distinguish the treatment of an outside observer who is lying to an agent that is modelled in the system, from the case of one agent who is lying to another agent, and where both are modelled in the system. We also model bluffing, how to incorporate unbelievable lies, and lying about modal formulas. For more information, see http://arxiv.org/abs/1108.2115

The buzzword of our time, “sustainability”, is closely related to a book published 40 years ago, in 1972: “The Limits to Growth” written by an MIT project team involving Donella and Dennis Meadows. Using computer models in an attempt to quantify various aspects of the future, “Limits to Growth” has shaped new modes of thinking. The book became a bestseller and is still frequently cited when it comes to analyzing growth related to finite resources.

Objectives of the Winter School In order to give fresh impetus to the debate, the Volkswagen Foundation aims to foster new think- ing and the development of different models in all areas related to the “Limits to Growth” study at the crossroads of natural and social sciences. The Winter School “Limits to Growth Revisited” is directed specifically at 60 highly talented young scholars from all related disciplines. The Foundation intends to grant this selected group of academics the opportunity to create networks with scholars from other research communities.


Network Design and In-network Data Analysis for Energy-efficient Wireless Sensor Networks of Bridge-Monitoring Applications

In this talk, I will focus on the network design and in-network data analysis issues for energy-efficient wireless sensor networks (WSN) in the context of bridge monitoring applications. First, I will introduce the background of our research, a project funded by the U.S. National Science Foundation. Then I will discuss the history of the critical communication radius problem in wireless sensor network design, and explain our result of determinate upper and lower bounds of the critical radius for the connectivity of bridge-monitoring WSN in detail. Finally I will describe a distributed in-network data analysis algorithm for energy-efficient WSN performing iterative modal identification in bridge-monitoring applications.

Together with Kanishka Bhaduri and Hillol Kargupta, Katharina Morik has edited a special issue of the international journal Data Mining and KnowledgeDiscovery. The special issue on Data Mining for Sustainability including a comprehensive introduction is now online at http://www.springerlink.com/.


In der Fakultät für Informatik der Technischen Universität Dortmund ist baldmöglichst eine Universitätsprofessur W3 (Technische Informatik) Methodik eingebetteter Systeme (Nachfolge Peter Marwedel) zu besetzen.

Bewerberinnen und Bewerber sollen sich in Forschung und Lehre schwerpunktmäßig der Rechner- und Systemarchitektur, deren Optimierung (z. B. bzgl. der Energieeffizienz) oder deren Anwendung (z. B. in der Logistik) widmen und darin international in besonderem Maße wissenschaftlich ausgewiesen sein. Die Mitwirkung am SFB 876, Verfügbarkeit von Information durch Analyse unter Ressourcenbeschränkung, wird erwartet, ebenso wie eine angemessene Beteiligung an der grundständigen Lehre in den Studiengängen der Fakultät für Informatik, bei der Förderung des wissenschaftlichen Nachwuchses und an der Selbstverwaltung der Technischen Universität Dortmund.

Bewerbungen mit den üblichen Unterlagen werden erbeten bis zum 16.02.2012 an die Dekanin der Fakultät für Informatik,
Prof. Dr. Gabriele Kern-Isberner,
Technische Universität Dortmund,
44221 Dortmund,
Tel.: 0231 755-2121,
Fax: 0231 755-2130,
E-Mail: dekan.cs@udo.edu


In der Fakultät für Informatik der Technischen Universität Dortmund ist baldmöglichst eine Universitätsprofessur W3 (Praktische Informatik) Datenbanken und Informationssysteme (Nachfolge Joachim Biskup) zu besetzen.

Bewerberinnen und Bewerber sollen in Forschung und Lehre schwerpunktmäßig das Gebiet Datenbanken und Informationssysteme vertreten, idealerweise mit Schwerpunkt in der Verwaltung sehr großer Datenmengen, und darin international in besonderem Maße wissenschaftlich ausgewiesen sein. Die Mitwirkung am SFB 876, Verfügbarkeit von Information durch Analyse unter Ressourcenbeschränkung, wird erwartet, ebenso wie eine angemessene Beteiligung an der grundständigen Lehre in den Studiengängen der Fakultät für Informatik, bei der Förderung des wissenschaftlichen Nachwuchses und an der Selbstverwaltung der Technischen Universität Dortmund.

Bewerbungen mit den üblichen Unterlagen werden erbeten bis zum 16.02.2012 an die Dekanin der Fakultät für Informatik,
Prof. Dr. Gabriele Kern-Isberner,
Technische Universität Dortmund,
44221 Dortmund,
Tel.: 0231 755-2121,
Fax: 0231 755-2130,
E-Mail: dekan.cs@udo.edu


KI 2012, the 35th German Conference on Artificial Intelligence, taking place in Saarbrücken (Germany) from September 24th to 27th, invites original research papers, as well as workshop and tutorial proposals from all areas of AI, its fundamentals, its algorithms, and its applications. Together with the main conference, it aims at organizing a small number of high-quality workshops suitable for a large percentage of conference participants, including graduate students as well as experienced researchers and practitioners.


The slides of the presentation by Piero Bonatti on Confidentiality policies on the semantic web: Logic programming vs. Description logics are now available for Download.

Presentation abstract:

An increasing amount of information is being encoded via ontologies and knowledge representation languages of some sort. Some of these knowledge bases are encoded manually, while others are generated automatically by information extraction techniques. In order to protect the confidentiality of this information, a natural choice consists in encoding access control policies with the ontology language itself. This approach led to so-called "semantic web policies".


The first year SFB 876 ends with a selection of presentations during our Christmas Topical Seminar:

  • One year SFB - Restropect and future (Katharina Morik)
  • Star Trek 876 (Olaf Spinczyk)
  • Computer Engineers and Christmas Gifts - Like Cats and Dogs (Stefan Michaelis)
  • All around the world: Marshall islands and Micronesia (Peter Marwedel)

The christmas party of the faculty for computer science starts afterwards in front of the lecture hall.

December  12,  2011

Shortly after the installation on 11th October FACT (First G-APD Cherenkov Telescope) yielded the first data. These data is used in projekt C3. FACT was developed in collaboration with the TU Dortmund University of Wuerzburg, ETH Zurich and others. It is able to take 109 pictures per second. Further details can be found in the article.


The international summer university will take place from August 20th to August 31st 2012 in the Department for Mathematics and Informatics.
Women experts from science and practice may submit their contributions concerning recent or basic topics from the field of Computer Sciences until January 31st 2012. Proposals from the broad array of Informatics and its interdisciplinary relations are welcome. We are also looking for lecturers with contributions concerning studying, working and career. Informatica Feminale is part of the regular course program at University of Bremen. Therefore, teaching assignment can be given to lecturers. A program committee will make the selection of contributions. Course languages are German and English.
There will be several possibilities for lectures and presentations during the summer university for which we also search for contributions. Presentations with a length of 30 to 60 minutes from lecturers of all fields are welcome.
We would like to point out the carrer fair 'Jobforum' of both Informatica Feminale and Ingenieurinnen-Sommeruni on August 22nd 2012 for interested human resource representatives. Furthermore there will be various chances to talk to graduates during the whole summer university.
Informatica Feminale a place for experimentation, with the intention to develop and imply new impulses in Informatics (Computer Science). It is also aiming at professional networking of students as well as the extra occupational training of women computer scientists on an academic level.
Please forward this Call for Contributions to interested colleagues, co-workers and students.
Further information and the application form can be found here:


All female students who will soon write their theses, women interested in doing a PhD, PhD-students and those who are already postdocs are invited to the event female.2.enterprises on December 6th 2011, from 9.30 am to 4 pm, at the TechnologieZentrumDortmund.
This event offers detailed and personal insight and contact to companies, having talks with experts, and taking part in workshops for earning softskills.


The Cross-Layer Multi-Dimensional Design Space of Power, Reliability, Temperature and Voltage in Highly Scaled Geometries This talk addresses this notion of error-awareness across multiple abstraction layers – application, architectural platform, and technology – for next generation SoCs. The intent is to allow exploration and evaluation of a large, previously invisible design space exhibiting a wide range of power, performance, and cost attributes. To achieve this one must synergistically bring together expertise at each abstraction layer: in communication/multimedia applications, SoC architectural platforms, and advanced circuits/technology, in order to allow effective co-design across these abstraction layers. As an example, one may investigate methods to achieve acceptable QoS at different abstraction levels as a result of intentionally allowing errors to occur inside the hardware with the aim of trading that off for lower power, higher performance and/ or lower cost. Such approaches must be validated and tested in real applications. An ideal context for the convergence of such applications are handheld multimedia communication devices in which a WCDMA modem and an H.264 encoder must co-exist, potentially with other applications such as imaging. These applications have a wide scope, execute in highly dynamic environments and present interesting opportunities for tradeoff analysis and optimization. We also demonstrate how error awareness can be exploited at the architectural platform layer through the implementation of error tolerant caches that can operate at very low supply voltage.

Fay: Extensible Distributed Software Tracing from OS Kernels to Clusters

In this talk, I present Fay, a flexible platform for the efficient collection, processing, and analysis of software execution traces. Fay provides dynamic tracing through use of runtime instrumentation and distributed aggregation within machines and across clusters. At the lowest level, Fay can be safely extended with new tracing primitives, and Fay can be applied to running applications and operating system kernels without compromising system stability. At the highest level, Fay provides a unified, declarative means of specifying what events to trace, as well as the aggregation, processing, and analysis of those events.

We have implemented the Fay tracing platform for the Windows operating system and integrated it with two powerful, expressive systems for distributed programming. I will demonstrate the generality of Fay tracing, by showing how a range of existing tracing and data-mining strategies can be specified as Fay trace queries. Next, I will present experimental results using Fay that show that modern techniques for high-level querying and data-parallel processing of disaggregated data streams are well suited to comprehensive monitoring of software execution in distributed systems. Finally, I will show how Fay automatically derives optimized query plans and code for safe extensions from high-level trace queries that can equal or even surpass the performance of specialized monitoring tools.

November  18,  2011

Am 9. Dezember findet an der TU Dortmund eine Tagung von DPPD (Dortmunder politisch-philosophische Diskurse) mit dem Thema "Freiheit und Sicherheit" statt. Es wird dabei unter anderem auch um die Aspekte des Datenschutzes gehen. Es beginnt um 10 Uhr und endet gegen 16 Uhr. Für weitere Details zum Tagesablauf, Wegbeschreibung und Anmeldung siehe Flyer.


Confidentiality policies on the semantic web: Logic programming vs. Description logics. An increasing amount of information is being encoded via ontologies and knowledge representation languages of some sort. Some of these knowledge bases are encoded manually, while others are generated automatically by information extraction techniques. In order to protect the confidentiality of this information, a natural choice consists in encoding access control policies with the ontology language itself. This approach led to so-called "semantic web policies". The semantic web is founded on two knowledge representation languages: description logics and logic programs. In this talk we compare their expressive power as *policy* representation languages, and argue that logic programming approaches are currently more mature than description logics, although this picture may change in the near future.


Examining of possible approaches to the signal quantification for PAMONO-method

Tim Ruhe will present joint work with Katharina Morik within the IceCube collaboration (member: Wolfgang Rhode) at the International conference Astronomical Data Analysis Software & Systems XXI taking place in Paris, 6-10 November 2011. The title is "Data Mining Ice Cubes".


The Cross-Layer Multi-Dimensional Design Space of Power, Reliability, Temperature and Voltage in Highly Scaled Geometries This talk addresses this notion of error-awareness across multiple abstraction layers – application, architectural platform, and technology – for next generation SoCs. The intent is to allow exploration and evaluation of a large, previously invisible design space exhibiting a wide range of power, performance, and cost attributes. To achieve this one must synergistically bring together expertise at each abstraction layer: in communication/multimedia applications, SoC architectural platforms, and advanced circuits/technology, in order to allow effective co-design across these abstraction layers. As an example, one may investigate methods to achieve acceptable QoS at different abstraction levels as a result of intentionally allowing errors to occur inside the hardware with the aim of trading that off for lower power, higher performance and/ or lower cost. Such approaches must be validated and tested in real applications. An ideal context for the convergence of such applications are handheld multimedia communication devices in which a WCDMA modem and an H.264 encoder must co-exist, potentially with other applications such as imaging. These applications have a wide scope, execute in highly dynamic environments and present interesting opportunities for tradeoff analysis and optimization. We also demonstrate how error awareness can be exploited at the architectural platform layer through the implementation of error tolerant caches that can operate at very low supply voltage.


Time series data arise in diverse applications and their modeling poses several challenges to the data analyst. This track is concerned with the use of time series models and the associated computational methods for estimating them and assessing their fit. Special attention will be given to more recently proposed methods and models whose development made possible to attack data structures that cannot be modeled by standard methodology. Examples can arise from finance, marketing, medicine, meteorology etc.


Compressive Sensing (sparse recovery) predicts that sparse vectors can be recovered from what was previously believed to be highly incomplete linear measurements. Efficient algorithms such as convex relaxations and greedy algorithms can be used to perform the reconstruction. Remarkably, all good measurement matrices known so far in this context are based on randomness. Recently, it was observed that similar findings also hold for the recovery of low rank matrices from incomplete information, and for the matrix completion problem in particular. Again, convex relaxations and random are crucial ingredients. The talk gives an introduction and overview on sparse and low rank recovery with emphasis on results due to the speaker.

Cartification: from Similarities to Itemset Frequencies

Suppose we are given a multi-dimensional dataset. For every point in the dataset, we create a transaction, or cart, in which we store the k-nearest neighbors of that point for one of the given dimensions. The resulting collection of carts can then be used to mine frequent itemsets; that is, sets of points that are frequently seen together in some dimensions. Experimentation shows that finding clusters, outliers, cluster centers, or even subspace clustering becomes easy on the cartified dataset using state-of-the-art techniques in mining interesting itemsets.


The Next Generation of Data Mining (NGDM) Event Series explores emerging issues in the field of data mining by bringing researchers and practitioners from different fields. NGDM 2011 is co-located with ECML PKDD 2011.


The Maxine Research Virtual Machine The Maxine project is run at Oracle Labs and aims at providing a JVM that is binary compatible with the standard JVM while being implemented (almost) completely in Java. Since the open source release of the Maxine VM, it has progressed to the point where it can now run application servers such as Eclipse and Glassfish. With the recent addition of a new compiler that leverages the mature design behind the HotSpot server compiler (aka C2), the VM is on track to deliver performance on par with the HotSpot VM. At the same time, its adoption by VM researchers and enthusiasts is increasing. That is, we believe the productivity advantages of system level programming in Java are being realized. This talk will highlight and demonstrate the advantages of both the Maxine architecture and of meta-circular JVM development in general.


The annual ACM SIGKDD conference is the premier international forum for data mining researchers and practitioners from academia, industry, and government to share their ideas, research results and experiences. KDD-2011 will feature keynote presentations, oral paper presentations, poster sessions, workshops, tutorials, panels, exhibits, demonstrations, and the KDD Cup competition. KDD-2011 will run from August 21-24 in San Diego, CA and will feature hundreds of practitioners and academic data miners converging on the one location.


As part of project C1 - Feature selection in high dimensional data for risk prognosis in oncology - several new feature selection algorithms have been developed and publicly released. During his visit at the SFB, Viswanath Sivakumar implemented these algorithms as an extension to Rapidminer. The implementations are available for download on Sourceforge: RM-Featselext

  • Fast Correlation Based Filter (FCBF)
  • Shrunken Centroids – Prediction Analysis for Microarrays (PAM)
  • Backward Elimination via Hilbert-Schmidt Independence Criterion (BAHSIC)
  • Dense Relevance Attribute Group Selector (DRAGS)
  • Consensus Group Stable Feature Selector (CGS)


A report about the SFB's work including presentation of exemplary projects has been published in the newsletter of the MODAP-Project, privacy on the move. MODAP focuses on preserving privacy for mobility data in mobile networks. The newsletter can be found as a PDF on the MODAP website.


Large media collections rapidly evolve in the World Wide Web. In addition to the targeted retrieval as is performed by search engines, browsing and explorative navigation is an important issue. Since the collections grow fast and authors most often do not annotate their web pages according to a given ontology, automatic structuring is in demand as a prerequisite for any pleasant human–computer interface. In this paper, we investigate the problem of finding alternative high-quality structures for navigation in a large collection of high-dimensional data. We express desired properties of frequent termset clustering (FTS) in terms of objective functions. In general, these functions are conflicting. This leads to the formulation of FTS clustering as a multi-objective optimization problem. The optimization is solved by a genetic algorithm. The result is a set of Pareto-optimal solutions. Users may choose their favorite type of a structure for their navigation through a collection or explore the different views given by the different optimal solutions.We explore the capability of the new approach to produce structures that are well suited for browsing on a social bookmarking data set.


The workshop is about „IT-Applications in the Ion Mobility Spectrometry – State of the technology, challenges and new features“. At the focus are TB1 as well as the cooperation with TU Dortmund, B&S Analytik, KIST Europe and MPII / University of Saarbrücken. The workshop starts on 3.8.2011 at 3pm and ends on 4.8.2011 at 1pm. It takes place at KIST Europe, Campus E7 1, 66123 Saarbrücken. For information on the work at KIST Europe and how to get there please visit www.kist-europe.com.


The slides of Gerd Brewka's speech on "Multi-Context Systems: Integrating Heterogeneous Knowledge Bases" are now available.


Prof. Peter Marwedel (Part Project Manager of the SFB 876 Part Projekts A3, A4 and B2) runs a tutorial on "Embedded System Foundations of Cyber-Physical Systems" in Beijing on August 8th 2011. For further information see http://www.artist-embedded.org/artist/Schedule,2321.html .


The next workshop on embedded system education will take place in Taipei on Oct. 13th, 2011 (during ESWEEK). The paper submission deadline is approaching. Please submit your paper by July 22nd. Details are enclosed.


The bio.dortmund event at the 28th of September starting at 10.00 o'clock brings together regional players in bio technology. At the Leibniz-Institut für Analytische Wissenschaften ISAS Dortmund presentations and posters showcase recent research in bio technology.
The SFB 876 presents a short introduction in the data analysis in biomedical applications.


Energy-Aware COmputing (EACO) Beyond the State of the Art Purpose: To bring together researchers and engineers with interests in energy-aware computing for discussions to identify intellectual challenges that can be developed into collaborative research projects. We strive to go significantly beyond the state of the art.


Graphics processor (GPU) architectures: Graphics processor (GPU) architectures have evolved rapidly in recent years with increasing performance demanded by 3D graphics applications such as games. However, challenges exist in integrating complex GPUs into mobile devices because of power and energy constraints, motivating the need for energy efficiency in GPUs. While a significant amount of power optimization research effort has concentrated on the CPU system, GPU power efficiency is a relatively new and important area because the power consumed by GPUs is similar in magnitude to CPU power. Power and energy efficiency can be introduced into GPUs at many different levels: (i) Hardware component level - queue structures, caches, filter arithmetic units, interconnection networks, processor cores, etc., can be optimized for power. (ii) Algorithm level – the deep and complex graphics processing computation pipeline can be modified to be energy aware. Shader programs written by the user can be transformed to be energy aware. (iii) System level - co-ordination at the level of task allocation, voltage and frequency scaling, etc.

This workshop intends to bring together researchers from different research areas such as bioinformatics, biostatistics and systems biology, who are interested in modeling and analysis of biological systems or in the development of statistical methods with applications in biology and medicine.


In October the SFB will held its internal workshop on the latest results in research. The recent advances in resource constraint data analysis will be presented as well as hands on-sessions on tools and methodology.
(Agenda download SFB876-members only)

Strategies for Scaling Data Mining Algorithms In today’s world, data is collected/generated at an normous rate in a variety of disciplines starting from mechanical systems e.g. airplanes, cars, etc., sensor networks, Earth sciences, to social networks e.g. facebook. Many of the existing data analysis algorithms do not scale to such large datasets. In this talk, first I will discuss a technique for speeding up such algorithms by distributing the workload among the nodes of a cluster of computers or a multicore computer. Then, I will present a highly scalable distributed regression algorithm relying on the above technique which adapts to changes in the data and converges to the correct result. If time permits, I also plan to discuss a scalable outlier detection algorithm which is at least an order of magnitude faster than the existing methods. All of the algorithms that I discuss will offer provable correctness guarantees compared to a centralized execution of the same algorithm. Regression Algorithms for Large Scale Earth Science Data There has been a tremendous increase in the volume of Earth Science data over the last decade. Data is collected from modern satellites, in-situ sensors and different climate models. Information extraction from such rich data sources using advanced data mining and machine learning techniques is a challenging task due to their massive volume. My research focuses on developing highly scalable machine learning/algorithms, often using distributed computing setups like parallel/cluster computing. In this talk I will discuss regression algorithms for very large data sets from the Earth Science domain. Although simple linear regression techniques are based on decomposable computation primitives, and therefore are easily parallelizable, they fail to capture the non-linear relationships in the training data. In this talk, I will describe Block-GP, a scalable Gaussian Process regression framework for multimodal data, that can be an order of magnitude more scalable than existing state-of-the-art nonlinear regression algorithms.


Multi-Context Systems: A Flexible Approach for Integrating Heterogeneous Knowledge Sources In this talk we give an overview on multi-context systems (MCS) with a special focus on their recent nonmonotonic extensions. MCS provide a flexible, principled account of integrating heterogeneous knowledge sources, a task that is becoming more and more relevant. By a knowledge source we mean a knowledge base (KB) formulated in any of the typical knowledge representation languages, including classical logic, description logics, modal or temporal logics, but also nonmonotonic formalisms like logic programs under answer set semantics or default logic. The basic idea is to describe the information flow among different KBs declaratively, using so-called bridge rules. The semantics of MCS is based on the definition of an equilibrium. We will motivate the need for such systems, describe what has been achieved in this area, discuss work in progress and introduce generalizations of the existing framework which we consider useful.


Network Coding for resource-efficient operation of mobile clouds: The mobile communication architecture is changing dramatically, from formerly fully centralized systems, the mobile devices are getting connected among each other forming so called mobile clouds. One of the key technologies for mobile clouds is network coding. Network coding changes the way how mobile communication systems will be designed in the future. In contrast to source or channel coding, network coding is not end to end oriented, but allows on the fly recoding. The talk will advocate the need of network coding for mobile clouds.

Graphics processor (GPU) architectures: Graphics processor (GPU) architectures have evolved rapidly in recent years with increasing performance demanded by 3D graphics applications such as games. However, challenges exist in integrating complex GPUs into mobile devices because of power and energy constraints, motivating the need for energy efficiency in GPUs. While a significant amount of power optimization research effort has concentrated on the CPU system, GPU power efficiency is a relatively new and important area because the power consumed by GPUs is similar in magnitude to CPU power. Power and energy efficiency can be introduced into GPUs at many different levels: (i) Hardware component level - queue structures, caches, filter arithmetic units, interconnection networks, processor cores, etc., can be optimized for power. (ii) Algorithm level – the deep and complex graphics processing computation pipeline can be modified to be energy aware. Shader programs written by the user can be transformed to be energy aware. (iii) System level - co-ordination at the level of task allocation, voltage and frequency scaling, etc., requires knowledge and control of several different GPU system components. We outline two strategies for applying energy optimizations at different levels of granularity in a GPU. (1) Texture Filter Memory is an energy-efficient an augmentation of the standard GPU texture cache hierarchy. Instead of a regular data cache hierarchy, we employ a small first level register based structure that is optimized for the relatively predictable memory access stream in the texture filtering computation. Power is saved by avoiding the expensive tag lookup and comparisons present in regular caches. Further, the texture filter memory is a very small structure, whose access energy is much smaller than a data cache of similar performance. (2) Dynamic Voltage and Frequency Scaling, an established energy management technique, can be applied in GPUs by first predicting the workload in a given frame, and, where sufficient slack exists, lowering the voltage and frequency levels so as to save energy while still completing the work within the frame rendering deadline. We apply DVFS in a tiled graphics renderer, where the workload prediction and voltage/frequency adjustment is performed at a tile-level of granularity, which creates opportunities for on-the-fly correction of prediction inaccuracies, ensuring high frame rates while still delivering low power.


The planned presentation of Prof. Bonatti has to be canceled due to personal reasons of the presenter.


Network Coding for resource-efficient operation of mobile clouds: The mobile communication architecture is changing dramatically, from formerly fully centralized systems, the mobile devices are getting connected among each other forming so called mobile clouds. One of the key technologies for mobile clouds is network coding. Network coding changes the way how mobile communication systems will be designed in the future. In contrast to source or channel coding, network coding is not end to end oriented, but allows on the fly recoding. The talk will advocate the need of network coding for mobile clouds.


We observe that in diverse applications ranging from stock trading to traffic monitoring, data streams are continuously monitored by multiple analysts for extracting patterns of interest in real-time. Such complex pattern mining requests cover a broad range of popular mining query types, including detection of clusters, outliers, nearest neighbors, and top-k requests. These analysts often submit similar pattern mining requests yet customized with different parameter settings. In this work, we exploit classical principles for core database technology, namely, multi-query optimization, now in the context of data mining.

Emerging and envisioned applications within domains such as indoor navigation, fire-fighting, and precision agriculture still pose challenges for existing positioning solutions to operate accurately, reliably, and robustly in a variety of environments and conditions and under various application-specific constraints. This talk will first give a brief overview of efforts made in a Danish project to address challenges as mentioned above, and will subsequently focus on addressing the energy constraints imposed by Location-based Services (LBS), running on mobile user devices such as smartphones. A variety of LBS, including services for navigation, location-based search, social networking, games, and health and sports trackers, demand the positioning and trajectory tracking of smartphones. To be useful, such tracking has to be energy-efficient to avoid having a major impact on the battery life of the mobile device, since the battery capacity in modern smartphones is a scarce resource, and is not increasing at the same pace as new power-demanding features, including various positioning sensors, are added to such devices. We present novel on-device sensor management and trajectory updating strategies which intelligently determine when to sample different on-device positioning sensors (accelerometer, compass and GPS) and when data should be sent to a remote server and to which extent to simplify it beforehand in order to save communication costs. The resulting system is provided as uniform framework for both position and trajectory tracking and is configurable with regards to accuracy requirements. The effectiveness of our approach and the energy savings achievable are demonstrated both by emulation experiments using real-world data and by real-world deployments.


The ArtistDesign European Network of Excellence on Embedded Systems Design is organizing the 7th edition of it's highly successful "ARTIST Summer School in Europe", September 4-9th 2011 (http://www.artist-embedded.org/artist/-ARTIST-Summer-School-Europe-2011-.ht ml - funded by the European Commission). This is the seventh edition of yearly schools on embedded systems design, and is meant to be exceptional in terms of both breadth of coverage and invited speakers. This school brings together some of the best lecturers from Europe, USA and China in a 6-day programme, and will be a fantastic opportunity for interaction. It will be held in beautiful Aix-les-Bains, near Grenoble - France (see webpage for details and photos). Past participants are also encouraged to apply! The ARTIST Summer School 2011 will be held near Grenoble by the magnificent Lac du Bourget and the French Alps in the historic city of Aix-les-Bains. It features a luxury spa with full services, pool, sauna, hammam, tennis courts and open space. The social programme includes ample time for discussion, and a visit to the historic city of Annecy with a gala dinner while touring the lake of Annecy. Deadline for applications is May 15th 2011. Attendance is limited, so we will be selecting amongst the candidates. Registration fees include the technical and social programmes, 6 days' meals and lodging (2-3 persons/room) from dinner Saturday Sept 3rd through Friday 9th lunch, social programme, and bus transport from/to the St Exupéry or Geneva airports. The registration fee only partially covers the costs incurred. The remaining costs are covered by the European Commission?s 7th Framework Programme ICT. The programme will offer world-class courses and significant opportunities for interaction with leading researchers in the area:

  • Professor Tarek Abdelzaher (University of Illinois at Urbana Champaign - USA) Challenges in Human-centric Sensor Networks
  • Professor Sanjoy Baruah (University of North Carolina at Chapel Hill - USA) Certification-cognizant scheduling in integrated computing environments
  • Professor Luca Benini (University of Bologna - Italy) Managing MPSoCs beyond their Thermal Design Power
  • Professor Rastislav Bodik (UC Berkeley, USA) Automatic Programming Revisited
  • Dr. Fabien Clermidy (CEA - France) Designing Network-on-Chip based multi-core heterogeneous System-on-Chip: the MAGALI experience
  • Professor Peter Druschell (Max Planck Institute for Software Systems - Germany) Trust and Accountability in Social Systems
  • Professor Rolf Ernst (TU Braunschweig - Germany) Mixed safety critical system design and analysis
  • Professor Babak Falsafi (EPFL - Switzerland)
  • Professor Martti Forsell (VTT - Finland) Parallelism, programmability and architectural support for them on multi-core machines
  • Professor Kim Larsen (University of Aalborg - Denmark) Timing and Performance Analysis of Embedded Systems
  • Professor Yunhao Liu (Tsinghua University/HKUST - China) GreenOrbs: Lessons Learned from Extremely Large Scale Sensor Network Deployment
  • Professor Alberto Sangiovanni-Vincentelli (UC Berkeley - USA) Mapping abstract models to architectures: automatic synthesis across layers of abstraction
  • Professor Janos Sztipanovits (Vanderbilt University - USA) Domain Specific Modeling Languages for Cyber Physical Systems: Where are Semantics Coming From?
  • Prof. Dr. Lothar Thiele (ETH Zurich, Switzerland) Temperature-aware Scheduling


Mapping of applications to MPSoCs is one of the hottest topics resulting from the availability of multi-core processors. The ArtistDesign workshop on this topic has become a key event for discussing approaches for solving the problems. This year, the workshop will again be held back-to-back with the SCOPES workshop.
Recent technological trends have led to the introduction of multi-processor systems on a chip (MPSoCs). It can be expected that the number of processors on such chips will continue to increase. Power efficiency is frequently the driving force having a strong impact on the architectures being used. As a result, heterogeneous architectures incorporating functional units optimized for specific functions are commonly employed. This technological trend has dramatic consequences on the design technology. Techniques are required, which map sets of applications onto architectures of MPSoCs.
Deadline for Abstract Submissions is April, 22nd.


We observe that in diverse applications ranging from stock trading to traffic monitoring, data streams are continuously monitored by multiple analysts for extracting patterns of interest in real-time. Such complex pattern mining requests cover a broad range of popular mining query types, including detection of clusters, outliers, nearest neighbors, and top-k requests. These analysts often submit similar pattern mining requests yet customized with different parameter settings. In this work, we exploit classical principles for core database technology, namely, multi-query optimization, now in the context of data mining.


The new Collaborative Research Center SFB 876 "Providing Information by Resource-Constrained Data Analysis" starts the new year with a kick-off colloquium. The colloquium takes place on January 20th 2011 starting at 4 pm at auditorium E23, Otto-Hahn-Straße 14, TU Dortmund University campus. For further information about the program and speeches please have a look at the attachment.

At this time, no futher applications for open positions at the SFB 876 are being accepted.

November  16,  2010

The DFG granted the SFB 876.

Newsletter RSS Twitter