Menu+

Research Projects

Large data volumes and small devices. These at first glance contradicting problems are approached in twelve projects, concentrating on different topics divided in three areas. Each single project is described at the Project Overview, including a list of recent publications.

Projects Part B

Logo B-projects The projects of part B focus on the embedded systems side of the data analysis domain. Research follows the complete path, starting at the embedded systems and sensors, via feature extraction and data aggregation, up to prognosis supporting the decision process. In many scenarios the technical system and the data analysis are coupled tightly. For example, results of the data analysis is fed back into the development of sensors delivering the data.

Example project B4: Analysis and Communication for dynamic traffic prognosis

Imagine traffic without congestion. In-car navigation systems should be intelligent enough to prevent bottlenecks even before they occur. The latest state-of-the-art GPS navigation systems already provide and utilise information of mobile networks. A navigation device itself can upload the car's current state, i.e. speed, position, and direction via mobile networks, or mobile network operators can enrich this data by detecting movements of their customers.

Nevertheless, certain additional factors influence traffic flow, e.g. destination, vehicle acceleration or even weather conditions. In many cases modern cars allow to derive information about these conditions via the car's internal control and communication systems. Information about current weather therefore may be derived from the status of windscreen wipers. Accordingly, a constant stream of information bits is produced locally, which can increase precision of global object stream behavior prediction.

These myriads of sensor sources need to be efficiently distributed and synchronised to provide the near-realtime view of the flow of all moving objects. Mobile networks, especially in high-density urban areas, are the major resource to transmit the local sensor data. But even the most capable mobile networks will not be able to naively transmit the complete raw sensor information, not to speak of wasting scarce resources when transmitting data useless for predicting the stream behaviors. Before distributing local data, the information has to be filtered and aggregated to increase the ratio of information to data size.

Projects Part C

Logo C-projects The projects of part C cover resource-constraints, which are not implied by small devices, but by the dimensionality or amount of data. As more and more data is measured, information is more difficult to extract and even large data centers may not provide sufficient computation capacity. The projects of part C feature multiple real-world scenarios, from medical applications to astrophysics. The decision process will be supported by reliable extraction of information.

Application in medical research often uses data about patients as input for information retrieval. Typically, the amount of patients in clinical studies is limited and very small compared to the number of data about each patient, e.g. gene or exon data. A selection of reliable features to gain stable predictions is of major importance in these scenarios.

In complement, for astrophysics the problem is inverted: Detection of rare events in streams of huge data, looking for the needle in the haystack. Intelligent filters need to be applied to the stream of data to detect and mark events of interest.

Projects Part A

Logo A-projects Finally, projects of the third core area, part A, bridge the gap between both areas of research, from embedded systems to masses of data. General data mining algorithms will be developed and tested in the scenarios from part B and C. Research covers support vector algorithms, distributed clustering, data stream algorithms or structural models like Conditional Random Fields.

Example project A1: Data Mining for Ubiquitous System Software

Nearly everybody carries an embedded system around, often 24 hours a day. Mobile phones and especially smartphones have achieved a level of ubiquity unthinkable of just ten years ago. But the convenience brought by faster CPUs, larger displays and faster network connections intensely impacts battery lifetime. Often this is unnecessary: Non-optimal background processes burn through CPU cycles without need.

The goal of project A1 is to decrease energy consumption, start-up time, and response time of mobile embedded systems through mining system data. Gathering system data follows the Aspect-Oriented Programming approach. Methods for data reduction are evaluated with respect to minimizing storage requirements and maximizing the benefit for data mining. Learning usage patterns is investigated, both, locally and in terms of distributed data mining, considering data streams, structural models, and kernel functions. Learned patterns are used to constantly adjust the system software such that the resource demand is decreased.

The stream of kernel function call data is analyzed using a Graphics Processing Unit (GPU) for a new, parallel set of algorithms realizing Conditional Random Fields. Future smartphones will be equipped with a suitable GPU. More learning algorithms will be implemented in a massively parallel version.