• German
  • >
German >

Main Navigation

Collaborative Research Center SFB 876 - Providing Information by Resource-Constrained Data Analysis

The collaborative research center SFB876 brings together data mining and embedded systems. On the one hand, embedded systems can be further improved using machine learning. On the other hand, data mining algorithms can be realized in hardware, e.g. FPGAs, or run on GPGPUs. The restrictions of ubiquitous systems in computing power, memory, and energy demand new algorithms for known learning tasks. These resource bounded learning algorithms may also be applied on extremely large data bases on servers.

  Annual meeting of DFG SPP 1736: Algorithms for BIG DATA in Dortmund

From 26th to 28th of September the annual meeting of the DFG-SPP 1736: Algorithms for BIG DATA will be held in Dortmund. SPP members of the TU Dortmund are Johannes Fischer, Oliver Koch and Petra Mutzel. The SFB 876 participates via invited talks of Katharina Morik and Sangkyun Lee.

Focus of the SPP:

Computer systems pervade all parts of human activity and acquire, process, and exchange data at a rapidly increasing pace. As a consequence, we live in a Big Data world where information is accumulating at an exponential rate and often the real problem has shifted from collecting enough data to dealing with its impetuous growth and abundance. In fact, we often face poor scale-up behavior from algorithms that have been designed based on models of computation that are no longer realistic for big data.

While it is getting more and more difficult to build faster processors, the hardware industry keeps on increasing the number of processors/cores per board or graphics card, and also invests into improved storage technologies. However, all these investments are in vain, if we lack algorithmic methods that are able to efficiently utilize additional processors or memory features.

more ...

  Interpretable Domain Adaptation via Optimization over the Stiefel Manifold

In domain adaptation, the goal is to find common ground between two, potentially differently distributed, data sets. By finding common concepts present in two sets of words pertaining to different domains, one could leverage the performance of a classifier for one domain for use on the other domain. We propose a solution to the domain adaptation task, by efficiently solving an optimization problem through Stochastic Gradient Descent. We provide update rules that allow us to run Stochastic Gradient Descent directly on a matrix manifold: the steps compel the solution to stay on the Stiefel manifold. This manifold encompasses projection matrices of word vectors onto low-dimensional latent feature representations, which allows us to interpret the results: the rotation magnitude of the word vector projection for a given word corresponds to the importance of that word towards making the adaptation. Beyond this interpretability benefit, experiments show that the Stiefel manifold method performs better than state-of-the-art methods.

Published at the European Conference for Machine Learning ECML 2016 by Christian Poelitz, Wouter Duivesteijn, Katharina Morik

more ...

DockHa - Personal Hadoop cluster on Docker Swarm in minutes

Analysing Big Data typically involves developing for or comparing to Hadoop. For researching new algorithms, a personal Hadoop cluster, running independently of other software or other Hadoop clusters, should provide a sealed environment for testing and benchmarking. Easy setup, resizing and stopping enables rapid prototyping on a containerized playground.

DockHa is a project developed at the Artificial Intelligence Group, TU Dortmund University, that aims to simplify and automate the setup of independent Hadoop clusters in the SFB 876 Docker Swarm cluster. The Hadoop properties and setup parameters can be modified to suit the application. More information can be found in the software section (DockHa) and the Bitbucket repository (DockHa-Repository).

more ...

Show news archive
Newsletter RSS Twitter