Menu+

Distributed Analysis

Distributed data mining has been pushed more than 10 years ago. The terms peer-to-peer data analysis and ubiquitous knowledge discovery in data were coined the next generation of data mining in 2008. The IoT now puts the subject again on the agenda. Google engineers in India describe federated learning as the collaboration of local models with training data remaining on the device and not transmitted to the cloud . With the new architecture, the mobile device downloads the current model and improves it by learning from data on the device. The device generates a summary of what it has learned and sends it to the cloud where it becomes aggregated with other user summaries to refine the shared model. This renaissance fits to the IoT related machine learning in CRC 876. In the industrial production project B3, several distributed learning algorithms have been used for learning from distributed sensors in factories or in traffic systems with special consideration of privacy demands. The resource to be saved is communication between the nodes. Efficient communication has also been explored in the platform project A4 with respect to large-scale IoT warehouse systems.

Data summary or aggregation is necessary in order to learn from distributed sensor streams. The algorithm project A2 succeeded in theoretically well-based sketching or sampling for clustering data streams. Its coresets are used to study distributed data analysis with respect to resource constraints. Summaries with a fixed memory size are developed in the data analysis project A1 and synopses from CERN data are studied in the particle physics project C5. Distributed analysis needs to take care of real-time and communication constraints. Its particular paradigm of federated learning supports the achievements of the current funding phase and will be extended.

The distributed analysis with data summaries, model merging, and model update will become a topic in the proposed next phase for A1, A2 and C4.
A4 shifts to heterogeneous IoT networks for distributed logistic systems using distributed multi-radio access points. Similarly, C3 moves to an array of telescopes in project C3 and, hence, needs distributed data analysis.
Parallel distributed systems are investigated by A6 in the course of developing graph kernels and by C5 for distributed storage of massive data streams.
Project B4 will integrate reinforcements learning for routing and signalling into the distributed vehicle to vehicle communication.

Main Navigation

Distributed Analysis