Distributed data mining has been pushed more than 10 years ago. The terms peer-to-peer data analysis and ubiquitous knowledge discovery in data were coined the next generation of data mining in 2008. The IoT now puts the subject again on the agenda. Google engineers in India describe federated learning as the collaboration of local models with training data remaining on the device and not transmitted to the cloud . With the new architecture, the mobile device downloads the current model and improves it by learning from data on the device. The device generates a summary of what it has learned and sends it to the cloud where it becomes aggregated with other user summaries to refine the shared model. This renaissance fits to the IoT related machine learning in CRC 876. In the industrial production project B3, several distributed learning algorithms have been used for learning from distributed sensors in factories or in traffic systems with special consideration of privacy demands. The resource to be saved is communication between the nodes. Efficient communication has also been explored in the platform project A4 with respect to large-scale IoT warehouse systems.
Data summary or aggregation is necessary in order to learn from distributed sensor streams. The algorithm project A2 succeeded in theoretically well-based sketching or sampling for clustering data streams. Its coresets are used to study distributed data analysis with respect to resource constraints. Summaries with a fixed memory size are developed in the data analysis project A1 and synopses from CERN data are studied in the particle physics project C5. Distributed analysis needs to take care of real-time and communication constraints. Its particular paradigm of federated learning supports the achievements of the current funding phase and will be extended.