• German

Main Navigation

Machine Learning and Modern Hardware

At the 2017 International Symposium on Physical Design, machine learning and artificial intel- ligence in general were granted a session in which Pradeep Dubey (Intel) advocated a "quest for the ultimate learning machine". At the International Conference on Computer Design in the same year, FPGAs were claimed to accelerate predictions and, at the same time, require the redesign of the overall prediction pipeline. High throughput and low latency are more important for prediction than for training. For the execution of learned models, particularly for inference, modern hard- ware (such as an FPGA) has become decisive for business applications, because the energy and communication savings add up to enormous sums of money, when the model is massively applied day in, day out.

Fast training that uses less computation and communication motivates the use of FPGAs within the machine learning community. Fast and resource-restricted inference on FPGAs is also in demand in edge computing settings. While research combining machine learning and hardware was rare at the beginning of our Collaborative Research Centre, it has become multifaceted, receives more attention now, and is still developing.

It is the natural task of the platform project A4 to measure and analyse the performance of various architectures. Now, the project has even produced a cyber-physical node, the PhyNode. It is a slave board with memory, radio communication, some sensors, and a solar cell, which is almost energy-neutral due to energy harvesting and energy efficiency. Through a master board, the PhyNode can be managed. In the future, this opportunity will be used for machine learning and inference.

In the particle physics project C5, the acceleration of inference through a combination of FPGA and GPU hardware is important, because the trigger for storing events needs to be very fast. Combinations of different architectures and programming frameworks are being tried. For instance, a cluster of ARM processors for streaming data is combined with Hadoop-based analysis.

In project A1, work on the real-time and low-energy execution of Gaussian processes and decision trees on FPGAs has begun, motivated by the astrophysical telescope array (project C3), where sensor data have to be filtered immediately so that telescopes may react to a perceived source. Enhancing algorithms for storage and execution of decision tree ensembles such as random forests is ongoing work.

  • In the proposed next phase of the CRC, A1 wants to develop a model of learning for random forests and deep feed-forward networks such that FPGAs become capable of tailoring compute hardware on demand.
  • Learning tasks need to be analysed such that they can be executed on massively parallelised cores without sacrificing the guarantees of the learning model. This is an issue in several projects, especially in A1, C3, and C5.
  • Modern hardware will remain a hot topic, especially but not exclusively for the projects A1, A3, B2, and C5.