Event Date: November 18, 2021 16:15
GraphAttack+MAPLE: Optimizing Data Supply for Graph Applications on In-Order Multicore Architectures
Abstract - Graph structures are a natural representation for data generated by a wide range of sources. While graph applications have significant parallelism, their pointer indirect accesses to neighbor data hinder scalability. A scalable and efficient system must tolerate latency while leveraging data parallelism across millions of vertices. Existing solutions have shortcomings; modern OoO cores are area- and energy-inefficient, while specialized accelerator and memory hierarchy designs cannot support diverse application demands.In this talk we will describe a full-stack data supply approach, GraphAttack, that accelerates graph applications on in-order multi-core architectures by mitigating latency bottlenecks. GraphAttack's compiler identifies long-latency loads and slices programs along these loads into Producer/Consumer threads to map onto pairs of parallel cores. A specialized hardware unit shared by each core pair, called Memory Access Parallel-Load Engine (MAPLE), allows tracking and buffering of asynchronous loads issued by the Producer whose data are used by the Consumer. In equal-area comparisons via simulation, GraphAttack outperforms OoO cores, do-all parallelism, prefetching, and prior decoupling approaches, achieving a 2.87x speedup and 8.61x gain in energy efficiency across a range of graph applications. These improvements scale; GraphAttack achieves a 3x speedup over 64 parallel cores. Our approach has been further validated on a dual-core FPGA prototype running applications with full SMP Linux, where we have demonstrated speedups of 2.35x and 2.27x over software-based prefetching and decoupling, respectively. Lastly, this approach has been taped out in silicon as part of a manycore chip design.
Short bio
Esin Tureci is an Associate Research Scholar in the Department of Computer Science at Princeton University, working with Professor Margaret Martonosi. Tureci works on a range of research problems in computer architecture design and verification including hardware-software co-design of heterogeneous systems targeting efficient data movement, design of efficient memory consistency model verification tools and more recently, optimization of hybrid classical-quantum computing approaches. Tureci has a PhD in Biophysics from Cornell University and has worked as a high-frequency algorithmic trader prior to her work in Computer Science.
www.cs.princeton.edu/
Aninda Manocha is currently a Computer Science PhD student at Princeton University advised by Margaret Martonosi. Her broad area of research is computer architecture, with specific interests in data supply techniques across the computing stack for graph and other emerging applications with sparse memory access patterns. These techniques span hardware-software co-designs and memory systems. She received her B.S. degrees in Electrical and Computer Engineering and Computer Science from Duke University in 2018 and is a recipient of the NSF Graduate Research Fellowship.
Marcelo Orenes Vera is a PhD candidate in the Department of Computer Science at Princeton University advised by Margaret Martonosi and David Wentzlaff. He received his BSE from University of Murcia. Marcelo is interested in hardware innovations that are modular, to make SoC integration practical. His research focuses on Computer Architecture, from hardware RTL design and verification to software programming models of novel architectures.He has previously worked in the hardware industry at Arm, contributing to the design and verification of three GPU projects. At Princeton, he has contributed in two academic chip tapeouts that aims to improve the performance, power and programmability of several emerging workflows in the broad areas of Machine Learning and Graph Analytics.