Event Date: December 1, 2011 16:15
Fay: Extensible Distributed Software Tracing from OS Kernels to Clusters
In this talk, I present Fay, a flexible platform for the efficient collection, processing, and analysis of software execution traces. Fay provides dynamic tracing through use of runtime instrumentation and distributed aggregation within machines and across clusters. At the lowest level, Fay can be safely extended with new tracing primitives, and Fay can be applied to running applications and operating system kernels without compromising system stability. At the highest level, Fay provides a unified, declarative means of specifying what events to trace, as well as the aggregation, processing, and analysis of those events.
We have implemented the Fay tracing platform for the Windows operating system and integrated it with two powerful, expressive systems for distributed programming. I will demonstrate the generality of Fay tracing, by showing how a range of existing tracing and data-mining strategies can be specified as Fay trace queries. Next, I will present experimental results using Fay that show that modern techniques for high-level querying and data-parallel processing of disaggregated data streams are well suited to comprehensive monitoring of software execution in distributed systems. Finally, I will show how Fay automatically derives optimized query plans and code for safe extensions from high-level trace queries that can equal or even surpass the performance of specialized monitoring tools.