Event Date: March 28, 2013 16:15
Patterns that Matter -- MDL for Pattern Mining
by Matthijs van Leeuwen
Pattern mining is one of the best-known concepts in the field of exploratory data mining. A big problem, however, is that humongous amounts of patterns can be mined even from very small datasets. This hinders the knowledge discovery process, as it is impossible for domain experts to manually analyse so many patterns.
In this seminar I will show how compression can be used to address the pattern explosion. We argue that the best pattern set is that set of patterns that compresses the data best. Based on an analysis from MDL (Minimum Description Length) perspective, we introduce a heuristic algorithm, called Krimp, that approximates the best set of patterns. High compression ratios and good classification scores confirm that Krimp constructs pattern-based summaries that are highly characteristic for the data.
Our MDL approach to pattern mining is very generic and can be used to take on a large number of problems in knowledge discovery. One such example is change detection in data streams. I will show how sudden changes in the underlying data distribution of a data stream can be detected using compression, and argue that this can be generalised to concept drift and other slower forms of change.
CV
Matthijs van Leeuwen is a post-doctoral researcher in the Machine Learning group at the KU Leuven. His main interests are pattern mining and related data mining problems; how can we identify patterns that matter? To this end, the Minimum Description Length (MDL) principle and other information theoretic concepts often proof to be very useful.
Matthijs defended his Ph.D. thesis titled 'Patterns that Matter' in February 2010, which he wrote under the supervision of prof.dr. Arno Siebes in the Algorithmic Data Analysis group (Universiteit Utrecht). He received the ECML PKDD 2009 'Best student paper award', and runner-up best student paper at CIKM 2009. His current position is supported by a personal Rubicon grant from the Netherlands Organisation for Scientific Research (NWO).
He was co-chair of MPS 2010, a Lorentz workshop on Mining Patterns and Subgroups, and IID 2012, the ECML PKDD 2012 workshop on Instant and Interactive Data Mining. Furthermore, he was demo co-chair of ICDM 2012 and is currently poster chair of IDA 2013.