• German
German

Bibtype Inproceedings Buschjaeger/Honysz/2020a Buschjäger, Sebastian and Honysz, Philipp-Jan and Morik, Katharina Generalized Isolation Forest: Some Theory and More Applications -- Extended Abstract Proceedings 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA 2020) IEEE Isolation Forest is a popular outlier detection algorithm that isolates outlier observations from regular observations by building multiple random decision trees. Multiple extensions enhance the original Isolation Forest algorithm including the Extended Isolation Forest which allows for non-rectangular splits and the SCiForest which improves the fitting of individual trees. All these approaches rate the outlierness of an observation by its average path-length. However, we find a lack of theoretical explanation on why these isolation-based algorithms offer such good practical performance. In this paper, we present a theoretical framework that describes the effectiveness of isolation-based approaches from a distributional viewpoint. We show that these algorithms fit a mixture of distributions, where the average path length of an observation can be viewed as a (somewhat crude) approximation of the mixture coefficient. Using this framework, we derive the Generalized Isolation Forest (GIF) which also trains random trees, but combining them moves beyond using the average path-length. In an extensive evaluation of over $350,000$ experiments, we show that GIF outperforms the other methods on a variety of datasets while having comparable runtime. 2020 SFB876-A1