• German

Main Navigation

Hoang Vu Nguyen, Karlsruhe Institute of Technology, OH 14, E23

Event Date: October 16, 2014 16:15

Non-parametric Methods for Correlation Analysis in Multivariate Data
Knowledge discovery in multivariate data often is involved in analyzing the relationship of two or more dimensions. Correlation analysis with its root in statistics is one of the most effective approaches towards addressing the issue.

In this seminar, I will present some non-parametric methods for correlation analysis in multivariate data. I will focus on real-valued data where probability density functions (pdfs) are in general not available at hand. Instead of estimating them, we propose to work with cumulative distribution functions (cdfs) and cumulative entropy - a new concept of entropy for real-valued data.

For the talk, I will first discuss two methods for scalable mining of correlated subspaces in large high dimensional data. Second, I will introduce an efficient and effective non-parametric method for computing total correlation - a well-known correlation measure based on Shannon entropy. This method is based on discretization and hence, can be perceived as a technique for correlation-preserving discretization (compression) of multivariate data. Lastly, I will go beyond correlation analysis and present our ongoing research in multivariate causal inference.

Hoang-Vu Nguyen is working as a PhD candidate in the Institute for Program Structures and Data Organization (IPD) - Chair Prof. Böhm, Karlsruhe Institute of Technology (KIT). Before joining KIT, he obtained his Master's and Bachelor's degrees from Nanyang Technological University (NTU), Singapore.

His research lies in the junction between theory and practice. Currently, he is focusing on scalable multivariate correlation analysis with applications in data mining. He develops efficient and practical computation methods for correlation measures, and applies them in clustering, outlier detection, mining big data, schema extraction, graph mining, time series analysis, etc.

Newsletter RSS Twitter