The main objective of project C4 is the development of highly efficient regression approaches. We want to make modern statistical regression methods scalable to very large and high-dimensional data sets and settings where computational resources are scarce.
We focus on algorithmic approaches that can be efficiently implemented in streaming as well as in distributed environments. In particular, we develop methods to aggregate data and to reduce the number of observations using, e.g., random linear projections and sampling, as well as methods to reduce the dimensionality of the underlying, possibly Bayesian, model classes.
Sketching and sampling methods for regression approaches on large-scale data are important areas of research with many interesting open questions. Although basic models are well studied, research on complex and modern statistical methods has just begun. We pursue the study of novel data reduction techniques for, e.g., Bayesian generalised linear models, and aim at the challenging objective of unifying their algorithmic treatment to provide blueprints for broad statistical settings.
Munteanu/etal/2022a | Munteanu, Alexander and Omlor, Simon and Peters, Christian. p-Generalized Probit Regression and Scalable Maximum Likelihood Estimation via Sketching and Coresets. In The 25th International Conference on Artificial Intelligence and Statistics (AISTATS), 2022. |
Munteanu/etal/2022b | Munteanu, Alexander and Omlor, Simon and Song, Zhao and Woodruff, David P.. Bounding the Width of Neural Networks via Coupled Initialization - A Worst Case Analysis. In Proceedings of the 39th International Conference on Machine Learning (ICML), 2022. |
Madjar/etal/2021a | Madjar, Katrin and Zucknick, Manuela and Ickstadt, Katja and Rahnenführer, Jörg. Combining heterogeneous subgroups with graph-structured variable selection priors for Cox regression. In BMC Bioinform., Vol. 22, No. 1, pages 586, 2021. |
Munteanu/etal/2021a | Munteanu, Alexander and Omlor, Simon and Woodruff, David P.. Oblivious Sketching for Logistic Regression. In Proceedings of the 38th International Conference on Machine Learning (ICML), 2021. |
Parry/etal/2021a | Parry, Katharina and Geppert, Leo N. and Munteanu, Alexander and Ickstadt, Katja. Cross-Leverage Scores for Selecting Subsets of Explanatory Variables. In arXiv e-prints, Vol. abs/2109.08399, 2021. |
Geppert/etal/2020a | Geppert, Leo N. and Ickstadt, Katja and Munteanu, Alexander and Sohler, Christian. Streaming statistical models via Merge & Reduce. In International Journal of Data Science and Analytics, Vol. 10, No. 4, pages 331-347, 2020. |
Krivosija/Munteanu/2019a | Krivo\vsija, Amer and Munteanu, Alexander. Probabilistic smallest enclosing ball in high dimensions via subgradient sampling. In Proceedings of the 35th International Symposium on Computational Geometry (SoCG), pages 47:1--47:14, 2019. |
Meintrup/etal/2019a | Meintrup, Stefan and Munteanu, Alexander and Rohde, Dennis. Random projections and sampling algorithms for clustering of high-dimensional polygonal curves. In Advances in Neural Information Processing Systems 32 (NeurIPS), pages 12807--12817, 2019. |
Munteanu/etal/2019a | Munteanu, Alexander and Nayebi, Amin and Poloczek, Matthias. A Framework for Bayesian Optimization in Embedded Subspaces. In Proceedings of the 36th International Conference on Machine Learning (ICML), Vol. 97, pages 4752--4761, Long Beach, California, USA, PMLR, 2019. |
Tietz/etal/2019a | Tietz, Tobias and Selinski, Silvia and Golka, Klaus and Hengstler, Jan G. and Gripp, Stephan and Ickstadt, Katja and Ruczinski, Ingo and Schwender, Holger. Identification of interactions of binary variables associated with survival time using survivalFS. In Archives of Toxicology, Vol. 93, No. 3, pages 585--602, 2019. |
Wigmann/etal/2019a | Wigmann, Claudia and Lange, Laura and Vautz, Wolfgang and Ickstadt, Katja. Modelling and Classification of GC/IMS Breath Gas Measurements for Lozenges of Different Flavours. In Applications in Statistical Computing, pages 31--48, Springer, 2019. |
Ickstadt/etal/2018a | Ickstadt, Katja and Schäfer, Martin and Zucknick, Manuela. Toward Integrative Bayesian Analysis in Molecular Biology. In Annual Review of Statistics and Its Application, Vol. 5, No. 1, pages 141-167, 2018. |
Molina/etal/2018a | Molina, Alejandro and Munteanu, Alexander and Kersting, Kristian. Core Dependency Networks. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI), 2018. |
Munteanu/etal/2018a | Munteanu,Alexander and Schwiegelshohn, Chris and Sohler, Christian and Woodruff, David P.. On Coresets for Logistic Regression. In Advances in Neural Information Processing Systems 31 (NeurIPS), 2018. |
Munteanu/Schwiegelshohn/2018a | Munteanu, Alexander and Schwiegelshohn, Chris. Coresets - Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms. In KI - Künstliche Intelligenz, Vol. 32, No. 1, pages 37-53, 2018. |
Weihs/Ickstadt/2018a | Weihs, Claus and Ickstadt, Katja. Data Science: the impact of statistics. In International Journal of Data Science and Analytics, Springer, 2018. |
Geppert/etal/2017a | Geppert, Leo N. and Ickstadt, Katja and Munteanu, Alexander and Quedenfeld, Jens and Sohler, Christian. Random projections for Bayesian regression. In Statistics and Computing, Vol. 27, No. 1, pages 79-101, 2017. |
Schlieker/etal/2017a | Schlieker, Laura and Telaar, Anna and Lueking, Angelika and Schulz-Knappe, Peter and Theek, Carmen and Ickstadt, Katja. Multivariate binary classification of imbalanced datasets - A case study on high-dimensional multiplex autoimmune assay data. In Biometrical Journal, 2017. |
Treppmann/etal/2017a | Treppmann, Tabea and Ickstadt, Katja and Zucknick, Manuela. Integration of multiple genomic data sources in a Bayesian Cox model for variable selection and prediction. In Computational and Mathematical Methods in Medicine, Vol. Vol. 2017, pages 1-19, 2017. |
Huels/etal/2016a | Hüls, Anke and Krämer, Ursula and Stolz, Sabine and Hennig, Frauke and Hoffmann, Barbara and Ickstadt, Katja and Vierkötter, Andrea and Schikowski, Tamara. Applicability of the Global Lung Initiative 2012 Reference Values for Spirometry for Longitudinal Data of Elderly Women. In PLOS ONE, Vol. 11, No. 6, pages e0157569, 2016. |
Koellmann/etal/2016a | Köllmann, Claudia and Ickstadt, Katja and Fried, Roland. Beyond unimodal regression: modelling multimodality with piecewise unimodal regression or deconvolution models. arXiv:1606.01666 [stat.AP], 2016. |
Munteanu/Wornowizki/2015a | Munteanu, Alexander and Wornowizki, Max. Correcting statistical models via empirical distribution functions. In Computational Statistics, Vol. 31, No. 2, pages 465-495, Springer, 2016. |
Koellmann/etal/2014a | Köllmann, Claudia and Bornkamp, Björn and Ickstadt, Katja. Unimodal regression using Bernstein-Schoenberg-splines and penalties. In Biometrics, Vol. 70, No. 4, pages 783-793, 2014. |
Koellmann/etal/2014b | Köllmann, Claudia and Ickstadt, Katja and Fried, Roland. Beyond unimodal regression: modelling multimodality with piecewise unimodal, mixture or additive regression. No. 8, TU Dortmund, 2014. |
Schwiegelshohn/Sohler/2014a | Chris Schwiegelshohn and Christian Sohler. Logistic Regression for Datastreams. No. 1, TU Dortmund, 2014. |
Binder/etal/2012a | Binder, Harald and Müller, Tina and Schwender, Holger and Golka, Klaus and Steffens, Michael and Hengstler, Jan G. and Ickstadt, Katja and Schumacher, Martin. Cluster-localized sparse logistic regression for SNP data. In Statistical Applications in Genetics and Molecular Biology, Vol. 11, No. 4, 2012. |
Canzar/etal/2011c | Canzar, Stefan and Marschall, Tobias and Rahmann, Sven and Schwiegelshohn, Chris. Solving The Minimum String Cover Problem. In David A. Bader and Petra Mutzel (editors), Proceedings of the SIAM Meeting on Algorithm Engineering and Experiments (ALENEX'12), pages 75--83, 2012. |
Koellmann/etal/2012a | Köllmann, Claudia and Bornkamp, Björn and Ickstadt, Katja. Unimodal regression using Bernstein-Schoenberg-splines and penalties. No. 6, TU Dortmund, 2012. |
Lohr/etal/2012a | Lohr, M. and Köllmann, C. and Freis, E. and Hellwig, B. and Hengstler, J. G. and Ickstadt, K. and Rahnenführer, J.. Optimal strategies for sequential validation of significant features from high-dimensional genomic data. In Journal of Toxicology and Environmental Health, Part A, Vol. 75, No. 8-10, pages 447-460, 2012. |
Schwender/etal/2012a | Schwender, Holger and Selinski, Silvia and Blaszkewicz, Meinolf and Marchan, Rosemarie and Ickstadt, Katja and Golka, Klaus and Hengstler, Jan G.. Distinct SNP combinations confer susceptibility to urinary bladder cancer in smokers and non-smokers. In Plos One, Vol. 7, No. 12, 2012. |
Ickstadt/etal/2011b | Ickstadt, Katja and Bornkamp, Björn and Grzegorczyk, Marco and Wieczorek, Jakob and Sheriff, M.Rahuman and Grecco, Hérnan E. and Zamir, Eli. Nonparametric Bayesian Networks (with discussion). In Bernardo, José M. and Bayarri, M. J. and Berger, James O. and Dawid, A. Philip and Heckerman, David and Smith, Adrian F. M. and West, M. (editors), Bayesian Statistics, Vol. 9, pages 283-316, 2011. |
Schwender/etal/2011a | Schwender, Holger and Ruczinski, Ingo and Ickstadt, Katja. Testing SNPs and sets of SNPs for importance in association studies. In Biostatistics, Vol. 12, No. 1, pages 18-32, 2011. |
Sohler/Woodruff/2011a | Sohler, Christian and Woodruff, David P.. Subspace embeddings for the \(L_1\)-norm with applications. In Proceedings of the 43rd ACM Symposium on Theory of Computing (STOC), pages 755-764, ACM, 2011. |
Geppert/2018a | Geppert, Leo Nikolaus. Bayesian and Frequentist Regression Approaches for Very Large Data Sets. TU Dortmund, 2018. |
Munteanu/2018a | Munteanu, Alexander. On large-scale probabilistic and statistical data analysis. TU Dortmund, 2018. |
Koellmann/2016a | Köllmann, Claudia. Unimodal spline regression and its use in various applications with single or multiple modes. TU Dortmund, 2016. |
Bornkamp/etal/2010a | B. Bornkamp and K. Ickstadt and D. B. Dunson. Stochastically ordered multiple regression. In Biostatistics, Vol. 11, No. 3, pages 419-431, 2010. |
Feldman/etal/2010a | Dan Feldman and Morteza Monemizadeh and Christian Sohler and David Woodruff. Coresets and Sketches for High Dimensional Subspace Approximation Problems. In Proceedings 21st Annual ACM-SIAM Symposium on Discrete Algorithms, pages 630-649, 2010. |
Bornkamp/etal/2009a | B. Bornkamp and A. Fritsch and O. Kuss and K. Ickstadt. Penalty specialists among goalkeepers: A nonparametric Bayesian analysis of 44 years of German Bundesliga. In B. Schipp and W. Krämer (editors), Statistical Inference, Econometric Analysis and Matrix Algebra: Festschrift in Honour of Götz Trenkler, pages 63-76, Physica Verlag, 2009. |
Bornkamp/Ickstadt/2009b | Bornkamp, Björn and Ickstadt, Katja. Bayesian nonparametric estimation of continuous monotone functions with applications to dose-response analysis. In Biometrics, Vol. 65, pages 198 -- 205, 2009. |
Frahling/etal/2008a | Gereon Frahling and Piotr Indyk and Christian Sohler. Sampling in Dynamic Data Streams and Applications. In International Journal of Computational Geometry and Applications (Special Issue with selected papers from the 21st ACM Symposium on Computational Geometry), Vol. 18, No. 1/2, pages 3 -- 28, 2008. |
Schwender/Ickstadt/2008a | Schwender, H. and Ickstadt, K.. Identification of SNP interactions using logic regression. In Biostatistics, Vol. 9, pages 187 -- 198, 2008. |
Feldman/etal/2007a | Dan Feldman and Morteza Monemizadeh and Christian Sohler. A PTAS for k-means clustering based on weak coresets. In Proceedings of the 23rd ACM Symposium on Computational Geometry, pages 11-18, 2007. |
Fritsch/2007a | Fritsch, A. und Ickstadt, K.. Comparing logic regression based methods for identifying SNP interactions. In Hochreiter, S. and Wagner, R. (editors), Bioinformatics in Research and Development, Springer, 2007. |
Nunkesser/etal/2007a | Nunkesser, R. and Bernholt, T. and Schwender, H. and Ickstadt, K. and Wegener, I.. Detecting high-order interactions of single nucleotide polymorphisms using genetic programming. In Bioinformatics, Vol. 23, pages 3280 -- 3288, 2007. |
Ickstadt/Wolpert/99a | K. Ickstadt and R. L. Wolpert. Spatial regression for marked point processes. In J. M. Bernardo and J. O. Berger and A. P. Dawid and A. F. M. Smith (editors), Bayesian Statistics 6, pages 323-341, Oxford, Oxford University Press, 1999. |
Wolpert/Ickstadt/98a | R. L. Wolpert and K. Ickstadt. Poisson/Gamma random field models for spatial statistics. In Biometrika, Vol. 85, pages 251-267, 1998. |