• German
German

Main Navigation

C4  Regression approaches for large-scale high-dimensional data


Ickstadt.JPG
Prof. Dr. Ickstadt, Katja
Sohler.JPG
Prof. Dr. Sohler, Christian
The scalability of modern regression approaches is often stretched to its limits when applying them to big data or in embedded systems. The goal of this project is therefore the development of highly efficient regression methods. We pursue the design of algorithms to reduce the number of observations for generalized linear and Bayesian regression models using, e.g. random linear projections and sampling. Furthermore we develop methods to solve nonparametric regression models under resource constraints imposed on their description complexity and structural constraints, e.g. monotonicity.

Project management:

Prof. Dr. Ickstadt, Katja
Prof. Dr. Sohler, Christian

Project members:

Geppert, Leo N.
Dr. Köllmann, Claudia
Munteanu, Alexander

Alumni:

Dr. Driemel, Anne
König, Helena

Software:

RaProR - Random Projections for Bayesian linear regression (R package)

Publications:

Rekowski/etal/2017a Rekowski, Jan and Köllmann, Claudia and Bornkamp, Björn and Ickstadt, Katja and Scherag, André. Phase II Dose-Response Trials: A Simulation Study to Compare Analysis Method Performance under Design Considerations. In Journal of Biopharmaceutical Statistics, 2017.


Huels/etal/2016a Hüls, Anke and Krämer, Ursula and Stolz, Sabine and Hennig, Frauke and Hoffmann, Barbara and Ickstadt, Katja and Vierkötter, Andrea and Schikowski, Tamara. Applicability of the Global Lung Initiative 2012 Reference Values for Spirometry for Longitudinal Data of Elderly Women. In PLOS ONE, Vol. 11, No. 6, pages e0157569, 2016.


Koellmann/2016a Köllmann, Claudia. Unimodal spline regression and its use in various applications with single or multiple modes. Dissertation, Faculty of Statistics, TU Dortmund University, 2016.


Koellmann/etal/2016a Köllmann, C. and Ickstadt, K. and Fried, R.. Beyond unimodal regression: modelling multimodality with piecewise unimodal regression or deconvolution models. In ArXiv e-prints, 2016.


Munteanu/Wornowizki/2015a Alexander Munteanu and Max Wornowizki. Correcting statistical models via empirical distribution functions. In Computational Statistics, Vol. 31, No. 2, pages 465-495, Springer, 2016.


Geppert/etal/2015a Geppert, Leo and Ickstadt, Katja and Munteanu, Alexander and Quedenfeld, Jens and Sohler Christian. Random projections for Bayesian regression. In Statistics and Computing, 2015.


Geppert/etal/2014a Leo Geppert and Katja Ickstadt and Alexander Munteanu and Christian Sohler. Random projections for Bayesian regression. No. 4, TU Dortmund, 2014.


Koellmann/etal/2014a Köllmann, Claudia and Bornkamp, Björn and Ickstadt, Katja. Unimodal regression using Bernstein-Schoenberg-splines and penalties. In Biometrics, Vol. 70, No. 4, pages 783-793, 2014.


Koellmann/etal/2014b Köllmann, Claudia and Ickstadt, Katja and Fried, Roland. Beyond unimodal regression: modelling multimodality with piecewise unimodal, mixture or additive regression. No. 8, TU Dortmund, 2014.


Munteanu/Wornowizki/2014a Alexander Munteanu and Max Wornowizki. Demixing empirical distribution functions. No. 2, TU Dortmund, 2014.


Schwiegelshohn/Sohler/2014a Chris Schwiegelshohn and Christian Sohler. Logistic Regression for Datastreams. No. 1, TU Dortmund, 2014.


Binder/etal/2012a Binder, Harald and Müller, Tina and Schwender, Holger and Golka, Klaus and Steffens, Michael and Hengstler, Jan G. and Ickstadt, Katja and Schumacher, Martin. Cluster-localized sparse logistic regression for SNP data. In Statistical Applications in Genetics and Molecular Biology, Vol. 11, No. 4, 2012.


Canzar/etal/2011c Canzar, Stefan and Marschall, Tobias and Rahmann, Sven and Schwiegelshohn, Chris. Solving The Minimum String Cover Problem. In David A. Bader and Petra Mutzel (editors), Proceedings of the SIAM Meeting on Algorithm Engineering and Experiments (ALENEX'12), pages 75--83, 2012.


Koellmann/etal/2012a Köllmann, Claudia and Bornkamp, Björn and Ickstadt, Katja. Unimodal regression using Bernstein-Schoenberg-splines and penalties. No. 6, TU Dortmund, 2012.


Lohr/etal/2012a Lohr, M. and Köllmann, C. and Freis, E. and Hellwig, B. and Hengstler, J. G. and Ickstadt, K. and Rahnenführer, J.. Optimal strategies for sequential validation of significant features from high-dimensional genomic data. In Journal of Toxicology and Environmental Health, Part A, Vol. 75, No. 8-10, pages 447-460, 2012.


Schwender/etal/2012a Schwender, Holger and Selinski, Silvia and Blaszkewicz, Meinolf and Marchan, Rosemarie and Ickstadt, Katja and Golka, Klaus and Hengstler, Jan G.. Distinct SNP combinations confer susceptibility to urinary bladder cancer in smokers and non-smokers. In Plos One, Vol. 7, No. 12, 2012.


Ickstadt/etal/2011b Ickstadt, Katja and Bornkamp, Björn and Grzegorczyk, Marco and Wieczorek, Jakob and Sheriff, M.Rahuman and Grecco, Hérnan E. and Zamir, Eli. Nonparametric Bayesian Networks (with discussion). In Bernardo, José M. and Bayarri, M. J. and Berger, James O. and Dawid, A. Philip and Heckerman, David and Smith, Adrian F. M. and West, M. (editors), Bayesian Statistics, Vol. 9, pages 283-316, 2011.


Schwender/etal/2011a Schwender, Holger and Ruczinski, Ingo and Ickstadt, Katja. Testing SNPs and sets of SNPs for importance in association studies. In Biostatistics, Vol. 12, No. 1, pages 18-32, 2011.


Sohler/Woodruff/2011a Sohler, Christian and Woodruff, David P.. Subspace embeddings for the \(L_1\)-norm with applications. In Lance Fortnow and Salil P. Vadhan (editors), Proceedings of the 43rd ACM Symposium on Theory of Computing (STOC), pages 755-764, ACM, 2011.



Final Thesis:

Lategahn/2016a Lategahn, Niels. Vergleich von Methoden zur Auswahl von Beobachtungen bei Regression mit fehlenden Y-Werten. TU Dortmund, 2016.


Mueller/2016a Müller, Steffen. Untersuchung von Regression auf eingebetteten Datens¨atzen unter Verwendung von verschiedenen Abstandsnormen und Penalisierungstermen. TU Dortmund, 2016.


Horn/2015a Simon Horn. Analyse von Metabolom-Daten der Arzneipanze Duboisia: Hauptkomponentenanalyse, Clusterung und Peakidentifzierung. TU Dortmund, 2015.


Lange/2015a Laura Lange. Analyse von GC/IMS-Atemluftmessungen unter Berücksichtigung verschiedener Atemerfrischer. TU Dortmund, 2015.


Rathjens/2015a Rathjens, Jonathan. Hierarchische Bayes-Regression bei Einbettung großer Datensätze. TU Dortmund, 2015.


mueller/2013a Müller, Steffen. Untersuchung der praktischen Anwendbarkeit von unimodaler Regression auf diverse naturwissenschaftliche Datensätze. TU Dortmund, 2013.


Okroy/2013a Okroy, Lena. Untersuchung der praktischen Anwendbarkeit von nichtlinearer Regression auf verschiedene Datensätze. TU Dortmund, 2013.


Quedenfeld/2013a Quedenfeld, Jens. Experimentelle Analyse verschiedener linearer \(\ell_2\)-Einbettungen von dünn besetzten Eingabedaten. TU Dortmund, 2013.


Jabs/2012a Jabs, Verena. Vergleich von Methoden zur Dimensionsreduktion unter Berücksichtigung der Rechenzeit und des Speicherbedarfs. TU Dortmund, 2012.


Rueppert/2012a Rüppert, Andreas. LASSO Regression für große Datenmengen. TU Dortmund, 2012.


Zhu/2012a Qingchui Zhu. Datenstromalgorithmen für Regression. TU Dortmund, 2012.


  • Lategahn/2016a - Vergleich von Methoden zur Auswahl von Beobachtungen bei Regression mit fehlenden Y-Werten
  • Mueller/2016a - Untersuchung von Regression auf eingebetteten Datens¨atzen unter Verwendung von verschiedenen Abstandsnormen und Penalisierungstermen
  • Horn/2015a - Analyse von Metabolom-Daten der Arzneipanze Duboisia: Hauptkomponentenanalyse, Clusterung und Peakidentifzierung
  • Lange/2015a - Analyse von GC/IMS-Atemluftmessungen unter Berücksichtigung verschiedener Atemerfrischer
  • Rathjens/2015a - Hierarchische Bayes-Regression bei Einbettung großer Datensätze
  • mueller/2013a - Untersuchung der praktischen Anwendbarkeit von unimodaler Regression auf diverse naturwissenschaftliche Datensätze
  • Okroy/2013a - Untersuchung der praktischen Anwendbarkeit von nichtlinearer Regression auf verschiedene Datensätze
  • Quedenfeld/2013a - Experimentelle Analyse verschiedener linearer \(\ell_2\)-Einbettungen von dünn besetzten Eingabedaten
  • Jabs/2012a - Vergleich von Methoden zur Dimensionsreduktion unter Berücksichtigung der Rechenzeit und des Speicherbedarfs
  • Rueppert/2012a - LASSO Regression für große Datenmengen
  • Zhu/2012a - Datenstromalgorithmen für Regression

Preliminary Work:

Bornkamp/etal/2010a B. Bornkamp and K. Ickstadt and D. B. Dunson. Stochastically ordered multiple regression. In Biostatistics, Vol. 11, No. 3, pages 419-431, 2010.


Feldman/etal/2010a Dan Feldman and Morteza Monemizadeh and Christian Sohler and David Woodruff. Coresets and Sketches for High Dimensional Subspace Approximation Problems. In Proceedings 21st Annual ACM-SIAM Symposium on Discrete Algorithms, pages 630-649, 2010.


Bornkamp/etal/2009a B. Bornkamp and A. Fritsch and O. Kuss and K. Ickstadt. Penalty specialists among goalkeepers: A nonparametric Bayesian analysis of 44 years of German Bundesliga. In B. Schipp and W. Krämer (editors), Statistical Inference, Econometric Analysis and Matrix Algebra: Festschrift in Honour of Götz Trenkler, pages 63-76, Physica Verlag, 2009.


Bornkamp/Ickstadt/2009b Bornkamp, Björn and Ickstadt, Katja. Bayesian nonparametric estimation of continuous monotone functions with applications to dose-response analysis. In Biometrics, Vol. 65, pages 198 -- 205, 2009.


Frahling/etal/2008a Gereon Frahling and Piotr Indyk and Christian Sohler. Sampling in Dynamic Data Streams and Applications. In International Journal of Computational Geometry and Applications (Special Issue with selected papers from the 21st ACM Symposium on Computational Geometry), Vol. 18, No. 1/2, pages 3 -- 28, 2008.


Schwender/Ickstadt/2008a Schwender, H. and Ickstadt, K.. Identification of SNP interactions using logic regression. In Biostatistics, Vol. 9, pages 187 -- 198, 2008.


Feldman/etal/2007a Dan Feldman and Morteza Monemizadeh and Christian Sohler. A PTAS for k-means clustering based on weak coresets. In Proceedings of the 23rd ACM Symposium on Computational Geometry, pages 11-18, 2007.


Fritsch/2007a Fritsch, A. und Ickstadt, K.. Comparing logic regression based methods for identifying SNP interactions. In Hochreiter, S. and Wagner, R. (editors), Bioinformatics in Research and Development, Springer, 2007.


Nunkesser/etal/2007a Nunkesser, R. and Bernholt, T. and Schwender, H. and Ickstadt, K. and Wegener, I.. Detecting high-order interactions of single nucleotide polymorphisms using genetic programming. In Bioinformatics, Vol. 23, pages 3280 -- 3288, 2007.


Frahling/Sohler/2005a Frahling, Gereon and Sohler, Christian. Coresets in dynamic geometric data streams. In Harold N. Gabow and Ronald Fagin (editors), Proceedings of the 37th Annual ACM Symposium on Theory of Computing, pages 209--217, ACM, 2005.


Ickstadt/Wolpert/99a K. Ickstadt and R. L. Wolpert. Spatial regression for marked point processes. In J. M. Bernardo and J. O. Berger and A. P. Dawid and A. F. M. Smith (editors), Bayesian Statistics 6, pages 323-341, Oxford, Oxford University Press, 1999.


Wolpert/Ickstadt/98a R. L. Wolpert and K. Ickstadt. Poisson/Gamma random field models for spatial statistics. In Biometrika, Vol. 85, pages 251-267, 1998.