Hauptnavigation

SFB 876 - News

C4  Regression approaches for large-scale high-dimensional data


katja.ickstadt.jpg
Prof. Dr. Ickstadt, Katja
christian.sohler.jpg
Prof. Dr. Sohler, Christian
The scalability of modern regression approaches is often stretched to its limits by a large number of observations and/or variables. This aggravates their use in embedded systems. The goal of this project is therefore the development of highly efficient regression methods. We pursue the development of algorithms to reduce the number of observations using, e.g., random linear projections and sampling (streaming algorithms), as well as the development of methods to reduce the dimensionality of the underlying, possibly Bayesian, model classes imposing structural constraints, e.g., monotonicity.

Project management:

Prof. Dr. Ickstadt, Katja 
Prof. Dr. Sohler, Christian 

Project members:

Geppert, Leo 
König, Helena 
Schmidt, Melanie 
Schwiegelshohn, Chris 

Publications:

Canzar/etal/2011c Canzar, Stefan and Marschall, Tobias and Rahmann, Sven and Schwiegelshohn, Chris. Solving The Minimum String Cover Problem. In ALENEX, 2011.


Ickstadt/etal/2011b Ickstadt, Katja and Bornkamp, Björn and Grzegorczyk, Marco and Wieczorek, Jakob and Sheriff, M.Rahuman and Grecco, Hérnan E. and Zamir, Eli. Nonparametric Bayesian Networks. In Bernardo, José M. and Bayarri, M. J. and Berger, James O. and Dawid, A. Philip and Heckerman, David and Smith, Adrian F. M. and et al. (editors), Bayesian Statistics, Vol. 9, pages 283-316, 2011.


Lohr/etal/2011a Lohr, M. and Köllmann, C. and Freis, E. and Hellwig, B. and Hengstler, J. G. and Ickstadt, K. and Rahnenführer, J.. Optimal strategies for sequential validation of significant features from high-dimensional genomic data. In Journal of Toxicology and Environmental Health, Part A, 2011.


Schwender/etal/2011a Schwender, Holger and Ruczinski, Ingo and Ickstadt, Katja. Testing SNPs and sets of SNPs for importance in association studies. In Biostatistics, Vol. 12, No. 1, pages 18-32, 2011.


Sohler/Woodruff/2011a Sohler, Christian and Woodruff, David P.. Subspace embeddings for the $L_1$-norm with applications. In Lance Fortnow and Salil P. Vadhan (editors), STOC, pages 755-764, ACM, 2011.



Preliminary work:

Bornkamp/etal/2010a B. Bornkamp and K. Ickstadt and D. B. Dunson. Stochastically ordered multiple regression. In Biostatistics, 2010.


Feldman/etal/2010a Dan Feldman and Morteza Monemizadeh and Christian Sohler and David Woodruff. Coresets and Sketches for High Dimensional Subspace Approximation Problems. In Proceedings 21st Annual ACM-SIAM Symposium on Discrete Algorithms, pages 630-649, 2010.


Bornkamp/etal/2009a B. Bornkamp and A. Fritsch and O. Kuss and K. Ickstadt. Penalty specialists among goalkeepers: A nonparametric Bayesian analysis of 44 years of German Bundesliga. In B. Schipp and W. Krämer (editors), Statistical Inference, Econometric Analysis and Matrix Algebra: Festschrift in Honour of Götz Trenkler, pages 63-76, Physica Verlag, 2009.


Bornkamp/Ickstadt/2009b Bornkamp, Björn and Ickstadt, Katja. Bayesian nonparametric estimation of continuous monotone functions with applications to dose-response analysis. In Biometrics, Vol. 65, pages 198 -- 205, 2009.


Frahling/etal/2008a Gereon Frahling and Piotr Indyk and Christian Sohler. Sampling in Dynamic Data Streams and Applications. In International Journal of Computational Geometry and Applications (Special Issue with selected papers from the 21st ACM Symposium on Computational Geometry), Vol. 18, No. 1/2, pages 3 -- 28, 2008.


Schwender/Ickstadt/2008a Schwender, H. and Ickstadt, K.. Identification of SNP interactions using logic regression. In Biostatistics, Vol. 9, pages 187 -- 198, 2008.


Feldman/etal/2007a Dan Feldman and Morteza Monemizadeh and Christian Sohler. A PTAS for k-means clustering based on weak coresets. In Proceedings of the 23rd ACM Symposium on Computational Geometry, pages 11-18, 2007.


Fritsch/2007a Fritsch, A. und Ickstadt, K.. Comparing logic regression based methods for identifying SNP interactions. In Hochreiter, S. and Wagner, R. (editors), Bioinformatics in Research and Development, Springer, 2007.


Nunkesser/etal/2007a Nunkesser, R. and Bernholt, T. and Schwender, H. and Ickstadt, K. and Wegener, I.. Detecting high-order interactions of single nucleotide polymorphisms using genetic programming. In Bioinformatics, Vol. 23, pages 3280 -- 3288, 2007.


Frahling/Sohler/2005a Frahling, Gereon and Sohler, Christian. Coresets in dynamic geometric data streams. In Harold N. Gabow and Ronald Fagin (editors), Proceedings of the 37th Annual ACM Symposium on Theory of Computing, pages 209--217, ACM, 2005.


Ickstadt/Wolpert/99a K. Ickstadt and R. L. Wolpert. Spatial regression for marked point processes. In J. M. Bernardo and J. O. Berger and A. P. Dawid and A. F. M. Smith (editors), Bayesian Statistics 6, pages 323-341, Oxford, Oxford University Press, 1999.


Wolpert/Ickstadt/98a R. L. Wolpert and K. Ickstadt. Poisson/Gamma random field models for spatial statistics. In Biometrika, Vol. 85, pages 251-267, 1998.