SFB 876 - News

C3 Regression approaches for large-scale high-dimensional data

Prof. Dr. Christian Sohler
Prof. Dr. Katja Ickstadt
The scalability of modern regression approaches is often stretched to its limits by a large number of observations and/or variables. This aggravates their use in embedded systems. The goal of this project is therefore the development of highly efficient regression methods. We pursue the development of algorithms to reduce the number of observations using, e.g., random linear projections and sampling (streaming algorithms), as well as the development of methods to reduce the dimensionality of the underlying, possibly Bayesian, model classes imposing structural constraints, e.g., monotonicity.


  • [1] Bornkamp, B., A. Fritsch, O. Kuss und K. Ickstadt:
    Penalty specialists among goalkeepers: A nonparametric Bayesian analysis of 44 years of German Bundesliga. In: Schipp, B. und W. Krämer (Hrsg.): Statistical Inference, Econometric Analysis and Matrix Algebra: Festschrift in Honour of Götz Trenkler , S. 63-76. Physica Verlag, 2009.
  • [2] Bornkamp, B. und K. Ickstadt:
    Bayesian nonparametric estimation of continuous monotone functions with applications to dose-response analysis. Biometrics, 65:198-205, 2009.
  • [3] Bornkamp, B., K. Ickstadt und D. B. Dunson:
    Stochastically ordered multiple regression. Biostatistics, 2010.
  • [4] Feldman, D., M. Monemizadeh und C. Sohler:
    A PTAS for k-means clustering based on weak coresets. In: Proceedings of the 23rd ACM Symposium on Computational Geometry, S. 11-18, 2007.
  • [5] Feldman, D., M. Monemizadeh, C. Sohler und D. Woodruff:
    Coresets and sketches for high-dimensional subspace approximation Problems. In: Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, S. 630-649, 2010.
  • [6] Frahling, G., P. Indyk und C. Sohler:
    Sampling in dynamic data streams and applications. International Journal of Computational Geometry and Applications (Special Issue with selected papers from the 21st ACM Symposium on Computational Geometry), 18(1/2):3-28, 2008.
  • [7] Frahling, G. und C. Sohler:
    Coresets in dynamic geometric data streams. In: Proceedings of the 37th Annual ACM Symposium on Theory of Computing, S. 209-217, 2005.
  • [8] Fritsch, A. und Ickstadt, K.:
    Comparing logic regression based methods for identifying SNP interactions. In: Hochreiter, S. und R.Wagner (Hrsg.): Bioinformatics in Research and Development. Springer, Berlin, 2007.
  • [9] Ickstadt, K. und R. L. Wolpert:
    Spatial regression for marked point processes. In: Bernardo, J. M., J. O. Berger, A. P. Dawid und A. F. M. Smith (Hrsg.): Bayesian Statistics 6 , S. 323-341. Oxford University Press, Oxford, 1999.
  • [10] Nunkesser, R., T. Bernholt, H. Schwender, K. Ickstadt und I. Wegener:
    Detecting high-order interactions of single nucleotide polymorphisms using genetic programming. Bioinformatics, 23:3280-3288, 2007.
  • [11] Schwender, H. und K. Ickstadt:
    Identification of SNP interactions using logic regression. Biostatistics, 9:187-198, 2008.
  • [12] Wolpert, R. L. und K. Ickstadt:
    Poisson/Gamma random field models for spatial statistics. Biometrika, 85:251-267, 1998.