• German

Main Navigation

Corpus Linguistics Plugin from KobRA


Corpus Lingustic Plugin for the RapidMiner. For the different variants of LDA, different operators are available. Besides standard LDA with Gibbs sampling and Variational Inference, supervised versions with Gaussian, Beta, Uniform and Gompertz distributed document labels can be used for diachronic linguistic tasks. An implementation of LDA with word features and word groups via special Laplace and Group-Sparsity inducing priors is available to integrate word informations.

Additional to the latent variable methods, I also implemented a number of interfaces to the language resources. To access the different corpora, operators to execute linguistic queries on the different corpora at the Berlin Brandenburger Academia of Science are available. Besides the standard corpora, we also provide access to the dictionaries and the GermaNet (the German version of WordNet). To access the Wikipedia corpora, a TEI-reader is implemented thats extends a standard XML-stream reader to process the TEI tags. Finally, preprocessing operators provide methods for text transformations and text visualization.


rapidminer-Kobra-1.2.001.jar (106081 KB)


Pölitz, Christian