Structured real world data can be represented with graphs whose structure encodes independence assumptions within the data. Due to statistical advantages over generative graphical models, Conditional Random Fields (CRFs) are used in a wide range of classification tasks on structured data sets. CRFs can be learned from both, fully or partially supervised data, and may be used to infer fully unlabeled or partially labelled data. However, performing inference in CRFs with an arbitrary graphical structure on a large amount of data is computational expensive and nearly intractable on a reseacher's workstation. Hence, we take advantage of recent developments in computer hardware, namely general-purpose Graphics Processing Units (GPUs). We present a novel framework of parallel algorithms for training linear-chain and general CRFs on very large data sets.
Linear-Chain CRF@GPU is the first parallel implementation of Linear-Chain Conditional Random Fields (CRFs) for segmenting/labeling sequential data which runs on GPUs. Linear-Chain CRFs could be applied to a variety of NLP tasks, such as Named Entity Recognition, Information Extraction and Text Chunking. It relies on highly parallel algorithms, written with NVIDIA's CUDA C/C++.