Hauptnavigation

SFB 876 - News

Linear-Chain CRF@GPU

Linear-Chain CRF@GPU is the first parallel implementation of Linear-Chain Conditional Random Fields (CRFs) for segmenting/labeling sequential data which runs on GPUs. Linear-Chain CRFs could be applied to a variety of NLP tasks, such as Named Entity Recognition, Information Extraction and Text Chunking. It relies on highly parallel algorithms, written with NVIDIA's CUDA C/C++.

Features

  • Written in CUDA C/C++
  • Runs on devices with CUDA Compute Capability 1.0 and above
  • Fast training based on Stochastic Gradient Descent
  • Supports full online learning
  • Less memory usage

Download

    Please let me know if you use this tool for research purpose.

    Binary (Linux 2.6, x86_64)

    • LCRFCUDA-1.0-x86-64.tgz:

    Source Code

    • LCRFCUDA-1.0-src.tgz:


Usage

Training and Test file formats

Both the training file and the test file need to be in a particular format. Each row must either be empty or consist of tab-separated tokens. Depending on it's position, every token represents a hidden node realisation (y) or an observed node realisation (x). The first token in each line represents a hidden node realisation (label). All following tokens represent an observed node realisation (attribute). Each row may contain as many attributes as you like. An empty row marks the end of a sequence.

In general, the data consists of tab-separated realisations:


HIDDEN_LABEL1	OBSERVED_REALISATION11	OBSERVED_REALISATION12	OBSERVED_REALISATION13
HIDDEN_LABEL2	OBSERVED_REALISATION21
HIDDEN_LABEL3	OBSERVED_REALISATION31	OBSERVED_REALISATION32	OBSERVED_REALISATION33	OBSERVED_REALISATION43
HIDDEN_LABEL4	OBSERVED_REALISATION41	OBSERVED_REALISATION42

HIDDEN_LABEL1	OBSERVED_REALISATION11
HIDDEN_LABEL2	OBSERVED_REALISATION21



In case of the CoNLL2000 shared task, the data would look like this:


B-NP	X00=	X01=	X02=Rockwell	X03=International	X04=Corp.	X05=/Rockwell	X06=Rockwell/International	..
I-NP	X00=	X01=Rockwell	X02=International	X03=Corp.	X04='s	X05=Rockwell/International	..
I-NP	X00=Rockwell	X01=International	X02=Corp.	X03='s	X04=Tulsa	X05=International/Corp.	..
B-NP	X00=International	X01=Corp.	X02='s	X03=Tulsa	X04=unit	X05=Corp./'s	X06='s/Tulsa	..
I-NP	X00=Corp.	X01='s	X02=Tulsa	X03=unit	X04=said	X05='s/Tulsa	X06=Tulsa/unit	X10=NNP	..
I-NP	X00='s	X01=Tulsa	X02=unit	X03=said	X04=it	X05=Tulsa/unit	X06=unit/said	X10=POS	X11=NNP	..
B-VP	X00=Tulsa	X01=unit	X02=said	X03=it	X04=signed	X05=unit/said	X06=said/it	X10=NNP	..
B-NP	X00=unit	X01=said	X02=it	X03=signed	X04=a	X05=said/it	X06=it/signed	X10=NN	X11=VBD	..
B-VP	X00=said	X01=it	X02=signed	X03=a	X04=tentative	X05=it/signed	X06=signed/a	X10=VBD	X11=PRP	..
B-NP	X00=it	X01=signed	X02=a	X03=tentative	X04=agreement	X05=signed/a	X06=a/tentative	X10=PRP	X11=VBD	..
I-NP	X00=signed	X01=a	X02=tentative	X03=agreement	X04=extending	X05=a/tentative	X06=tentative/agreement	..
I-NP	X00=a	X01=tentative	X02=agreement	X03=extending	X04=its	X05=tentative/agreement	X06=agreement/extending	..
B-VP	X00=tentative	X01=agreement	X02=extending	X03=its	X04=contract	X05=agreement/extending	X06=extending/its	..
B-NP	X00=agreement	X01=extending	X02=its	X03=contract	X04=with	X05=extending/its	X06=its/contract	..
I-NP	X00=extending	X01=its	X02=contract	X03=with	X04=Boeing	X05=its/contract	X06=contract/with	..
B-PP	X00=its	X01=contract	X02=with	X03=Boeing	X04=Co.	X05=contract/with	X06=with/Boeing	X10=PRP$	..
B-NP	X00=contract	X01=with	X02=Boeing	X03=Co.	X04=to	X05=with/Boeing	X06=Boeing/Co.	X10=NN	X11=IN	X12=NNP	..
I-NP	X00=with	X01=Boeing	X02=Co.	X03=to	X04=provide	X05=Boeing/Co.	X06=Co./to	X10=IN	X11=NNP	X12=NNP	..
B-VP	X00=Boeing	X01=Co.	X02=to	X03=provide	X04=structural	X05=Co./to	X06=to/provide	X10=NNP	X11=NNP	X12=TO	..
I-VP	X00=Co.	X01=to	X02=provide	X03=structural	X04=parts	X05=to/provide	X06=provide/structural	X10=NNP	X11=TO	..
B-NP	X00=to	X01=provide	X02=structural	X03=parts	X04=for	X05=provide/structural	X06=structural/parts	X10=TO	..
I-NP	X00=provide	X01=structural	X02=parts	X03=for	X04=Boeing	X05=structural/parts	X06=parts/for	X10=VB	..
B-PP	X00=structural	X01=parts	X02=for	X03=Boeing	X04='s	X05=parts/for	X06=for/Boeing	X10=JJ	X11=NNS	X12=IN	..
B-NP	X00=parts	X01=for	X02=Boeing	X03='s	X04=747	X05=for/Boeing	X06=Boeing/'s	X10=NNS	X11=IN	X12=NNP	X13=POS	..
B-NP	X00=for	X01=Boeing	X02='s	X03=747	X04=jetliners	X05=Boeing/'s	X06='s/747	X10=IN	X11=NNP	X12=POS	X13=CD	..
I-NP	X00=Boeing	X01='s	X02=747	X03=jetliners	X04=.	X05='s/747	X06=747/jetliners	X10=NNP	X11=POS	X12=CD	..
I-NP	X00='s	X01=747	X02=jetliners	X03=.	X04=	X05=747/jetliners	X06=jetliners/.	X10=POS	X11=CD	X12=NNS	X13=.	..
O	X00=747	X01=jetliners	X02=.	X03=	X04=	X05=jetliners/.	X06=./	X10=CD	X11=NNS	X12=.	X13=	X14=	..

B-NP	X00=	X01=	X02=Rockwell	X03=said	X04=the	X05=/Rockwell	X06=Rockwell/said	X10=	X11=	..
B-VP	X00=	X01=Rockwell	X02=said	X03=the	X04=agreement	X05=Rockwell/said	X06=said/the	X10=	..
..



Training

Use lcrfcuda command:


% cat train_file | lcrfcuda -M model_file


where train_file contains the training data. The trained model is stored in the file model_file.

lcrfcuda outputs the following information.


LINEAR-CHAIN CRF@CUDA, R1
Copyright (C) 2011 Nico Piatkowski, All rights reserved.
There is ABSOLUTELY NO WARRANTY; not even for MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE.

DEVICE		Tesla C2050 / C2070 (Device 0)
BATCHSIZE	96
ETA		0.1
INITIAL-WEIGHT	0.05
TRAINING	8936
TESTING 	8936/8936
ACCURACY	96.2962
WRITING MODEL	DONE
LABELS		22
ATTRIBUTES	338547
PARAMETERS	1039407
TIME		21s


There are three major parameters to control the training

  • --eta float: Changes the learning rate/stepsize.
  • --batchsize int: Changes the batchsize. Higher batchsizes decrease training time and increase training error, since fewer weight-updates are performed.
  • --iterations int: The number of training iterations. If this equals 1 (default), training instances are not stored during training, which allows full online- learning.

Testing

Use lcrfcuda command:


% lcrfcuda -M model_file -T test_file --predict


where model_file is a previously learned model and test_file contains the test set.


There are three major parameters to control the testing

  • --predict: Enables testing.
  • --prf: Computes precision, recall and F1-score for each class.
  • --iob: Labels are interpreted as IOB-Labels.

References

  • Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Procs. 18th International Conf. on Machine Learning pp. 282-289 (2001)
  • Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Lechevallier, Y., Saporta, G. (eds.) Procs. of the 19th International Conference on Computational Statistics. pp. 177-187. Springer, Paris, France (2010)



Copyright (C) 2011 Nico Piatkowski, All rights reserved