Comparing Bayesian models of annotation

IRIS

The analysis of crowdsourced annotations in natural language processing is concerned with identifying (1) gold standard labels, (2) annotator accuracies and biases, and (3) item difficulties and error patterns. Traditionally, majority voting was used for 1, and coefficients of agreement for 2 and 3. Lately, model-based analysis of corpus annotations have proven better at all three tasks. But there has been relatively little work comparing them on the same datasets. This paper aims to fill this gap by analyzing six models of annotation, covering different approaches to annotator ability, item difficulty, and parameter pooling (tying) across annotators and items. We evaluate these models along four aspects: comparison to gold labels, predictive accuracy for new annotations, annotator characterization, and item difficulty, using four datasets with varying degrees of noise in the form of random (spammy) annotators. We conclude with guidelines for model selection, application, and implementation.

Comparing Bayesian models of annotation

Hovy, Dirk^{Membro del Collaboration Group};

2018

Abstract

The analysis of crowdsourced annotations in natural language processing is concerned with identifying (1) gold standard labels, (2) annotator accuracies and biases, and (3) item difficulties and error patterns. Traditionally, majority voting was used for 1, and coefficients of agreement for 2 and 3. Lately, model-based analysis of corpus annotations have proven better at all three tasks. But there has been relatively little work comparing them on the same datasets. This paper aims to fill this gap by analyzing six models of annotation, covering different approaches to annotator ability, item difficulty, and parameter pooling (tying) across annotators and items. We evaluate these models along four aspects: comparison to gold labels, predictive accuracy for new annotations, annotator characterization, and item difficulty, using four datasets with varying degrees of noise in the form of random (spammy) annotators. We conclude with guidelines for model selection, application, and implementation.

Scheda breve

Scheda completa

Scheda completa (DC)

	Year / Anno
	
				2018
			
	Date first on line publication / Data di prima pubblicazione on line
	
				2018
			
	DOI
	
				https://dx.doi.org/10.1162/tacl_a_00040
			
	Journal / Rivista
	
				TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS
			
	URL / Indirizzo web
	
				http://dx.doi.org/10.1162/tacl_a_00040
			
	Tutti gli autori
	
						Paun, Silviu; Carpenter, Bob; Chamberlain, Jon; Hovy, Dirk; Kruschwitz, Udo; Poesio, Massimo
					
	Appare nelle tipologie:
	
				01 - Article in academic journal / Articolo su rivista scientifica

File in questo prodotto:

File	Dimensione	Formato
tacl_a_00040.pdf accesso aperto Descrizione: Articolo Tipologia: Pdf editoriale (Publisher's layout) Licenza: Creative commons Dimensione 437.64 kB Formato Adobe PDF Visualizza/Apri	437.64 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11565/4023239

Citazioni

ND

ND

ND

social impact