Large deviations in the perceptron model and consequences for active learning

IRIS

Active learning (AL) is a branch of machine learning that deals with problems where unlabeled data is abundant yet obtaining labels is expensive. The learning algorithm has the possibility of querying a limited number of samples to obtain the corresponding labels, subsequently used for supervised learning. In this work, we consider the task of choosing the subset of samples to be labeled from a fixed finite pool of samples. We assume the pool of samples to be a random matrix and the ground truth labels to be generated by a single-layer teacher random neural network. We employ replica methods to analyze the large deviations for the accuracy achieved after supervised learning on a subset of the original pool. These large deviations then provide optimal achievable performance boundaries for any AL algorithm. We show that the optimal learning performance can be efficiently approached by simple message-passing AL algorithms. We also provide a comparison with the performance of some other popular active learning strategies.

Large deviations in the perceptron model and consequences for active learning

Cui, Hugo;Saglietti, Luca;Zdeborovà, Lenka

2021

Abstract

Active learning (AL) is a branch of machine learning that deals with problems where unlabeled data is abundant yet obtaining labels is expensive. The learning algorithm has the possibility of querying a limited number of samples to obtain the corresponding labels, subsequently used for supervised learning. In this work, we consider the task of choosing the subset of samples to be labeled from a fixed finite pool of samples. We assume the pool of samples to be a random matrix and the ground truth labels to be generated by a single-layer teacher random neural network. We employ replica methods to analyze the large deviations for the accuracy achieved after supervised learning on a subset of the original pool. These large deviations then provide optimal achievable performance boundaries for any AL algorithm. We show that the optimal learning performance can be efficiently approached by simple message-passing AL algorithms. We also provide a comparison with the performance of some other popular active learning strategies.

Scheda breve

Scheda completa

Scheda completa (DC)

	Year / Anno
	
				2021
			
	Date first on line publication / Data di prima pubblicazione on line
	
				2021
			
	DOI
	
				https://dx.doi.org/10.1088/2632-2153/abfbbb
			
	Journal / Rivista
	
				MACHINE LEARNING: SCIENCE AND TECHNOLOGY.
			
	URL / Indirizzo web
	
				https://iopscience.iop.org/article/10.1088/2632-2153/abfbbb
			
	Tutti gli autori
	
						Cui, Hugo; Saglietti, Luca; Zdeborovà, Lenka
					
	Appare nelle tipologie:
	
				01 - Article in academic journal / Articolo su rivista scientifica

File in questo prodotto:

File	Dimensione	Formato
Cui_2021_Mach._Learn.__Sci._Technol._2_045001.pdf accesso aperto Descrizione: article Tipologia: Pdf editoriale (Publisher's layout) Licenza: Creative commons Dimensione 821.83 kB Formato Adobe PDF Visualizza/Apri	821.83 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11565/4046563

Citazioni

ND

7

ND

social impact