The Coordinate Ascent Variational Inference scheme is a popular algorithm used to compute the mean-field approximation of a probability distribution of interest. We analyze its random scan version, under log-concavity assumptions on the target density. Our approach builds on the recent work of Arnese and Lacker, [M. Arnese and D. Lacker, Convergence of Coordinate Ascent Variational Inference for Log-Concave Measures via Optimal Transport, preprint, arXiv:2404.08792, 2024], which studies the deterministic scan version of the algorithm, phrasing it as a block-coordinate descent algorithm in the space of probability distributions endowed with the geometry of optimal transport. We obtain tight rates for the random scan version, which imply that the total number of factor updates required to converge scales linearly with the condition number and the number of blocks of the target distribution. By contrast, available bounds for the deterministic scan case scale quadratically in the same quantities, which is analogous to what happens for optimization of convex functions in Euclidean spaces.
Convergence Rate of Random Scan Coordinate Ascent Variational Inference Under Log-Concavity
Lavenant, Hugo;Zanella, Giacomo
2024
Abstract
The Coordinate Ascent Variational Inference scheme is a popular algorithm used to compute the mean-field approximation of a probability distribution of interest. We analyze its random scan version, under log-concavity assumptions on the target density. Our approach builds on the recent work of Arnese and Lacker, [M. Arnese and D. Lacker, Convergence of Coordinate Ascent Variational Inference for Log-Concave Measures via Optimal Transport, preprint, arXiv:2404.08792, 2024], which studies the deterministic scan version of the algorithm, phrasing it as a block-coordinate descent algorithm in the space of probability distributions endowed with the geometry of optimal transport. We obtain tight rates for the random scan version, which imply that the total number of factor updates required to converge scales linearly with the condition number and the number of blocks of the target distribution. By contrast, available bounds for the deterministic scan case scale quadratically in the same quantities, which is analogous to what happens for optimization of convex functions in Euclidean spaces.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.