While generalized linear mixed models are a fundamental tool in applied statistics, many specifications, such as those involving categorical factors with many levels or interaction terms, can be computationally challenging to estimate due to the need to compute or approximate 15 high-dimensional integrals. Variational inference is a popular way to perform such computations, especially in the Bayesian context. However, naive use of such methods can provide unreliable uncertainty quantification. We show that this is indeed the case for mixed models, proving that standard mean-field variational inference dramatically underestimates posterior uncertainty in high-dimensions. We then show how appropriately relaxing the mean-field assumption leads to 20 methods whose uncertainty quantification does not deteriorate in high-dimensions, and whose total computational cost scales linearly with the number of parameters and observations. Our theoretical and numerical results focus on mixed models with Gaussian or binomial likelihoods, and rely on connections to random graph theory to obtain sharp high-dimensional asymptotic analysis.We also provide generic results, which are of independent interest, relating the accuracy 25 of variational inference to the convergence rate of the corresponding coordinate ascent algorithm that is used to find it. Our proposed methodology is implemented in the R package vglmer. Numerical results with simulated and real data examples illustrate the favourable computation cost versus accuracy trade-off of our approach compared to various alternatives.
Partially factorized variational inference for high-dimensional mixed models
Papaspiliopoulos, Omiros
;Zanella, Giacomo
In corso di stampa
Abstract
While generalized linear mixed models are a fundamental tool in applied statistics, many specifications, such as those involving categorical factors with many levels or interaction terms, can be computationally challenging to estimate due to the need to compute or approximate 15 high-dimensional integrals. Variational inference is a popular way to perform such computations, especially in the Bayesian context. However, naive use of such methods can provide unreliable uncertainty quantification. We show that this is indeed the case for mixed models, proving that standard mean-field variational inference dramatically underestimates posterior uncertainty in high-dimensions. We then show how appropriately relaxing the mean-field assumption leads to 20 methods whose uncertainty quantification does not deteriorate in high-dimensions, and whose total computational cost scales linearly with the number of parameters and observations. Our theoretical and numerical results focus on mixed models with Gaussian or binomial likelihoods, and rely on connections to random graph theory to obtain sharp high-dimensional asymptotic analysis.We also provide generic results, which are of independent interest, relating the accuracy 25 of variational inference to the convergence rate of the corresponding coordinate ascent algorithm that is used to find it. Our proposed methodology is implemented in the R package vglmer. Numerical results with simulated and real data examples illustrate the favourable computation cost versus accuracy trade-off of our approach compared to various alternatives.File | Dimensione | Formato | |
---|---|---|---|
VI_GLMM_biom_arxiv.pdf
non disponibili
Descrizione: article
Tipologia:
Documento in Post-print (Post-print document)
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
1.11 MB
Formato
Adobe PDF
|
1.11 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.