No abstract available

Bayesian binary probit regression and its extensions to time-dependent observations and multi-class responses are popular tools in binary and categorical data regression due to their high interpretability and non-restrictive assumptions. Although the theory is well established in the frequentist literature, such models still face a florid research in the Bayesian framework.This is mostly due to the fact that state-of-the-art methods for Bayesian inference in such settings are either computationally impractical or inaccurate in high dimensions and in many cases a closed-form expression for the posterior distribution of the model parameters is, apparently, lacking.The development of improved computational methods and theoretical results to perform inference with this vast class of models is then of utmost importance. In order to overcome the above-mentioned computational issues, we develop a novel variational approximation for the posterior of the coefficients in high-dimensional probit regression with binary responses and Gaussian priors, resulting in a unified skew-normal (SUN) approximating distribution that converges to the exact posterior as the number of predictors p increases. Moreover, we show that closed-form expressions are actually available for posterior distributions arising from models that account for correlated binary time-series and multi-class responses. In the former case, we prove that the filtering, predictive and smoothing distributions in dynamic probit models with Gaussian state variables are, in fact, available and belong to a class of SUN distributions whose parameters can be updated recursively in time via analytical expressions, allowing to develop an i.i.d. sampler together with an optimal sequential Monte Carlo procedure. As for the latter case, i.e. multi-class probit models, we show that many different formulations developed in the literature in separate ways admit a unified view and a closed-form SUN posterior distribution under a SUN prior distribution (thus including the Gaussian case). This allows to implement computational methods which outperform state-of-the-art routines in high-dimensional settings by leveraging SUN properties and the variational methods introduced for the binary probit. Finally, motivated also by the possible linkage of some of the above-mentioned models to the Bayesian nonparametrics literature, a novel species-sampling model for partially-exchangeable observations is introduced, with the double goal of both predicting the class (or species) of the future observations and testing for homogeneity among the different available populations. Such model arises from a combination of Pitman-Yor processes and leverages on the appealing features of both hierarchical and nested structures developed in the Bayesian nonparametrics literature. Posterior inference is feasible thanks to the implementation of a marginal Gibbs sampler, whose pseudo-code is given in full detail.

Advances in Bayesian Inference for Binary and Categorical Data

FASANO, AUGUSTO
2021

Abstract

Bayesian binary probit regression and its extensions to time-dependent observations and multi-class responses are popular tools in binary and categorical data regression due to their high interpretability and non-restrictive assumptions. Although the theory is well established in the frequentist literature, such models still face a florid research in the Bayesian framework.This is mostly due to the fact that state-of-the-art methods for Bayesian inference in such settings are either computationally impractical or inaccurate in high dimensions and in many cases a closed-form expression for the posterior distribution of the model parameters is, apparently, lacking.The development of improved computational methods and theoretical results to perform inference with this vast class of models is then of utmost importance. In order to overcome the above-mentioned computational issues, we develop a novel variational approximation for the posterior of the coefficients in high-dimensional probit regression with binary responses and Gaussian priors, resulting in a unified skew-normal (SUN) approximating distribution that converges to the exact posterior as the number of predictors p increases. Moreover, we show that closed-form expressions are actually available for posterior distributions arising from models that account for correlated binary time-series and multi-class responses. In the former case, we prove that the filtering, predictive and smoothing distributions in dynamic probit models with Gaussian state variables are, in fact, available and belong to a class of SUN distributions whose parameters can be updated recursively in time via analytical expressions, allowing to develop an i.i.d. sampler together with an optimal sequential Monte Carlo procedure. As for the latter case, i.e. multi-class probit models, we show that many different formulations developed in the literature in separate ways admit a unified view and a closed-form SUN posterior distribution under a SUN prior distribution (thus including the Gaussian case). This allows to implement computational methods which outperform state-of-the-art routines in high-dimensional settings by leveraging SUN properties and the variational methods introduced for the binary probit. Finally, motivated also by the possible linkage of some of the above-mentioned models to the Bayesian nonparametrics literature, a novel species-sampling model for partially-exchangeable observations is introduced, with the double goal of both predicting the class (or species) of the future observations and testing for homogeneity among the different available populations. Such model arises from a combination of Pitman-Yor processes and leverages on the appealing features of both hierarchical and nested structures developed in the Bayesian nonparametrics literature. Posterior inference is feasible thanks to the implementation of a marginal Gibbs sampler, whose pseudo-code is given in full detail.
1-feb-2021
Inglese
32
2019/2020
STATISTICS
Settore SECS-S/01 - Statistica
DURANTE, DANIELE
PRUENSTER, IGOR
File in questo prodotto:
File Dimensione Formato  
Augusto_Fasano_Thesis.pdf

accesso aperto

Descrizione: Thesis_Fasano_Augusto
Tipologia: Tesi di dottorato
Dimensione 2.23 MB
Formato Adobe PDF
2.23 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11565/4035709
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact