Bayesian Inference for Complex Data Structures: Theoretical and Computational Advances

Rebaudo, Giovanni

No abstract available

In Bayesian Statistics the modeling of data with complex dependence structures is often obtained by composition of simple dependence assumptions. Such representations facilitate the probabilistic assessment and ease the derivation of analytical and computational results in complex models. In the present thesis we derive novel theoretical and computational results on Bayesian inference for probabilistic clustering and flexible dependence models for complex data structures. We focus on models arising from hierarchical specifications in both parametric and nonparametric frameworks. More precisely, we derive novel conjugacy results for one of the most applied dynamic regression model for binary time series: the dynamic probit model. Exploiting such theoretical results we derive new efficient sampling schemes improving state-of-the-art approximate or sequential Monte Carlo inference. Motivated by an issue of the well-known nested Dirichlet process, we also introduce a novel model, arising from the composition of Dirichlet processes, to cluster populations and observations across populations simultaneously. We derive a closed form expression for the induced distribution of the random partition which allows to gain a deeper understanding of the theoretical properties and inferential implications of the model and we propose a conditional Markov Chain Monte Carlo (MCMC) algorithm to effectively perform inference. Moreover, we generalize the previous composition of discrete random probabilities defining a novel wide class of species sampling priors which allows to predict future observations in different groups and test for homogeneity among sub-populations. Posterior inference is feasible thanks to a marginal MCMC routine and urn schemes that allow to evaluate posterior and predictive functionals of interest. Finally, we prove a surprising consistency result for the number of clusters for a popular nonparametric model for clustering, that is the Dirichlet process mixture model. In this way we partially answer an open question in the literature.