Data organized in array structures arise in various domains. Each entry of the array serves as a statistical unit, while the dimensions correspond to indexing attributes. The inherent dependence among statistical units along the indexing attributes makes the array representation more suitable than the usual tabular format. Models for this type of data typically employ probabilistic low-rank factorizations, where the latent factors attempt to capture patterns within the indexing attributes responsible for the values of the outcome. It is of primary importance to correctly model the dependence within the latent factors eliciting structural information available from data. Our contribution consists of novel structured Bayesian factorization models for array data, with applications to mortality forecasts and network analysis. We first address the problem of accurately forecasting future death-rate patterns for different age groups and time horizons for a country of interest. This type of data exhibits smooth structures of different natures across ages and years, which we flexibly account for in our model. We propose a novel B-spline process with locally-adaptive dynamic coefficients that outperforms state-of-the-art forecasting strategies by explicitly incorporating the core structures of period mortality trajectories within an interpretable formulation. Next, we consider the problem of learning the underlying structure responsible for the connectivity patterns in the human brain. We analyze a population of networks representing the connections between brain regions for a set of subjects. These networks are characterized by a hierarchical or multiresolution organization of the nodes responsible for the connectivity. We propose a phylogenetic latent position model that effectively learns the multiresolution structure. The model reveals a tree organization of the brain regions coherent with known hemisphere and lobe partitions. Such a result uncovers interesting new possible clusterings of the brain regions at different levels of resolution. Finally, we explore the potential to incorporate additional covariates to inform the tree structure of the model responsible for the latent positions. We have considered two settings of array data that exhibit distinct structural properties. Through Bayesian modelling, we have been able to leverage this information in the form of prior specification. Our results highlight the importance of incorporating these structures appropriately, leading to improved outcomes in both inferential and forecasting problems.

Advances in Bayesian modelling of array structured data

PAVONE, FEDERICO
2024

Abstract

Data organized in array structures arise in various domains. Each entry of the array serves as a statistical unit, while the dimensions correspond to indexing attributes. The inherent dependence among statistical units along the indexing attributes makes the array representation more suitable than the usual tabular format. Models for this type of data typically employ probabilistic low-rank factorizations, where the latent factors attempt to capture patterns within the indexing attributes responsible for the values of the outcome. It is of primary importance to correctly model the dependence within the latent factors eliciting structural information available from data. Our contribution consists of novel structured Bayesian factorization models for array data, with applications to mortality forecasts and network analysis. We first address the problem of accurately forecasting future death-rate patterns for different age groups and time horizons for a country of interest. This type of data exhibits smooth structures of different natures across ages and years, which we flexibly account for in our model. We propose a novel B-spline process with locally-adaptive dynamic coefficients that outperforms state-of-the-art forecasting strategies by explicitly incorporating the core structures of period mortality trajectories within an interpretable formulation. Next, we consider the problem of learning the underlying structure responsible for the connectivity patterns in the human brain. We analyze a population of networks representing the connections between brain regions for a set of subjects. These networks are characterized by a hierarchical or multiresolution organization of the nodes responsible for the connectivity. We propose a phylogenetic latent position model that effectively learns the multiresolution structure. The model reveals a tree organization of the brain regions coherent with known hemisphere and lobe partitions. Such a result uncovers interesting new possible clusterings of the brain regions at different levels of resolution. Finally, we explore the potential to incorporate additional covariates to inform the tree structure of the model responsible for the latent positions. We have considered two settings of array data that exhibit distinct structural properties. Through Bayesian modelling, we have been able to leverage this information in the form of prior specification. Our results highlight the importance of incorporating these structures appropriately, leading to improved outcomes in both inferential and forecasting problems.
23-gen-2024
Inglese
35
2022/2023
STATISTICS
Settore SECS-S/01 - Statistica
DURANTE, DANIELE
File in questo prodotto:
File Dimensione Formato  
THESIS_last_submission.pdf

accesso aperto

Descrizione: PhD Thesis
Tipologia: Tesi di dottorato
Dimensione 7.41 MB
Formato Adobe PDF
7.41 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11565/4062462
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact