The interpoint distance distribution can be used to analyze data consisting of inter-observation distances, i.e. all the pairwise distances arising from a random sample of n multivariate observations. Methods for the study of such distributions exist in the literature with applications to genetics, disease clustering, and biosurveillance problems. So far, techniques have been limited to nonparametric analyses. Here we illustrate how one can expand this set of tools to the use of parametric models. We assume a parametric model $f_D(d;\theta)$ for the random variable D=d(X1,X2) where d is a dissimilarity measure and X1, X2 two i.i.d. observations from a multivariate distribution. We describe the properties of a proposed estimator for theta in R^k, noting in particular its asymptotic normality. We compare the proposed estimator with two alternative estimators, both in general and within an analytically tractable case. We discuss the implementation of the methods to the construction of a parametric mixture model, and illustrate the use of that model for a preliminary analysis of data arising from a biosurveillance system. Parametric models for interpoint distance distributions can be a valuable tool for the analysis of multivariate data ranging from geographic coordinates to highly dimensional vectors.
Parametric modelling of interpoint distance distributions, with an application to a mixture model for biosurveillance data
BONETTI, MARCO;
2008
Abstract
The interpoint distance distribution can be used to analyze data consisting of inter-observation distances, i.e. all the pairwise distances arising from a random sample of n multivariate observations. Methods for the study of such distributions exist in the literature with applications to genetics, disease clustering, and biosurveillance problems. So far, techniques have been limited to nonparametric analyses. Here we illustrate how one can expand this set of tools to the use of parametric models. We assume a parametric model $f_D(d;\theta)$ for the random variable D=d(X1,X2) where d is a dissimilarity measure and X1, X2 two i.i.d. observations from a multivariate distribution. We describe the properties of a proposed estimator for theta in R^k, noting in particular its asymptotic normality. We compare the proposed estimator with two alternative estimators, both in general and within an analytically tractable case. We discuss the implementation of the methods to the construction of a parametric mixture model, and illustrate the use of that model for a preliminary analysis of data arising from a biosurveillance system. Parametric models for interpoint distance distributions can be a valuable tool for the analysis of multivariate data ranging from geographic coordinates to highly dimensional vectors.File | Dimensione | Formato | |
---|---|---|---|
Bonetti_2008_BMSCE.pdf
non disponibili
Descrizione: Articolo
Tipologia:
Pdf editoriale (Publisher's layout)
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
2.42 MB
Formato
Adobe PDF
|
2.42 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.