The interpoint distance distribution can be used to analyze data consisting of inter-observation distances, i.e. all the pairwise distances arising from a random sample of n multivariate observations. Methods for the study of such distributions exist in the literature with applications to genetics, disease clustering, and biosurveillance problems. So far, techniques have been limited to nonparametric analyses. Here we illustrate how one can expand this set of tools to the use of parametric models. We assume a parametric model $f_D(d;\theta)$ for the random variable D=d(X1,X2) where d is a dissimilarity measure and X1, X2 two i.i.d. observations from a multivariate distribution. We describe the properties of a proposed estimator for theta in R^k, noting in particular its asymptotic normality. We compare the proposed estimator with two alternative estimators, both in general and within an analytically tractable case. We discuss the implementation of the methods to the construction of a parametric mixture model, and illustrate the use of that model for a preliminary analysis of data arising from a biosurveillance system. Parametric models for interpoint distance distributions can be a valuable tool for the analysis of multivariate data ranging from geographic coordinates to highly dimensional vectors.

Parametric modelling of interpoint distance distributions, with an application to a mixture model for biosurveillance data

BONETTI, MARCO;
2008

Abstract

The interpoint distance distribution can be used to analyze data consisting of inter-observation distances, i.e. all the pairwise distances arising from a random sample of n multivariate observations. Methods for the study of such distributions exist in the literature with applications to genetics, disease clustering, and biosurveillance problems. So far, techniques have been limited to nonparametric analyses. Here we illustrate how one can expand this set of tools to the use of parametric models. We assume a parametric model $f_D(d;\theta)$ for the random variable D=d(X1,X2) where d is a dissimilarity measure and X1, X2 two i.i.d. observations from a multivariate distribution. We describe the properties of a proposed estimator for theta in R^k, noting in particular its asymptotic normality. We compare the proposed estimator with two alternative estimators, both in general and within an analytically tractable case. We discuss the implementation of the methods to the construction of a parametric mixture model, and illustrate the use of that model for a preliminary analysis of data arising from a biosurveillance system. Parametric models for interpoint distance distributions can be a valuable tool for the analysis of multivariate data ranging from geographic coordinates to highly dimensional vectors.
Bonetti, Marco; K. L., Olson; K. D., Mandl; M., Pagano
File in questo prodotto:
File Dimensione Formato  
Bonetti_2008_BMSCE.pdf

non disponibili

Descrizione: Articolo
Tipologia: Pdf editoriale (Publisher's layout)
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 2.42 MB
Formato Adobe PDF
2.42 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11565/3715679
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact