Parametric modelling of interpoint distance distributions, with an application to a mixture model for biosurveillance data

Bonetti, Marco; Olson, K. L.; Mandl, K. D.; Pagano, M.

The interpoint distance distribution can be used to analyze data consisting of inter-observation distances, i.e. all the pairwise distances arising from a random sample of n multivariate observations. Methods for the study of such distributions exist in the literature with applications to genetics, disease clustering, and biosurveillance problems. So far, techniques have been limited to nonparametric analyses. Here we illustrate how one can expand this set of tools to the use of parametric models. We assume a parametric model $f_D(d;\theta)$ for the random variable D=d(X1,X2) where d is a dissimilarity measure and X1, X2 two i.i.d. observations from a multivariate distribution. We describe the properties of a proposed estimator for theta in R^k, noting in particular its asymptotic normality. We compare the proposed estimator with two alternative estimators, both in general and within an analytically tractable case. We discuss the implementation of the methods to the construction of a parametric mixture model, and illustrate the use of that model for a preliminary analysis of data arising from a biosurveillance system. Parametric models for interpoint distance distributions can be a valuable tool for the analysis of multivariate data ranging from geographic coordinates to highly dimensional vectors.