The choice of the number of bins for the M statistic

Forsberg White, L.; Bonetti, Marco; Pagano, M.

doi:10.1016/j.csda.2009.03.005

Methods to monitor spatial patterns of disease in populations are of interest in public health practice. The M statistic uses interpoint distances between cases to detect abnormalities in the spatial patterns of diseases. This statistic compares the observed distribution of interpoint distances with that which is expected when no unusual spatial patterns exist. We show the relationship of M to Pearson’s Chi Square statistic, image. Both statistics require the discretization of continuous data into bins and then are formed by creating a quadratic form, scaled by an appropriate variance covariance matrix. We seek to choose the number and type of these bins for the M statistic so as to maximize the power to detect spatial anomalies. By showing the relationship between M to image, we argue for the extension of the theory that has been developed for the selection of the number and type of bins for image to M. We further show that spatial data provides a unique insight into the problem through examples with simulated data and spatial data from a health care provider. In the spatial setting, these indicate that the optimal number of bins depends on the size of the cluster. For large clusters, a smaller number of bins appears to be preferable, however for small clusters having many bins increases the power. Further, results indicate that the number of bins does not appear to vary with m, the number of spatial locations. We discuss the implications of this result for further work.