This PhD thesis is composed of three projects concerning different, yet related, topics. Chapter 1, which corresponds to the first project, is deeply applied in scope, but, at the same time, it proposes methodological challenges. We develop multi-state models for the natural history of breast cancer, where the main events of interest are the start of asymptomatic detectability of the disease (through screening) and the start of symptomatic detectability (through symptoms). The time interval between these two events represents the latent phase of the tumour, which is called sojourn time. The aim of this work is to reconstruct the underlying latent process through the probabilistic description of the occurrence and development of breast cancer. We develop several parametric specifications, which have in common a cure rate structure to takes into account that only a proportion of women experiences breast cancer in their lifetime. We present the results of the analysis of data collected as part of a motivating study from Milan. We first present a tractable model for which we develop the likelihood contributions of the observed trajectories and perform maximum likelihood inference on the latent process. Likelihood-based inference is not feasible for more flexible models, and we rely on Approximate Bayesian Computation (ABC) for inference. Issues that arise from the use of ABC for model choice and parameter estimation are discussed, including the problem of choosing appropriate summary statistics. The estimated parameters of the underlying disease process allow for the study of the effect of different examination schedules (age range and frequency of screening examinations) on a population of asymptomatic subjects, in terms of number and kind of diagnoses. In Chapter 2, we report on an exploratory work that was motivated by our first project, we study the use of pairwise dissimilarities among observations to define a measure of the distance between two datasets. This problem is particularly relevant in the context of ABC, where observed and model-generated data need to be compared. We start by considering relatively simple models, where we can investigate the ability of the dissimilarity approach to recover the true parameter values in simulation studies, but also compare their performance to other estimation techniques, such as maximum likelihood. As part of this study, we propose a new likelihood-free estimation procedure. The new estimator is based on calibration ideas, and makes more complete use of the datasets that are routinely generated when one performs ABC inference. Chapter 3 is devoted to a purely methodological study that concerns results and algorithms developed for computing the optimal estimator in a size-biased sampling problem. Size-bias can occur in a variety of contexts, whenever the sampling unit is the individual and the population consists of clusters of individuals. For example, in the study of the family history of cancer, larger families have higher probability to manifest at least one case of cancer and to be, therefore, included into the Cancer Registry. In this chapter, we obtain the uniformly minimum variance unbiased estimator (UMVUE) for the sparsity index in size-biased Poisson sampling. We first propose two exact algorithms, based on the enumeration of cases, where the second algorithm is the refined and speeded up version of the first. Despite their exact nature, these algorithms become not feasible already for quite small sample sizes. As an alternative, a third, approximate, algorithm, based on the inverse fast Fourier transform, is also developed. An exact confidence interval based on the UMVUE is also built by inversion of the associated test. The performance of the estimation procedure is compared to classical maximum likelihood inference, in terms of mean squared errors of the two estimators, as well as with respect to the average coverage and width of the confidence intervals.

Models for the Natural History of Breast Cancer

BONDI, LAURA
2022

Abstract

This PhD thesis is composed of three projects concerning different, yet related, topics. Chapter 1, which corresponds to the first project, is deeply applied in scope, but, at the same time, it proposes methodological challenges. We develop multi-state models for the natural history of breast cancer, where the main events of interest are the start of asymptomatic detectability of the disease (through screening) and the start of symptomatic detectability (through symptoms). The time interval between these two events represents the latent phase of the tumour, which is called sojourn time. The aim of this work is to reconstruct the underlying latent process through the probabilistic description of the occurrence and development of breast cancer. We develop several parametric specifications, which have in common a cure rate structure to takes into account that only a proportion of women experiences breast cancer in their lifetime. We present the results of the analysis of data collected as part of a motivating study from Milan. We first present a tractable model for which we develop the likelihood contributions of the observed trajectories and perform maximum likelihood inference on the latent process. Likelihood-based inference is not feasible for more flexible models, and we rely on Approximate Bayesian Computation (ABC) for inference. Issues that arise from the use of ABC for model choice and parameter estimation are discussed, including the problem of choosing appropriate summary statistics. The estimated parameters of the underlying disease process allow for the study of the effect of different examination schedules (age range and frequency of screening examinations) on a population of asymptomatic subjects, in terms of number and kind of diagnoses. In Chapter 2, we report on an exploratory work that was motivated by our first project, we study the use of pairwise dissimilarities among observations to define a measure of the distance between two datasets. This problem is particularly relevant in the context of ABC, where observed and model-generated data need to be compared. We start by considering relatively simple models, where we can investigate the ability of the dissimilarity approach to recover the true parameter values in simulation studies, but also compare their performance to other estimation techniques, such as maximum likelihood. As part of this study, we propose a new likelihood-free estimation procedure. The new estimator is based on calibration ideas, and makes more complete use of the datasets that are routinely generated when one performs ABC inference. Chapter 3 is devoted to a purely methodological study that concerns results and algorithms developed for computing the optimal estimator in a size-biased sampling problem. Size-bias can occur in a variety of contexts, whenever the sampling unit is the individual and the population consists of clusters of individuals. For example, in the study of the family history of cancer, larger families have higher probability to manifest at least one case of cancer and to be, therefore, included into the Cancer Registry. In this chapter, we obtain the uniformly minimum variance unbiased estimator (UMVUE) for the sparsity index in size-biased Poisson sampling. We first propose two exact algorithms, based on the enumeration of cases, where the second algorithm is the refined and speeded up version of the first. Despite their exact nature, these algorithms become not feasible already for quite small sample sizes. As an alternative, a third, approximate, algorithm, based on the inverse fast Fourier transform, is also developed. An exact confidence interval based on the UMVUE is also built by inversion of the associated test. The performance of the estimation procedure is compared to classical maximum likelihood inference, in terms of mean squared errors of the two estimators, as well as with respect to the average coverage and width of the confidence intervals.
4-feb-2022
Inglese
33
2020/2021
STATISTICS
Settore SECS-S/01 - Statistica
BONETTI, MARCO
File in questo prodotto:
File Dimensione Formato  
PhD_Thesis_FINAL.pdf

accesso aperto

Descrizione: PhD_Thesis_Laura_Bondi
Tipologia: Tesi di dottorato
Dimensione 1.83 MB
Formato Adobe PDF
1.83 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11565/4058503
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact