ABSTRACT Objective : Non-response is unavoidable in longitudinal surveys. The consequences are lower statistical power and the potential for bias. We implemented a systematic data-driven approach to identify predictors of non-response in the National Child Development Study (NCDS; 1958 British birth cohort). Such variables can help make the missing at random assumption more plausible, which has implications for the handling of missing data Study Design and Setting : We identified predictors of non-response using data from the 11 sweeps (birth to age 55) of the NCDS (n = 17,415), employing parametric regressions and the LASSO for variable selection. Results : Disadvantaged socio-economic background in childhood, worse mental health and lower cognitive ability in early life, and lack of civic and social participation in adulthood were consistently associated with non-response. Using this information, along with other data from NCDS, we were able to replicate the “population distribution” of educational attainment and marital status (derived from external data), and the original distributions of key early life characteristics. Conclusion : The identified predictors of non-response have the potential to improve the plausibility of the missing at random assumption. They can be straightforwardly used as “auxiliary variables” in analyses with principled methods to reduce bias due to missing data.

Missing at random assumption made more plausible: evidence from the 1958 British birth cohort

Benedetta Pongiglione;
2021

Abstract

ABSTRACT Objective : Non-response is unavoidable in longitudinal surveys. The consequences are lower statistical power and the potential for bias. We implemented a systematic data-driven approach to identify predictors of non-response in the National Child Development Study (NCDS; 1958 British birth cohort). Such variables can help make the missing at random assumption more plausible, which has implications for the handling of missing data Study Design and Setting : We identified predictors of non-response using data from the 11 sweeps (birth to age 55) of the NCDS (n = 17,415), employing parametric regressions and the LASSO for variable selection. Results : Disadvantaged socio-economic background in childhood, worse mental health and lower cognitive ability in early life, and lack of civic and social participation in adulthood were consistently associated with non-response. Using this information, along with other data from NCDS, we were able to replicate the “population distribution” of educational attainment and marital status (derived from external data), and the original distributions of key early life characteristics. Conclusion : The identified predictors of non-response have the potential to improve the plausibility of the missing at random assumption. They can be straightforwardly used as “auxiliary variables” in analyses with principled methods to reduce bias due to missing data.
2021
2021
Mostafa, Tarek; Narayanan, Martina; Pongiglione, Benedetta; Dodgeon, Brian; Goodman, Alissa; Silverwood, Richard; Ploubidis, George
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11565/4037100
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 41
  • ???jsp.display-item.citation.isi??? 36
social impact