Clustering longitudinal life-course sequences using mixtures of exponential-distance models

IRIS

Sequence analysis is an increasingly popular approach for analysing life courses represented by ordered collections of activities experienced by subjects over time. Here, we analyse a survey data set containing information on the career trajectories of a cohort of Northern Irish youths tracked between the ages of 16 and 22. We propose a novel, model-based clustering approach suited to the analysis of such data from a holistic perspective, with the aims of estimating the number of typical career trajectories, identifying the relevant features of these patterns, and assessing the extent to which such patterns are shaped by background characteristics. Several criteria exist for measuring pairwise dissimilarities among categorical sequences. Typically, dissimilarity matrices are employed as input to heuristic clustering algorithms. The family of methods we develop instead clusters sequences directly using mixtures of exponential-distance models. Basing the models on weighted variants of the Hamming distance metric permits closed-form expressions for parameter estimation. Simultaneously allowing the component membership probabilities to depend on fixed covariates and accommodating sampling weights in the clustering process yields new insights on the Northern Irish data. In particular, we nd that school examination performance is the single most important predictor of cluster membership.

Clustering longitudinal life-course sequences using mixtures of exponential-distance models

Murphy, Keefe;Murphy, T. Brendan;Piccarreta, Raffaella;Gormley, Claire I.

2021

Abstract

Sequence analysis is an increasingly popular approach for analysing life courses represented by ordered collections of activities experienced by subjects over time. Here, we analyse a survey data set containing information on the career trajectories of a cohort of Northern Irish youths tracked between the ages of 16 and 22. We propose a novel, model-based clustering approach suited to the analysis of such data from a holistic perspective, with the aims of estimating the number of typical career trajectories, identifying the relevant features of these patterns, and assessing the extent to which such patterns are shaped by background characteristics. Several criteria exist for measuring pairwise dissimilarities among categorical sequences. Typically, dissimilarity matrices are employed as input to heuristic clustering algorithms. The family of methods we develop instead clusters sequences directly using mixtures of exponential-distance models. Basing the models on weighted variants of the Hamming distance metric permits closed-form expressions for parameter estimation. Simultaneously allowing the component membership probabilities to depend on fixed covariates and accommodating sampling weights in the clustering process yields new insights on the Northern Irish data. In particular, we nd that school examination performance is the single most important predictor of cluster membership.

Scheda breve

Scheda completa

Scheda completa (DC)

	Year / Anno
	
				2021
			
	Date first on line publication / Data di prima pubblicazione on line
	
				2021
			
	DOI
	
				https://dx.doi.org/10.1111/rssa.12712
			
	Journal / Rivista
	
				JOURNAL OF THE ROYAL STATISTICAL SOCIETY. SERIES A. STATISTICS IN SOCIETY
			
	URL / Indirizzo web
	
				https://doi.org/10.1111/rssa.12712
			
	Tutti gli autori
	
						Murphy, Keefe; Murphy, T. Brendan; Piccarreta, Raffaella; Gormley, Claire I.
					
	Appare nelle tipologie:
	
				01 - Article in academic journal / Articolo su rivista scientifica

File in questo prodotto:

File	Dimensione	Formato
MAIL DI ACCETTAZIONE.pdf non disponibili Descrizione: EMAIL DI ACCETTAZIONE Tipologia: Allegato per valutazione Bocconi (Attachment for Bocconi evaluation) Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 89.54 kB Formato Adobe PDF Visualizza/Apri	89.54 kB	Adobe PDF	Visualizza/Apri
RJ_2021_JRSSA_MURPHY ET AL.pdf accesso aperto Descrizione: ARTICOLO PRINCIPALE Tipologia: Pdf editoriale (Publisher's layout) Licenza: Creative commons Dimensione 1.07 MB Formato Adobe PDF Visualizza/Apri	1.07 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11565/4038748

Citazioni

ND

8

5

social impact