We introduce the ratio-cut polytope defined as the convex hull of ratio-cut vectors corresponding to all partitions of n points in R-m into at most K clusters. This polytope is closely related to the convex hull of the feasible region of a number of clustering problems such as K-means clustering and spectral clustering. We study the facial structure of the ratio-cut polytope and derive several types of facet-defining inequalities. We then consider the problem of K-means clustering and introduce a novel linear programming (LP) relaxation for it. Subsequently, we focus on the case of two clusters and derive sufficient condition under which the proposed LP relaxation recovers the underlying clusters exactly. Namely, we consider the stochastic ball model, a popular generative model for K-means clustering, and we show that if the separation distance between cluster centers satisfies Delta > 1 + root 3, then the LP relaxation recovers the planted clusters with high probability. This is a major improvement over the only existing recovery guarantee for an LP relaxation of K-means clustering stating that recovery is possible with high probability if and only if Delta > 4. Our numerical experiments indicate that the proposed LP relaxation significantly outperforms a popular semidefinite programming relaxation in recovering the planted clusters.

The ratio-cut polytope and K-means clustering

De Rosa, Antonio;
2022

Abstract

We introduce the ratio-cut polytope defined as the convex hull of ratio-cut vectors corresponding to all partitions of n points in R-m into at most K clusters. This polytope is closely related to the convex hull of the feasible region of a number of clustering problems such as K-means clustering and spectral clustering. We study the facial structure of the ratio-cut polytope and derive several types of facet-defining inequalities. We then consider the problem of K-means clustering and introduce a novel linear programming (LP) relaxation for it. Subsequently, we focus on the case of two clusters and derive sufficient condition under which the proposed LP relaxation recovers the underlying clusters exactly. Namely, we consider the stochastic ball model, a popular generative model for K-means clustering, and we show that if the separation distance between cluster centers satisfies Delta > 1 + root 3, then the LP relaxation recovers the planted clusters with high probability. This is a major improvement over the only existing recovery guarantee for an LP relaxation of K-means clustering stating that recovery is possible with high probability if and only if Delta > 4. Our numerical experiments indicate that the proposed LP relaxation significantly outperforms a popular semidefinite programming relaxation in recovering the planted clusters.
2022
2022
De Rosa, Antonio; Khajavirad, Aida
File in questo prodotto:
File Dimensione Formato  
de-rosa-khajavirad-2022-the-ratio-cut-polytope-and-k-means-clustering.pdf

non disponibili

Descrizione: article
Tipologia: Pdf editoriale (Publisher's layout)
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 835.47 kB
Formato Adobe PDF
835.47 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11565/4059677
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 10
social impact