Clustering is one of the most fundamental tools in data science and machine learning, and k-means clustering is one of the most common of such methods. There is a variety of approximate algorithms for the k-means problem, but only a few methods compute the globally optimal solution, as it is in general NP-hard. In this paper, we consider the k-means problem for instances with low-dimensional data and formulate it as a structured concave assignment problem. This allows us to exploit the low-dimensional structure and solve the problem to global optimality for very large data sets with several clusters, complementing and outperforming state-of-the-art for this class of problems. The method builds on iteratively solving a small concave problem and a large linear programming or assignment problem. This gives a sequence of feasible solutions along with bounds, which we show converges to a zero optimality gap. The paper combines methods from global optimization to accelerate the procedure, and we provide numerical results on synthetic data and real-world data.
QC 20250917
Not duplicate with DiVA 1935628