1 K-Means

In k-means clustering, we are given a sequence of data $x_i ∈ ℝ^m$. We want to cluster the data into $k ∈ ℤ$ clusters. First, we initialize the cluster centers ${\prosedeflabel{clustering}{{c}}}_i ∈ ℝ^m$ arbitrarily. Then we iteratively update cluster centers. The updated cluster centers are the points which minimize the sum of squared distances to all points ${\prosedeflabel{clustering}{{y}}}_i$ which are closer to ${\proselabel{clustering}{{c}}}_i$ than any other cluster ${\proselabel{clustering}{{c}}}_{j \neq i}$.

$$\DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\argmin}{arg\,min} \begin{align*} \begin{aligned} \min_{\mathit{c} \in \mathbb{R}^{ \mathit{m}}} \quad & \sum_{\mathit{i}} \left\|\idlabel{ {"onclick":"event.stopPropagation(); onClickSymbol(this, 'y', 'clustering', 'use', false, '')", "id":"clustering-y", "sym":"y", "func":"clustering", "localFunc":"", "type":"use", "case":"equation"} }{ {\mathit{y}} }_{ \mathit{i} } - \idlabel{ {"onclick":"event.stopPropagation(); onClickSymbol(this, 'c', 'clustering', 'use', false, '')", "id":"clustering-c", "sym":"c", "func":"clustering", "localFunc":"", "type":"use", "case":"equation"} }{ {\mathit{c}} }\right\|^{1} \\ \end{aligned}\\\eqlabel{ {"onclick":"event.stopPropagation(); onClickEq(this, 'clustering', ['y', 'c'], false, [], [], 'bWluXyggYyDiiIgg4oSdXm0gKSDiiJFfaSDigJYgeV9pIC0gYyDigJZeMQ==', false);"} }{} \end{align*} \tag{1}\label{1}$$

K-Means with $k=4$. Cluster centers are shown in black. Clusters are strongly affected by outliers with the L2 norm.