Jump to content

Talk:Lloyd's algorithm

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Lloyd's method is identical to the k-means algorithm. Oddly, the k-means article says the method "converges very quickly in practice", while this one says "the algorithm converges slowly". — Jeff Erickson 04:30, 6 April 2007 (UTC)[reply]

The principal difference between Lloyd's method and the k-means algorithm is that k-means applies to a finite set of prescribed discrete entities, whereas the method described on this page applies to a continuum with a prescribed density function. The 'points' referred to on this page correspond to the 'centroids' of k-means clusters, in each case iteratively optimised to model the distribution of the prescribed data.

Mathematically these are indeed equivalent. Computationally, however, they are very different - the difference between a discrete and a continuous problem. This explains the difference in performance, and the fact that of the two methods only k-means can guarantee convergence in finitely many iterations. - Robert Stanforth 14:50, 15 May 2007 (UTC)[reply]

I think the two are different enough to remove the merge tag. Weston.pace 19:50, 12 June 2007 (UTC)[reply]

LLoyd's algorithm is the most popular method to find an approximate solution to the k-means problem. --178.2.54.39 (talk) 06:26, 13 October 2011 (UTC)[reply]

Performance

[edit]

Coming from the article on Smoothed analysis, I was quite surprised that Lloyd's algorithm is given as an example there, but the result cannot be found here. As I am no expert in either k-means nor smoothed analysis, I don't feel comfortable to judge whether the result is in fact worth mentioning (or still up-to-date). Maybe someone with more expertise can make this decision and throw some light on the complexity of Lloyd's algorithm? 2A02:8109:1A3F:F5B4:0:0:0:215A (talk) 09:55, 13 December 2021 (UTC)[reply]

I think the smoothed analysis reference is inaccurate and that the reference doesn't really refer to Lloyd's algorithm, as described here, but rather to the algorithm at k-means clustering#Standard algorithm (naive k-means), which is very similar and which the linked article also calls Lloyd's algorithm. The difference is that the clusters here are geometric regions and that the algorithm repeatedly replaces each center by the centroid of its region, while the clusters at the linked article and the JACM reference are sets of points and that the algorithm repeatedly replaces each center by the average of its assigned points. In the context of smoothed analysis this makes an important difference, because the smoothing part is something that can be applied to sets of points but not really directly to geometric regions. So probably the link to Lloyd's algorithm at the smoothed analysis article should be changed to point to the k-means version of Lloyd's algorithm, not to the continuous version described at this article. —David Eppstein (talk) 17:25, 13 December 2021 (UTC)[reply]
Good catch, that was an error on my part. I changed it. Shuiberts (talk) 13:37, 14 December 2021 (UTC)[reply]

Distinction: Lloyd's algorithm vs. Generalized Lloyd Algorithm

[edit]

The original method of Lloyd from a 1957 draft report (officially published only in 1982 [1]) is a method formulated only for the one-dimensional case and was developed for the problem of pulse code modulation. In this seminal paper Lloyd described a "method 1" which consists of the well-known iteration of

a) assigning data points to nearest quantization level (1-D) and 
b) computing new quantization levels as centroid of the associated data points (1-D). 

It was Linde, Buzo and Gray in 1980 which proposed a generalization of Lloyd's method for vector quantization in multi-dimensional spaces:

a) assigning data points to nearest multi-dimensional codebook vector 
b) computing new codebook vectors as centroid of the associated multidimensional data points.

This extended method is often called "Generalized Lloyd Algorithm". Other names used are "LBG"-Algorithm (deducted from the first letters of the three author's names) and also quite often (and slightly imprecise IMO) just the K-means algorithm.

To better clarify this terminology (it is mentioned in the current article but the distinction is blurred) I would suggest to have the following pages:

  • Lloyd's algorithm (shortly describing the 1D-case, giving the original LLoyd reference and linking to the page on the generalized algorithm)
  • Generalized Lloyd Algorithm a.k.a. LBG-Algorithm (replacing the current stub Linde–Buzo–Gray algorithm

References

[edit]
  1. ^ Cite error: The named reference l82 was invoked but never defined (see the help page).

[1]

Linde, Y.; Buzo, A.; Gray, R. (1980). "An Algorithm for Vector Quantizer Design". IEEE Transactions on Communications. 28: 84–95. doi:10.1109/TCOM.1980.1094577. Tauboss (talk) 12:15, 7 August 2022 (UTC)[reply]

  1. ^ Lloyd, Stuart P. (1982), "Least squares quantization in PCM", IEEE Transactions on Information Theory, 28 (2): 129–137, doi:10.1109/TIT.1982.1056489.