A group of similar data type is termed as “cluster”,
which is a collection of data type collected together because of certain
similarities in the data set.
Example:
K-means
clustering is one of the simple as well as most popular unsupervised machine
learning algorithms. The goal
of clustering is to create groups of data points such that points in different
clusters are dissimilar while points within a cluster are similar.
The k-means the algorithm takes as input the number of clusters to generate “k”, and a set of
observation vectors to cluster. It returns a set of centroids /code vectors, one
for each of the k clusters. The
centroids/code vectors are like the soul of the cluster, they search the
points closest to them
and add them self to the respective cluster
With k-means
clustering algorithm, we make a cluster of our data points into k groups. A larger “k means” creates smaller groups with more granularity, whereas lower “k means” creates larger groups and fewer granularity.
Steps for K-mean clustering:
1. Place K code vector randomly in data
2. Divide up the data into K partition
according to the closest code vectors
3. Keep repeating point 2, and moving the
each code vector to the means of its partition. Until code vectors stop moving.
Note: “K means” may produce different clustering
when running multiple times on the same data.
0 Comments