Cardiff University**We aren't endorsed by this school
Course
MATH MA3701
Subject
Mathematics
Date
Dec 21, 2024
Pages
41
Uploaded by ProfessorWaterRaven30
Mathematics of Artificial IntelligenceLecture 8Alexander BalinskyCardiff School of MathematicsAlexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 81 / 21
Contents1ClusteringIntroduction to Clustering TechniquesHierarchical ClusteringK-means AlgorithmAlexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 82 / 21
ClusteringClusteringClustering is the process of examining a collection of ”points,” andgroupingthe points into”clusters”according to somedistance measure.Thegoalis that points in the same cluster have a small distance from oneanother, while points in different clusters are at a large distance from oneanother.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 83 / 21
ClusteringExampleshows height and weight measurements of dogs of several varieties.Without knowing which dog is of which variety, we can see just by lookingat the diagram that the dogs fall into three clusters, and those clustershappen to correspond to three varieties.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 84 / 21
ClusteringIntroduction to Clustering TechniquesOur goal in this section is to offer methods for discovering clusters in data.We are particularly interested in situations where the data isvery large,and/or where the space either ishigh-dimensional,or the space isnot Euclideanat all.We begin with the basics:the two general approaches to clustering andthe methods for dealing with clusters in a non- Euclidean space.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 85 / 21
ClusteringIntroduction to Clustering TechniquesClustering StrategiesWe can divide clustering algorithms into two groups that follow twofundamentally different strategies.Hierarchical or agglomerative algorithmsHierarchical or agglomerativealgorithms start with each point in itsown cluster. Clusters are combined based on their ”closeness”, using oneof many possible definitions of ”close”.Combination stops when further combination leads to clusters that areundesirable for one of several reasons.For example, we may stop when we have apredetermined numberofclusters, or we may use a measure ofcompactnessfor clusters, and refuseto construct a cluster by combining two smaller clusters if the resultingcluster has points that are spread out over too large a region.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 86 / 21
ClusteringIntroduction to Clustering TechniquesClustering StrategiesWe can divide clustering algorithms into two groups that follow twofundamentally different strategies.Hierarchical or agglomerative algorithmsHierarchical or agglomerativealgorithms start with each point in itsown cluster. Clusters are combined based on their ”closeness”, using oneof many possible definitions of ”close”.Combination stops when further combination leads to clusters that areundesirable for one of several reasons.For example, we may stop when we have apredetermined numberofclusters, or we may use a measure ofcompactnessfor clusters, and refuseto construct a cluster by combining two smaller clusters if the resultingcluster has points that are spread out over too large a region.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 86 / 21
ClusteringIntroduction to Clustering TechniquesPoint AssignmentThe other class of algorithms involvepoint assignment.Points are considered in some order, and each one is assigned to thecluster into which it best fits.This process is normally preceded by a short phase in which initial clustersare estimated.Variations allow occasional combining or splitting of clusters, or may allowpoints to be unassigned if they are outliers (points too far from any of thecurrent clusters).Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 87 / 21
ClusteringIntroduction to Clustering TechniquesAlgorithms for clustering can also be distinguished by whether thealgorithm assumesaEuclideanspace,or whether the algorithm works for anarbitrary distance measure.We shall see that a key distinction is that in a Euclidean space it ispossible to summarize a collection of points by theircentroid–theaverage of the points.In a non-Euclidean space, there is no notion of a centroid, and we areforced to develop another way to summarize clusters.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 88 / 21
ClusteringHierarchical ClusteringHierarchical ClusteringAnyhierarchical clusteringalgorithm works as follows.We begin with every point in its own cluster.As time goes on, larger clusters will be constructed by combining twosmaller clusters, and we have to decide in advance:-How will clusters be represented?-How will we choose which two clusters to merge?-When will we stop combining clusters?Once we have answers to these questions, the algorithm can be describedsuccinctly as:WHILE it is not time to stop DOpick the best two clusters to merge;combine those two clusters into one cluster;END;Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 89 / 21
ClusteringHierarchical ClusteringHierarchical ClusteringAnyhierarchical clusteringalgorithm works as follows.We begin with every point in its own cluster.As time goes on, larger clusters will be constructed by combining twosmaller clusters, and we have to decide in advance:-How will clusters be represented?-How will we choose which two clusters to merge?-When will we stop combining clusters?Once we have answers to these questions, the algorithm can be describedsuccinctly as:WHILE it is not time to stop DOpick the best two clusters to merge;combine those two clusters into one cluster;END;Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 89 / 21
ClusteringHierarchical ClusteringHierarchical ClusteringAnyhierarchical clusteringalgorithm works as follows.We begin with every point in its own cluster.As time goes on, larger clusters will be constructed by combining twosmaller clusters, and we have to decide in advance:-How will clusters be represented?-How will we choose which two clusters to merge?-When will we stop combining clusters?Once we have answers to these questions, the algorithm can be describedsuccinctly as:WHILE it is not time to stop DOpick the best two clusters to merge;combine those two clusters into one cluster;END;Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 89 / 21
ClusteringHierarchical ClusteringHierarchical ClusteringAnyhierarchical clusteringalgorithm works as follows.We begin with every point in its own cluster.As time goes on, larger clusters will be constructed by combining twosmaller clusters, and we have to decide in advance:-How will clusters be represented?-How will we choose which two clusters to merge?-When will we stop combining clusters?Once we have answers to these questions, the algorithm can be describedsuccinctly as:WHILE it is not time to stop DOpick the best two clusters to merge;combine those two clusters into one cluster;END;Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 89 / 21
ClusteringHierarchical ClusteringHierarchical ClusteringAnyhierarchical clusteringalgorithm works as follows.We begin with every point in its own cluster.As time goes on, larger clusters will be constructed by combining twosmaller clusters, and we have to decide in advance:-How will clusters be represented?-How will we choose which two clusters to merge?-When will we stop combining clusters?Once we have answers to these questions, the algorithm can be describedsuccinctly as:WHILE it is not time to stop DOpick the best two clusters to merge;combine those two clusters into one cluster;END;Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 89 / 21
ClusteringHierarchical ClusteringHierarchical ClusteringAnyhierarchical clusteringalgorithm works as follows.We begin with every point in its own cluster.As time goes on, larger clusters will be constructed by combining twosmaller clusters, and we have to decide in advance:-How will clusters be represented?-How will we choose which two clusters to merge?-When will we stop combining clusters?Once we have answers to these questions, the algorithm can be describedsuccinctly as:WHILE it is not time to stop DOpick the best two clusters to merge;combine those two clusters into one cluster;END;Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 89 / 21
ClusteringHierarchical ClusteringHierarchical ClusteringAnyhierarchical clusteringalgorithm works as follows.We begin with every point in its own cluster.As time goes on, larger clusters will be constructed by combining twosmaller clusters, and we have to decide in advance:-How will clusters be represented?-How will we choose which two clusters to merge?-When will we stop combining clusters?Once we have answers to these questions, the algorithm can be describedsuccinctly as:WHILE it is not time to stop DOpick the best two clusters to merge;combine those two clusters into one cluster;END;Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 89 / 21
ClusteringHierarchical ClusteringHierarchical Clustering in a Euclidean SpaceTo begin, we shall assume the space is Euclidean.That allows us torepresent a cluster by its centroidor average of thepoints in the cluster.Note that in a cluster of one point, that point is the centroid, so we caninitialize the clusters straightforwardly.We can then use the merging rule that the distance between any twoclusters is the Euclidean distance between their centroids, and we shouldpick the two clusters at the shortest distance.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 810 / 21
ClusteringHierarchical ClusteringAlternative Rules for Controlling HierarchicalClusteringWe have seen one rule for picking the best clusters to merge: find the pairwith the smallest distance between their centroids.Other ways to define inter-cluster distance are possible, and we can alsopick the best pair of clusters on a basis other than their distance.Some other options are:Take the distance between two clusters to be the minimum of thedistances between any two points, one chosen from each cluster.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 811 / 21
ClusteringHierarchical ClusteringAlternative Rules for Controlling HierarchicalClusteringWe have seen one rule for picking the best clusters to merge: find the pairwith the smallest distance between their centroids.Other ways to define inter-cluster distance are possible, and we can alsopick the best pair of clusters on a basis other than their distance.Some other options are:Take the distance between two clusters to be the minimum of thedistances between any two points, one chosen from each cluster.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 811 / 21
ClusteringHierarchical ClusteringTake the distance between two clusters to be the average distance ofall pairs of points, one from each cluster.Theradiusof a cluster is the maximum distance between all thepoints and the centroid. Combine the two clusters whose resultingcluster has the lowest radius.Thediameterof a cluster is the maximum distance between any twopoints of the cluster. We may choose to merge those clusters whoseresulting cluster has the smallest diameter.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 812 / 21
ClusteringHierarchical ClusteringHierarchical Clustering in Non-Euclidean SpacesGiven that we cannot combine points in a cluster when the space isnon-Euclidean, our only choice is to pick one of the points of the clusteritself to represent the cluster. Ideally, this point is close to all the points ofthe cluster, so it in some sense lies in the ”center”.We call the representative point theclustroid.We can select theclustroidin various ways, each designed to, in somesense, minimize the distances between the clustroid and the other points inthe cluster.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 813 / 21
ClusteringHierarchical ClusteringHierarchical Clustering in Non-Euclidean SpacesGiven that we cannot combine points in a cluster when the space isnon-Euclidean, our only choice is to pick one of the points of the clusteritself to represent the cluster. Ideally, this point is close to all the points ofthe cluster, so it in some sense lies in the ”center”.We call the representative point theclustroid.We can select theclustroidin various ways, each designed to, in somesense, minimize the distances between the clustroid and the other points inthe cluster.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 813 / 21
ClusteringHierarchical ClusteringCommon choices include selecting as the clustroid the point thatminimizes:The sum of the distances to the other points in the cluster.The maximum distance to another point in the cluster.The sum of the squares of the distances to the other points in thecluster.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 814 / 21
ClusteringHierarchical ClusteringCommon choices include selecting as the clustroid the point thatminimizes:The sum of the distances to the other points in the cluster.The maximum distance to another point in the cluster.The sum of the squares of the distances to the other points in thecluster.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 814 / 21
ClusteringHierarchical ClusteringCommon choices include selecting as the clustroid the point thatminimizes:The sum of the distances to the other points in the cluster.The maximum distance to another point in the cluster.The sum of the squares of the distances to the other points in thecluster.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 814 / 21
ClusteringK-means AlgorithmK-means AlgorithmIn this section we begin the study ofpoint-assignment algorithms.The best known family of clustering algorithms of this type is calledk-means.They assume aEuclidean space, and they also assume the number ofclusters, k, is known in advance. It is, however, possible to deduce k bytrial and error.Partitioning AlgorithmFormally, given a data set,D, ofnobjects, andk, the number of clustersto form, apartitioning algorithmorganizes the objects intokpartitionsK≤n, where each partition represents a cluster. The clusters are formedto optimize anobjective partitioning criterion.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 815 / 21
ClusteringK-means AlgorithmK-means AlgorithmIn this section we begin the study ofpoint-assignment algorithms.The best known family of clustering algorithms of this type is calledk-means.They assume aEuclidean space, and they also assume the number ofclusters, k, is known in advance. It is, however, possible to deduce k bytrial and error.Partitioning AlgorithmFormally, given a data set,D, ofnobjects, andk, the number of clustersto form, apartitioning algorithmorganizes the objects intokpartitionsK≤n, where each partition represents a cluster. The clusters are formedto optimize anobjective partitioning criterion.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 815 / 21
ClusteringK-means Algorithmk-Means: A Centroid-Based TechniqueSuppose a data set,D, containsnobjects in Euclidean space.Partitioning methods distribute the objects inDintokclusters,C1, C2, . . . , Ck, that is,Ci⊂DandCi∩Cj=∅fori̸=j.Anobjective functionis used to assess the partitioning quality sothat objects within a cluster are similar to one another but dissimilarto objects in other clusters.A centroid-based partitioning technique uses thecentroidof acluster,Ci, to represent that cluster.Conceptually, the centroid of a cluster is its center point.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 816 / 21
ClusteringK-means Algorithmk-Means: A Centroid-Based TechniqueSuppose a data set,D, containsnobjects in Euclidean space.Partitioning methods distribute the objects inDintokclusters,C1, C2, . . . , Ck, that is,Ci⊂DandCi∩Cj=∅fori̸=j.Anobjective functionis used to assess the partitioning quality sothat objects within a cluster are similar to one another but dissimilarto objects in other clusters.A centroid-based partitioning technique uses thecentroidof acluster,Ci, to represent that cluster.Conceptually, the centroid of a cluster is its center point.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 816 / 21
ClusteringK-means Algorithmk-Means: A Centroid-Based TechniqueSuppose a data set,D, containsnobjects in Euclidean space.Partitioning methods distribute the objects inDintokclusters,C1, C2, . . . , Ck, that is,Ci⊂DandCi∩Cj=∅fori̸=j.Anobjective functionis used to assess the partitioning quality sothat objects within a cluster are similar to one another but dissimilarto objects in other clusters.A centroid-based partitioning technique uses thecentroidof acluster,Ci, to represent that cluster.Conceptually, the centroid of a cluster is its center point.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 816 / 21
ClusteringK-means AlgorithmObjective Function: Thewithin-cluster variation, which is the sumofsquared errorbetween all objects inCiand the centroidci, definedasE=kXi=1Xp∈Cidist(p, ci)2.Optimizing the within-cluster variation is computationally challenging.In the worst case, we would have to enumerate a number of possiblepartitionings that are exponential to number of clusters, and checkthe within-cluster variation values.It has been shown that the problem is NP-hard in general Euclideanspace even for two clusters (i.e.,k= 2).Moreover, the problem is NP-hard for general number of clusters evenin 2-D Euclidean space.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 817 / 21
ClusteringK-means AlgorithmObjective Function: Thewithin-cluster variation, which is the sumofsquared errorbetween all objects inCiand the centroidci, definedasE=kXi=1Xp∈Cidist(p, ci)2.Optimizing the within-cluster variation is computationally challenging.In the worst case, we would have to enumerate a number of possiblepartitionings that are exponential to number of clusters, and checkthe within-cluster variation values.It has been shown that the problem is NP-hard in general Euclideanspace even for two clusters (i.e.,k= 2).Moreover, the problem is NP-hard for general number of clusters evenin 2-D Euclidean space.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 817 / 21
ClusteringK-means Algorithmk-Means: Greedy ApproachTo overcome the prohibitive computational cost for the exact solution,greedyapproaches are often used in practice. A prime example is thek-means algorithm, which us simple and commonly used.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 818 / 21
ClusteringK-means Algorithmk-MeansThe k-Means Clustering Algorithm1.Choose a value ofk.2.Selectkobjects in an arbitrary fashion. Use these as theinitial setofkcentroids.3.Assign each of the objects to the cluster for which it is nearest to thecentroid.4.Recalculate the centroids of thekclusters.5.Repeat steps 3 and 4 until the centroids no longer move.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 819 / 21
ClusteringK-means Algorithmk-MeansThe k-Means Clustering Algorithm1.Choose a value ofk.2.Selectkobjects in an arbitrary fashion. Use these as theinitial setofkcentroids.3.Assign each of the objects to the cluster for which it is nearest to thecentroid.4.Recalculate the centroids of thekclusters.5.Repeat steps 3 and 4 until the centroids no longer move.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 819 / 21
ClusteringK-means Algorithmk-MeansThe k-Means Clustering Algorithm1.Choose a value ofk.2.Selectkobjects in an arbitrary fashion. Use these as theinitial setofkcentroids.3.Assign each of the objects to the cluster for which it is nearest to thecentroid.4.Recalculate the centroids of thekclusters.5.Repeat steps 3 and 4 until the centroids no longer move.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 819 / 21
ClusteringK-means Algorithmk-MeansThe k-Means Clustering Algorithm1.Choose a value ofk.2.Selectkobjects in an arbitrary fashion. Use these as theinitial setofkcentroids.3.Assign each of the objects to the cluster for which it is nearest to thecentroid.4.Recalculate the centroids of thekclusters.5.Repeat steps 3 and 4 until the centroids no longer move.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 819 / 21
ClusteringK-means Algorithmk-MeansThe k-Means Clustering Algorithm1.Choose a value ofk.2.Selectkobjects in an arbitrary fashion. Use these as theinitial setofkcentroids.3.Assign each of the objects to the cluster for which it is nearest to thecentroid.4.Recalculate the centroids of thekclusters.5.Repeat steps 3 and 4 until the centroids no longer move.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 819 / 21
ClusteringK-means AlgorithmIt can be proved that thek-means algorithm willalways terminate,but it does not necessarily find the best set of clusters, correspondingto minimising the value of the objective function.The initial selection of centroids can significantly affect the result. Toovercome this, the algorithm can be run several times for a givenvalue ofk, each time with a different choice of the initialkcentroids,the set of clusters with the smallest value of the objective functionthen being taken.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 820 / 21
ClusteringK-means AlgorithmIt can be proved that thek-means algorithm willalways terminate,but it does not necessarily find the best set of clusters, correspondingto minimising the value of the objective function.The initial selection of centroids can significantly affect the result. Toovercome this, the algorithm can be run several times for a givenvalue ofk, each time with a different choice of the initialkcentroids,the set of clusters with the smallest value of the objective functionthen being taken.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 820 / 21
ClusteringK-means AlgorithmInitializing Clusters for K-Means: We want to pick points that have agood chance of lying in different clusters.There are two approaches.1.Pick points that are as far away from one another as possible.2.Cluster a sample of the data, perhaps hierarchically, so there are kclusters. Pick a point from each cluster, perhaps that point closest tothe centroid of the cluster.Alexander Balinsky (Cardiff School of Mathematics)Mathematics of Artificial IntelligenceLecture 821 / 21