1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
#include <iostream> #include <vector> #include <cmath> #include <limits> #include <cstdlib> #include <ctime> struct Point { double x, y; int clusterId; // Cluster assignment Point(double x = 0, double y = 0) : x(x), y(y), clusterId(-1) {} }; class KMeans { public: KMeans(int k, int maxIterations) : k(k), maxIterations(maxIterations) { std::srand(std::time(0)); } void fit(std::vector<Point>& points) { // Initialize centroids randomly initializeCentroids(points); for (int iteration = 0; iteration < maxIterations; ++iteration) { bool changed = false; // Assign points to the nearest centroid for (auto& point : points) { int newClusterId = findClosestCentroid(point); if (point.clusterId != newClusterId) { point.clusterId = newClusterId; changed = true; } } // Recalculate centroids recalculateCentroids(points); if (!changed) break; // Stop if no change in cluster assignments } } void printResults(const std::vector<Point>& points) const { for (int i = 0; i < k; ++i) { std::cout << "Cluster " << i + 1 << " (Centroid: " << centroids[i].x << ", " << centroids[i].y << "):\n"; for (const auto& point : points) { if (point.clusterId == i) { std::cout << " (" << point.x << ", " << point.y << ")\n"; } } } } private: int k; int maxIterations; std::vector<Point> centroids; void initializeCentroids(const std::vector<Point>& points) { centroids.clear(); std::vector<int> indices(points.size()); for (int i = 0; i < points.size(); ++i) indices[i] = i; std::random_shuffle(indices.begin(), indices.end()); for (int i = 0; i < k; ++i) { centroids.push_back(points[indices[i]]); } } int findClosestCentroid(const Point& point) const { int closestId = 0; double minDistance = std::numeric_limits<double>::max(); for (int i = 0; i < k; ++i) { double distance = euclideanDistance(point, centroids[i]); if (distance < minDistance) { minDistance = distance; closestId = i; } } return closestId; } void recalculateCentroids(const std::vector<Point>& points) { std::vector<Point> newCentroids(k, Point()); std::vector<int> counts(k, 0); for (const auto& point : points) { int id = point.clusterId; newCentroids[id].x += point.x; newCentroids[id].y += point.y; counts[id]++; } for (int i = 0; i < k; ++i) { if (counts[i] > 0) { newCentroids[i].x /= counts[i]; newCentroids[i].y /= counts[i]; } centroids[i] = newCentroids[i]; } } double euclideanDistance(const Point& a, const Point& b) const { return std::sqrt((a.x - b.x) * (a.x - b.x) + (a.y - b.y) * (a.y - b.y)); } }; int main() { std::vector<Point> points = { {1.0, 2.0}, {1.5, 1.8}, {5.0, 8.0}, {8.0, 8.0}, {1.0, 0.6}, {9.0, 11.0}, {8.0, 2.0}, {10.0, 2.0}, {9.0, 3.0} }; int k = 3; // Number of clusters int maxIterations = 100; KMeans kmeans(k, maxIterations); kmeans.fit(points); kmeans.printResults(points); return 0; } |
Explanation
- Header Files:
<iostream>
: For input and output operations.<vector>
: For storing and manipulating collections ofPoint
objects.<cmath>
: For mathematical functions likestd::sqrt()
.<limits>
: To usestd::numeric_limits
for initializing the minimum distance.<cstdlib>
: For random number generation.<ctime>
: For seeding the random number generator.
- Point Struct:
- Represents a data point with
x
andy
coordinates and an assignedclusterId
.
- Represents a data point with
- KMeans Class:
- Constructor: Initializes the number of clusters
k
and the maximum number of iterations. Seeds the random number generator. - fit(): Main method to perform clustering. Initializes centroids, assigns points to the nearest centroid, and recalculates centroids until convergence or max iterations.
- printResults(): Outputs the clustered points along with their centroids.
- initializeCentroids(): Randomly selects initial centroids from the data points.
- findClosestCentroid(): Finds the index of the closest centroid to a given point using Euclidean distance.
- recalculateCentroids(): Computes new centroids based on the mean of points assigned to each cluster.
- euclideanDistance(): Calculates the Euclidean distance between two points.
- Constructor: Initializes the number of clusters
- main() Function:
- Initializes a vector of
Point
objects with sample data. - Creates a
KMeans
object with 3 clusters and 100 iterations. - Fits the K-Means algorithm to the data and prints the results.
- Initializes a vector of