Clustering
Module for clustering data. These algorithms do not require labeled data. They only need the input data (and the number of clusters in the case of KMeans). They ONLY work with data that is clustered together, hence the name of the module.
- class mapyl.clustering.KMeans(k=2, tol=0.001)[source]
KMeans instance
- Parameters:
k (int): The number of clusters. Defaults to 2
tol (float): The tolerance for the cost. Defaults to 0.001
Usage:
>>> X = np.array([[4, -3],
[9, -6],
[3, -5],
[4, -3],
[9, -5,]])
>>> km = KMeans(K=2, tol=0.001)
>>> km.fit(X, iters=100)
>>> print(km.predict(np.array([[3, -2]])))
>>> 0
- class mapyl.clustering.DBSCAN(eps=2, minpoints=5)[source]
DBSCAN instance, this instance does NOT have methods which return values or predict, so it is important to access the computes values by using the attributes.
- Parameters:
eps (float): The minimum radius of the distances for neighboring instances. Defaults to 2
minpoints (int): The minimum number of points for an instance to become a core. Defaults to 5
- Attributes:
labels (ndarray): The list of the sample indices.
core_sample_indices (ndarray):The indices of the core samples
noncore_sample_indices (list): The indices of the noncore samples (includes outliers)
cl (int): The number of clusters.
In order for the DBSCAN instance to predict values, the core sample indices and labels can be used to train another algorithm, such as KNN:
>>> X = np.array([[4, -3],
[9, -6],
[3, -5],
[4, -3],
[9, -5,]])
>>> db = DBSCAN(eps=2, minpoints=5)
>>> db.fit(X)
>>> knn = KNearestNeighbors(K=5)
>>> knn.fit(X, db.labels)
>>> print(knn.predict(np.array([3, -2])))
>>> [0]