Clustering

Module for clustering data. These algorithms do not require labeled data. They only need the input data (and the number of clusters in the case of KMeans). They ONLY work with data that is clustered together, hence the name of the module.

class mapyl.clustering.KMeans(k=2, tol=0.001)[source]

KMeans instance

Parameters:

k (int): The number of clusters. Defaults to 2

tol (float): The tolerance for the cost. Defaults to 0.001

fit(X, iters=300)[source]

Fits the instance

Parameters:

X (ndarray): The X values to be fitted

iters (int): The number of iterations. Defaults to 300

Returns none

predict(X)[source]

Predicts the class of an X value

Patameter:

X (ndarray): The X values to be predicted

Returns: The index of the class of the supplied X

Usage:

>>> X = np.array([[4, -3],
                    [9, -6],
                    [3, -5],
                    [4, -3],
                    [9, -5,]])
>>> km = KMeans(K=2, tol=0.001)
>>> km.fit(X, iters=100)
>>> print(km.predict(np.array([[3, -2]])))
>>> 0
class mapyl.clustering.DBSCAN(eps=2, minpoints=5)[source]

DBSCAN instance, this instance does NOT have methods which return values or predict, so it is important to access the computes values by using the attributes.

Parameters:

eps (float): The minimum radius of the distances for neighboring instances. Defaults to 2

minpoints (int): The minimum number of points for an instance to become a core. Defaults to 5

Attributes:

labels (ndarray): The list of the sample indices.

core_sample_indices (ndarray):The indices of the core samples

noncore_sample_indices (list): The indices of the noncore samples (includes outliers)

cl (int): The number of clusters.

fit(X)[source]

Fits the instance

Parameters:

X (ndarray): ndarray of the X values

Returns:

self: The fitted instance

In order for the DBSCAN instance to predict values, the core sample indices and labels can be used to train another algorithm, such as KNN:

>>> X = np.array([[4, -3],
                    [9, -6],
                    [3, -5],
                    [4, -3],
                    [9, -5,]])
>>> db = DBSCAN(eps=2, minpoints=5)
>>> db.fit(X)
>>> knn = KNearestNeighbors(K=5)
>>> knn.fit(X, db.labels)
>>> print(knn.predict(np.array([3, -2])))
>>> [0]