首页 > 解决方案 > Trying to plot outliers using DBSCAN

问题描述

I have never been great with Python plotting concepts, and now I'm still apparently missing something new.

Here is my code.

import pandas as pd
import matplotlib.pyplot as plt
import sys
from numpy import genfromtxt
from sklearn.cluster import DBSCAN

data = pd.read_csv('C:\\Users\\path_here\\wine.csv')
data

# Reading in 2D Feature Space
model = DBSCAN(eps=0.9, min_samples=10).fit(data)


array_flavanoids = data.iloc[:, 2]

# Slicing array
array_colorintensity = data.iloc[:, 3]

# Scatter plot function
colors = model.labels_
plt.scatter(array_flavanoids, array_colorintensity, c=colors, marker='o')
plt.xlabel('Concentration of flavanoids', fontsize=16)
plt.ylabel('Color intensity', fontsize=16)
plt.title('Concentration of flavanoids vs Color intensity', fontsize=20)
plt.show()

Here is my result.

enter image description here

I am expecting the outliers to be in a different color than the non-outliers. So, something like this.

enter image description here

Maybe one color for outliers and another for non-outliers. I am just trying to learn the concept in this exercise. I am trying to follow the example from this link.

https://towardsdatascience.com/outlier-detection-python-cd22e6a12098

I am using this data source.

https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al-2009

标签: pythonpython-3.xmatplotlibdbscan

解决方案


I am testing different data sets.
I got this to work.

from sklearn.cluster import DBSCAN

def dbscan(X, eps, min_samples):
    ss = StandardScaler()
    X = ss.fit_transform(X)
    db = DBSCAN(eps=eps, min_samples=min_samples)
    db.fit(X)
    y_pred = db.fit_predict(X)
    plt.scatter(X[:,0], X[:,1],c=y_pred, cmap='Paired')
    plt.title("DBSCAN")
        
dbscan(data, eps=.5, min_samples=5)

enter image description here

I found this to be a great resource.

https://medium.com/@plog397/functions-to-plot-kmeans-hierarchical-and-dbscan-clustering-c4146ed69744


推荐阅读