首页 > 解决方案 > boxplot 在 TypeError 中显示最大和最小传单结果:“AxesSubplot”对象不可下标

问题描述

我正在准备胡须间隔为 的箱线图[2,98]。问题是我正在处理空气质量数据并且拥有大量数据点,因此异常值占据了整个数字并使箱线图黯然失色。我只想绘制最大和最小离群值,并尝试了Matplotlib boxplot 中的方法仅显示最大和最小传单,但是,我收到一条错误消息,上面写着TypeError: 'AxesSubplot' object is not subscriptable

这是我的代码:

fig,ax = plt.subplots(1, figsize=(8,6))
g = sns.boxplot(data=mda8, orient='v', width = 0.7, whis = (2,98))
fliers = g['fliers']
for fly in fliers:
    fdata=fly.get_data
    fly.set_data([fdata[0][0],fdata[0][-1],fdata[1][0],fdata[1][-1]])
xvalues = ['Niland', 'El Centro', 'Calexico']
plt.xticks(np.arange(3), xvalues, fontsize=12)
ax.set_ylabel('Ozone MDA8 (ppb)',fontsize=15)
ax.set_ylim(0,105)
plt.show()

以下是一些示例数据:

mda8 = pd.DataFrame({
'T1':[35.000000, 32.125000, 32.000000, 35.250000, 28.875000, 28.500000, 29.375000, 25.125000, 34.166667, 35.250000],
'T2':[28.375, 30.750, 33.250, 34.000, 32.875, 30.250, 29.875, 100.409, 29.625, 1.232],
'T3':[34.250, 102.232, 28.250, 33.000, 27.625, 21.500, 28.375, 30.250, 3.454, 33.750]})

我只需要绘制最大和最小异常值的帮助,并且愿意做除了我在这里尝试的方法之外的另一种方法。

编辑这里是我的 csv 文件的链接https://drive.google.com/file/d/1E3A0UAYCbSN53JXtfsbrA4i_Phci_JWf/view?usp=sharing

标签: pythonpandasmatplotlibplotseaborn

解决方案


一种可能的方法是:

  • seaborn.boxplot隐藏通过传递showfliers = False参数绘制的异常值:

    sns.boxplot(data=mda8, orient='v', width = 0.7, whis = (2,98), showfliers = False)
    
  • 获取每列的异常值列表,找到最大值和最小值并仅绘制它们:

    outliers = {col: list(stat['fliers']) for col in mda8.columns for stat in boxplot_stats(mda8[col])}
    min_max_outliers = {key: [np.min(value), np.max(value)] if value != [] else [] for key, value in outliers.items()}
    
    i = 0
    for key, value in min_max_outliers.items():
        if value != []:
            ax.scatter([i, i], value, marker = 'd', facecolor = 'black')
        i += 1
    

完整代码

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from matplotlib.cbook import boxplot_stats


mda8 = pd.DataFrame({'T1': [35.000000, 32.125000, 32.000000, 35.250000, 28.875000, 28.500000, 29.375000, 25.125000, 34.166667, 35.250000],
                     'T2': [28.375, 30.750, 33.250, 34.000, 32.875, 30.250, 29.875, 100.409, 29.625, 1.232],
                     'T3': [34.250, 102.232, 28.250, 33.000, 27.625, 21.500, 28.375, 30.250, 3.454, 33.750]})


fig,ax = plt.subplots(1, figsize=(8,6))

sns.boxplot(data=mda8, orient='v', width = 0.7, whis = (2,98), showfliers = False)

outliers = {col: list(stat['fliers']) for col in mda8.columns for stat in boxplot_stats(mda8[col])}
min_max_outliers = {key: [np.min(value), np.max(value)] if value != [] else [] for key, value in outliers.items()}

i = 0
for key, value in min_max_outliers.items():
    if value != []:
        ax.scatter([i, i], value, marker = 'd', facecolor = 'black')
    i += 1

xvalues = ['Niland', 'El Centro', 'Calexico']
plt.xticks(np.arange(3), xvalues, fontsize=12)
ax.set_ylabel('Ozone MDA8 (ppb)',fontsize=15)
ax.set_ylim(0,105)

plt.show()

在此处输入图像描述


编辑

处理您提供的数据,如果我按原样绘制它们:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


mda8 = pd.read_csv(r'data/MDA8_allregions.csv')
mda8 = mda8.drop(['date', 'date.1', 'date.2'], axis = 1)


fig, ax = plt.subplots(1, figsize = (8, 6))

sns.boxplot(data = mda8, orient = 'v', width = 0.7, whis = (2, 98), showfliers = True)

plt.show()

我得到:

在此处输入图像描述

在上面的代码中,我更改了参数showfliers = False,以隐藏异常值。
然后,正如 JohanC 在评论中所建议的那样,绘制异常值的一种更简单的方法是绘制每列的最小值和最大值:

for i, col in enumerate(mda8.columns, 0):
    ax.scatter([i, i], [mda8[col].min(), mda8[col].max()], marker = 'd', facecolor = 'black')

完整代码

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


mda8 = pd.read_csv(r'data/MDA8_allregions.csv')
mda8 = mda8.drop(['date', 'date.1', 'date.2'], axis = 1)


fig, ax = plt.subplots(1, figsize = (8, 6))

sns.boxplot(data = mda8, orient = 'v', width = 0.7, whis = (2, 98), showfliers = False)

for i, col in enumerate(mda8.columns, 0):
    ax.scatter([i, i], [mda8[col].min(), mda8[col].max()], marker = 'd', facecolor = 'black')

plt.show()

在此处输入图像描述


推荐阅读