首页 > 解决方案 > 使用 pandas df 突出显示热图中每一列的平均值数据单元格的自定义颜色

问题描述

我试图在熊猫热图中突出显示平均值单元格或平均值的最小值单元格,但它总是给我失败的结果。如果精确值不可用意味着需要突出显示平均值的最低值,我想在热图中突出显示精确平均值的单元格。

例如:平均值为 17.522 但在 df 中不可用意味着需要突出显示 15.499(请参阅

在这里,我分享了我尝试过的屏幕截图以及我对您的参考的期望。

天才永远欢迎..!提前致谢。

每列的平均值是,

array([17.60950419, 33.73034387, 46.63401871, 56.27580645, 52.62956452,
       63.70669355, 71.75735484, 67.788     , 83.62327419, 75.41342   ])

我尝试使用以下代码突出显示单个单元格

df_Mean=np.array(df.mean())

fig, ax = plt.subplots(figsize=(18,8))

cmap = matplotlib.colors.LinearSegmentedColormap.from_list("", ["#f9f9f9","#B6DBF2","#327AD9","#3068D9"])
color = ["#f9f9f9",'#3068D9',"#f9f9f9","#f9f9f9","#B6DBF2","#327AD9","#3068D9"]

ax = sns.heatmap(df, annot=True, fmt=".5g", linewidths=.02, 
                 cmap=cmap, vmin=0, vmax=300,cbar_kws={'label': 'Si.No'}, 
                linecolor='#CBDBD7',
                ax = ax,
                xticklabels=1, yticklabels=1,
                )

ax = sns.heatmap(df.round(),mask=(df > df_Mean),
             vmin=10, vmax=80, cmap=color, cbar=False)

ax.invert_yaxis()
ax.yaxis.set_label_position("right")
ax.yaxis.tick_right()

ax.set_xticklabels(
    ax.get_xticklabels(), color = 'darkgreen',
    horizontalalignment='center');

ax.set_yticklabels(
    ax.get_yticklabels(), color = 'darkblue',
    horizontalalignment='right',
    size = 10,);

ax.xaxis.set_tick_params(pad=10)
ax.yaxis.set_tick_params(pad=20)  

plt.xlabel('Month', color = 'Maroon', size= 15)
plt.title('Testing_HeatMap', color = 'darkgreen', size = 20)
plt.show()

我得到这个输出, 我得到这个输出

预期输出为: 在此处输入图像描述

标签: pythonpandasmatplotlibseaborn

解决方案


在这里,我展示了一些随机数据的解决方案(但它应该说明您的设置方法)。

对于每一列,我找到最接近列均值的值的行号,该值小于或等于列均值:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
import pandas as pd
import seaborn as sns

# Generate some random data (25 rows, 3 columns)
df = pd.DataFrame(np.random.rand(25, 3))

# Compute the mean of each column
df_mean = df.mean()

# Find the difference between each value and the column mean
diff = df - df.mean()

# We are only interested in values less than or equal to the mean
mask = diff <= 0

# The row numbers of the closest values to the column mean
# which are less than or equal to the column mean
highlight_row = np.nanargmin(np.array(df[mask]), axis=0)

一旦你得到了这些行号,剩下要做的就是绘制它们。一种方法是使用Rectangle补丁(尽管毫无疑问还有其他方法可以做到这一点):

# Plotting
fig, ax = plt.subplots()
ax = sns.heatmap(df, ax=ax)

# Loop over the columns
for col in range(df.shape[1]):
    # Add a rectangle which highlights our cell of interest
    ax.add_patch(Rectangle((col, highlight_row[col]), 1, 1))

推荐阅读