首页 > 解决方案 > 将调查数据绘制到条形图

问题描述

我有一个数据框,其中包含“城市”、“性别”、“教育水平”和“您对某事的满意度”等列

所以我试图将它绘制成条形图;

#in here i select the neighbourhood as "X"
#then i group it based on gender and try to plot it with the question of how satisfied are you about something.

所以这就是我得到的:

但我想得到这样的东西:

我想不出让这些条的颜色与“您对某事的满意度如何”问​​题的答案相同。

我希望能够在条形图的顶部添加百分比。如果有人可以指导我,我会非常感激。谢谢你。

标签: matplotlibplotbar-chart

解决方案


countplot()您可以按如下方式创建 Seaborn 。gender用于将其x放置在 x 轴上。使用Satisfied?ashue将性别条划分为较小的条并创建一个随附的图例。如果您想固定这些值的特定顺序,hue_order可以使用,也可以将列设为分类。

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

N = 500
data = pd.DataFrame({'City': np.random.choice(['Test City', 'Other City'], N),
                     'Gender': np.random.choice(['Male', 'Female'], N),
                     'Satisfied?': np.random.choice(['1 - very bad', '2 - bad', '3 - neutral', '4 - good', '5 - very good'], N)})
sns.countplot(data=data[data['City'] == 'Test City'], x='Gender', palette='plasma',
              hue='Satisfied?', hue_order=['1 - very bad', '2 - bad', '3 - neutral', '4 - good', '5 - very good'])
plt.show()

示例图

从这里,可以进行进一步的改进:

  • 更改条形高度,使每个性别的总和为 1。这会将高度转换为百分比。
  • 更改 y 轴的格式以显示百分比
  • 在改变高度的同时,也可以改变条的宽度,在它们之间留下一点间隙
  • 将图例放在底部,没有框架和方形标记。
  • 将百分比添加为条形上方的文本
  • 添加水平网格线
  • 隐藏刺
  • ...

Seaborn 有无数种选择颜色的方法。最简单的方法是给出一个命名颜色的列表。但并不是说现有的调色板已经过研究以使颜色能够很好地融合在一起。Colorbrewer 网站可用于在许多情况下试验和查找颜色。

代码中的变量width_scale可用于设置间隙。在旧版本0.8中被设置,留下了0.2. 新示例的差距为1.0 - 0.6 = 0.4.

这是一个例子:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from matplotlib.ticker import PercentFormatter

N = 500
data = pd.DataFrame({'City': np.random.choice(['Test City', 'Other City'], N),
                     'Gender': np.random.choice(['Male', 'Female'], N, p=[0.3, 0.7]),
                     'Satisfied?': np.random.choice(['1 - very bad', '2 - bad', '3 - neutral', '4 - good', '5 - very good'], N)})
city_data = data[data['City'] == 'Test City']
fig, ax = plt.subplots(figsize=(14, 4))
sns.countplot(data=city_data, x='Gender', order=['Male', 'Female'], ax=ax,
              palette=['turquoise', 'tomato', 'deepskyblue', 'gold', 'limegreen'],
              hue='Satisfied?', hue_order=['1 - very bad', '2 - bad', '3 - neutral', '4 - good', '5 - very good'])

width_scale = 0.6  # the relative width of the bars, 1.0 means bars touching; the gap will be 1-width_scale
for bars in ax.containers:
    for bar, total_per_gender in zip(bars, [sum(city_data['Gender'] == 'Male'), sum(city_data['Gender'] == 'Female')]):
        new_height = bar.get_height() / total_per_gender
        bar.set_height(new_height)
        width = bar.get_width()
        x = bar.get_x()
        bar.set_width(width * width_scale)
        bar.set_x(x + width * (1 - width_scale) / 2)  # recenter
        if np.isnan(new_height):
            new_height = 0
        ax.text(x + width / 2, new_height, f' {new_height * 100:.1f}%\n', ha='center', va='bottom', rotation=90)
ax.set_xlabel('')  # remove superfluous x-label
ax.set_ylabel('')
ax.tick_params(axis='x', length=0, labelsize=14)  # remove tick marks, larger text
ax.yaxis.set_major_formatter(PercentFormatter(1))
ax.grid(axis='y', ls=':', clip_on=False)
sns.despine(fig, ax, top=True, right=True, left=True, bottom=True)
ax.legend(ncol=5, bbox_to_anchor=(0.5, -0.1), loc='upper center', frameon=False, handlelength=1, handleheight=1)
ax.autoscale()  # needed to recalculate the axis limits after changing the heights
ax.relim()
ax.margins(y=0.15, x=0.02)  # some space for the text on top of the bars
plt.tight_layout()
plt.show()

示例百分比图


推荐阅读