首页 > 解决方案 > 数据透视表中的条形图,包含总计和每组聚合的百分比

问题描述

这是挑战:从 shipwreck.csv 文件中创建一个数据框。从这个数据框中,构建一个数据透视表,显示每个班级中男性/女性的平均票价,以及每个班级中幸存的男性/女性人数。行索引应该是类值。使用边距包括每个舱位中所有男性、女性和所有乘客的平均值。打印整个框架。然后创建一个条形图,显示每个班级的男性和女性以及所有乘客的存活百分比。在上一个问题中使用数据透视表中的数据。条的宽度应为 0.25。

我的问题是我只使用那些指定的列构建了数据框,但我不明白如何获取数据框数据透视表并找到男性/女性的平均票价以便能够设置图表。

到目前为止,这是我的代码:

%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
matplotlib.rcParams['figure.figsize'] = (10.0, 4.0)

df =  pd.read_csv("shipwreck.csv",usecols=     
['survived','sex','fare','class'])
df.set_index('survived')
print(df)
#pivot table to get average fares for male/female then plot it
#use bar graph w/ width of.25 for bars

这是 .csv 从数据框中显示的内容:

             survived     sex      fare   class
        0           0    male    7.2500   Third
        1           1  female   71.2833   First
        2           1  female    7.9250   Third
        3           1  female   53.1000   First
        4           0    male    8.0500   Third
        5           0    male    8.4583   Third
        6           0    male   51.8625   First
        7           0    male   21.0750   Third
        8           1  female   11.1333   Third
        9           1  female   30.0708  Second
        10          1  female   16.7000   Third
        11          1  female   26.5500   First
        12          0    male    8.0500   Third
        13          0    male   31.2750   Third
        14          0  female    7.8542   Third
        15          1  female   16.0000  Second
        16          0    male   29.1250   Third
        17          1    male   13.0000  Second
        18          0  female   18.0000   Third
        19          1  female    7.2250   Third
        20          0    male   26.0000  Second
        21          1    male   13.0000  Second
        22          1  female    8.0292   Third
        23          1    male   35.5000   First
        24          0  female   21.0750   Third
        25          1  female   31.3875   Third
        26          0    male    7.2250   Third
        27          0    male  263.0000   First
        28          1  female    7.8792   Third
        29          0    male    7.8958   Third
        ..        ...     ...       ...     ...
        861         0    male   11.5000  Second
        862         1  female   25.9292   First
        863         0  female   69.5500   Third
        864         0    male   13.0000  Second
        865         1  female   13.0000  Second
        866         1  female   13.8583  Second
        867         0    male   50.4958   First
        868         0    male    9.5000   Third
        869         1    male   11.1333   Third
        870         0    male    7.8958   Third
        871         1  female   52.5542   First
        872         0    male    5.0000   First
        873         0    male    9.0000   Third
        874         1  female   24.0000  Second
        875         1  female    7.2250   Third
        876         0    male    9.8458   Third
        877         0    male    7.8958   Third
        878         0    male    7.8958   Third
        879         1  female   83.1583   First
        880         1  female   26.0000  Second
        881         0    male    7.8958   Third
        882         0  female   10.5167   Third
        883         0    male   10.5000  Second
        884         0    male    7.0500   Third
        885         0  female   29.1250   Third
        886         0    male   13.0000  Second
        887         1  female   30.0000   First
        888         0  female   23.4500   Third
        889         1    male   30.0000   First
        890         0    male    7.7500   Third

        [891 rows x 4 columns]

这是条形图的样子:

在此处输入图像描述

标签: pythonpandasmatplotlib

解决方案


以下是您可以执行的操作:

df = pd.read_csv('shipwreck.csv', usecols=['survived', 'sex', 'class'])
df_piv = pd.pivot_table(df,
                        index='class',
                        columns='sex',
                        aggfunc=lambda x: 100*x.sum()/x.count(), # % per group
                        margins=True,
                        margins_name='Combined')
df_piv.columns = df_piv.columns.droplevel()
df_piv.plot.bar(rot='horizontal');

在此处输入图像描述


推荐阅读