首页 > 解决方案 > 使用 pandas 总结分类问卷数据

问题描述

从 Python 开始,我遇到了一个一定很常见但找不到直接解决方案的问题。我有一些虚构的问卷数据,我想对其进行有意义的描述。具体来说,对于每个问题,我想知道给出了多少次特定响应(“是”/“也许”/“否”)。

输入:

         Question1   Question2   Question3
Answer1  Maybe       Yes         Yes
Answer2  No          Maybe       Yes
Answer3  Maybe       Maybe       No
Answer4  No          Yes         Maybe

现在,我想对某个问题给出特定答案的次数有一个很好的概述。首选输出是这样的:

(首选)输出:

           Yes     Maybe    No
Question1  0       2        2
Question2  2       2        0
Question3  2       1        1

我自己的想法是解决方案必须在“groupby”命令中。到目前为止,我还没有成功获得任何有意义的输出:

df.groupby(['Question1']).sum()
      Question2 Question3
Question1                    
Maybe      YesMaybe     YesNo
No         MaybeYes  YesMaybe

我已经生成了虚拟数据:

# Generate data
data = np.array([['','Question1','Question2','Question3'],['Answer1',"Maybe","Yes","Yes"],['Answer2',"No","Maybe","Yes"],['Answer3',"Maybe","Maybe","No"],['Answer4',"No","Yes","Maybe"]])          


# convert to pandas dataframe
df = pd.DataFrame(data=data[1:,1:],index=data[1:,0],columns=data[0,1:])

我知道这一定是一个简单的挑战,但任何帮助将不胜感激。

标签: pythonpandas

解决方案


简单地

df.apply(pd.value_counts).fillna(0)


            Question1   Question2   Question3
Maybe       2.0         2.0         1.0
No          2.0         0.0         1.0
Yes         0.0         2.0         2.0

如果你愿意,你可以转置它df.apply(pd.value_counts).fillna(0).T

            Maybe   No    Yes
Question1   2.0     2.0   0.0
Question2   2.0     0.0   2.0
Question3   1.0     1.0   2.0

推荐阅读