首页 > 解决方案 > 在 Dataframe 中跨多个列(比如两个)查找最大出现值

问题描述

我创建了一个数据框(df1),并有列HostCity1HostCity2,我想知道这两列中哪个城市的出现率最高?在这种情况下是伦敦,但是如何识别并将其分配给对象(例如 city_max)


***import pandas as pd
  olympic_data_list={'HostCity1':['London','Beijing','Athens'],'Year1':[2012,2008,2004],'HostCity2':['London','Sydney','Atlanta'],'Year2':[1948,2000,1996]}
df1=pd.DataFrame(olympic_data_list)
print(df1)***

输出是:

    HostCity1   Year1   HostCity2   Year2
0   London      2012    London      1948
1   Beijing     2008    Sydney      2000
2   Athens      2004    Atlanta     1996

标签: pythonpython-3.xpandasdataframe

解决方案


使用DataFrame.filterfor 获取带有 的列HostCity,然后通过DataFrame.stackfor重塑形状并按 -Series计数值Series.value_counts默认输出是排序的,因此对于 top1 值,通过索引选择第一个索引值[0]

city_max = df1.filter(like='HostCity').stack().value_counts().index[0]
print (city_max)
London

详情

print (df1.filter(like='HostCity'))
  HostCity1 HostCity2
0    London    London
1   Beijing    Sydney
2    Athens   Atlanta

print (df1.filter(like='HostCity').stack())
0  HostCity1     London
   HostCity2     London
1  HostCity1    Beijing
   HostCity2     Sydney
2  HostCity1     Athens
   HostCity2    Atlanta
dtype: object

print (df1.filter(like='HostCity').stack().value_counts())
London     2
Beijing    1
Athens     1
Atlanta    1
Sydney     1
dtype: int64
 

DataFrame.melt用于 unpivot 的另一种解决方案:

city_max = df1.filter(like='HostCity').melt()['value'].value_counts().index[0]

推荐阅读