python - 在 Dataframe 中跨多个列（比如两个）查找最大出现值

问题描述

我创建了一个数据框（df1），并有列HostCity1和HostCity2，我想知道这两列中哪个城市的出现率最高？在这种情况下是伦敦，但是如何识别并将其分配给对象（例如 city_max）

***import pandas as pd
  olympic_data_list={'HostCity1':['London','Beijing','Athens'],'Year1':[2012,2008,2004],'HostCity2':['London','Sydney','Atlanta'],'Year2':[1948,2000,1996]}
df1=pd.DataFrame(olympic_data_list)
print(df1)***

输出是：

    HostCity1   Year1   HostCity2   Year2
0   London      2012    London      1948
1   Beijing     2008    Sydney      2000
2   Athens      2004    Atlanta     1996

标签： pythonpython-3.xpandasdataframe

使用DataFrame.filterfor 获取带有的列HostCity，然后通过DataFrame.stackfor重塑形状并按 -Series计数值Series.value_counts默认输出是排序的，因此对于 top1 值，通过索引选择第一个索引值[0]：

city_max = df1.filter(like='HostCity').stack().value_counts().index[0]
print (city_max)
London

详情：

print (df1.filter(like='HostCity'))
  HostCity1 HostCity2
0    London    London
1   Beijing    Sydney
2    Athens   Atlanta

print (df1.filter(like='HostCity').stack())
0  HostCity1     London
   HostCity2     London
1  HostCity1    Beijing
   HostCity2     Sydney
2  HostCity1     Athens
   HostCity2    Atlanta
dtype: object

print (df1.filter(like='HostCity').stack().value_counts())
London     2
Beijing    1
Athens     1
Atlanta    1
Sydney     1
dtype: int64

DataFrame.melt用于 unpivot 的另一种解决方案：

city_max = df1.filter(like='HostCity').melt()['value'].value_counts().index[0]

python - 在 Dataframe 中跨多个列（比如两个）查找最大出现值

问题描述

解决方案

推荐阅读