首页 > 解决方案 > 在列表中组合 DataFrame 的唯一元素

问题描述

我会尽量清楚地问我的问题。

我有以下看起来像这样的DataFrame

import pandas as pd
data = {'player' : ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
       'game' : ['Soccer', 'Basketball', 'Ping pong', 'Soccer', 'Tennis', 'Tennis', 'Baseball', 'Volleyball', 'Dodgeball']}
df = pd.DataFrame(data, columns=['player','game'])

  player        game
0      A      Soccer
1      A  Basketball
2      A   Ping pong
3      B      Soccer
4      B      Tennis
5      B      Tennis
6      C    Baseball
7      C  Volleyball
8      C   Dodgeball

现在我只想为每个玩家保留一次唯一的值。理想情况下在列表中,但这没什么大不了的。

例如, player Aand Bplaysoccer所以我不想在输出中显示足球。 tennis出现两次,但都出现在播放器B中,所以它会在输出中。

我想输出为:

player        game
0      A  Basketball
1      A   Ping pong
2      B      Soccer
3      B      Tennis
4      C    Baseball
5      C  Volleyball
6      C   Dodgeball

或者像这样:

player        game
0      A  [Basketball, Ping Pong]
1      B  [Soccer, Tennis]
2      C  [Baseball, Volleyball, Dodgeball]

谢谢您的帮助!

标签: pythonpandasdataframe

解决方案


似乎需要删除重复项并保留每列“游戏”的最后一个DataFrame.drop_duplicates,然后如果需要列表将它们聚合list

df = (df.drop_duplicates('game', keep='last')
        .groupby('player')['game']
        .agg(list)
        .reset_index())
print (df)
  player                               game
0      A            [Basketball, Ping pong]
1      B                   [Soccer, Tennis]
2      C  [Baseball, Volleyball, Dodgeball]

推荐阅读