首页 > 解决方案 > 如何比较 Python 数据框中的子字符串以创建新列?

问题描述

我目前有一个数据框,我正在分析体育数据。一栏,“团队”,有玩家所属的球队,另一栏,“游戏信息”,有关于游戏的信息。游戏信息栏看起来像这样

SAC@HOU 12/09/2019 08:00PM ET

并且 Team 列可以有“SAC”或“HOU”。我正在尝试创建一个包含对手的新列。目前我尝试过的是

df.insert(7, "Opp", '', True)
df["Opp"][df['Game Info'].str[:3].str.contains(df['Team'])] = df['Game Info'].str[4:7]
df["Opp"][df['Opp'].empty] = df['Team']

这给了我以下错误:

'Series' objects are mutable, thus they cannot be hashed

我也试过

df['Opp'] = np.where(df['Team'].str != df['Game Info'].str[:3]), df['Game Info'].str[:3], df['Game Info'].str[4:7])

df['Opp'] = df['Game Info'].str[:3] if df['Team'].str != df['Game Info'].str[:3] else df['Game Info'].str[4:7]

但两者都给我以下错误:

The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

我怎样才能正确比较这些子字符串?

标签: pythonpandasdataframe

解决方案


采用:

df=pd.DataFrame({'Team':['SAC','HOU'], 'Game Info':['SAC@HOU 12/09/2019 08:00PM ET', 'SAC@HOU 12/09/2019 08:00PM ET']})    
df['Opp'] = np.where(df['Team'] == df['Game Info'].str[:3], df['Game Info'].str[4:7], df['Game Info'].str[:3])
df
  Team                      Game Info  Opp
0  SAC  SAC@HOU 12/09/2019 08:00PM ET  HOU
1  HOU  SAC@HOU 12/09/2019 08:00PM ET  SAC

推荐阅读