python - 将数据框中字符串中的单词替换为单独数据框中的单词
问题描述
我有以下数据集:
Date User comments
9/20/2019 user1 My car model is 600.
9/21/2019 user2 My car model is viper.
9/23/2019 user3 I have a car. The model is civic.
9/23/2019 user4 Washington is name of the city.
9/23/2019 user5 I like the freedom I feel when I drive my chevy.
这些是被抓取的示例评论。我正在尝试使用这个数据框:
Brand Model
ford 600
chevrolet chevy
dodge viper
honda civic
pontiac gto
honda freed
我正在尝试用品牌替换数据框评论中描述的模型。
这是我的代码:
file = pd.read_csv('test_dataset.csv')
file['comments'] = file['comments'].astype(str)
file["comments"] = file["comments"].str.lower()
brandconverter = pd.read_csv("brandconverter.csv")
def replacemodel(comment):
return pd.Series(comment).replace(brandconverter.set_index('Model')['Brand'], regex=True)[0]
file['test'] = file['comments'].apply(replacemodel)
我的预期输出应该是:
Date User comments test
9/20/2019 user1 My car model is 600. My car model is ford.
9/21/2019 user2 My car model is viper. My car model is dodge.
9/23/2019 user3 I have a car. The model is civic. I have a car. The model is honda.
9/23/2019 user4 Washington is name of the city. Washington is name of the city.
但我得到的输出是:
Date User comments test
9/20/2019 user1 My car model is 600. My car model is ford.
9/21/2019 user2 My car model is viper. My car model is dodge.
9/23/2019 user3 I have a car. The model is civic. I have a car. The model is honda.
9/23/2019 user4 Washington is name of the city. Washinpontiacn is name of the city.
当汽车模型在“华盛顿”之类的单词中时,我希望我的功能可以忽略。目前,它正在寻找模型出现在评论中的任何情况,即使它在一个单词中。我希望该功能不考虑“华盛顿”中的“gto”。我也希望将此功能应用于不同的评论。这只是一个示例。
解决方案
您可以使用Series.replace
with 可选参数regex=True
将模型替换comments
为相应的品牌df2
:
s = brandconverter.set_index('Model')['Brand']
s.index = r'\b' + s.index + r'\b' # Takes care of word boundary condition
file['test'] = file['comments'].replace(s, regex=True)
结果:
Date User comments test
0 9/20/2019 user1 My car model is 600. My car model is ford.
1 9/21/2019 user2 My car model is viper. My car model is dodge.
2 9/23/2019 user3 I have a car. The model is civic. I have a car. The model is honda.
3 9/23/2019 user4 Washington is name of the city. Washington is name of the city.