python - 如何在定义函数时指定使用 pandas .replace() 而不是 str.replace()?
问题描述
我想从熊猫数据框列中过滤掉某些单词,并为过滤后的文本创建一个新列。我从这里尝试了解决方案,但我认为我遇到了 python 的问题,我想调用str.replace()
而不是df.replace()
。只要我在函数中调用它,我不确定如何指定后者。
东风:
id old_text
0 my favorite color is blue
1 you have a dog
2 we built the house ourselves
3 i will visit you
def removeWords(txt):
words = ['i', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself']
txt = txt.replace('|'.join(words), '', regex=True)
return txt
df['new_text'] = df['old_text'].apply(removeWords)
错误:
TypeError: replace() takes no keyword arguments
所需的输出:
id old_text new_text
0 my favorite color is blue favorite color is blue
1 you have a dog have a dog
2 we built the house ourselves built the house
3 i will visit you will visit you
其他尝试过的东西:
def removeWords(txt):
words = ['i', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself']
txt = [word for word in txt.split() if word not in words]
return txt
df['new_text'] = df['old_text'].apply(removeWords)
这返回:
id old_text new_text
0 my favorite color is blue favorite, color, is, blue
1 you have a dog have, a, dog
2 we built the house ourselves built, the, house
3 i will visit you will, visit, you
解决方案
从这一行:
txt.replace(rf"\b({'|'.join(words)})\b", '', regex=True)
这是签名,pd.Series.replace
因此您的函数将一系列作为输入。另一方面,
df['old_text'].apply(removeWords)
将函数应用于 的每个单元格df['old_text']
。这意味着,txt
将只是一个字符串,并且在这种情况下,for 的签名str.replace
没有关键字参数 ( )。regex=True
TLDR,你想做:
df['new_text'] = removeWords(df['old_text'])
输出:
id old_text new_text
0 0 my favorite color is blue favorte color s blue
1 1 you have a dog have a dog
2 2 we built the house ourselves bult the house selves
3 3 i will visit you wll vst
但是正如您所看到的,您的函数替换i
了单词中的。您可能需要修改模式,使其仅用边界指示符替换整个单词\b
:
def removeWords(txt):
words = ['i', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself']
# note the `\b` here
return txt.replace(rf"\b({'|'.join(words)})\b", '', regex=True)
输出:
id old_text new_text
0 0 my favorite color is blue favorite color is blue
1 1 you have a dog have a dog
2 2 we built the house ourselves built the house
3 3 i will visit you will visit
推荐阅读
- google-cloud-platform - 使用 cloudsqlproxy 从 GKE 集群连接到 Google 云 mysql 实例
- angular - 使用“dotnet run”命令运行时 Angular/NET Core Web API 未命中
- perl - USPS HTTP 发布请求
- ampl - 下标中的 AMPL 变量尚不允许
- ios - realmdb swift 不从文件中读取值
- c# - 读取 appsettings.json 文件 C# .NET Core 的问题
- javascript - 我无法使用以下 JavaScript 代码制作运行列表
- java - Uiautomator 使用 adb 绑定到坐标
- java - Android:错误:程序类型已存在:com.google.common.base.AbstractIterator$State
- java - 为什么两个日期之间的差异会返回额外的一天?