首页 > 解决方案 > 如何在定义函数时指定使用 pandas .replace() 而不是 str.replace()?

问题描述

我想从熊猫数据框列中过滤掉某些单词,并为过滤后的文本创建一个新列。我从这里尝试了解决方案,但我认为我遇到了 python 的问题,我想调用str.replace()而不是df.replace()。只要我在函数中调用它,我不确定如何指定后者。

东风:

id     old_text 
0      my favorite color is blue
1      you have a dog
2      we built the house ourselves
3      i will visit you
def removeWords(txt):
     words = ['i', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself']
     txt = txt.replace('|'.join(words), '', regex=True)
     return txt

df['new_text'] = df['old_text'].apply(removeWords)

错误:

TypeError: replace() takes no keyword arguments

所需的输出:

id     old_text                         new_text
0      my favorite color is blue        favorite color is blue
1      you have a dog                   have a dog
2      we built the house ourselves     built the house 
3      i will visit you                 will visit you

其他尝试过的东西:

def removeWords(txt):
     words = ['i', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself']
     txt = [word for word in txt.split() if word not in words]
     return txt

df['new_text'] = df['old_text'].apply(removeWords)

这返回:

id     old_text                         new_text
0      my favorite color is blue        favorite, color, is, blue
1      you have a dog                   have, a, dog
2      we built the house ourselves     built, the, house 
3      i will visit you                 will, visit, you

标签: pythonregexpandas

解决方案


从这一行:

txt.replace(rf"\b({'|'.join(words)})\b", '', regex=True)

这是签名,pd.Series.replace因此您的函数将一系列作为输入。另一方面,

df['old_text'].apply(removeWords)

将函数应用于 的每个单元df['old_text']。这意味着,txt将只是一个字符串,并且在这种情况下,for 的签名str.replace没有关键字参数 ( )。regex=True

TLDR,你想做:

df['new_text'] = removeWords(df['old_text'])

输出:

   id                      old_text                new_text
0   0     my favorite color is blue    favorte color s blue
1   1                you have a dog              have a dog
2   2  we built the house ourselves   bult the house selves
3   3              i will visit you                wll vst 

但是正如您所看到的,您的函数替换i了单词中的。您可能需要修改模式,使其仅用边界指示符替换整个单词\b

def removeWords(txt):
    words = ['i', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself']
    
    # note the `\b` here
    return txt.replace(rf"\b({'|'.join(words)})\b", '', regex=True)

输出:

   id                      old_text                 new_text
0   0     my favorite color is blue   favorite color is blue
1   1                you have a dog               have a dog
2   2  we built the house ourselves         built the house 
3   3              i will visit you              will visit 

推荐阅读