首页 > 解决方案 > 上下文中的单词 - 熊猫

问题描述

我有一个列表,每次该列表中的一个单词出现在文本中时,我都想替换接下来的两个单词

例如:list = ['Mrs.', 'Miss', 'Ms.', 'Lady', 'Mr.', 'Sir', 'Lord']

短语 = '对不起,这位女士在家。

resultat = '对不起,那位女士 <next_words> 回家了。'

我正在尝试在数据框中执行此操作

我试过了:

def words_contexte(df):

    titres_list = ['Mrs.', 'Miss', 'Ms.', 'Lady', 'Mr.', 'Sir', 'Lord']

    data_frame_split = df['C'].str.split()
    words_index = df['C'].str.data_frame_split[data_frame_split.index(titres_list) + 2]
    df['C'] = df['C'].str.replace(words_index, '<next_words>')

    return df

我的数据框:

       A          B                                     C
  French      house                      Are you at home?
 English      house   I'm sorry, but the lady is at home.
  French  apartment          His name is Sir Ringo Starr.
  French      house      I'm Mrs. Carla and I have a dog.
 English  apartment                  Hi Miss how are you?

良好的输出

       A          B                                     C
  French      house                      Are you at home?
 English      house   I'm sorry, but the lady <next_words> home.
  French  apartment          His name is Sir <next_words>.
  French      house      I'm Mrs. <next_words> I have a dog.
 English  apartment                  Hi Miss <next_words> you?

标签: pythonpandasdataframe

解决方案


这是一种避免循环遍历每个列表的方法:

list_ = ['Mrs.', 'Miss', 'Ms.', 'lady', 'Mr.', 'Sir', 'Lord']

def fun(x, y):
    in1d = np.in1d(x.split(' '), y)
    in1d_drop = np.roll(in1d, 2)
    in1d_replace = np.roll(in1d, 1)
    l = np.where(in1d_drop, '', x.split(' '))
    l = np.where(in1d_replace, '<next_words>', l)
    return ' '.join(l)

并简单地应用于fun列上的每一行C

df ['C'] = df['C'].apply(fun, y=list_)

print(df)
      A          B                                            C
0   French      House                             Are you at home?
1  English      House  I'm sorry, but the lady <next_words>  home.
2   French  Apartment                His name is Sir <next_words> 
3   French      House          I'm Mrs. <next_words>  I have a dog
4  English  Apartment                   Hi Miss <next_words>  you?

推荐阅读