python - 上下文中的单词 - 熊猫
问题描述
我有一个列表,每次该列表中的一个单词出现在文本中时,我都想替换接下来的两个单词
例如:list = ['Mrs.', 'Miss', 'Ms.', 'Lady', 'Mr.', 'Sir', 'Lord']
短语 = '对不起,这位女士在家。
resultat = '对不起,那位女士 <next_words> 回家了。'
我正在尝试在数据框中执行此操作
我试过了:
def words_contexte(df):
titres_list = ['Mrs.', 'Miss', 'Ms.', 'Lady', 'Mr.', 'Sir', 'Lord']
data_frame_split = df['C'].str.split()
words_index = df['C'].str.data_frame_split[data_frame_split.index(titres_list) + 2]
df['C'] = df['C'].str.replace(words_index, '<next_words>')
return df
我的数据框:
A B C
French house Are you at home?
English house I'm sorry, but the lady is at home.
French apartment His name is Sir Ringo Starr.
French house I'm Mrs. Carla and I have a dog.
English apartment Hi Miss how are you?
良好的输出
A B C
French house Are you at home?
English house I'm sorry, but the lady <next_words> home.
French apartment His name is Sir <next_words>.
French house I'm Mrs. <next_words> I have a dog.
English apartment Hi Miss <next_words> you?
解决方案
这是一种避免循环遍历每个列表的方法:
list_ = ['Mrs.', 'Miss', 'Ms.', 'lady', 'Mr.', 'Sir', 'Lord']
def fun(x, y):
in1d = np.in1d(x.split(' '), y)
in1d_drop = np.roll(in1d, 2)
in1d_replace = np.roll(in1d, 1)
l = np.where(in1d_drop, '', x.split(' '))
l = np.where(in1d_replace, '<next_words>', l)
return ' '.join(l)
并简单地应用于fun
列上的每一行C
:
df ['C'] = df['C'].apply(fun, y=list_)
print(df)
A B C
0 French House Are you at home?
1 English House I'm sorry, but the lady <next_words> home.
2 French Apartment His name is Sir <next_words>
3 French House I'm Mrs. <next_words> I have a dog
4 English Apartment Hi Miss <next_words> you?
推荐阅读
- java - java中的多用户聊天室:在服务器或客户端类中输入名称的代码?消息的广播呢?
- html - 如何在 div 的任一侧居中和溢出文本?
- linux - 访问内存时如何通知操作系统内核?
- java - 找到图像左上角和右下角的第一个白色像素?
- vim - 如何安装同时支持 python2 和 python3 的 vim?
- c++ - C++ 将可变参数模板化参数转发到元组
- html - 当我尝试登录时,iframe 网站正在重新加载
- html - 硒 python 按钮
- python - python3.8 venv 在 windows 上不起作用,而 python3.5 可以
- f# - 具有多个可选参数的方法的包装器