首页 > 解决方案 > 如何根据列表中的字符串匹配文本并在 Python 中提取小节?

问题描述

我正在尝试从类似于以下示例的收益调用文本生成结构:

"Operator

Ladies and gentlemen, thank you for standing by. And welcome to XYZ Fourth Quarter 2019 Earning Conference Call. At this time, all participants are in a listen-only mode. After the speaker presentation, there will be a question-and-answer session. [Operator Instructions] Please be advised that today’s conference is being recorded. [Operator Instructions]
I would now like to hand the conference to your speaker today,Person1, Head of Investor Relations. Please go ahead, ma’am**

Person1

Hello everyone, blablablablabla. Now let's see what Person2 has to say.

Person2

Thank you and hello everyone. Blablablabla

Person3

I have no further remarks....thank you once again"

由此我生成了一个名为list1 = ['Person1','Person2','Person3']. 我生成了一个空数据框,其列名称为Person1,Person2Person3. 我现在必须提取下面的文本Person1Person2Person3根据列表中的值并填写数据框。那可能吗?

标签: pythonregexnlp

解决方案


text="""OperatorLadies and gentlemen, thank you for standing by. And welcome to XYZ Fourth Quarter 2019 Earning Conference Call. At this time, all participants are in a listen-only mode. After the speaker presentation, there will be a question-and-answer session. [Operator Instructions] Please be advised that today’s conference is being recorded. [Operator Instructions]I would now like to hand the conference to your speaker today,Person1, Head of Investor Relations. Please go ahead, ma’am**Person1Hello everyone, blablablablabla. Now let's see what Person2 has to say.Person2Thank you and hello everyone. BlablablablaPerson3I have no further remarks....thank you once again"""

import re
say1=text.split('Person1')[2].split('Person2')[0] #getting text of person1
say2=text.split('Person2')[2].split('Person3')[0] #getting text of person2
say3=text.split('Person3')[1] #getting text of person3

#converting to a dataframe
pd.DataFrame({'Person1':say1,'Person2':say2,'Person3':say3},index=[1])


在此处输入图像描述


推荐阅读