首页 > 解决方案 > 基于Regex python的拆分

问题描述

标签: regexpython-3.x

解决方案


一种选择是使用re.findall以下模式:

‘‘(.*?)’’ (.*?)(?= ‘‘|$)

这将在单独的组中为输入中找到的每个匹配项捕获公司名称和描述。请注意,我们使用前瞻(?= ‘‘|$)作为当前描述的结尾,它要么发生在下一个条目的开头,要么发生在输入的结尾。

inp = "‘‘Apple’’ It is create by Steve Jobs (He was fired and get hired) ‘‘Microsoft’’ Bill Gates was the richest man in the world ‘‘Oracle’’ It is a database company"
matches = re.findall('‘‘(.*?)’’ (.*?)(?= ‘‘|$)', inp)
companyList = [row[0] for row in matches]
descriptionList = [row[1] for row in matches]
print(companyList)
print(descriptionList)

这打印:

['Apple', 'Microsoft', 'Oracle']
['It is create by Steve Jobs (He was fired and get hired)',
 'Bill Gates was the richest man in the world', 'It is a database company']

推荐阅读