首页 > 解决方案 > 在Python中的数据框变量中的两个字符串之间提取一些字符串

问题描述

我是 Python 新手,知识不多,需要帮助解决我现在遇到的问题'和'相应地',我尝试了下面的方法,但得到一个空白输出

start = 'to notify'
end = 'accordingly'
data_1['match'] = data_1['Issue'].apply(lambda x: "".join(x for x in x.split() if re.search(('%s(.*)%s' % (start, end)),x)))

我也尝试了 re.findall 但它询问字符串或字节之类的对象,我试图将变量从对象转换为字符串,但它甚至没有发生。如果有人可以帮助我解决这些问题,那将非常有帮助......

标签: pythonstringextract

解决方案


我在阅读您的代码时遇到了一点问题,但是这个片段应该符合我的理解(获取开始和结束字符串之间的文本)

import pandas as pd
import re

start = 'to notify'
end = 'accordingly'

# I created an auxiliary function to better handle the errors
# when the patern start - text - end is not found
def extract_between(x, start, end):
    try:
        return re.match(pattern=r'.*{}(.*){}.*'.format(start, end), string=x).group(1)
    except AttributeError:
        return None

# This is just an example, if it does not work for your porpoise please share some data
df = pd.DataFrame([('to notify TEXT accordingly'), ('this should not match')], columns=['issue'])
df['issue'] = df['issue'].apply(extract_between, **{'start': start, 'end': end})

print(df['issue'])

推荐阅读