首页 > 解决方案 > 在 ProperCase 和换行符中提取两个字符串之间的单词

问题描述

我有以下数据框对不起混乱(它是从网站上刮下来的)

df = pd.DataFrame({'TEXT': ['Product Type:\n \n \n Fish\n \n \n \n \n Variety:\n \n \n Salmon\n \n \n \n \n Style:\n \n \n FROZEN\n \n \n \n \n Shape:\n \n \n Sliced\n \n \n \n \n Part:\n \n \n Fillet\n \n \n','Product Type:\n \n \n Fish\n \n \n \n \n Variety:\n \n \n Salmon\n \n \n \n \n Style:\n \n \n FROZEN\n \n \n \n \n Freezing Process:\n \n \n IQF\n \n \n \n \n Shape:\n \n \n Block\n \n \n \n \n Part:\n \n \n Body\n \n \n \n \n Certification:\n \n \n BRC, FDA, HACCP\n']})

我想提取不同的参数。例如,我希望输出为

df['ProductType']="Fish"

我试过这个:

df['ProductType']=df['TEXT'].str.extract("(?=Type\:)(.*)(?=Variety\:)").astype(str)

但它只是输出NaN。对不起,如果它太明显了,我今天从正则表达式开始

标签: pythonregexpandas

解决方案


推荐阅读