首页 > 解决方案 > 使用正则表达式从熊猫数据框中的单元格中提取日期

问题描述

我有以下数据框

column 1   Description                          Extracted Data
date       January 15,2020 is important day

我想得到以下结果

column 1   Description                          Extracted Data
date       January 15,2020 is important day     January 15,2020

df.loc[df['column 1']=='date','Extracted Data']=df['Description'].str.extract(r'((January)|[/. ])|(\d{1,2}|[/., ]|\d{4})')

但我没有得到想要的结果。相反,我得到了所有 NaN 值的数据框。我怎样才能解决这个问题?

标签: pythonregexpandas

解决方案


使用多点.*和数字。

import pandas as pd

df = pd.DataFrame({'column 1': ['date'], 'Description': ['January 15,2020 is important day']})
df['Extracted Data'] = df['Description'].str.extract(r'(.*,\d{4})')

输出:

  column 1                       Description   Extracted Data
0     date  January 15,2020 is important day  January 15,2020

推荐阅读