首页 > 解决方案 > parse multiple date format pandas

问题描述

I 've got stuck with the following format:

0   2001-12-25  
1   2002-9-27   
2   2001-2-24   
3   2001-5-3    
4   200510
5   20078

What I need is the date in a format %Y-%m

What I tried was

 def parse(date):
     if len(date)<=5:
         return "{}-{}".format(date[:4], date[4:5], date[5:])
     else:
         pass

  df['Date']= parse(df['Date'])

However, I only succeeded in parse 20078 to 2007-8, the format like 2001-12-25 appeared as None. So, how can I do it? Thank you!

标签: pythonpandasdate

解决方案


we can use the pd.to_datetime and use errors='coerce' to parse the dates in steps.

assuming your column is called date

s = pd.to_datetime(df['date'],errors='coerce',format='%Y-%m-%d')

s = s.fillna(pd.to_datetime(df['date'],format='%Y%m',errors='coerce'))

df['date_fixed'] = s

print(df)

         date date_fixed
0  2001-12-25 2001-12-25
1   2002-9-27 2002-09-27
2   2001-2-24 2001-02-24
3    2001-5-3 2001-05-03
4      200510 2005-10-01
5       20078 2007-08-01

In steps,

first we cast the regular datetimes to a new series called s

s = pd.to_datetime(df['date'],errors='coerce',format='%Y-%m-%d')

print(s)

0   2001-12-25
1   2002-09-27
2   2001-02-24
3   2001-05-03
4          NaT
5          NaT
Name: date, dtype: datetime64[ns]

as you can can see we have two NaT which are null datetime values in our series, these correspond with your datetimes which are missing a day,

we then reapply the same datetime method but with the opposite format, and apply those to the missing values of s

s = s.fillna(pd.to_datetime(df['date'],format='%Y%m',errors='coerce'))

print(s)


0   2001-12-25
1   2002-09-27
2   2001-02-24
3   2001-05-03
4   2005-10-01
5   2007-08-01

then we re-assign to your dataframe.


推荐阅读