python - parse multiple date format pandas
问题描述
I 've got stuck with the following format:
0 2001-12-25
1 2002-9-27
2 2001-2-24
3 2001-5-3
4 200510
5 20078
What I need is the date in a format %Y-%m
What I tried was
def parse(date):
if len(date)<=5:
return "{}-{}".format(date[:4], date[4:5], date[5:])
else:
pass
df['Date']= parse(df['Date'])
However, I only succeeded in parse 20078 to 2007-8, the format like 2001-12-25 appeared as None. So, how can I do it? Thank you!
解决方案
we can use the pd.to_datetime
and use errors='coerce'
to parse the dates in steps.
assuming your column is called date
s = pd.to_datetime(df['date'],errors='coerce',format='%Y-%m-%d')
s = s.fillna(pd.to_datetime(df['date'],format='%Y%m',errors='coerce'))
df['date_fixed'] = s
print(df)
date date_fixed
0 2001-12-25 2001-12-25
1 2002-9-27 2002-09-27
2 2001-2-24 2001-02-24
3 2001-5-3 2001-05-03
4 200510 2005-10-01
5 20078 2007-08-01
In steps,
first we cast the regular datetimes to a new series called s
s = pd.to_datetime(df['date'],errors='coerce',format='%Y-%m-%d')
print(s)
0 2001-12-25
1 2002-09-27
2 2001-02-24
3 2001-05-03
4 NaT
5 NaT
Name: date, dtype: datetime64[ns]
as you can can see we have two NaT
which are null datetime values in our series, these correspond with your datetimes which are missing a day,
we then reapply the same datetime
method but with the opposite format, and apply those to the missing values of s
s = s.fillna(pd.to_datetime(df['date'],format='%Y%m',errors='coerce'))
print(s)
0 2001-12-25
1 2002-09-27
2 2001-02-24
3 2001-05-03
4 2005-10-01
5 2007-08-01
then we re-assign to your dataframe.
推荐阅读
- nginx - 类型:未经授权的详细信息:未找到来自 404 文件的无效响应
- html - 如何将所需的星号和控件放在同一行 (HTML)?
- node.js - 错误相对路径全局包nodejs
- ios - 如何在 uinavigationbar 中创建收缩动画?
- flutter - RangeError(索引):无效值:有效值范围为空:0 在获取数据之前?
- sql - 使用索引进行 JOIN 查询优化
- javascript - 单击引导表中的选定行时防止取消选择
- python - Pandas 不按索引更新/合并数据帧
- c - 我想打印没有的阶乘。使用递归,但它在 C 中不起作用
- apache-spark - 如何在 Apache Spark 中将 JSON 文件转换为常规表 DataFrame