dataframe - dataframe to_datetime not reading dates correctly
问题描述
Part of an excel file is as below.
Action Date1 Action Date2
15.06.2018 - 06:06:30 17.06.2018 - 15:52:35
09.07.2018 - 10:12:13 09.07.2018 - 11:39:42
09.08.2018 - 15:21:45
10.07.2018 - 10:00:13 00.00.0000 - 00:00:00
......
I want to extract the latest action dates and I have the following codes
dates = df.fillna(axis=1, method='ffill')
df['Latest date'] = dates[dates.columns[-1]]
But this codes returns the correct dates as below.
2018-06-17 15:52:35
2018-09-07 11:39:42
2018-09-08 15:21:45
2018-10-07 10:00:13
.....
I tried
df['Latest date']=pd.to_datetime(df['Latest date'],format="%d%m%Y")
but it still gives me the same outcome.
解决方案
使用参数format
,检查http://strftime.org/
:
df['Latest date']=pd.to_datetime(df['Latest date'],format="%d.%m.%Y - %H:%M:%S")
或参数dayfirst=True
:
df['Latest date']=pd.to_datetime(df['Latest date'], dayfirst=True)
print (df)
Latest date
0 2018-06-15 06:06:30
1 2018-07-16 08:53:49
2 2018-07-09 10:12:13
3 2018-08-09 15:21:45
编辑:添加参数errors='coerce'
以将不可解析的值转换为NaT
:
df = df.apply(lambda x: pd.to_datetime(x,format="%d.%m.%Y - %H:%M:%S", errors='coerce'))
dates = df.ffill(axis=1)
df['Latest date'] = dates.iloc[:, -1]
print (df)
Action Date1 Action Date2 Latest date
0 2018-06-15 06:06:30 2018-06-17 15:52:35 2018-06-17 15:52:35
1 2018-07-09 10:12:13 2018-07-09 11:39:42 2018-07-09 11:39:42
2 2018-08-09 15:21:45 NaT 2018-08-09 15:21:45
3 2018-07-10 10:00:13 NaT 2018-07-10 10:00:13
推荐阅读
- java - GridBagLayout 和 JButton
- python - How to break cycle after a specific number of iterations in itertools?
- python - 如何合并这两个数据框?
- arrays - 我如何将 useState 从使用数组放置重新分配给他们的 id?
- html - 如何在 Angular 组件中定位 Div
- r - R 文件上传的闪亮运行功能
- sql - 在 postgres 中以高性能方式获取最大值和相应的列
- html - 背景颜色滑块
- unit-testing - 在CDK测试中,toHaveResource和toHaveResourceLike有什么区别?
- python-3.x - Python 日志记录;[1] 无法设置日志文件目录路径;和 [2] 日期时间格式问题