python-3.x - Sort pandas dataframe by date in day/month/year format
问题描述
I am trying to parse data from a csv file, sort them by date and write the sorted dataframe in a new csv file.
Say we have a very simple csv file with date entries following the pattern day/month/year:
Date,Reference
15/11/2020,'001'
02/11/2020,'002'
10/11/2020,'003'
26/11/2020,'004'
23/10/2020,'005'
I read the csv into a Pandas dataframe. When I attempt to order the dataframe based on the dates in ascending order I expect the data to be ordered as follows:
23/10/2020,'005'
02/11/2020,'002'
10/11/2020,'003'
15/11/2020,'001'
26/11/2020,'004'
Sadly, this is not what I get.
If I attempt to convert the date
to datetime
and then sort, then some date entries are converted to the month/day/year (e.g. 2020-10-23 instead of 2020-23-10) which messes up the ordering:
date reference
2020-02-11 '002'
2020-10-11 '003'
2020-10-23 '005'
2020-11-15 '001'
2020-11-26 '004'
If I sort without converting to datetime
, then the ordering is also wrong:
date reference
02/11/2020 '002'
10/11/2020 '003'
15/11/2020 '001'
23/10/2020 '005'
26/11/2020 '004'
Here is my code:
import pandas as pd
df = pd.read_csv('order_dates.csv',
header=0,
names=['date', 'reference'],
dayfirst=True)
df.reset_index(drop=True, inplace=True)
# df.date = pd.to_datetime(df.date)
df.sort_val
df.sort_values(by='date', ascending=True, inplace=True)
print(df)
df.to_csv('sorted.csv')
Why is sorting by date so hard? Can someone explain why the above sorting attempts fail?
Ideally, I would like the sorted.csv
to have the date entries in the day/month/year format.
解决方案
您可以做的是datetime
在读取 csv 文件时指定格式。为此,请尝试:
>>> df = pd.read_csv('filename.csv', parse_dates=['Date'],infer_datetime_format='%d/%m/%Y').sort_values(by='Date')
这将从 csv 读取您的日期,并为您提供对日期进行排序的输出。
Date Reference
4 2020-10-23 '005
1 2020-11-02 '002'
2 2020-11-10 '003'
0 2020-11-15 '001'
3 2020-11-26 '004'
现在剩下的就是简单地将格式更改为所需的格式
>>> df['Date'] = df['Date'].dt.strftime('%d/%m/%Y')
但是请记住,这会将Date
背面更改为string
(object)
>>> df
Date Reference
4 23/10/2020 '005
1 02/11/2020 '002'
2 10/11/2020 '003'
0 15/11/2020 '001'
3 26/11/2020 '004'
>>> df.dtypes
Date object
推荐阅读
- ios - 在Objective-C中覆盖.h文件中的静态常量int?
- keras - Keras:如何实现 fcn 结构?
- javascript - JavaScript Bind (this) 不访问类
- curl - 如何使用 LINE Notify 发送多张图片
- javascript - useState 似乎无法正确映射数组
- c++ - 如何访问 std::sub_match 中的正则表达式搜索结果?
- django - 更新 Web 应用程序的最佳工作流程是什么
- python - 什么是 tensorflow.python.ops
- vue.js - 更新缓存 vue-apollo 后缺少字段
- assembly - 为什么 GCC 选择 dword movl 将长移位计数复制到 CL?