首页 > 解决方案 > Sort pandas dataframe by date in day/month/year format

问题描述

I am trying to parse data from a csv file, sort them by date and write the sorted dataframe in a new csv file.

Say we have a very simple csv file with date entries following the pattern day/month/year:

Date,Reference
15/11/2020,'001'
02/11/2020,'002'
10/11/2020,'003'
26/11/2020,'004'
23/10/2020,'005'

I read the csv into a Pandas dataframe. When I attempt to order the dataframe based on the dates in ascending order I expect the data to be ordered as follows:

23/10/2020,'005'
02/11/2020,'002'
10/11/2020,'003'
15/11/2020,'001'
26/11/2020,'004'

Sadly, this is not what I get.

If I attempt to convert the date to datetime and then sort, then some date entries are converted to the month/day/year (e.g. 2020-10-23 instead of 2020-23-10) which messes up the ordering:

      date     reference
2020-02-11         '002'
2020-10-11         '003'
2020-10-23         '005'
2020-11-15         '001'
2020-11-26         '004'

If I sort without converting to datetime, then the ordering is also wrong:

      date     reference
02/11/2020         '002'
10/11/2020         '003'
15/11/2020         '001'
23/10/2020         '005'
26/11/2020         '004'

Here is my code:

import pandas as pd

df = pd.read_csv('order_dates.csv',
                 header=0,
                 names=['date', 'reference'],
                 dayfirst=True)

df.reset_index(drop=True, inplace=True)
# df.date = pd.to_datetime(df.date)
df.sort_val

df.sort_values(by='date', ascending=True, inplace=True)
print(df)
df.to_csv('sorted.csv')

Why is sorting by date so hard? Can someone explain why the above sorting attempts fail? Ideally, I would like the sorted.csv to have the date entries in the day/month/year format.

标签: python-3.xpandasdataframe

解决方案


您可以做的是datetime在读取 csv 文件时指定格式。为此,请尝试:

>>> df = pd.read_csv('filename.csv', parse_dates=['Date'],infer_datetime_format='%d/%m/%Y').sort_values(by='Date')

这将从 csv 读取您的日期,并为您提供对日期进行排序的输出。

        Date Reference
4 2020-10-23      '005
1 2020-11-02     '002'
2 2020-11-10     '003'
0 2020-11-15     '001'
3 2020-11-26     '004'

现在剩下的就是简单地将格式更改为所需的格式

>>> df['Date'] = df['Date'].dt.strftime('%d/%m/%Y')

但是请记住,这会将Date背面更改为string(object)

>>> df
         Date Reference
4  23/10/2020      '005
1  02/11/2020     '002'
2  10/11/2020     '003'
0  15/11/2020     '001'
3  26/11/2020     '004'

>>> df.dtypes
Date         object

推荐阅读