首页 > 解决方案 > sort dataframe with dates as column headers in pandas

问题描述

My dates have to be in water years and I wanted to find a way where I have the column start with date 09/30/1899_24:00 and end with date 9/30/1999_24:00.

enter image description here

Initially I had it like this (picture below) but when I did the dataframe pivot it messed up the order. enter image description here

Here is a snip of my code

    sim = pd.read_csv(headout,parse_dates=True, index_col='date')
    sim['Layer'] = sim.groupby('date').cumcount() + 1
    sim['Layer'] = 'L' + sim['Layer'].astype(str)
    sim = sim.pivot(index = None , columns = 'Layer').T
    sim = sim.reset_index() 
    sim = sim.rename(columns={"level_0": "NodeID"})
    sim["NodeID"]= sim['NodeID'].astype('int64')
    sim['gse'] = sim['NodeID'].map(sta.set_index(['NodeID'])['GSE'])

标签: pythonpandassortingdateheader

解决方案


问题是那24:00不是一个有效的时间

  • 如果您不将日期列转换为有效的日期时间,则 python 会将列视为字符串。
    • 这将使执行任何类型的基于时间的分析变得非常困难
    • 然后,列的顺序将按数字顺序排列,如下所示:'09/30/1899_24:00', '10/31/1899_24:00', '11/30/1898_24:00', '11/30/1899_24:00'
    • 注意,11/30/1898是之前11/30/1899
  • 替换24:0023:59
import pandas as pd

# dataframe
df = pd.DataFrame({'date': ['09/30/1899_24:00', '09/30/1899_24:00', '09/30/1899_24:00', '09/30/1899_24:00', '10/31/1899_24:00',
                            '10/31/1899_24:00', '10/31/1899_24:00', '10/31/1899_24:00', '11/30/1899_24:00', '11/30/1899_24:00']})

|    | date             |
|---:|:-----------------|
|  0 | 09/30/1899_24:00 |
|  1 | 09/30/1899_24:00 |
|  2 | 09/30/1899_24:00 |
|  3 | 09/30/1899_24:00 |
|  4 | 10/31/1899_24:00 |
|  5 | 10/31/1899_24:00 |
|  6 | 10/31/1899_24:00 |
|  7 | 10/31/1899_24:00 |
|  8 | 11/30/1899_24:00 |
|  9 | 11/30/1899_24:00 |

# replace 24:00
df.date = df.date.str.replace('24:00', '23:59')

# formate as datetime
df.date = pd.to_datetime(df.date, format='%m/%d/%Y_%H:%M')


# final
                 date
0 1899-09-30 23:59:00
1 1899-09-30 23:59:00
2 1899-09-30 23:59:00
3 1899-09-30 23:59:00
4 1899-10-31 23:59:00
5 1899-10-31 23:59:00
6 1899-10-31 23:59:00
7 1899-10-31 23:59:00
8 1899-11-30 23:59:00
9 1899-11-30 23:59:00

删除所有时间组件

df.date = df.date.str.replace('_24:00', '')
df.date = pd.to_datetime(df.date, format='%m/%d/%Y')

        date
0 1899-09-30
1 1899-09-30
2 1899-09-30
3 1899-09-30
4 1899-10-31
5 1899-10-31
6 1899-10-31
7 1899-10-31
8 1899-11-30
9 1899-11-30

推荐阅读