首页 > 解决方案 > 如何在数据透视图中对工作日进行排序?

问题描述

在此处输入图像描述

你好,

我正在尝试对上述数据透视表数据框中的工作日进行排序。最初,我使用以下代码对工作日列进行排序:

df4 = df3.sort_values(by = "day_of_week")

但我得到的结果如下:

在此处输入图像描述

然后我使用“有序分类”进行排序。但是通过使用有序分类我得到了一个关键错误:'day_of_week'

有序分类代码:

c_s = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
df3['day_of_week'] = pd.Categorical(df3['day_of_week'], categories=c_s, ordered=True)
df4 = df3.sort_values(by= "day_of_week")

你能建议任何功能吗?因此,可以按正确的顺序对这些天进行排序,而无需删除任何一天。

数据框:

,,,mean,amin,amax,median,count_nonzero
,,,ride_length_min,ride_length_min,ride_length_min,ride_length_min,ride_id
member_casual,rideable_type,day_of_week,,,,,
casual,docked_bike,Friday,48.62283921068028,0.0,24703.0,22.0,18801
casual,docked_bike,Monday,52.296237045280655,0.0,37127.0,24.0,14377
casual,docked_bike,Saturday,50.72260119602127,0.0,36082.0,25.0,33277
casual,docked_bike,Sunday,55.521421616358325,0.0,32521.0,27.0,32864
casual,docked_bike,Thursday,59.29000334656021,0.0,38537.0,24.0,20917
casual,docked_bike,Tuesday,45.6941467880028,0.0,12181.0,24.0,18571
casual,docked_bike,Wednesday,45.113569228835395,0.0,9936.0,23.0,15911
member,docked_bike,Friday,17.73414485696896,0.0,1548.0,13.0,24645
member,docked_bike,Monday,17.079518285663685,0.0,1500.0,13.0,22586
member,docked_bike,Saturday,20.22856960896701,0.0,9922.0,16.0,31404
member,docked_bike,Sunday,20.08236931950015,0.0,1500.0,16.0,30169
member,docked_bike,Thursday,18.38059068748675,0.0,1500.0,14.0,28306
member,docked_bike,Tuesday,17.577247344793605,0.0,1500.0,14.0,27399
member,docked_bike,Wednesday,19.15993775759105,0.0,41271.0,14.0,23778

提前致谢

标签: pythonpandasdataframepivot-table

解决方案


注意:这个过程在pivot. 但是,我们可以按照Sorting pandas dataframe by weekdays 的建议设置一周中的几天的分类,并将索引级别更改为新的 CategoricalDtype pandas: convert index type in multiindex dataframe

# Setup Categorical Dtype
c_s = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday',
       'Sunday']
cat_dtype = pd.CategoricalDtype(categories=c_s, ordered=True)
# Update Index Level Type
df3.index = df3.index.set_levels(
    df3.index.levels[-1].astype(cat_dtype), level=-1
)
# Sort values
df4 = df3.sort_values('day_of_week')

df4

                                                   mean            amin            amax          median count_nonzero
                                        ride_length_min ride_length_min ride_length_min ride_length_min       ride_id
member_casual rideable_type day_of_week                                                                              
casual        docked_bike   Monday            52.296237             0.0         37127.0            24.0         14377
member        docked_bike   Monday            17.079518             0.0          1500.0            13.0         22586
casual        docked_bike   Tuesday           45.694147             0.0         12181.0            24.0         18571
member        docked_bike   Tuesday           17.577247             0.0          1500.0            14.0         27399
casual        docked_bike   Wednesday         45.113569             0.0          9936.0            23.0         15911
member        docked_bike   Wednesday         19.159938             0.0         41271.0            14.0         23778
casual        docked_bike   Thursday          59.290003             0.0         38537.0            24.0         20917
member        docked_bike   Thursday          18.380591             0.0          1500.0            14.0         28306
casual        docked_bike   Friday            48.622839             0.0         24703.0            22.0         18801
member        docked_bike   Friday            17.734145             0.0          1548.0            13.0         24645
casual        docked_bike   Saturday          50.722601             0.0         36082.0            25.0         33277
member        docked_bike   Saturday          20.228570             0.0          9922.0            16.0         31404
casual        docked_bike   Sunday            55.521422             0.0         32521.0            27.0         32864
member        docked_bike   Sunday            20.082369             0.0          1500.0            16.0         30169

请注意,设置CategoricalDtype 枢轴要容易得多,因为与 MultiIndex dtype 的特定级别相比,对更改列 dtype 的支持更多:

import pandas as pd

# Some Small Sample Data
df3 = pd.DataFrame({'member_casual': ['casual', 'member', 'member'],
                    'rideable_type': 'docked_bike',
                    'day_of_week': ['Wednesday', 'Tuesday', 'Monday'],
                    'a': 'mean',
                    'b': 'ride_length_min',
                    'c': 120})
# Setup Categorical Dtype
c_s = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday',
       'Sunday']
# Change ColumnDtype
df3['day_of_week'] = pd.Categorical(df3['day_of_week'],
                                    categories=c_s,
                                    ordered=True)
# Pivot and Sort
df4 = (
    df3.pivot(index=['member_casual', 'rideable_type', 'day_of_week'],
              columns=['a', 'b'],
              values='c')
        .sort_values('day_of_week')
        .rename_axis(columns=[None, None])
)
print(df4)

样品df4

                                                   mean
                                        ride_length_min
member_casual rideable_type day_of_week                
member        docked_bike   Monday                  120
                            Tuesday                 120
casual        docked_bike   Wednesday               120

DataFrame 设置和导入:

import pandas as pd

df3 = pd.DataFrame([
    ['casual', 'docked_bike', 'Friday', 48.62283921068028, 0.0, 24703.0, 22.0,
     18801],
    ['casual', 'docked_bike', 'Monday', 52.296237045280655, 0.0, 37127.0, 24.0,
     14377],
    ['casual', 'docked_bike', 'Saturday', 50.72260119602127, 0.0, 36082.0, 25.0,
     33277],
    ['casual', 'docked_bike', 'Sunday', 55.521421616358325, 0.0, 32521.0, 27.0,
     32864],
    ['casual', 'docked_bike', 'Thursday', 59.29000334656021, 0.0, 38537.0, 24.0,
     20917],
    ['casual', 'docked_bike', 'Tuesday', 45.6941467880028, 0.0, 12181.0, 24.0,
     18571],
    ['casual', 'docked_bike', 'Wednesday', 45.1135692288354, 0.0, 9936.0, 23.0,
     15911],
    ['member', 'docked_bike', 'Friday', 17.73414485696896, 0.0, 1548.0, 13.0,
     24645],
    ['member', 'docked_bike', 'Monday', 17.079518285663685, 0.0, 1500.0, 13.0,
     22586],
    ['member', 'docked_bike', 'Saturday', 20.22856960896701, 0.0, 9922.0, 16.0,
     31404],
    ['member', 'docked_bike', 'Sunday', 20.08236931950015, 0.0, 1500.0, 16.0,
     30169],
    ['member', 'docked_bike', 'Thursday', 18.38059068748675, 0.0, 1500.0, 14.0,
     28306],
    ['member', 'docked_bike', 'Tuesday', 17.577247344793605, 0.0, 1500.0, 14.0,
     27399],
    ['member', 'docked_bike', 'Wednesday', 19.15993775759105, 0.0, 41271.0,
     14.0, 23778]
]).set_index([0, 1, 2])
df3.index.names = ['member_casual', 'rideable_type', 'day_of_week']
df3.columns = pd.MultiIndex.from_arrays([
    ['mean', 'amin', 'amax', 'median', 'count_nonzero'],
    ['ride_length_min', 'ride_length_min', 'ride_length_min', 'ride_length_min',
     'ride_id']
])

推荐阅读