python - 对每组熊猫数据框进行排序并保持所需的顺序
问题描述
我有一个如下所示的数据框
df = pd.DataFrame({
"Junk":list("aaaaaabbbcccc"),
"Region":['West','West','West','West','East','East','East','South','South','South','North','North','North'],
"Sales":[1, 3, 4, 2, 4, 2, 5, 7, 9, 7, 5, 9, 5]
})
+------+--------+-------+
| Junk | Region | Sales |
+------+--------+-------+
| a | West | 1 |
| a | West | 3 |
| a | West | 4 |
| a | West | 2 |
| a | East | 4 |
| a | East | 2 |
| b | East | 5 |
| b | South | 7 |
| b | South | 9 |
| c | South | 7 |
| c | North | 5 |
| c | North | 9 |
| c | North | 5 |
+------+--------+-------+
我正在尝试做两件事
- 根据每个区域对数据框进行排序
我可以用下面的代码来实现它
df.sort_values(by = ['Region','Sales'])
+------+--------+-------+
| Junk | Region | Sales |
+------+--------+-------+
| a | East | 2 |
| a | East | 4 |
| b | East | 5 |
| c | North | 5 |
| c | North | 5 |
| c | North | 9 |
| b | South | 7 |
| c | South | 7 |
| b | South | 9 |
| a | West | 1 |
| a | West | 2 |
| a | West | 3 |
| a | West | 4 |
+------+--------+-------+
但我想保留Region
列的顺序。West
应该是首先,然后East
,然后South
,然后North
期望的输出
+--------+----------+---------+
| Junk | Region | Sales |
+--------+----------+---------+
| a | West | 1 |
| a | West | 2 |
| a | West | 3 |
| a | West | 4 |
| a | East | 2 |
| a | East | 4 |
| b | East | 5 |
| b | South | 7 |
| c | South | 7 |
| b | South | 9 |
| c | North | 5 |
| c | North | 5 |
| c | North | 9 |
+--------+----------+---------+
- 我只想对区域进行排序
Region = East
,Region = North
其余区域应该是它们的方式
期望的输出:
+--------+----------+---------+
| Junk | Region | Sales |
+--------+----------+---------+
| a | West | 1 |
| a | West | 3 |
| a | West | 4 |
| a | West | 2 |
| a | East | 2 |
| a | East | 4 |
| b | East | 5 |
| b | South | 7 |
| b | South | 9 |
| c | South | 7 |
| c | North | 5 |
| c | North | 5 |
| c | North | 9 |
+--------+----------+---------+
解决方案
首先创建有序分类列,然后排序:
order = ['West', 'East', 'South', 'North']
df['Region'] = pd.CategoricalIndex(df['Region'], ordered=True, categories=order)
df = df.sort_values(by = ['Region','Sales'])
print (df)
Junk Region Sales
0 a West 1
3 a West 2
1 a West 3
2 a West 4
5 a East 2
4 a East 4
6 b East 5
7 b South 7
9 c South 7
8 b South 9
10 c North 5
12 c North 5
11 c North 9
使用字典的解决方案,map
创建新列,排序,然后删除辅助列:
order = {'West':1, 'East':2, 'South':3, 'North':4}
df = df.assign(tmp=df['Region'].map(order)).sort_values(by = ['tmp','Sales']).drop('tmp', 1)
print (df)
Junk Region Sales
6 a West 1
0 a West 2
7 a West 3
8 a West 4
2 a East 2
1 a East 4
3 b East 5
4 b South 7
9 c South 7
5 b South 9
10 c North 5
12 c North 5
11 c North 9
其次是必须按过滤行排序,但分配 numpy 数组以防止数据对齐:
order = ['West', 'East', 'South', 'North']
df['Region'] = pd.CategoricalIndex(df['Region'], ordered=True, categories=order)
mask = df['Region'].isin(['North', 'East'])
df[mask] = df[mask].sort_values(['Region','Sales']).values
print (df)
Junk Region Sales
0 a West 1
1 a West 3
2 a West 4
3 a West 2
4 a East 2
5 a East 4
6 b East 5
7 b South 7
8 b South 9
9 c South 7
10 c North 5
11 c North 5
12 c North 9
map
选择:
order = {'East':1, 'North':2}
df = df.assign(tmp=df['Region'].map(order))
mask = df['Region'].isin(['North', 'East'])
df[mask] = df[mask].sort_values(['tmp','Sales']).values
df = df.drop('tmp', axis=1)
推荐阅读
- python - 使用逻辑填充缺失数据 Pandas
- c# - 在 ConfigureServices (aspnetcore) 中获取 wwwroot 路径
- r - R - 等待解决的承诺列表
- android - Android - 从 Firebase 数据库中删除“活动”标记
- java - 如何在 Eclipse 中调试 sun.net 内部类?
- java - Junit5 错误。您必须为此 @ParameterizedTest 提供至少一个参数
- javascript - iOS 上的缩放控件不再可见
- django - 如何在 Django 中管理快速增长的 postgres 表?
- encryption - Travis 使用 openssl 解密加密文件失败
- javascript - 如何设置来自 API 的 JSON 数据计数以允许列表中的那么多?