python - 用排序索引替换熊猫列
问题描述
我有一个示例 DF,试图用升序索引替换列值列表:
东风:
df = pd.DataFrame(np.random.randint(0,10,size=(7,3)),columns=["a","b","c"])
df["d1"]=["Apple","Mango","Apple","Mango","Mango","Mango","Apple"]
df["d2"]=["Orange","lemon","lemon","Orange","lemon","Orange","lemon"]
df["date"] = ["2002-01-01","2002-01-01","2002-01-01","2002-01-01","2002-02-01","2002-02-01","2002-02-01"]
df["date"] = pd.to_datetime(df["date"])
a b c d1 d2 date
0 2 7 9 Apple Orange 2002-01-01
1 6 0 9 Mango lemon 2002-01-01
2 8 0 0 Apple lemon 2002-01-01
3 4 4 4 Mango Orange 2002-01-01
4 5 0 8 Mango lemon 2002-02-01
5 6 1 6 Mango Orange 2002-02-01
6 7 2 7 Apple lemon 2002-02-01
步骤1:
Group the DF by "date" column, sample group on "2002-01-01"
a b c d1 d2 date
0 2 7 9 Apple Orange 2002-01-01
1 6 0 9 Mango lemon 2002-01-01
2 8 0 0 Apple lemon 2002-01-01
3 4 4 4 Mango Orange 2002-01-01
第2步:
在该组中,将列的值替换为["d1","d2"]
基于 的排序平均值的索引(不是 DF 索引)c
。
例如在上面的组中 mean(c, d1="Apple") = [9+0]/2 => 4.5
,
mean(c, d1="Mango") = [9+4]/2 => 6.5
所以ascending sorted index
是Apple:0
和Mango:1
所以列的值d1
将被替换如下:
a b c d1 d2 date
0 2 7 9 0 Orange 2002-01-01
1 6 0 9 1 lemon 2002-01-01
2 8 0 0 0 lemon 2002-01-01
3 4 4 4 1 Orange 2002-01-01
将此应用于整个df
. 我有一种遍历组和每一行的蛮力方法,任何关于更pandas
基础的解决方案的建议都将有助于提高效率。
解决方案
这是您在 d1 列中寻找的内容吗?您也可以对 d2 应用一些类似的技术。虽然它不是最优雅的解决方案。
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,10,size=(7,3)),columns=["a","b","c"])
df["d1"]=["Apple","Mango","Apple","Mango","Mango","Mango","Apple"]
df["d2"]=["Orange","lemon","lemon","Orange","lemon","Orange","lemon"]
df["date"] = ["2002-01-01","2002-01-01","2002-01-01","2002-01-01","2002-02-01","2002-02-01","2002-02-01"]
df["date"] = pd.to_datetime(df["date"])
df['mean_value'] = df.groupby(['date', 'd1'])['c'].transform(lambda x: np.mean(x))
df['rank_value'] = (df.groupby(['date'])['mean_value'].rank(ascending=True, method='dense') - 1).astype(int)
df['d1'] = df['rank_value']
df.drop(labels=['rank_value', 'mean_value'], axis=1, inplace=True)
df
a b c d1 d2 date
0 3 1 4 1 Orange 2002-01-01
1 9 7 5 0 lemon 2002-01-01
2 9 9 5 1 lemon 2002-01-01
3 8 1 2 0 Orange 2002-01-01
4 8 0 1 0 lemon 2002-02-01
5 1 8 3 0 Orange 2002-02-01
6 8 0 4 1 lemon 2002-02-01
推荐阅读
- c# - API - XML 没有得到正确的格式(WebAPI 控制器)
- oauth-2.0 - facebook的移动身份验证登录流程
- android - GooglePlayServices 的 api isUserResolvableError 在不是可解决错误时也返回 true?
- c++ - “WCHAR **”类型的参数与“const wchar_t *”类型的参数不兼容
- c++ - 使用 CDT 插件在 Eclipse 中构建 C/C++ 应用程序
- angular - 有条件地防止值在 mat-select 中改变
- javascript - 等待反应路由器history.goBack
- kubernetes - Digital Ocean Kubernetes Let's Encrypt 等待 HTTP-01 挑战传播:未能执行自检
- swift - 如何让谷歌地图标记激活触摸拖动而不是长触摸
- python - 问题:TypeError:“float”对象不可下标