python - 按索引行值与前一行值比较数据帧组
问题描述
只是想知道是否有以下问题的简单解决方案。采取以下设置
import datetime
import pandas
data = [
{"date": datetime.date(2020, 1, 1), "ticker": "ticker-1", "internal_id": "T1", "score_1": 10.0, "score_2": 20.0},
{"date": datetime.date(2020, 1, 5), "ticker": "ticker-1", "internal_id": "T1", "score_1": 20.0, "score_2": 20.0},
{"date": datetime.date(2020, 1, 8), "ticker": "ticker-1", "internal_id": "T1", "score_1": 20.0, "score_2": 20.0},
{"date": datetime.date(2020, 1, 10), "ticker": "ticker-1", "internal_id": "T1-A", "score_1": 10.0, "score_2": 30.0},
{"date": datetime.date(2020, 1, 2), "ticker": "ticker-2", "internal_id": "T2", "score_1": 10.0, "score_2": 20.0},
{"date": datetime.date(2020, 1, 4), "ticker": "ticker-2", "internal_id": "T2", "score_1": 10.0, "score_2": 20.0},
{"date": datetime.date(2020, 1, 9), "ticker": "ticker-2", "internal_id": "T2", "score_1": 30.0, "score_2": 20.0},
]
df = pandas.DataFrame(data)
df = df.set_index(["date", "ticker"])
df['product'] = df.index.get_level_values('ticker')
df['date'] = df.index.get_level_values('date')
我需要能够比较某些列(internal_id
, score_1
, score_2
)的最后一个值并将它们与上一列进行比较ticker
,如果它与前一列的值不同,则输出它,否则显示None
/ NaN
。
例如,按照上面的例子,这是我想要的输出:
output = [
{"date": datetime.date(2020, 1, 1), "ticker": "ticker-1", "internal_id": "T1", "score_1": 10.0, "score_2": 20.0},
{"date": datetime.date(2020, 1, 5), "ticker": "ticker-1", "internal_id": None, "score_1": 20.0, "score_2": None},
{"date": datetime.date(2020, 1, 8), "ticker": "ticker-1", "internal_id": None, "score_1": None, "score_2": None},
{"date": datetime.date(2020, 1, 10), "ticker": "ticker-1", "internal_id": "T1-A", "score_1": 10.0, "score_2": 30.0},
{"date": datetime.date(2020, 1, 2), "ticker": "ticker-2", "internal_id": "T2", "score_1": None, "score_2": 20.0},
{"date": datetime.date(2020, 1, 4), "ticker": "ticker-2", "internal_id": None, "score_1": None, "score_2": None},
{"date": datetime.date(2020, 1, 9), "ticker": "ticker-2", "internal_id": None, "score_1": 30.0, "score_2": None},
]
如您所见,我需要对代码进行分组,然后与之前的日期列值进行比较。这需要跨字符串以及整数/浮点数工作。
解决方案
与和DataFrame.mask
的比较值一起使用:DataFrameGroupBy.shift
DataFrame.eq
df = pandas.DataFrame(data)
df = df.mask(df.groupby('ticker').shift().eq(df))
print (df)
date ticker internal_id score_1 score_2
0 2020-01-01 ticker-1 T1 10.0 20.0
1 2020-01-05 ticker-1 NaN 20.0 NaN
2 2020-01-08 ticker-1 NaN NaN NaN
3 2020-01-10 ticker-1 T1-A 10.0 30.0
4 2020-01-02 ticker-2 T2 10.0 20.0
5 2020-01-04 ticker-2 NaN NaN NaN
6 2020-01-09 ticker-2 NaN 30.0 NaN
推荐阅读
- python - 如何从一组值中删除相似的组合
- php - 帮助理解比较 PHP 中两个 Unix 时间戳之间的差异
- reactjs - 奇怪:Eslint 为使用的变量提高 no-unused-vars
- mysql - 如何在 sql 的 Left Join 中仅显示表的左侧部分(不相交部分)?
- reactjs - 如何分叉为自定义构建创建 React 应用程序
- javascript - 解析元素标题中的 html 元素
- c++ - 如何使用 C++ 中的非文字变量在数字中显示单引号和两个引号?假设 6'2" 英尺
- wordpress - 如何删除添加到 Woo-commerce 中产品标题的短语
- ionic-framework - 如何执行此异步功能?
- javascript - 单击提交按钮后不出现确认框(Bootbox)