首页 > 解决方案 > Pandas - groupby ID 输入一个后选择购买的物品

问题描述

我有一个像这样的数据框:

df = pd.DataFrame({"ID": [123214, 123214, 321455, 321455, 234325, 234325, 234325, 234325, 132134, 132134, 132134],
        "DATETIME": ["2020-05-28", "2020-06-12", "2020-01-06", "2020-01-10", "2020-01-11", "2020-02-06", "2020-07-24", "2020-10-14", "2020-03-04", "2020-09-11", "2020-10-17"],
        "CATEGORY": ["computer technology", "early childhood", "early childhood", "shoes and bags", "early childhood", "garden and gardening", "musical instruments", "handmade products", "musical instruments", "early childhood", "beauty"]})

我想:

结果应该是:

321455  "2020-01-10"    "shoes and bags"
234325  "2020-02-06"    "garden and gardening"
132134  "2020-10-17"    "beauty"

Pandas 的 shift 功能是我所需要的,但我无法在分组时使其工作。谢谢!

标签: pythonpandas

解决方案


CATEGORY您可以通过Series.eqwith创建带有 test 的掩码DataFrameGroupBy.shift,将第一个缺失值替换为False并传递给boolean indexing

#if necessary convert to datetimes and sorting
#df['DATETIME'] = pd.to_datetime(df['DATETIME'])
#df = df.sort_values(['ID','DATETIME'])


mask = df['CATEGORY'].eq('early childhood').groupby(df['ID']).shift(fill_value=False)
df = df[mask]
print (df)
        ID    DATETIME              CATEGORY
3   321455  2020-01-10        shoes and bags
5   234325  2020-02-06  garden and gardening
10  132134  2020-10-17                beauty

推荐阅读