python - 基于自定义层次结构对 pandas 数据帧进行排序和提取

问题描述

假设我有如下数据框：

import pandas as pd
df = pd.DataFrame({
    'brand': ['Yum_Yum', 'Yum_Yum', 'Indomie', 'Indomie', 'Indomie', 'Boom_Boom', 'Boom_Boom'],
    'style': ['cup', 'box', 'cup', 'pack', 'pack', 'pack', 'pack'],
    'rating': [4, 4, 3.5, 15, 5, 2.3, 0]
})

我将层次结构定义为#hierarchy --> 1 = pack, 2 = cup, 3= box其中 pack 是最高优先级，而 box 是最低优先级。我只想在品牌列中保留每个唯一值的一个实例。根据我的层次结构，此实例应具有最高优先级值。如果有平局，则可以随机拆分。

所以生成的数据框看起来像这样：

brand   style   rating
Yum_Yum cup 4.0
Indomie pack    5.0
Boom_Boom   box 2.3

标签： pythonpandasdataframe

尝试将样式映射到优先级、排序和删除重复项：

priority = {'cup':2, 'box':3, 'pack':1}
df['style_rank'] = df['style'].map(priority)
df.sort_values('style_rank').drop_duplicates('brand')

输出：

       brand style  rating  style_rank
3    Indomie  pack    15.0           1
5  Boom_Boom  pack     2.3           1
0    Yum_Yum   cup     4.0           2

python - 基于自定义层次结构对 pandas 数据帧进行排序和提取

问题描述

解决方案

推荐阅读