首页 > 解决方案 > 分配一个新列以将值项目分类到

问题描述

我使用从 excel 导入数据框

data = pd.read_csv('transaction.csv')

并有一个看起来像这样的数据框

         Date      Time  Transaction           Item
0  2016-10-30  09:58:11            1          water
1  2016-10-30  10:05:34            2   french fries
2  2016-10-30  10:05:34            2       Icecream
3  2016-10-30  10:07:57            3      chocolate
4  2016-10-30  10:07:57            3        Cookies

我创建了一个字典,将每个项目分配给一个食物或饮料类别,如下所示:

Food = ('french fries', 'Icecream', 'chocolate', 'Cookies')
Drink = ('water')
Category = {Food : "Food", Drink : "Drink"}

我想将类别分配给另一列,但它显示为 NaN。我使用了这段代码:

data['Classification'] = data['Item'].map(Category)


         Date      Time  Transaction           Item Food or Drink
0  2016-10-30  09:58:11            1          water           NaN
1  2016-10-30  10:05:34            2   french fries           NaN
2  2016-10-30  10:05:34            2       icecream           NaN
3  2016-10-30  10:07:57            3      chocolate           NaN
4  2016-10-30  10:07:57            3        cookies           NaN

解决此问题的最佳方法是什么?

标签: pythonpandasdictionary

解决方案


为每个类别创建字典dict.fromkeys并将它们合并在一起

Food = ('french fries', 'Icecream', 'chocolate', 'Cookies')
Drink = ('water',)

Category = {**dict.fromkeys(Food, "Food"), **dict.fromkeys(Drink, "Drink")}
print (Category)
{'french fries': 'Food', 'Icecream': 'Food', 
 'chocolate': 'Food', 'Cookies': 'Food', 'water': 'Drink'}

data['Classification'] = data['Item'].map(Category)
print (data)
         Date      Time  Transaction          Item Classification
0  2016-10-30  09:58:11            1         water          Drink
1  2016-10-30  10:05:34            2  french fries           Food
2  2016-10-30  10:05:34            2      Icecream           Food
3  2016-10-30  10:07:57            3     chocolate           Food
4  2016-10-30  10:07:57            3       Cookies           Food

推荐阅读