首页 > 解决方案 > 试图在某个城镇的某个商店获得平均消费

问题描述

我有一个交易数据框:

   customer_id         town  amount     category
0           n1     New York   12.50  Book Stores
1           n2     New York    4.49  Book Stores
2           n3     New York   11.70  Book Stores
3           n4     New York   15.00     Cable TV
4           n5     New York    7.00     Cable TV
5           n6     New York    6.00     Cable TV
6           n7     New York   15.00     Cable TV
7           n8     New York    7.00     Cable TV
8           n9     New York    7.00     Cable TV
9          la1  Los Angeles   15.00  Book Stores
10         la2  Los Angeles   15.99  Book Stores
11         la3  Los Angeles   15.00  Book Stores
12         la4  Los Angeles    7.00     Cable TV
13         la5  Los Angeles   15.99     Cable TV
14         la6  Los Angeles   15.00     Cable TV
15         la7  Los Angeles    7.00     Cable TV

两个城镇:纽约和洛杉矶。

两类消费:书店和有线电视。

如您所见,纽约有 9 个人。

纽约只有三笔书店交易,所以我预计书店的平均支出为:($12.50 + $4.49 + $11.70) / 9 people = $3.18

我正在尝试将其转换为 pandas 函数。我尝试了许多不同的groupbysand pivots,但它总是将数据视为只有 3 个人住在纽约。

这是我尝试过的一些方法:

print('\nAttempt 1...\n')
print(new_df.groupby(['town','category'])['amount'].mean())
print('\nAttempt 2...\n')
print(new_df.groupby(['category','town'])['amount'].mean())
print('\nAttempt 2...\n')
print(pd.pivot_table(new_df,index=["category"],values=["amount"],columns=["town"],aggfunc=[np.mean],fill_value=0))


Attempt 1...

town         category   
Los Angeles  Book Stores   15.33
             Cable TV      11.25
New York     Book Stores    9.56
             Cable TV       9.50
Name: amount, dtype: float64

Attempt 2...

category     town       
Book Stores  Los Angeles   15.33
             New York       9.56
Cable TV     Los Angeles   11.25
             New York       9.50
Name: amount, dtype: float64

Attempt 2...

                   mean         
                 amount         
town        Los Angeles New York
category                        
Book Stores       15.33     9.56
Cable TV          11.25     9.50

我每次在纽约的书店平均能拿到 9.56 美元。

我怎样才能得到纽约整个人口的平均值,而不仅仅是买书的三个人?

如果你想使用pd.DataFrame.from_dict()


{'customer_id': {0: 'n1',
  1: 'n2',
  2: 'n3',
  3: 'n4',
  4: 'n5',
  5: 'n6',
  6: 'n7',
  7: 'n8',
  8: 'n9',
  9: 'la1',
  10: 'la2',
  11: 'la3',
  12: 'la4',
  13: 'la5',
  14: 'la6',
  15: 'la7'},
 'town': {0: 'New York',
  1: 'New York',
  2: 'New York',
  3: 'New York',
  4: 'New York',
  5: 'New York',
  6: 'New York',
  7: 'New York',
  8: 'New York',
  9: 'Los Angeles',
  10: 'Los Angeles',
  11: 'Los Angeles',
  12: 'Los Angeles',
  13: 'Los Angeles',
  14: 'Los Angeles',
  15: 'Los Angeles'},
 'amount': {0: 12.5,
  1: 4.49,
  2: 11.7,
  3: 15.0,
  4: 7.0,
  5: 6.0,
  6: 15.0,
  7: 7.0,
  8: 7.0,
  9: 15.0,
  10: 15.99,
  11: 15.0,
  12: 7.0,
  13: 15.99,
  14: 15.0,
  15: 7.0},
 'category': {0: 'Book Stores',
  1: 'Book Stores',
  2: 'Book Stores',
  3: 'Cable TV',
  4: 'Cable TV',
  5: 'Cable TV',
  6: 'Cable TV',
  7: 'Cable TV',
  8: 'Cable TV',
  9: 'Book Stores',
  10: 'Book Stores',
  11: 'Book Stores',
  12: 'Cable TV',
  13: 'Cable TV',
  14: 'Cable TV',
  15: 'Cable TV'}}

标签: pythonpandas

解决方案


尝试自己计算每个聚合值:即

amounts = df.groupby(['town', 'category']).amount.sum()
ncostumers = df.groupby(['town']).size()
o = amounts.unstack(level=0) / ncostumers
print(o)
town    Los Angeles New York
category        
Book Stores 6.570000    3.187778
Cable TV    6.427143    6.333333

推荐阅读