python - 试图在某个城镇的某个商店获得平均消费
问题描述
我有一个交易数据框:
customer_id town amount category
0 n1 New York 12.50 Book Stores
1 n2 New York 4.49 Book Stores
2 n3 New York 11.70 Book Stores
3 n4 New York 15.00 Cable TV
4 n5 New York 7.00 Cable TV
5 n6 New York 6.00 Cable TV
6 n7 New York 15.00 Cable TV
7 n8 New York 7.00 Cable TV
8 n9 New York 7.00 Cable TV
9 la1 Los Angeles 15.00 Book Stores
10 la2 Los Angeles 15.99 Book Stores
11 la3 Los Angeles 15.00 Book Stores
12 la4 Los Angeles 7.00 Cable TV
13 la5 Los Angeles 15.99 Cable TV
14 la6 Los Angeles 15.00 Cable TV
15 la7 Los Angeles 7.00 Cable TV
两个城镇:纽约和洛杉矶。
两类消费:书店和有线电视。
如您所见,纽约有 9 个人。
纽约只有三笔书店交易,所以我预计书店的平均支出为:($12.50 + $4.49 + $11.70) / 9 people = $3.18
我正在尝试将其转换为 pandas 函数。我尝试了许多不同的groupbys
and pivots
,但它总是将数据视为只有 3 个人住在纽约。
这是我尝试过的一些方法:
print('\nAttempt 1...\n')
print(new_df.groupby(['town','category'])['amount'].mean())
print('\nAttempt 2...\n')
print(new_df.groupby(['category','town'])['amount'].mean())
print('\nAttempt 2...\n')
print(pd.pivot_table(new_df,index=["category"],values=["amount"],columns=["town"],aggfunc=[np.mean],fill_value=0))
Attempt 1...
town category
Los Angeles Book Stores 15.33
Cable TV 11.25
New York Book Stores 9.56
Cable TV 9.50
Name: amount, dtype: float64
Attempt 2...
category town
Book Stores Los Angeles 15.33
New York 9.56
Cable TV Los Angeles 11.25
New York 9.50
Name: amount, dtype: float64
Attempt 2...
mean
amount
town Los Angeles New York
category
Book Stores 15.33 9.56
Cable TV 11.25 9.50
我每次在纽约的书店平均能拿到 9.56 美元。
我怎样才能得到纽约整个人口的平均值,而不仅仅是买书的三个人?
如果你想使用pd.DataFrame.from_dict()
:
{'customer_id': {0: 'n1',
1: 'n2',
2: 'n3',
3: 'n4',
4: 'n5',
5: 'n6',
6: 'n7',
7: 'n8',
8: 'n9',
9: 'la1',
10: 'la2',
11: 'la3',
12: 'la4',
13: 'la5',
14: 'la6',
15: 'la7'},
'town': {0: 'New York',
1: 'New York',
2: 'New York',
3: 'New York',
4: 'New York',
5: 'New York',
6: 'New York',
7: 'New York',
8: 'New York',
9: 'Los Angeles',
10: 'Los Angeles',
11: 'Los Angeles',
12: 'Los Angeles',
13: 'Los Angeles',
14: 'Los Angeles',
15: 'Los Angeles'},
'amount': {0: 12.5,
1: 4.49,
2: 11.7,
3: 15.0,
4: 7.0,
5: 6.0,
6: 15.0,
7: 7.0,
8: 7.0,
9: 15.0,
10: 15.99,
11: 15.0,
12: 7.0,
13: 15.99,
14: 15.0,
15: 7.0},
'category': {0: 'Book Stores',
1: 'Book Stores',
2: 'Book Stores',
3: 'Cable TV',
4: 'Cable TV',
5: 'Cable TV',
6: 'Cable TV',
7: 'Cable TV',
8: 'Cable TV',
9: 'Book Stores',
10: 'Book Stores',
11: 'Book Stores',
12: 'Cable TV',
13: 'Cable TV',
14: 'Cable TV',
15: 'Cable TV'}}
解决方案
尝试自己计算每个聚合值:即
amounts = df.groupby(['town', 'category']).amount.sum()
ncostumers = df.groupby(['town']).size()
o = amounts.unstack(level=0) / ncostumers
print(o)
town Los Angeles New York
category
Book Stores 6.570000 3.187778
Cable TV 6.427143 6.333333
推荐阅读
- mysql - 为什么将字符集从 utf8mb4 更改为 utf8 后,表的索引存储大小更大?
- unity-container - 你如何使用注册工厂?
- javascript - TS:从动态对象的键创建映射类型
- python - 无法使用 rdflib 解析 .nt 文件
- reactjs - 如何在 React Apollo 查询中包含变量并执行它?
- javascript - Stripe Payment Intent API 阻止表单提交,直到用户验证卡
- python - Scrapy Xpath:如何避免在同一页面中获得相同的值?
- ruby-on-rails - Libsass:错误:“env(safe-area-inset-right)”不是“max”的数字
- c - 通过长指针将 C char 数组解释为 long
- javascript - axios 请求返回乱码的 json 响应