python - 将 2 个数据框与 A 列合并,将值均匀地分布在 A_label 列的计数中
问题描述
我有2个数据框:
df1:
date | product | amount_daily
2020-01-2020 | "a" | 2
2020-01-2020 | "b" | 3
2020-01-2020 | "c" | 5
2020-02-2020 | "a" | 1
2020-02-2020 | "c" | 3
2020-02-2020 | "d" | 4
df2:
product | product_label | factor
"a" | 1 | 10
"a" | 2 | 20
"b" | 3 | 2
"b" | 1 | 4
"b" | 2 | 6
"b" | 4 | 8
"c" | 1 | 5
"c" | 2 | 100
"c" | 3 | 200
"d" | 1 | 18
我想合并“产品”上的两个数据框,并将“amount_daily”的值分散到每天的每个产品 + product_label 上。
最终结果应该是:
date | product | product_label | factor | amount_daily
2020-01-2020 | "a" | 1 | 10 | 1 # a has 2 labels = 2/2
2020-01-2020 | "a" | 2 | 20 | 1
2020-01-2020 | "b" | 1 | 2 | 0.75 # b has 4 labels = 3/4
2020-01-2020 | "b" | 2 | 4 | 0.75
2020-01-2020 | "b" | 3 | 6 | 0.75
2020-01-2020 | "b" | 4 | 8 | 0.75
2020-01-2020 | "c" | 1 | 5 | 1.666 # c has 3 labels = 5/3
2020-01-2020 | "c" | 2 | 100 | 1.666
2020-01-2020 | "c" | 3 | 200 | 1.666
2020-02-2020 | "a" | 1 | 10 | 0.5 # a has 2 labels = 1/2
2020-02-2020 | "a" | 2 | 20 | 0.5
2020-02-2020 | "c" | 1 | 5 | 1 # c has 3 labels = 3/3
2020-02-2020 | "c" | 2 | 100 | 1
2020-02-2020 | "c" | 3 | 200 | 1
2020-02-2020 | "d" | 1 | 18 | 4 # d has 1 label = 4/1
解决方案
一个选项是先合并,然后用于groupby().size()
获取总标签:
ret = df1.merge(df2, on='product')
ret['amount_daily'] /= ret.groupby(['date','product'])['product'].transform('size')
输出:
date product amount_daily product_label factor
0 2020-01-2020 "a" 1.000000 1 10
1 2020-01-2020 "a" 1.000000 2 20
2 2020-02-2020 "a" 0.500000 1 10
3 2020-02-2020 "a" 0.500000 2 20
4 2020-01-2020 "b" 0.750000 3 2
5 2020-01-2020 "b" 0.750000 1 4
6 2020-01-2020 "b" 0.750000 2 6
7 2020-01-2020 "b" 0.750000 4 8
8 2020-01-2020 "c" 1.666667 1 5
9 2020-01-2020 "c" 1.666667 2 100
10 2020-01-2020 "c" 1.666667 3 200
11 2020-02-2020 "c" 1.000000 1 5
12 2020-02-2020 "c" 1.000000 2 100
13 2020-02-2020 "c" 1.000000 3 200
14 2020-02-2020 "d" 4.000000 1 18