首页 > 解决方案 > 将 2 个数据框与 A 列合并,将值均匀地分布在 A_label 列的计数中

问题描述

我有2个数据框:

df1:

date         | product | amount_daily
2020-01-2020 | "a"     | 2
2020-01-2020 | "b"     | 3
2020-01-2020 | "c"     | 5
2020-02-2020 | "a"     | 1
2020-02-2020 | "c"     | 3
2020-02-2020 | "d"     | 4

df2:

product | product_label | factor
"a"     | 1             | 10
"a"     | 2             | 20        
"b"     | 3             | 2
"b"     | 1             | 4
"b"     | 2             | 6
"b"     | 4             | 8
"c"     | 1             | 5
"c"     | 2             | 100
"c"     | 3             | 200
"d"     | 1             | 18

我想合并“产品”上的两个数据框,并将“amount_daily”的值分散到每天的每个产品 + product_label 上。

最终结果应该是:

date         | product | product_label | factor | amount_daily
2020-01-2020 | "a"     | 1             | 10     | 1 # a has 2 labels = 2/2
2020-01-2020 | "a"     | 2             | 20     | 1
2020-01-2020 | "b"     | 1             | 2      | 0.75 # b has 4 labels = 3/4
2020-01-2020 | "b"     | 2             | 4      | 0.75
2020-01-2020 | "b"     | 3             | 6      | 0.75
2020-01-2020 | "b"     | 4             | 8      | 0.75
2020-01-2020 | "c"     | 1             | 5      | 1.666 # c has 3 labels = 5/3
2020-01-2020 | "c"     | 2             | 100    | 1.666
2020-01-2020 | "c"     | 3             | 200    | 1.666

2020-02-2020 | "a"     | 1             | 10     | 0.5 # a has 2 labels = 1/2
2020-02-2020 | "a"     | 2             | 20     | 0.5
2020-02-2020 | "c"     | 1             | 5      | 1 # c has 3 labels = 3/3
2020-02-2020 | "c"     | 2             | 100    | 1
2020-02-2020 | "c"     | 3             | 200    | 1
2020-02-2020 | "d"     | 1             | 18     | 4 # d has 1 label = 4/1

标签: pythonpandasdataframemerge

解决方案


一个选项是先合并,然后用于groupby().size()获取总标签:

ret = df1.merge(df2, on='product')

ret['amount_daily'] /= ret.groupby(['date','product'])['product'].transform('size')

输出:

            date product  amount_daily  product_label  factor
0   2020-01-2020     "a"      1.000000              1      10
1   2020-01-2020     "a"      1.000000              2      20
2   2020-02-2020     "a"      0.500000              1      10
3   2020-02-2020     "a"      0.500000              2      20
4   2020-01-2020     "b"      0.750000              3       2
5   2020-01-2020     "b"      0.750000              1       4
6   2020-01-2020     "b"      0.750000              2       6
7   2020-01-2020     "b"      0.750000              4       8
8   2020-01-2020     "c"      1.666667              1       5
9   2020-01-2020     "c"      1.666667              2     100
10  2020-01-2020     "c"      1.666667              3     200
11  2020-02-2020     "c"      1.000000              1       5
12  2020-02-2020     "c"      1.000000              2     100
13  2020-02-2020     "c"      1.000000              3     200
14  2020-02-2020     "d"      4.000000              1      18

推荐阅读