python-3.x - 基于总和为目标值的最接近组合连接两个数据帧
问题描述
我试图根据df2列中最接近的行组合加入以下两个数据帧,该组合总和为df1列Sales
中的目标值,两个数据帧中的列和在加入时应该相同(如预期输出所示)。Total Sales
Name
Date
例如:在df1行号 0 应仅与df2行 0 和 1 匹配,因为列Name
&Date
是相同的,即名称:约翰和日期:2021-10-01。
df1:
df1 = pd.DataFrame({"Name":{"0":"John","1":"John","2":"Jack","3":"Nancy","4":"Ahmed"},
"Date":{"0":"2021-10-01","1":"2021-11-01","2":"2021-10-10","3":"2021-10-12","4":"2021-10-30"},
"Total Sales":{"0":15500,"1":5500,"2":17600,"3":20700,"4":12000}})
Name Date Total Sales
0 John 2021-10-01 15500
1 John 2021-11-01 5500
2 Jack 2021-10-10 17600
3 Nancy 2021-10-12 20700
4 Ahmed 2021-10-30 12000
df2:
df2 = pd.DataFrame({"ID":{"0":"JO1","1":"JO2","2":"JO3","3":"JO4","4":"JA1","5":"JA2","6":"NA1",
"7":"NA2","8":"NA3","9":"NA4","10":"AH1","11":"AH2","12":"AH3","13":"AH3"},
"Name":{"0":"John","1":"John","2":"John","3":"John","4":"Jack","5":"Jack","6":"Nancy","7":"Nancy",
"8":"Nancy","9":"Nancy","10":"Ahmed","11":"Ahmed","12":"Ahmed","13":"Ahmed"},
"Date":{"0":"2021-10-01","1":"2021-10-01","2":"2021-11-01","3":"2021-11-01","4":"2021-10-10","5":"2021-10-10","6":"2021-10-12","7":"2021-10-12",
"8":"2021-10-12","9":"2021-10-12","10":"2021-10-30","11":"2021-10-30","12":"2021-10-30","13":"2021-10-29"},
"Sales":{"0":10000,"1":5000,"2":1000,"3":5500,"4":10000,"5":7000,"6":20000,
"7":100,"8":500,"9":100,"10":5000,"11":7000,"12":10000,"13":12000}})
ID Name Date Sales
0 JO1 John 2021-10-01 10000
1 JO2 John 2021-10-01 5000
2 JO3 John 2021-11-01 1000
3 JO4 John 2021-11-01 5500
4 JA1 Jack 2021-10-10 10000
5 JA2 Jack 2021-10-10 7000
6 NA1 Nancy 2021-10-12 20000
7 NA2 Nancy 2021-10-12 100
8 NA3 Nancy 2021-10-12 500
9 NA4 Nancy 2021-10-12 100
10 AH1 Ahmed 2021-10-30 5000
11 AH2 Ahmed 2021-10-30 7000
12 AH3 Ahmed 2021-10-30 10000
13 AH3 Ahmed 2021-10-29 12000
预期输出:
Name Date Total Sales Comb IDs Comb Total
0 John 2021-10-01 15500 JO1, JO2 15000.0
1 John 2021-11-01 5500 JO4 5500.0
2 Jack 2021-10-10 17600 JA1, JA2 17000.0
3 Nancy 2021-10-12 20700 NA1, NA2, NA3, NA4 20700.0
4 Ahmed 2021-10-30 12000 AH1, AH2 12000.0
我在下面尝试的是一次只为一行工作,但我不确定如何将它应用到 pandas 数据帧中以获得预期的输出。
numbers
下面脚本中的变量代表df2Sales
中的列,下面的变量代表df1中的列。target
Total Sales
import itertools
import math
numbers = [1000, 5000, 3000]
target = 6000
best_combination = ((None,))
best_result = math.inf
best_sum = 0
for L in range(0, len(numbers)+1):
for combination in itertools.combinations(numbers, L):
sum = 0
for number in combination:
sum += number
result = target - sum
if abs(result) < abs(best_result):
best_result = result
best_combination = combination
best_sum = sum
print("\nbest sum{} = {}".format(best_combination, best_sum))
[Out] best sum(1000, 5000) = 6000
解决方案
获取您编写的找到最佳总和的代码并将其转换为一个函数(让我们称之为它opt
,它具有目标和数据帧的参数(它将是 的子集df2
。它需要返回对应于最佳值的 ID 列表)组合。
编写另一个函数,它接受 3 个参数名称、日期和目标(我们称之为calc
)。df2
此函数将根据名称和日期进行过滤,并将其与目标一起传递给opt
函数并返回该函数的结果。最后,遍历 的行df1
,并calc
使用行参数调用(或者使用pandas.DataFrame.apply
推荐阅读
- react-native - React Native 文本和图像并排对齐问题
- netlogo - 如何在 NetLogo 中改变随机海龟的颜色
- java - SpEL Spring中的三元运算符
- c# - Windows 身份验证:用户 'DOMAIN\MACHINE$' 登录失败
- android - Android 推送通知与 Firebase 推送通知
- mysql - 是否可以使用 mysql_user ansible 模块删除 mysql root 用户进行远程登录?
- java - 当参数是给定类型的子类时如何调用重载方法?
- amazon-web-services - 字体未从 aws s3 加载(400 错误请求)
- php - PHP / htaccess中带有正斜杠的GET参数
- react-native - Metro bundler 不会随着 expo start 自动启动