python - Python Pandas Dataframe:根据第二个数据框中的条件(日期范围和条件)选择条目
问题描述
我正在尝试使用 Pythons pandas 数据帧从一个数据帧中以另一个数据帧为条件选择条目:
第一个数据框给出了每个人的优先日期:
import pandas as pd
df_priority = pd.DataFrame({'Person': ['Alfred', 'Bob', 'Charles'], 'Start Date': ['2018-01-01', '2018-03-01', '2018-05-01'] , 'End Date': ['2018-02-01', '2018-04-01', '2018-06-01']})
df_priority.head()
Start Date End Date Person
0 2018-01-01 2018-02-01 Alfred
1 2018-03-01 2018-04-01 Bob
2 2018-05-01 2018-06-01 Charles
第二个数据框给出了每个人和每个月的销售额:
df_sales = pd.DataFrame({'Person': ['Alfred', 'Alfred', 'Alfred','Bob','Bob','Bob','Bob','Bob','Bob','Charles','Charles','Charles','Charles','Charles','Charles'],'Date': ['2018-01-01', '2018-02-01', '2018-03-01', '2018-01-01', '2018-02-01', '2018-03-01','2018-04-01', '2018-05-01', '2018-06-01', '2018-01-01', '2018-02-01', '2018-03-01','2018-04-01', '2018-05-01', '2018-06-01'], 'Sales': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]})
df_sales.head(15)
Date Person Sales
0 2018-01-01 Alfred 1
1 2018-02-01 Alfred 2
2 2018-03-01 Alfred 3
3 2018-01-01 Bob 4
4 2018-02-01 Bob 5
5 2018-03-01 Bob 6
6 2018-04-01 Bob 7
7 2018-05-01 Bob 8
8 2018-06-01 Bob 9
9 2018-01-01 Charles 10
10 2018-02-01 Charles 11
11 2018-03-01 Charles 12
12 2018-04-01 Charles 13
13 2018-05-01 Charles 14
14 2018-06-01 Charles 15
现在,我想要每个人在他的优先日期范围内的销售数字,即结果应该是:
Date Person Sales
0 2018-01-01 Alfred 1
1 2018-02-01 Alfred 2
5 2018-03-01 Bob 6
6 2018-04-01 Bob 7
13 2018-05-01 Charles 14
14 2018-06-01 Charles 15
有什么帮助吗?
解决方案
您可以在多个列上应用 lambda 以获得所需的结果:
# custom function that gives the prioritized date range for each person by person name
def salesByNameAndDate(name):
start_date = df_priority[df_priority['Person'] == name]['Start Date'].values[0]
end_date = df_priority[df_priority['Person'] == name]['End Date'].values[0]
date_range = pd.date_range(start=start_date, end=end_date)
return date_range
# return sales value if the date is inside the date range for this person or "nothing" if the date is outside this range
df_sales['new_sales'] = df_sales.apply(lambda x: x['Sales'] if x['Date'] in salesByNameAndDate(x['Person']) else 'nothing',axis=1)
# after that you drop all "nothing" and duplicate column "new_sales"
new_df = df_sales[df_sales['new_sales'] != 'nothing'].drop('new_sales', axis=1)[['Date', 'Person', 'Sales']]
# output
Date Person Sales
0 2018-01-01 Alfred 1
1 2018-02-01 Alfred 2
5 2018-03-01 Bob 6
6 2018-04-01 Bob 7
13 2018-05-01 Charles 14
14 2018-06-01 Charles 15
推荐阅读
- java - 使用 Apache POI 4.1.1 从 excel 文件中读取值时出错
- javascript - Replacing HOC setState and callbacks with functional components Hooks in React
- agora.io - Agora RTC - Agora-SDK [DEBUG]: Ignoring event undefined
- flutter - What is the best way to make fullscreen for flutter video_player?
- r - R Studio 1.3.959 Errors in Installing Packages?
- java - How can I make a package inside of a package in eclipse?
- php - Symfony 4 - 创建一个新项目(它不起作用)
- javascript - 将文件从 React.JS 前端移动到 C++ 后端
- reactjs - 如何将 React JS 应用程序调试到 Cordova?
- maven - Groovy 和 Maven:陷入编译和测试一个非常简单的问题