首页 > 解决方案 > 根据日期和分组过滤熊猫数据框

问题描述

我有以下数据框:

Date    group   File1   File2   Begin Date  End Date
4/28/2014   A   CC2015H CC2015K 5/1/2014    2/2/2015
4/29/2014   A   CC2015H CC2015K 5/1/2014    2/2/2015
4/30/2014   A   CC2015H CC2015K 5/1/2014    2/2/2015
5/1/2014    A   CC2015H CC2015K 5/1/2014    2/2/2015
5/2/2014    A   CC2015H CC2015K 5/1/2014    2/2/2015
1/22/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/23/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/26/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/27/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/28/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/29/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/30/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
2/2/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
2/3/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
2/4/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
2/5/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
2/6/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
8/25/2014   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/26/2014   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/27/2014   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/28/2014   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/29/2014   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
9/2/2014    B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/7/2015    B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/10/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/11/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/12/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/13/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/14/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/17/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/18/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/19/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/20/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015

它实际上是一个更大的数据框,包含更多的组。为了显示的目的,我已经缩短了它。我正在尝试过滤日期列上的数据框,如下所示:

df = df.loc[df.groupby(['group','File1', 'File2']).df['Date'] >= df.groupby(['group', 'File1', 'File2'])['Begin Date']

输出应如下所示:

Date    group   File1   File2   Begin Date  End Date
5/1/2014    A   CC2015H CC2015K 5/1/2014    2/2/2015
5/2/2014    A   CC2015H CC2015K 5/1/2014    2/2/2015
1/22/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/23/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/26/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/27/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/28/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/29/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/30/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
2/2/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
2/3/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
2/4/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
2/5/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
2/6/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
8/29/2014   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
9/2/2014    B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/7/2015    B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/10/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/11/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/12/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/13/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/14/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/17/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/18/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/19/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/20/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015

奖励问题:我想按开始日期和结束日期过滤,即按标准保持组

df['Date'] >= df['Begin Date'] & df['Date'] <= df['End Date']

感谢您提前提供任何帮助或建议。

标签: pandasfilterpandas-groupby

解决方案


我认为groupby这里没有必要,因为您没有汇总每个组的任何内容(最小值、最大值、总和、计数等)。

between是您正在寻找的:

df[df['Date'].between(df['Begin Date'], df['End Date'])]

推荐阅读