pandas - 计算一组时间戳的小时总和与一个固定数的差
问题描述
我有以下数据框。我想要实现的是对于每个“Date_worked”,“Time_spent”列中的小时数总和应该等于 7。例如,在 2019 年 6 月 10 日,小时数总和已经是 7,所以什么都不需要进行调整。在 2019 年 6 月 12 日,小时总和为 4.25,因此我需要插入一行带有“Tab_description”“差异”的行,这将在“时间_花费”下显示为 2.75 的差异。2019 年 6 月 13 日和 2019 年 6 月 14 日已经达到 7,因此无需在那里进行任何操作。在 2019 年 6 月 19 日的情况下,我需要做与 2019 年 6 月 12 日相同的操作,插入总和为 6 的行,使总和为 7。感谢您的帮助。
Date_worked Tab_description Time_spent
0 6/10/2019 Perform planning procedures 7.0
1 6/11/2019 Perform planning procedures 7.0
2 6/12/2019 Time off (away from the office) 2.25
3 6/12/2019 Staff meeting 1.0
4 6/12/2019 Accounting & Risk Management Luncheon 1.0
5 6/13/2019 Perform planning procedures 7.0
6 6/14/2019 Time off (away from the office) 2.0
7 6/14/2019 Review policies and procedures 5.0
8 6/17/2019 Time off (away from the office) 7.0
9 6/18/2019 Perform planning procedures 7.0
10 6/19/2019 Staff meeting 1.0
11 6/20/2019 Time off (away from the office) 2.0
12 6/21/2019 Time off (away from the office) 1.0
13 6/24/2019 Staff meeting (FY 20 planning) 7.0
14 6/25/2019 FCR Kick-off meeting 1.0
15 6/26/2019 Time off (away from the office) 1.5
16 6/26/2019 Staff meeting 1.0
17 6/28/2019 Time off (away from the office) 1.0
解决方案
有很多方法可以做到这一点,我将向您展示使用groupby
& concat
。
首先让我们算出总时间和差,
print(df)
Date_worked Tab_description Time_spent
0 6/10/2019 Perform planning procedures 7.00
1 6/11/2019 Perform planning procedures 7.00
2 6/12/2019 Time off (away from the office) 0.25
3 6/12/2019 Staff meeting 1.00
4 6/12/2019 Accounting & Risk Management Luncheon 1.00
5 6/13/2019 Perform planning procedures 7.00
6 6/14/2019 Time off (away from the office) 2.00
7 6/14/2019 Review policies and procedures 5.00
8 6/17/2019 Time off (away from the office) 7.00
9 6/18/2019 Perform planning procedures 7.00
10 6/19/2019 Staff meeting 1.00
11 6/20/2019 Time off (away from the office) 2.00
12 6/21/2019 Time off (away from the office) 1.00
13 6/24/2019 Staff meeting (FY 7.00
14 6/25/2019 FCR Kick-off meeting 1.00
15 6/26/2019 Time off (away from the office) 1.50
16 6/26/2019 Staff meeting 1.00
17 6/28/2019 Time off (away from the office) 1.00
我们从groupby
一个简单的差和开始,我们将其分配给一个名为 df2 的新变量。
df2 = df.groupby('Date_worked')['Time_spent'].sum().reset_index()
df2['variance'] = df2['Time_spent'] - 7.00
我们现在创建您的标签列并创建您要求的描述,
df2.loc[df2['variance'] != 0, 'Tab_description'] = 'Difference'
然后,我们删除任何 NaN 行,删除'Time_spent'
列,并将“方差”列重命名为concat
.
pd.concat(
[
df,
df2.dropna()
.drop("Time_spent", axis=1)
.rename(columns={"variance": "Time_spent"}),
],
sort=False,
)
print(df)
Date_worked Tab_description Time_spent
0 6/10/2019 Perform planning procedures 7.00
1 6/11/2019 Perform planning procedures 7.00
2 6/12/2019 Time off (away from the office) 0.25
3 6/12/2019 Staff meeting 1.00
4 6/12/2019 Accounting & Risk Management Luncheon 1.00
5 6/13/2019 Perform planning procedures 7.00
6 6/14/2019 Time off (away from the office) 2.00
7 6/14/2019 Review policies and procedures 5.00
8 6/17/2019 Time off (away from the office) 7.00
9 6/18/2019 Perform planning procedures 7.00
10 6/19/2019 Staff meeting 1.00
11 6/20/2019 Time off (away from the office) 2.00
12 6/21/2019 Time off (away from the office) 1.00
13 6/24/2019 Staff meeting (FY 7.00
14 6/25/2019 FCR Kick-off meeting 1.00
15 6/26/2019 Time off (away from the office) 1.50
16 6/26/2019 Staff meeting 1.00
17 6/28/2019 Time off (away from the office) 1.00
2 6/12/2019 Difference -4.75
7 6/19/2019 Difference -6.00
8 6/20/2019 Difference -5.00
9 6/21/2019 Difference -6.00
11 6/25/2019 Difference -6.00
12 6/26/2019 Difference -4.50
13 6/28/2019 Difference -6.00
推荐阅读
- reactjs - 由于 pure.js 中使用 react-testing-library/react-hooks 的错误,测试套件无法运行
- c++ - constexpr 对于静态常量包装器对象是多余的吗?
- c# - 在多行上显示带有 Binding 对象的 StackLayout
- 2sxc - 2sxc 应用程序是否需要完全翻译成目标语言?
- javascript - 我应该什么时候在 Angular 中设置 localStorage?
- go - 如何在 fosite aouth2 中创建 JWT 访问令牌?
- python - 尝试获取不等于列表的元素时的 SettingWithCopyWarning
- reactjs - 每当我单击添加时,我的页面都会刷新,因此数组中的所有值都会重置
- python - 用函数丰富已经实现的python类
- ios - Swift UI 中的自定义滚动视图内容大小