sql - 计算分工工时?
问题描述
我需要计算一些报告的拆分工时,并且一直在努力解决这个问题。我们的 Oracle 数据库目前使用包中的存储过程来执行此操作,但我正在为这个项目在 Snowflake/dbt 的范围内工作,虽然我可以将 Oracle 中的视图中的数据带到 Snowflake,但我想要在重新计算分段时间之前对时间进行一些转换,因为我们经常有人忘记打卡,我想在他们轮班结束时纠正这些。所以我必须在调整错误的时钟后重新计算分段时间。
我的数据如下所示:
emp_id | task_id | time_in | 超时 |
---|---|---|---|
43 | 159221 | 2021-09-30 09:00:00 | 2021-09-30 14:30:00 |
43 | 188239 | 2021-09-30 09:16:03 | 2021-09-30 10:44:51 |
43 | 177741 | 2021-09-30 13:12:49 | 2021-09-30 14:14:32 |
在这里停下来总结一下时间得到:
emp_id | task_id | 小时 |
---|---|---|
43 | 159221 | 5.5 |
43 | 188239 | 1.47 |
43 | 177741 | 1.03 |
这是错误的,因为花费的一些时间与其他任务共享,应该分开。
我需要做的是,如果我们计算第一项任务的总小时数,这将是公式:
.267 + (1.47 / 2) + 2.47 + (1.03 / 2) + .27 = 4.257 小时在 task_id 159221 上工作。
有什么方法可以在 SQL 中处理这个问题,还是我必须学习 JavaScript 并编写存储过程?
额外细节:
- 可以重叠的任务数是可变的,它不仅最多为 2。
- 员工可以在一天中多次进入同一任务
- 多个不同的员工可能同时在同一任务中打卡。
解决方案
您可以尝试以下方法。
在汇总结果以从实际任务持续时间中减去总重叠持续时间之前,使用 LEFT JOIN 确定重叠持续时间。
SELECT
emp_id,
task_id,
MAX(task_duration) - SUM(overlap_duration/2) as hours
FROM (
SELECT
t1.emp_id,
t1.task_id,
t1.time_in,
t1.time_out,
(datediff(seconds, t1.time_in, t1.time_out)/3600) as task_duration,
CASE
WHEN t2.task_id IS NULL THEN 0
ELSE (datediff(seconds,t2.time_in,t2.time_out)/3600)
END as overlap_duration,
t2.task_id as overlap_task
FROM
task_times t1
LEFT JOIN
task_times t2 ON t1.emp_id = t2.emp_id AND
t1.task_id < t2.task_id AND
(
(t1.time_in between t2.time_in and t2.time_out) or
(t1.time_out between t2.time_in and t2.time_out) or
(t1.time_in < t2.time_in and t1.time_out > t2.time_out) or
(t1.time_in > t2.time_in and t1.time_out < t2.time_out)
)
) t3
GROUP BY
emp_id, task_id;
emp_id | task_id | 小时 |
---|---|---|
43 | 159221 | 4.245694444444444 |
43 | 177741 | 1.0286111111111111 |
43 | 188239 | 1.48 |
我还包括了下面子查询的输出供您阅读
SELECT
t1.emp_id,
t1.task_id,
t1.time_in,
t1.time_out,
(datediff(seconds, t1.time_in, t1.time_out)/3600) as task_duration,
CASE
WHEN t2.task_id IS NULL THEN 0
ELSE (datediff(seconds,t2.time_in,t2.time_out)/3600)
END as overlap_duration,
t2.task_id as overlap_task
FROM
task_times t1
LEFT JOIN
task_times t2 ON t1.emp_id = t2.emp_id AND
t1.task_id < t2.task_id AND
(
(t1.time_in between t2.time_in and t2.time_out) or
(t1.time_out between t2.time_in and t2.time_out) or
(t1.time_in < t2.time_in and t1.time_out > t2.time_out) or
(t1.time_in > t2.time_in and t1.time_out < t2.time_out)
);
emp_id | task_id | time_in | 超时 | 任务持续时间 | 重叠时间 | 重叠任务 |
---|---|---|---|---|---|---|
43 | 159221 | 2021-09-30T09:00:00.000Z | 2021-09-30T14:30:00.000Z | 5.5 | 1.48 | 188239 |
43 | 159221 | 2021-09-30T09:00:00.000Z | 2021-09-30T14:30:00.000Z | 5.5 | 1.0286111111111111 | 177741 |
43 | 188239 | 2021-09-30T09:16:03.000Z | 2021-09-30T10:44:51.000Z | 1.48 | 0 | |
43 | 177741 | 2021-09-30T13:12:49.000Z | 2021-09-30T14:14:32.000Z | 1.0286111111111111 | 0 |
编辑 1
回应问题更新
额外细节:
- 可以重叠的任务数是可变的,它不仅最多为 2。
- 员工可以在一天中多次进入同一任务
我添加了更多示例数据并提供了更新的小提琴。我无法提供雪花的演示小提琴,但我们可以将其更改EXTRACT
为DATEDIFF
如前所示。
CREATE TABLE task_times (
emp_id INTEGER,
task_id INTEGER,
time_in TIMESTAMP,
time_out TIMESTAMP
);
INSERT INTO task_times
(emp_id, task_id, time_in, time_out)
VALUES
-- add repetition of the same tasks with overlaps
('43', '159221', '2021-09-30 09:00:00', '2021-09-30 14:30:00'),
('43', '159221', '2021-09-30 09:03:00', '2021-09-30 14:27:00'),
('43', '159221', '2021-09-30 09:03:00', '2021-09-30 14:38:00'),
--- add overlap of different tasks
('43', '188239', '2021-09-30 09:16:03', '2021-09-30 10:44:51'),
('43', '188239', '2021-09-30 09:16:03', '2021-09-30 10:43:51'),
('43', '177741', '2021-09-30 13:12:49', '2021-09-30 14:14:32'),
-- add multiple overalapping tasks with a difference of 10 minutes
('43', '333333', '2021-10-30 09:00:00', '2021-10-30 12:40:00'),
('43', '333334', '2021-10-30 09:00:00', '2021-10-30 12:30:00'),
('43', '333335', '2021-10-30 09:00:00', '2021-10-30 12:20:00'),
('43', '333336', '2021-10-30 09:00:00', '2021-10-30 12:10:00'),
-- add multiple log in and log out times with no overlaps
('43', '333337', '2021-10-30 09:00:00', '2021-10-30 12:00:00'),
('43', '333337', '2021-10-30 12:30:00', '2021-10-30 13:00:00'),
('43', '333337', '2021-10-30 13:30:00', '2021-10-30 14:00:00')
;
确定重叠的子查询 (用于调试目的)
SELECT
t1.emp_id,
t1.task_id,
t1.time_in,
t1.time_out,
(EXTRACT(epoch FROM t1.time_out-t1.time_in)/3600) as task_duration,
CASE
WHEN t2.task_id IS NULL THEN 0
ELSE (EXTRACT(epoch FROM t2.time_out-t2.time_in)/3600)
END as overlap_duration,
t2.task_id as overlap_task
FROM
task_times t1
LEFT JOIN
task_times t2 ON t1.emp_id = t2.emp_id AND
t1.task_id < t2.task_id AND
(
(t1.time_in between t2.time_in and t2.time_out) or
(t1.time_out between t2.time_in and t2.time_out) or
(t1.time_in < t2.time_in and t1.time_out > t2.time_out) or
(t1.time_in > t2.time_in and t1.time_out < t2.time_out)
);
emp_id | task_id | time_in | 超时 | 任务持续时间 | 重叠时间 | 重叠任务 |
---|---|---|---|---|---|---|
43 | 159221 | 2021-09-30T09:00:00.000Z | 2021-09-30T14:30:00.000Z | 5.5 | 1.48 | 188239 |
43 | 159221 | 2021-09-30T09:00:00.000Z | 2021-09-30T14:30:00.000Z | 5.5 | 1.4633333333333334 | 188239 |
43 | 159221 | 2021-09-30T09:00:00.000Z | 2021-09-30T14:30:00.000Z | 5.5 | 1.0286111111111111 | 177741 |
43 | 159221 | 2021-09-30T09:03:00.000Z | 2021-09-30T14:27:00.000Z | 5.4 | 1.48 | 188239 |
43 | 159221 | 2021-09-30T09:03:00.000Z | 2021-09-30T14:27:00.000Z | 5.4 | 1.4633333333333334 | 188239 |
43 | 159221 | 2021-09-30T09:03:00.000Z | 2021-09-30T14:27:00.000Z | 5.4 | 1.0286111111111111 | 177741 |
43 | 159221 | 2021-09-30T09:03:00.000Z | 2021-09-30T14:38:00.000Z | 5.583333333333333 | 1.48 | 188239 |
43 | 159221 | 2021-09-30T09:03:00.000Z | 2021-09-30T14:38:00.000Z | 5.583333333333333 | 1.4633333333333334 | 188239 |
43 | 159221 | 2021-09-30T09:03:00.000Z | 2021-09-30T14:38:00.000Z | 5.583333333333333 | 1.0286111111111111 | 177741 |
43 | 188239 | 2021-09-30T09:16:03.000Z | 2021-09-30T10:44:51.000Z | 1.48 | 0 | |
43 | 188239 | 2021-09-30T09:16:03.000Z | 2021-09-30T10:43:51.000Z | 1.4633333333333334 | 0 | |
43 | 177741 | 2021-09-30T13:12:49.000Z | 2021-09-30T14:14:32.000Z | 1.0286111111111111 | 0 | |
43 | 333333 | 2021-10-30T09:00:00.000Z | 2021-10-30T12:40:00.000Z | 3.6666666666666665 | 3.5 | 333334 |
43 | 333333 | 2021-10-30T09:00:00.000Z | 2021-10-30T12:40:00.000Z | 3.6666666666666665 | 3.3333333333333335 | 333335 |
43 | 333333 | 2021-10-30T09:00:00.000Z | 2021-10-30T12:40:00.000Z | 3.6666666666666665 | 3.1666666666666665 | 333336 |
43 | 333333 | 2021-10-30T09:00:00.000Z | 2021-10-30T12:40:00.000Z | 3.6666666666666665 | 3 | 333337 |
43 | 333333 | 2021-10-30T09:00:00.000Z | 2021-10-30T12:40:00.000Z | 3.6666666666666665 | 0.5 | 333337 |
43 | 333334 | 2021-10-30T09:00:00.000Z | 2021-10-30T12:30:00.000Z | 3.5 | 3.3333333333333335 | 333335 |
43 | 333334 | 2021-10-30T09:00:00.000Z | 2021-10-30T12:30:00.000Z | 3.5 | 3.1666666666666665 | 333336 |
43 | 333334 | 2021-10-30T09:00:00.000Z | 2021-10-30T12:30:00.000Z | 3.5 | 3 | 333337 |
43 | 333334 | 2021-10-30T09:00:00.000Z | 2021-10-30T12:30:00.000Z | 3.5 | 0.5 | 333337 |
43 | 333335 | 2021-10-30T09:00:00.000Z | 2021-10-30T12:20:00.000Z | 3.3333333333333335 | 3.1666666666666665 | 333336 |
43 | 333335 | 2021-10-30T09:00:00.000Z | 2021-10-30T12:20:00.000Z | 3.3333333333333335 | 3 | 333337 |
43 | 333336 | 2021-10-30T09:00:00.000Z | 2021-10-30T12:10:00.000Z | 3.1666666666666665 | 3 | 333337 |
43 | 333337 | 2021-10-30T09:00:00.000Z | 2021-10-30T12:00:00.000Z | 3 | 0 | |
43 | 333337 | 2021-10-30T12:30:00.000Z | 2021-10-30T13:00:00.000Z | 0.5 | 0 | |
43 | 333337 | 2021-10-30T13:30:00.000Z | 2021-10-30T14:00:00.000Z | 0.5 | 0 |
实际查询 (我只包含mins
用于调试目的的列)
SELECT
emp_id,
task_id,
MAX(CASE
WHEN overlap_duration>0 THEN task_duration
ELSE 0
END ) + SUM(CASE
WHEN overlap_duration=0 THEN task_duration
ELSE 0
END) - AVG(overlap_duration) as hrs,
(MAX(CASE
WHEN overlap_duration>0 THEN task_duration
ELSE 0
END ) + SUM(CASE
WHEN overlap_duration=0 THEN task_duration
ELSE 0
END) - AVG(overlap_duration))*60 as mins
FROM (
SELECT
t1.emp_id,
t1.task_id,
t1.time_in,
t1.time_out,
(EXTRACT(epoch FROM t1.time_out-t1.time_in)/3600) as task_duration,
CASE
WHEN t2.task_id IS NULL THEN 0
ELSE (EXTRACT(epoch FROM t2.time_out-t2.time_in)/3600)
END as overlap_duration,
t2.task_id as overlap_task
FROM
task_times t1
LEFT JOIN
task_times t2 ON t1.emp_id = t2.emp_id AND
t1.task_id < t2.task_id AND
(
(t1.time_in between t2.time_in and t2.time_out) or
(t1.time_out between t2.time_in and t2.time_out) or
(t1.time_in < t2.time_in and t1.time_out > t2.time_out) or
(t1.time_in > t2.time_in and t1.time_out < t2.time_out)
)
) t3
GROUP BY
emp_id, task_id
ORDER BY
emp_id, task_id;
emp_id | task_id | 小时 | 分钟 |
---|---|---|---|
43 | 159221 | 4.259351851851852 | 255.5611111111111 |
43 | 177741 | 1.0286111111111111 | 61.71666666666667 |
43 | 188239 | 2.9433333333333334 | 176.6 |
43 | 333333 | 0.9666666666666663 | 57.99999999999998 |
43 | 333334 | 1 | 60 |
43 | 333335 | 0.25000000000000044 | 15.000000000000027 |
43 | 333336 | 0.16666666666666652 | 9.999999999999991 |
43 | 333337 | 4 | 240 |
让我知道这是否适合您。
推荐阅读
- java - 将验证 InternetAddress 对象上的方法删除 CRLF 注入问题
- python - 重新排序 df.columns alpha 数字
- php - 我如何将功能添加到已经存在的功能
- c# - 替换部分 asp 标签值而不替换标签中的全部内容
- python - 如何在 Tensorflow 中使用多个输入应用 tf.map_fn
- mongodb - 为什么我只得到 json 数组的最后一条记录,它是从带有颤振的 graphql 接收的?
- flutter - Flutter:从 Flutter 应用程序启动我的其他应用程序并取回一些数据
- angular - 使用 Jest 的不可理解的单元测试效果错误
- docker - Windows 兼容性:权限?
- python - 避免类实例覆盖默认值