首页 > 解决方案 > 计算分工工时?

问题描述

我需要计算一些报告的拆分工时,并且一直在努力解决这个问题。我们的 Oracle 数据库目前使用包中的存储过程来执行此操作,但我正在为这个项目在 Snowflake/dbt 的范围内工作,虽然我可以将 Oracle 中的视图中的数据带到 Snowflake,但我想要在重新计算分段时间之前对时间进行一些转换,因为我们经常有人忘记打卡,我想在他们轮班结束时纠正这些。所以我必须在调整错误的时钟后重新计算分段时间。

我的数据如下所示:

emp_id task_id time_in 超时
43 159221 2021-09-30 09:00:00 2021-09-30 14:30:00
43 188239 2021-09-30 09:16:03 2021-09-30 10:44:51
43 177741 2021-09-30 13:12:49 2021-09-30 14:14:32

在这里停下来总结一下时间得到:

emp_id task_id 小时
43 159221 5.5
43 188239 1.47
43 177741 1.03

这是错误的,因为花费的一些时间与其他任务共享,应该分开。

我需要做的是,如果我们计算第一项任务的总小时数,这将是公式:

.267 + (1.47 / 2) + 2.47 + (1.03 / 2) + .27 = 4.257 小时在 task_id 159221 上工作。

有什么方法可以在 SQL 中处理这个问题,还是我必须学习 JavaScript 并编写存储过程?

额外细节:

标签: sqlsnowflake-cloud-data-platformdbt

解决方案


您可以尝试以下方法。

在汇总结果以从实际任务持续时间中减去总重叠持续时间之前,使用 LEFT JOIN 确定重叠持续时间。

SELECT
     emp_id,
     task_id,
     MAX(task_duration) - SUM(overlap_duration/2) as hours
FROM (
    SELECT
        t1.emp_id,
        t1.task_id,
        t1.time_in,
        t1.time_out,
        (datediff(seconds, t1.time_in, t1.time_out)/3600) as task_duration,
         CASE
             WHEN t2.task_id IS NULL THEN 0
             ELSE (datediff(seconds,t2.time_in,t2.time_out)/3600)
         END as overlap_duration,
         t2.task_id as overlap_task
    FROM
        task_times t1
    LEFT JOIN
        task_times t2 ON t1.emp_id = t2.emp_id AND
                         t1.task_id < t2.task_id AND
                         (
                      (t1.time_in between t2.time_in and t2.time_out) or
  (t1.time_out between t2.time_in and t2.time_out) or
  (t1.time_in < t2.time_in and t1.time_out > t2.time_out) or
  (t1.time_in > t2.time_in and t1.time_out < t2.time_out)
                      )
                      
) t3
GROUP BY
    emp_id, task_id;
emp_id task_id 小时
43 159221 4.245694444444444
43 177741 1.0286111111111111
43 188239 1.48

在 DB Fiddle 上查看工作演示

我还包括了下面子查询的输出供您阅读

SELECT
        t1.emp_id,
        t1.task_id,
        t1.time_in,
        t1.time_out,
        (datediff(seconds, t1.time_in, t1.time_out)/3600) as task_duration,
         CASE
             WHEN t2.task_id IS NULL THEN 0
             ELSE (datediff(seconds,t2.time_in,t2.time_out)/3600)
         END as overlap_duration,
         t2.task_id as overlap_task
    FROM
        task_times t1
    LEFT JOIN
        task_times t2 ON t1.emp_id = t2.emp_id AND
                         t1.task_id < t2.task_id AND
                         (
                      (t1.time_in between t2.time_in and t2.time_out) or
  (t1.time_out between t2.time_in and t2.time_out) or
  (t1.time_in < t2.time_in and t1.time_out > t2.time_out) or
  (t1.time_in > t2.time_in and t1.time_out < t2.time_out)
                      );
emp_id task_id time_in 超时 任务持续时间 重叠时间 重叠任务
43 159221 2021-09-30T09:00:00.000Z 2021-09-30T14:30:00.000Z 5.5 1.48 188239
43 159221 2021-09-30T09:00:00.000Z 2021-09-30T14:30:00.000Z 5.5 1.0286111111111111 177741
43 188239 2021-09-30T09:16:03.000Z 2021-09-30T10:44:51.000Z 1.48 0
43 177741 2021-09-30T13:12:49.000Z 2021-09-30T14:14:32.000Z 1.0286111111111111 0

在 DB Fiddle 上查看演示

编辑 1

回应问题更新

额外细节:

  1. 可以重叠的任务数是可变的,它不仅最多为 2。
  2. 员工可以在一天中多次进入同一任务

我添加了更多示例数据并提供了更新的小提琴。我无法提供雪花的演示小提琴,但我们可以将其更改EXTRACTDATEDIFF如前所示。

CREATE TABLE task_times (
  emp_id INTEGER,
  task_id INTEGER,
  time_in TIMESTAMP,
  time_out TIMESTAMP
);

INSERT INTO task_times
  (emp_id, task_id, time_in, time_out)
VALUES
-- add repetition of the same tasks with overlaps
  ('43', '159221', '2021-09-30 09:00:00', '2021-09-30 14:30:00'),
  ('43', '159221', '2021-09-30 09:03:00', '2021-09-30 14:27:00'),
  ('43', '159221', '2021-09-30 09:03:00', '2021-09-30 14:38:00'),
--- add overlap of different tasks
  ('43', '188239', '2021-09-30 09:16:03', '2021-09-30 10:44:51'),
  ('43', '188239', '2021-09-30 09:16:03', '2021-09-30 10:43:51'),
  ('43', '177741', '2021-09-30 13:12:49', '2021-09-30 14:14:32'),
-- add multiple overalapping tasks with a difference of 10 minutes
  ('43', '333333', '2021-10-30 09:00:00', '2021-10-30 12:40:00'),
  ('43', '333334', '2021-10-30 09:00:00', '2021-10-30 12:30:00'),
  ('43', '333335', '2021-10-30 09:00:00', '2021-10-30 12:20:00'),
  ('43', '333336', '2021-10-30 09:00:00', '2021-10-30 12:10:00'),
-- add multiple log in and log out times with no overlaps
  ('43', '333337', '2021-10-30 09:00:00', '2021-10-30 12:00:00'),
  ('43', '333337', '2021-10-30 12:30:00', '2021-10-30 13:00:00'),
  ('43', '333337', '2021-10-30 13:30:00', '2021-10-30 14:00:00')
  ;

确定重叠的子查询 (用于调试目的)

SELECT
        t1.emp_id,
        t1.task_id,
        t1.time_in,
        t1.time_out,
        (EXTRACT(epoch FROM t1.time_out-t1.time_in)/3600) as task_duration,
         CASE
             WHEN t2.task_id IS NULL THEN 0
             ELSE (EXTRACT(epoch FROM t2.time_out-t2.time_in)/3600)
         END as overlap_duration,
         t2.task_id as overlap_task
    FROM
        task_times t1
    LEFT JOIN
        task_times t2 ON t1.emp_id = t2.emp_id AND
                         t1.task_id < t2.task_id AND
                         (
                      (t1.time_in between t2.time_in and t2.time_out) or
  (t1.time_out between t2.time_in and t2.time_out) or
  (t1.time_in < t2.time_in and t1.time_out > t2.time_out) or
  (t1.time_in > t2.time_in and t1.time_out < t2.time_out)
                      );
emp_id task_id time_in 超时 任务持续时间 重叠时间 重叠任务
43 159221 2021-09-30T09:00:00.000Z 2021-09-30T14:30:00.000Z 5.5 1.48 188239
43 159221 2021-09-30T09:00:00.000Z 2021-09-30T14:30:00.000Z 5.5 1.4633333333333334 188239
43 159221 2021-09-30T09:00:00.000Z 2021-09-30T14:30:00.000Z 5.5 1.0286111111111111 177741
43 159221 2021-09-30T09:03:00.000Z 2021-09-30T14:27:00.000Z 5.4 1.48 188239
43 159221 2021-09-30T09:03:00.000Z 2021-09-30T14:27:00.000Z 5.4 1.4633333333333334 188239
43 159221 2021-09-30T09:03:00.000Z 2021-09-30T14:27:00.000Z 5.4 1.0286111111111111 177741
43 159221 2021-09-30T09:03:00.000Z 2021-09-30T14:38:00.000Z 5.583333333333333 1.48 188239
43 159221 2021-09-30T09:03:00.000Z 2021-09-30T14:38:00.000Z 5.583333333333333 1.4633333333333334 188239
43 159221 2021-09-30T09:03:00.000Z 2021-09-30T14:38:00.000Z 5.583333333333333 1.0286111111111111 177741
43 188239 2021-09-30T09:16:03.000Z 2021-09-30T10:44:51.000Z 1.48 0
43 188239 2021-09-30T09:16:03.000Z 2021-09-30T10:43:51.000Z 1.4633333333333334 0
43 177741 2021-09-30T13:12:49.000Z 2021-09-30T14:14:32.000Z 1.0286111111111111 0
43 333333 2021-10-30T09:00:00.000Z 2021-10-30T12:40:00.000Z 3.6666666666666665 3.5 333334
43 333333 2021-10-30T09:00:00.000Z 2021-10-30T12:40:00.000Z 3.6666666666666665 3.3333333333333335 333335
43 333333 2021-10-30T09:00:00.000Z 2021-10-30T12:40:00.000Z 3.6666666666666665 3.1666666666666665 333336
43 333333 2021-10-30T09:00:00.000Z 2021-10-30T12:40:00.000Z 3.6666666666666665 3 333337
43 333333 2021-10-30T09:00:00.000Z 2021-10-30T12:40:00.000Z 3.6666666666666665 0.5 333337
43 333334 2021-10-30T09:00:00.000Z 2021-10-30T12:30:00.000Z 3.5 3.3333333333333335 333335
43 333334 2021-10-30T09:00:00.000Z 2021-10-30T12:30:00.000Z 3.5 3.1666666666666665 333336
43 333334 2021-10-30T09:00:00.000Z 2021-10-30T12:30:00.000Z 3.5 3 333337
43 333334 2021-10-30T09:00:00.000Z 2021-10-30T12:30:00.000Z 3.5 0.5 333337
43 333335 2021-10-30T09:00:00.000Z 2021-10-30T12:20:00.000Z 3.3333333333333335 3.1666666666666665 333336
43 333335 2021-10-30T09:00:00.000Z 2021-10-30T12:20:00.000Z 3.3333333333333335 3 333337
43 333336 2021-10-30T09:00:00.000Z 2021-10-30T12:10:00.000Z 3.1666666666666665 3 333337
43 333337 2021-10-30T09:00:00.000Z 2021-10-30T12:00:00.000Z 3 0
43 333337 2021-10-30T12:30:00.000Z 2021-10-30T13:00:00.000Z 0.5 0
43 333337 2021-10-30T13:30:00.000Z 2021-10-30T14:00:00.000Z 0.5 0

实际查询 (我只包含mins用于调试目的的列)

SELECT
     emp_id,
     task_id,
     MAX(CASE
         WHEN overlap_duration>0 THEN task_duration
         ELSE 0
     END ) + SUM(CASE
         WHEN overlap_duration=0 THEN task_duration
         ELSE 0
     END) - AVG(overlap_duration) as hrs,
     (MAX(CASE
         WHEN overlap_duration>0 THEN task_duration
         ELSE 0
     END ) + SUM(CASE
         WHEN overlap_duration=0 THEN task_duration
         ELSE 0
     END) - AVG(overlap_duration))*60 as mins
FROM (
    SELECT
        t1.emp_id,
        t1.task_id,
        t1.time_in,
        t1.time_out,
        (EXTRACT(epoch FROM t1.time_out-t1.time_in)/3600) as task_duration,
         CASE
             WHEN t2.task_id IS NULL THEN 0
             ELSE (EXTRACT(epoch FROM t2.time_out-t2.time_in)/3600)
         END as overlap_duration,
         t2.task_id as overlap_task
    FROM
        task_times t1
    LEFT JOIN
        task_times t2 ON t1.emp_id = t2.emp_id AND
                         t1.task_id < t2.task_id AND
                         (
                      (t1.time_in between t2.time_in and t2.time_out) or
  (t1.time_out between t2.time_in and t2.time_out) or
  (t1.time_in < t2.time_in and t1.time_out > t2.time_out) or
  (t1.time_in > t2.time_in and t1.time_out < t2.time_out)
                      )
                      
) t3
GROUP BY
    emp_id, task_id
ORDER BY
    emp_id, task_id;
emp_id task_id 小时 分钟
43 159221 4.259351851851852 255.5611111111111
43 177741 1.0286111111111111 61.71666666666667
43 188239 2.9433333333333334 176.6
43 333333 0.9666666666666663 57.99999999999998
43 333334 1 60
43 333335 0.25000000000000044 15.000000000000027
43 333336 0.16666666666666652 9.999999999999991
43 333337 4 240

在 DB Fiddle 上查看

让我知道这是否适合您。


推荐阅读