python - 比较包含日期和时间的数据框中的两列,并在另一列中给出差异
问题描述
我有一个这样的数据框:
datetime1 datetime2
0 2021-05-09 19:52:14 2021-05-09 20:52:14
1 2021-05-09 19:52:14 2021-05-09 21:52:14
我想比较它们并创建一个具有它们之间差异的新列:
理想的输出如下:
datetime1 datetime2 Difference in H:m:s
0 2021-05-09 19:52:14 2021-05-09 20:52:14 01:00:00
1 2021-05-09 19:52:14 2021-05-09 21:52:14 02:00:00
编辑:
@Andrej,当我在 datetime1 和 2 中都有时间戳时,您给我的解决方案非常有效。如果我有一个像下面这样的 df,它就失败了,因为它没有什么可比较的
df1:
datetime1 datetime2
0 2021-05-09 19:52:14 2021-05-09 20:52:14
1 2021-05-09 19:52:14 2021-05-09 21:52:14
2 NaN NaN
3 2021-05-09 16:30:14 NaN
4 NaN NaN
5 2021-05-09 12:30:14 2021-05-09 14:30:14
df2(理想输出):
datetime1 datetime2 Difference in H:m:s Compared with datetime.now()
0 2021-05-09 19:52:14 2021-05-09 20:52:14 01:00:00 NaN
1 2021-05-09 19:52:14 2021-05-09 21:52:14 02:00:00 NaN
2 NaN NaN NaN NaN
3 2021-05-09 16:30:14 NaN NaN e.g(04:00:00)
4 NaN NaN NaN NaN
5 2021-05-09 12:30:14 2021-05-09 14:30:14 02:00:00 NaN
在一个真实的场景中,我有一个案例,我在 datetime1 和 datetime2 中没有值,或者我在 datatime1 中有值但我在 datatime2 中没有值,所以如果有可能在“差异”列中获取 NaN datetime1 和 2 中没有时间戳,如果仅在 datetime1 中有时间戳,则获取与 datetime.now() 相比的差异并将其放在另一列中。
解决方案
尝试:
def strfdelta(tdelta, fmt):
d = {"days": tdelta.days}
d["hours"], rem = divmod(tdelta.seconds, 3600)
d["minutes"], d["seconds"] = divmod(rem, 60)
return fmt.format(**d)
# if datetime1/datetime2 aren't already datetime, apply `.to_datetime()`:
df["datetime1"] = pd.to_datetime(df["datetime1"])
df["datetime2"] = pd.to_datetime(df["datetime2"])
df["Difference in H:m:s"] = df.apply(
lambda x: strfdelta(
x["datetime2"] - x["datetime1"],
"{hours:02d}:{minutes:02d}:{seconds:02d}",
),
axis=1,
)
print(df)
印刷:
datetime1 datetime2 Difference in H:m:s
0 2021-05-09 19:52:14 2021-05-09 20:52:14 01:00:00
1 2021-05-09 19:52:14 2021-05-09 21:52:14 02:00:00
编辑:处理NaN
s:
# if datetime1/datetime2 aren't already datetime, apply `.to_datetime()`:
df["datetime1"] = pd.to_datetime(df["datetime1"])
df["datetime2"] = pd.to_datetime(df["datetime2"])
df["Difference in H:m:s"] = df.apply(
lambda x: strfdelta(
x["datetime2"] - x["datetime1"],
"{hours:02d}:{minutes:02d}:{seconds:02d}",
)
if pd.notna(x["datetime1"]) and pd.notna(x["datetime2"])
else np.nan,
axis=1,
)
df["Compared with datetime.now()"] = df.apply(
lambda x: strfdelta(
pd.Timestamp.now() - x["datetime1"],
"{hours:02d}:{minutes:02d}:{seconds:02d}",
)
if pd.notna(x["datetime1"]) & pd.isna(x["datetime2"])
else np.nan,
axis=1,
)
print(df)
印刷:
datetime1 datetime2 Difference in H:m:s Compared with datetime.now()
0 2021-05-09 19:52:14 2021-05-09 20:52:14 01:00:00 NaN
1 2021-05-09 19:52:14 2021-05-09 21:52:14 02:00:00 NaN
2 NaT NaT NaN NaN
3 2021-05-09 16:30:14 NaT NaN 03:00:20
4 NaT NaT NaN NaN
5 2021-05-09 12:30:14 2021-05-09 14:30:14 02:00:00 NaN
推荐阅读
- docker - 无法在具有 Linux Runner 的 Docker 容器中执行 GO 二进制文件
- python - 如何在 colab 中实现命令行参数?
- python - multiprocessing.Pool().map 工作函数错误中参数的多个值
- r - 如何过滤R中数据框每一列中的NA
- python - Python 脚本能否在其自身死亡时产生一个新进程?
- javascript - 从包含它们的 json 文件中自动化 Dialogflow 中提出的问题和答案
- sql - 如何在 Postgres 9.4 中忽略没有唯一约束的重复项?
- html - 如何填充标签以填充容器内的父 div
- arrays - KOTLIN 比较 n 维数组
- javascript - 未知的自定义元素:您是否正确注册了组件?Vue js Laravel