python - Pandas DF - 将时间 b/w 2 时间戳切入小时箱
问题描述
假设我在 df 中有这种格式的数据
id sta end dur
40433 2020-01-08 05:06:01 2020-01-08 05:08:14 133
40433 2020-09-22 12:01:26 2020-09-22 12:31:34 1808
40433 2020-09-22 12:05:00 2020-09-22 13:05:00 3600
也许在同一个 df 或一个新的 df 中,我想添加如下所示的记录:
id sta end h1 dur
40433 2020-01-08 05:06:01 2020-01-08 05:08:14 05 133
40433 2020-09-22 12:01:26 2020-09-22 12:31:34 12 1808
40433 2020-09-22 12:05:00 2020-09-22 13:05:00 12 3300
40433 2020-09-22 12:05:00 2020-09-22 13:05:00 13 300
dur
以秒为单位。
我想groupby
id
,然后day
(从 中提取sta
),然后,为特定时间()h1, h2, etc.
聚合。dur
h1, etc.
id
解决方案
根据您的评论修改答案。为了更快地转身,在尝试了其他一些方法后,我通过一些转换进行了数组数学。可能有一种更有效的方法,不确定它如何大规模执行,但它确实有效。需要注意的是,如果您的持续时间总计超过 24 小时,所有小时列的值将全部为 60 分钟,所以我不理会该条件,以便您可以根据需要处理它:
import cudf
import cupy as cp
#If your duration goes over 24 hours total, ALL hour column values will be all 60 minutes.
sta = ['2020-01-08 05:06:01', '2020-09-22 12:01:26', '2020-09-22 12:05:00', '2020-09-22 01:15:00', '2020-09-22 21:05:00']
end = ['2020-01-08 05:08:14', '2020-09-22 12:31:34', '2020-09-22 13:05:00', '2020-09-22 08:05:00', '2020-09-23 01:05:00']
#put it in a dataframe
df = cudf.DataFrame({'sta': sta, 'end':end})
print(df.head())
#the object is a string, so let's convert it to date time
df['sta']= df['sta'].astype('datetime64[s]')
df['end']=df['end'].astype('datetime64[s]')
df['dur']=(df['end']-df['sta']).astype('int64')
#create new df of same type to convert to cupy (to preserve datetime values)
df2=cudf.DataFrame()
df2['dur']=(df['end']-df['sta']).astype('int64')
df2['min_sta'] =df['sta'].dt.minute.astype('int64')
df2['min_end']= df['end'].dt.minute.astype('int64')
df2['h_sta']= df['sta'].dt.hour.astype('int64')
df2['h_end']= df['end'].dt.hour.astype('int64')
df2['day']=df['sta'].dt.day.astype('int64')
print(df2)
#convert df2's values from df to cupy array (you can use numpy if on pandas)
a = cp.fromDlpack(df2.to_dlpack())
print(a)
#create new temp cupy array b to contain minute duration per hour. This algo will work with numpy by using mumpy instead of cupy
b = cp.zeros((len(a),24))
for j in range(0,len(a)):
hours = int((a[j][0]/3600)+(a[j][1]/60))
if(hours==0): # within same hour
b[j][a[j][3]] = int(a[j][0]/60)
elif(hours==1): #you could probably delete this condition.
b[j][a[j][3]] = 60-a[j][1]
b[j][a[j][4]] = a[j][2]
else:
b[j][a[j][3]] = 60-a[j][1]
if(hours<24): #all array elements will be all 60 minutes if duration is over 24 hours
if(a[j][3]+hours<24):
b[j][a[j][3]+1:a[j][3]+hours]=60
b[j][a[j][4]] = a[j][2]
else:
b[j][a[j][3]+1:24]=60
b[j][0:(a[j][3]+1+hours)%24]=60
b[j][a[j][4]] = a[j][2]
# bring cupy array b back to a df.
reshaped_arr = cp.asfortranarray(b)
cpdf = cudf.from_dlpack(reshaped_arr.toDlpack())
print(cpdf.head())
#concat the original and cupy df
df = cudf.concat([df, cpdf], axis=1)
print(df.head())
#you can rename the columns with "h" as you wish
推荐阅读
- matlab - 使用 imread 从 windows 读取图像
- java - ARCore TransformableNode 拖动后将 localPosition 设置为 [x=0.0, y=0.0, z=0.0]
- javascript - 将 NodeJS 与电子应用程序一起打包
- aem - 在 AEM 中激活内容/页面和发布页面有什么区别?
- powershell - 使用 PowerShell 按邮箱大小对 csv 输出文件进行排序
- dependency-injection - 如何订阅 .net Core 中的事件?
- c# - 在另一个文本框中显示字符
- javascript - 如何在 Google Analytics 中设置自定义维度
- angular - 通过 http 使用 API
- angular - 离子获取httpClient