首页 > 解决方案 > 如何在特定条件下将 Pandas 数据框的一行拆分为两行

问题描述

假设我有这样的数据

    id      Date        Time_Start  Time_End          start              stop           split

0   011     2017-08-01  20:20   21:40   2017-08-01 20:20:00     2017-08-01 21:40:00     False
1   012     2017-08-01  17:15   19:12   2017-08-01 17:15:00     2017-08-01 19:12:00     True
2   013     2017-08-01  15:46   16:20   2017-08-01 15:46:00     2017-08-01 16:20:00     False

截止时间为每天18:00。因此,例如,“012”应拆分为两行,第一行的停止列应更新为 2017-08-01 17:59:00,而第二行的开始将是 2017-08-01 18: 00:00 其余时间保持不变

    id      Date        Time_Start  Time_End          start              stop              split    birth_date
    0   011     2017-08-01  20:20   21:40   2017-08-01 20:20:00     2017-08-01 21:40:00     False   2017-08-01
    1   012     2017-08-01  17:15   19:12   2017-08-01 17:15:00     2017-08-01 17:59:00     True    2017-08-01
    1   012     2017-08-01  17:15   19:12   2017-08-01 18:00:00     2017-08-01 19:12:00     True    2017-08-02
    2   013     2017-08-01  15:46   16:20   2017-08-01 15:46:00     2017-08-01 16:20:00     False   2017-08-01

请注意,如果在 18:00 之前的停止时间“birth_date”将与“date”同一天,但如果在剪切之后,“birth_date”将是次日。

下面是我迄今为止一直在使用的代码,我被困在我要问的地方。因此,任何帮助将不胜感激。

import pandas as pd    
from datetime import datetime, time

def make_date_time(df):
    df["start"] = pd.to_datetime(df["Date"].apply(str) + " " + df["Time_Start"])
    df["stop"] = pd.to_datetime(df["Date"].apply(str) + " " + df["Time_End"])
    def check_date_time(df):
        if df["start"] > df["stop"]:
            df["stop"] += pd.Timedelta(days=1)
        return df["stop"]
    df["stop"] = df.apply(check_date_time, axis=1)
    return df

def in_cut(df):
    reference = df["start"].replace(hour=18, minute=0, second=0)
    if df["start"] <= df["stop"]:
        return df["start"] <= reference < df["stop"]
    else:
        return df["start"] <= reference or reference < df["stop"]



data = {"id":["011","012","013"], "Date": ["2017-08-01", "2017-08-01", "2017-08-01"], "Time_Start":["20:20", "17:15", "15:46"], "Time_End":["21:40", "19:12", "16:20"]}
df = pd.DataFrame.from_dict(data)
df = make_date_time(df)
df["split"] = False
df["stop"] = df.apply(check_date_time, axis=1)
df["split"] = df.apply(in_cut, axis=1)
df

标签: pythonpandasdatetimedataframe

解决方案


根据我对问题的理解,您希望在 split 为 True 时更新停止和开始时间。以下是我的方法(我猜它需要一些改进.. :))。希望这可以帮助。

import datetime

next_df = pd.DataFrame(columns=list(df.columns))
next_df['BirthDate'] = ''
pos_new = 0
pos_old = 0
for i in range(len(df)):
    if df['split'][i]:
        temp = list(df.iloc[i])
        print(temp)
        df['stop'][i] = (df['stop'][i]).replace(hour=17, minute=59, second=0)

        temp_list = list(df.loc[pos_old])
        temp_list.append(df.loc[pos_old][0])

        next_df.loc[pos_new] = temp_list
        pos_old+=1
        pos_new+=1

        temp[4] = temp[4].replace(hour=18, minute=0, second=0)
        print(temp)
        #conversion of date
        temp_date = datetime.datetime.strptime(temp[0], "%Y-%m-%d")
        temp.append((temp_date + datetime.timedelta(days=1)).strftime("%Y-%m-%d"))
        next_df.loc[pos_new] = temp

        pos_new+=1
    else:
        temp_list = list(df.loc[pos_old])
        temp_list.append(df.loc[pos_old][0])
        next_df.loc[pos_new] = temp_list
        pos_old+=1
        pos_new+=1

编辑

我必须对您的代码进行一些更改才能成功生成我想要的内容。因为这个解决方案做了我想要的,虽然它不是很pythonic。所以,我会接受它作为正确的答案。

以下是我更新的代码。

import datetime

next_df = pd.DataFrame(columns=list(df.columns))
next_df['BirthDate'] = ''
pos_new = 0
pos_old = 0
for i in range(len(df)):
    if df['split'][i]:
        # get one row at a time and converse it to a list
        temp = list(df.iloc[i])

        # update stop time to 17:59:00
        df['stop'][i] = (df['stop'][i]).replace(hour=17, minute=59, second=0)

        temp_list = list(df.loc[pos_old])
        # append birth adte to the list
        temp_list.append(temp[4].date().strftime("%Y-%m-%d"))
        # add this row to new df
        next_df.loc[pos_new] = temp_list
        # update the pointers for old and new df 
        pos_old+=1
        pos_new+=1
        temp[4] = temp[4].replace(hour=18, minute=0, second=0)

        # conversion of date
        temp_date = temp[4].date() 
        # plus one to date as this case is considered to happen in the following day
        temp.append((temp_date + datetime.timedelta(days=1)).strftime("%Y-%m-%d"))
        # add the new row to df
        next_df.loc[pos_new] = temp
        # update the pointer of new df for the split row
        pos_new+=1
    else:
        temp_list = list(df.loc[pos_old])
        temp_list.append(temp[4].date().strftime("%Y-%m-%d"))
        next_df.loc[pos_new] = temp_list
        # update the pointers for old and new df 
        pos_old+=1
        pos_new+=1

next_df

推荐阅读