python - 按字符串列和 datetime64[ns] 列分组
问题描述
我有以下数据,我想知道:每位司机每天接载的第一个和最后一个客户是谁?
这就是我刚刚得到的:
#Import libraries
import pandas as pd
import numpy as np
#Open and clean the data
df = pd.read_csv('Data.csv')
df = df.drop(['Cod'], axis=1)
df['Start'] = pd.to_datetime(df['Start'])
df['End'] = pd.to_datetime(df['End'])
#the following code is to respond the following question:
#Who was the first and last customer that each Driver picked-up for each day?
#link to access the data: https://drive.google.com/file/d/194byxNkgr2e9r-IOEmSuu9gpZyw27G7j/view?usp=sharing
unique_drivers = df['Driver'].value_counts()
for driver in unique_drivers:
d= vdf.groupby('Driver').get_group(driver)
time = d['Start'][0]
first_customer = d['Customer'][0]
end = d['End'][0]
last_customer = d['Customer'][-1]
解决方案
您可以首先按包含小时和分钟的列Start
进行排序,以确保对多个同一天的事件进行正确排序以进行下一步。将框架分组Driver
以查找每天接载的司机。
使用drop_duplicates
标志删除重复值keep="first"
在评估期间仅保留第一个值,类似地用于keep="last"
仅保留最后一个(来自重复值的集群)。这将为每个司机生成唯一的日期,每天的第一个接载日期和最后一个接载日期,然后使用Customer
列上那些日子的索引来获取客户的姓名。
import pandas as pd
df = pd.read_csv("data.csv")
print(df)
# sort including HH:MM
df = df.sort_values("Start")
drivers_df = []
for gname, group in df.groupby("Driver"):
dn = pd.DataFrame()
# split to get date and time in two columns
ts = group["Start"].str.split(expand=True)
# remove duplicate days keeping the first occurance
t_first = ts.drop_duplicates(subset=[0], keep="first")
# remove duplicate days keeping the last occurance
t_last = ts.drop_duplicates(subset=[0], keep="last")
dn["Date"] = t_first[0]
dn["Driver"] = gname
dn["Num_Customers"] = ts[0].groupby(ts[0]).count().values
# use the previous obtained indices over the "Customer" column
dn["First_Customer"] = df.loc[t_first.index, "Customer"].values
dn["Last_Customer"] = df.loc[t_last.index, "Customer"].values
drivers_df.append(dn)
dn = pd.concat(drivers_df)
# remove to sort by driver's name
dn = dn.sort_values("Date")
dn = dn.reset_index(drop=True)
print(dn)
来自dn的输出
Date Driver Num_Customers First_Customer Last_Customer
0 5/10/2020 Javier Pulgar 1 100998 - MARA MIRIAN BEATRIZ 100998 - MARA MIRIAN BEATRIZ
1 5/10/2020 Santiago Muruaga 1 103055 - ZANOTTO VALERIA 103055 - ZANOTTO VALERIA
2 5/10/2020 Martín Gutierrez 1 105645 - PAWLIW MARTA SOFI... 105645 - PAWLIW MARTA SOFI...
3 5/10/2020 Pablo Aguilar 2 102737 - GONZALVE DE ACEVEDO 102737 - GONZALVE DE ACEVEDO
4 5/10/2020 Carlos Medina 1 102750 - COOP.DE TRABAJO 102750 - COOP.DE TRABAJO
5 5/11/2020 Facundo Papaleo 6 101209 - FARMACIA NAZCA 2602 105093 - BIO HERLPER
6 5/11/2020 Franco Chiarrappa 15 100288 - SAVINI LUCIANA MARIA 102690 - GIOIA ELIZABETH
7 5/11/2020 Hernán Navarro 14 106367 - FARMACIA BERAPHAR... 102631 - SPALVIERI MARINA
8 5/11/2020 Pablo Aguilar 9 102510 - CAZADORFARM SCS 101482 - JOAQUIN MARCIAL
9 5/11/2020 Daniel Godino 7 103572 - GIRALDEZ ALICIA OLGA 103363 - CADELLI ROBERTO JOSE
10 5/11/2020 Hernán Urquiza 1 105323 - GARCIA GERMAN REI... 105323 - GARCIA GERMAN REI...
11 5/11/2020 Héctor Naselli 19 103545 - FARMACIA DESANTI 102257 - FARMA NUOVA S.C.S.
12 5/11/2020 Santiago Muruaga 12 101735 - ALEGRE LEONARDO 500014 - Drogueria DIMEC
13 5/11/2020 Javier Pulgar 2 101009 - MIGUEL ANGEL MARA 103462 - DRAGONE CARLOS AL...
14 5/11/2020 Atilano Aguilera 1 104003 - FARMACIA SANTA 104003 - FARMACIA SANTA
15 5/11/2020 Muletto 3 101359 - FARMACIA COSENTINO 105886 - NEGRI GREGORIO
16 5/11/2020 Martín Venturino 8 102587 - JANISZEWSKI MATIL... 102672 - BORSOTTI GUSTAVO
17 5/11/2020 Martín Gutierrez 1 105645 - PAWLIW MARTA SOFI... 105645 - PAWLIW MARTA SOFI...
18 5/11/2020 José Vallejos 13 102229 - LANDRIEL MARIA LO... 105721 - SOSA NANCY EDITH ...
19 5/11/2020 Edgardo Andrade 9 101524 - FARMACIA M Y A 101217 - MARISA TESORO
20 5/11/2020 Carlos Medina 14 105126 - QUISPE MURILLO RODY 100538 - MAXIMILIANO CAMPO...
21 5/11/2020 Javier Torales 1 200666 - CLINICA BOEDO SRL 200666 - CLINICA BOEDO SRL
22 5/12/2020 Hernán Urquiza 8 105293 - BENSAK MARIA EUGENIA 103005 - BONVISSUTO SANDRA
23 5/12/2020 Miguel Quilici 17 102918 - BRITO NICOLAS 102533 - SAMPEDRO PURA
...
...
推荐阅读
- r - 如何从数据框中删除列组合并分配给新的数据框
- testing - 查询中的空手道graphql变量
- javascript - 按用户 ID 查找所有帖子
- c# - 比较枚举为 0 默认值
- r - 使用 Shiny 在反应式环境中更新绘图的最佳方法
- correlation - R psych::statsBy() 错误:“'x' 必须是数字”
- javascript - 数组推送方法是重复元素,新的 Set 方法不删除重复的元素
- c# - 带有子选择的EF Core 5.0 Union Linq查询不起作用
- java - 有没有办法通过java代码在draft-07 json模式中获取所需的字段数组
- c# - 亚马逊 MWS:计算的签名与将工作 PHP 解决方案转换为 c# 时提供的不匹配