python - Pandas:在循环中构建新数据框时出现“返回视图与副本”警告
问题描述
假设我有一个包含两个日期时间列的数据框,我想分析它们之间的区别:
import pandas as pd
csv = [
['2019-08-03 00:00:00', '2019-08-01 15:00:00', 4],
['2019-08-03 00:00:00', '2019-08-01 10:00:00', 6],
['2019-08-03 00:00:00', '2019-08-01 16:00:00', 8],
['2019-08-04 00:00:00', '2019-08-02 19:00:00', 3],
['2019-08-04 00:00:00', '2019-08-02 13:00:00', 4],
['2019-08-04 00:00:00', '2019-08-02 11:00:00', 5]
]
df = pd.DataFrame(csv, columns=['delivery_date', 'dispatch_date', 'order_size'])
df['delivery_date'] = pd.to_datetime(df['delivery_date'])
df['dispatch_date'] = pd.to_datetime(df['dispatch_date'])
df['transit_time'] = (df['delivery_date']-df['dispatch_date'])
df = df.set_index(['delivery_date','transit_time'])
好的,现在我们有这样的东西:
dispatch_date order_size
delivery_date transit_time
2019-08-03 1 days 09:00:00 2019-08-01 15:00:00 4
1 days 14:00:00 2019-08-01 10:00:00 6
1 days 08:00:00 2019-08-01 16:00:00 8
2019-08-04 1 days 05:00:00 2019-08-02 19:00:00 3
1 days 11:00:00 2019-08-02 13:00:00 4
1 days 13:00:00 2019-08-02 11:00:00 5
例如,对于每个交货日期,我想知道哪个交货最快(交货时间最短)。我想将结果保存到一个新的数据框中,其中包含原始数据框中的所有列。所以我这样迭代:
delivery_dates = df.index.get_level_values(0).unique()
df_ouput = pd.DataFrame()
for date in delivery_dates:
df_analyzed = df.loc[(date, )].sort_index()
df_result = df_analyzed.iloc[[df_analyzed.index.get_loc(0, method='nearest')]]
df_result.loc[:,'delivery_date'] = date
df_ouput = df_ouput.append(df_result)
df_ouput = df_ouput.reset_index().set_index(['delivery_date'])
结果是正确的:
transit_time dispatch_date order_size
delivery_date
2019-08-03 1 days 08:00:00 2019-08-01 16:00:00 8
2019-08-04 1 days 05:00:00 2019-08-02 19:00:00 3
但我收到警告:
试图在 DataFrame 中的切片副本上设置一个值。尝试改用 .loc[row_indexer,col_indexer] = value 查看文档中的注意事项:http: //pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
而且我不知道为什么,因为我已经在使用“.loc”方法进行分配:
df_result.loc[:,'delivery_date'] = date
但我无法摆脱警告,所以我来到了这个罕见的解决方案:
delivery_dates = df.index.get_level_values(0).unique()
df_ouput = pd.DataFrame()
for date in delivery_dates:
df_analyzed = df.loc[(date, )].sort_index()
df_result = df_analyzed.iloc[[df_analyzed.index.get_loc(0, method='nearest')]]
df_result_2 = df_result.copy()
df_result_2.loc[:,'delivery_date'] = date
df_ouput = df_ouput.append(df_result_2)
df_ouput = df_ouput.reset_index().set_index(['delivery_date'])
如果进行复制,则不会显示警告。但为什么?有没有更好的方法来做我想做的事?
解决方案
您的解决方案应更改copy
为进行过滤:
delivery_dates = df.index.get_level_values(0).unique()
df_ouput = pd.DataFrame()
for date in delivery_dates:
df_analyzed = df.loc[date].sort_index()
df_result = df_analyzed.iloc[[df_analyzed.index.get_loc(0, method='nearest')]].copy()
df_result['delivery_date'] = date
df_ouput = df_ouput.append(df_result)
df_ouput = df_ouput.reset_index().set_index(['delivery_date'])
print (df_ouput)
transit_time dispatch_date order_size
delivery_date
2019-08-03 1 days 08:00:00 2019-08-01 16:00:00 8
2019-08-04 1 days 05:00:00 2019-08-02 19:00:00 3
具有自定义功能的更好解决方案GroupBy.apply
:
def f(x):
x = x.sort_index(level=1)
s = x.iloc[[x.index.get_level_values(1).get_loc(0, method='nearest')]]
return s
df = df.groupby(level=0).apply(f).reset_index(level=0, drop=True)
print (df)
dispatch_date order_size
delivery_date transit_time
2019-08-03 1 days 08:00:00 2019-08-01 16:00:00 8
2019-08-04 1 days 05:00:00 2019-08-02 19:00:00 3
或者:
def f(x):
x = x.sort_index(level=1)
s = x.iloc[[x.index.get_level_values(1).get_loc(0, method='nearest')]]
return s
df = df.groupby(level=0, group_keys=False).apply(f)
print (df)
dispatch_date order_size
delivery_date transit_time
2019-08-03 1 days 08:00:00 2019-08-01 16:00:00 8
2019-08-04 1 days 05:00:00 2019-08-02 19:00:00 3
如果理解得好:
df = df.sort_index()
df = df[~df.index.get_level_values(0).duplicated()]
print (df)
dispatch_date order_size
delivery_date transit_time
2019-08-03 1 days 08:00:00 2019-08-01 16:00:00 8
2019-08-04 1 days 05:00:00 2019-08-02 19:00:00 3
推荐阅读
- reactjs - React Redux - 在调度下一个操作之前等待异步 api 调用完成
- python - 使用 python 和 tkcalendar 时更改日期格式
- business-objects - Web Intelligence:输入控件日历和空变量
- ios - Xcode/iOS App 冻结,如何跟踪此类错误
- excel - 我可以将多个宏放在一个中吗?
- c# - 我正在寻找一种将输出转换为八位字节的方法
- java - How to Covert a @XmlType to Json string?
- swift - SwiftUI 的 LocalizedStringKey 的 String.localizedStringWithFormat(_:_:) 等价于什么?
- angular - 警告:无法完全加载 /node_modules/ag-grid-angular/main.js 以进行源地图展平
- python - 加载微调的 EfficientDet 模型时出错