python - 从 pandas.core.groupby.generic.DataFrameGroupBy 中删除空数据框
问题描述
如何从 pandas.core.groupby.generic.DataFrameGroupBy 中删除空数据框?
我的聚合代码:
cols = ["col1", "col2","col3","col4"]
joined = pd.concat(df.reset_index() for df in collectData)
joined = joined.replace({np.nan:1, 0:1})
joined[cols] = joined[cols].mask(joined[cols] < 0, 1)
df = joined.set_index('sensor').groupby(pd.Grouper(freq='D'))
分组后的数据:
list(df)
[(Timestamp('2020-02-04 00:00:00+0000', tz='UTC', freq='D'),
col1 col2 col3 col4
sensor
2020-02-04 00:00:00+00:00 2.586569 0.015321 0.000149 0.884470
2020-02-04 00:00:00+00:00 4.429571 4.049798 1.820845 2.882445
2020-02-04 00:00:00+00:00 12.883314 6.900607 1.002138 3.613021
... ... ... ... ...
2020-02-04 23:45:00+00:00 3.798017 1.605979 0.176515 2.400820
2020-02-04 23:45:00+00:00 5.546771 2.232437 0.233292 3.750547
2020-02-04 23:45:00+00:00 4.910360 3.730932 0.985459 1.238469
[48945 rows x 4 columns]),
(Timestamp('2020-02-05 00:00:00+0000', tz='UTC', freq='D'),
Empty DataFrame
Columns: [col1, col2, col3, col4]
Index: []),
(Timestamp('2020-02-06 00:00:00+0000', tz='UTC', freq='D'),
Empty DataFrame
Columns: [col1, col2, col3, col4]]
Index: []),
(Timestamp('2020-02-07 00:00:00+0000', tz='UTC', freq='D'),
col1 col2 col3 col4
sensor
2020-02-07 00:00:00+00:00 17.065174 3.065422 0.171053 9.048574
2020-02-07 00:00:00+00:00 30.181997 20.651204 4.413567 15.200674
2020-02-07 00:00:00+00:00 1.864378 1.726365 0.819459 1.441588
... ... ... ... ...
2020-02-07 23:45:00+00:00 39.644320 0.234830 0.002289 13.642480
2020-02-07 23:45:00+00:00 30.778517 10.540318 0.944788 13.165241
2020-02-07 23:45:00+00:00 34.610439 25.342142 6.184292 22.725937
[50112 rows x 4 columns]),]
df 的大小df.size()
:
sensor
2020-02-02 00:00:00+00:00 47574
2020-02-03 00:00:00+00:00 49353
2020-02-04 00:00:00+00:00 48945
2020-02-05 00:00:00+00:00 0
2020-02-06 00:00:00+00:00 0
...
2020-09-26 00:00:00+00:00 83680
2020-09-27 00:00:00+00:00 84293
2020-09-28 00:00:00+00:00 84873
2020-09-29 00:00:00+00:00 84306
2020-09-30 00:00:00+00:00 84875
Freq: D, Length: 242, dtype: int64
我需要在应用之前删除空数据框std = df.apply(gstd)
。我不知道空数据框的位置。
https://stackoverflow.com/a/51052536/14338086和https://stackoverflow.com/a/16916611/14338086返回错误。也使用df.filter(lambda x: x.size() != 0)
return TypeError: 'numpy.int64' object is not callable
。dropna()
不可用。
解决方案
我通过以下代码解决了这个问题,也许它可以帮助某人。
cols = [" col1", "col2", "col3", "col4"]
joined = pd.concat(df.reset_index() for df in collectData)
joined = joined.replace({np.nan:1, 0:1})
joined[cols] = joined[cols].mask(joined[cols] < 0, 1)
df = joined.set_index('sensor').groupby(pd.Grouper(freq='D'))
dff = pd.concat(map(lambda x: x[1], df))
means = dff.groupby(dff.index.floor('d')).agg(gmean)
std = dff.groupby(dff.index.floor('d')).agg(gstd)
df_result = pd.merge (left=means, right=std, how='left', on='sensor')
推荐阅读
- task - Snowflake - 使用多个指令(插入、删除、创建视图、创建表等)安排 Script.sql-
- python - 如何替换给定范围内列表列表中的元素?
- microsoft-teams - 如何在 Teams 的连接器设置中保存外部连接?
- asp.net-mvc-5 - 如何将 MVC5 Identity ApplicationDbContext 合并到我自己的 EF DataContext 中
- docker - 使用覆盖入口点运行测试脚本是否是 Docker 中的反模式?
- java - 如何将 websocket 消息发送到会话中的特定订阅?
- java - 从java验证用户
- python - 没有循环或变量的递归函数
- flutter - 如何从 API 设置小部件的位置?
- android - 为什么我的弹出活动出现两次?安卓