python - Slicing within the groups of a DataFrameGroupBy object
问题描述
Python version: 3.7.3
Something similar was asked here, but it's not quite the same.
Based on a condition, I would like to retrieve only a subset of each group of the DataFrameGroupBy object. Basically, if a DataFrame starts with rows with only NANs, I want to delete those. If it isn't the case, I want the entire DataFrame to keep intact. To accomplish this, I wrote a function delete_rows
.
Grouped_object = df.groupby(['col1', 'col2'])
def delete_rows(group):
pos_min_notna = group[group['cumsum'].notna()].index[0]
return group[pos_min_notna:]
new_df = Grouped_object.apply(delete_rows)
However, this function seems to only do the "job" for the first group in the DataFrameGroupBy
object. What am I missing, so it does this for all the groups and "glues" the subsets together?
Function delete_rows
edited according to logic as provided by Laurens Koppenol
解决方案
In Pandas you have to be very careful with index (loc
) and index locations (iloc
). It is always a good idea to make this explicit.
This answer has a great overview of the differences
Grouped_object = df.groupby(['col1', 'col2'])
def delete_rows(group):
pos_min_notna = group[group['cumsum'].notna()].index[0] # returns value of the index = loc
return group.loc[pos_min_notna:] # make loc explicit
new_df = Grouped_object.apply(delete_rows) # this dataframe has a messed up index :)
Minimal example Showing the unwanted behavior
df = pd.DataFrame([[1,2,3], [2,4,6], [2,4,6]], columns=['a', 'b', 'c'])
# Drop the first row of every group
df.groupby('a').apply(lambda g: g.iloc[1:])
# Identical results as:
df.groupby('a').apply(lambda g: g[1:])
# Return anything from any group with index 1 or higher
# This is nonsense with a static index in a sorted df. But examples huh
df.groupby('a').apply(lambda g: g.loc[1:])
推荐阅读
- corda - 如何在 Corda 中使用 Oracle 服务签署交易?
- jenkins - Jenkins 中的 Allure 报告未加载,它显示浏览器上的加载
- python - 迭代python循环直到上一个位置的有效方法
- scala - is it necessary to add my custom scala library dependencies in new scala project?
- here-api - 在 ESRI JS Map 上绘制 HERE 数据
- coq - 使用依赖类型减少参数
- javascript - Ember JQuery UI 导入覆盖 JQuery $ 行为
- php - PHP curl Keep-Alive - CURLOPT_FORBID_REUSE
- http - 使用 http 调用登录 Zoho CRM
- java - IntelliJ Spring Boot 运行配置不考虑 maven 依赖的测试范围