首页 > 解决方案 > Slicing within the groups of a DataFrameGroupBy object

问题描述

Python version: 3.7.3

Something similar was asked here, but it's not quite the same.

Based on a condition, I would like to retrieve only a subset of each group of the DataFrameGroupBy object. Basically, if a DataFrame starts with rows with only NANs, I want to delete those. If it isn't the case, I want the entire DataFrame to keep intact. To accomplish this, I wrote a function delete_rows.

Grouped_object = df.groupby(['col1', 'col2']) 

def delete_rows(group):
  pos_min_notna = group[group['cumsum'].notna()].index[0]
  return group[pos_min_notna:]

new_df = Grouped_object.apply(delete_rows)

However, this function seems to only do the "job" for the first group in the DataFrameGroupBy object. What am I missing, so it does this for all the groups and "glues" the subsets together?

Function delete_rows edited according to logic as provided by Laurens Koppenol

标签: pythonpandasdataframesubsetslice

解决方案


In Pandas you have to be very careful with index (loc) and index locations (iloc). It is always a good idea to make this explicit.

This answer has a great overview of the differences

Grouped_object = df.groupby(['col1', 'col2']) 

def delete_rows(group):
  pos_min_notna = group[group['cumsum'].notna()].index[0]  # returns value of the index = loc
  return group.loc[pos_min_notna:]  # make loc explicit

new_df = Grouped_object.apply(delete_rows)  # this dataframe has a messed up index :)

Minimal example Showing the unwanted behavior

df = pd.DataFrame([[1,2,3], [2,4,6], [2,4,6]], columns=['a', 'b', 'c'])

# Drop the first row of every group
df.groupby('a').apply(lambda g: g.iloc[1:])

# Identical results as:
df.groupby('a').apply(lambda g: g[1:])

# Return anything from any group with index 1 or higher
# This is nonsense with a static index in a sorted df. But examples huh
df.groupby('a').apply(lambda g: g.loc[1:])



推荐阅读