python - 将一定数量的变量从一组添加到另一组
问题描述
我有一个 pandas 数据框,其中我object
将相同的 s分块type
成一定数量的组(例如,3)。例如,该组ball_1
包含 3 个来自同一类型的唯一对象:soccer
、basket
和 bouncy
。剩余的对象进入组ball_2
,在这种情况下,只有 1 个对象tennis
。
对于包含少于 3 个唯一对象的组,我想用第一组的前 k 个唯一对象填充它们。例如, groupball_2
将被填充,然后tennis
来自group 。因此,目标是所有组都拥有相同数量的唯一对象。soccer
basket
ball_1
# chunk into groups of 3
N = 3
g = df.groupby('type')['object'].transform(lambda x: pd.factorize(x)[0]) // N + 1
df['group'] = df['type'].str.cat(g.astype(str), '_')
# identify which groups need more objects
for name, batch in df.groupby(['group']):
subset = df[df.group.isin([name])]
batch = batch.assign(check = subset['object'].nunique() < 3)
batch = batch.assign(need = 3 - subset['object'].nunique())
needmore = batch.loc[batch['check'] == True]
if needmore.empty:
continue
print('{} needs {} more objects'.format(batch['group'].unique(), batch['need'].unique()))
当前 df(这个玩具数据集具有选定的列,但实际数据集有更多列)
type object index group
0 ball soccer 1 ball_1
1 ball soccer 2 ball_1
2 ball basket 1 ball_1
3 ball bouncy 1 ball_1
4 ball tennis 1 ball_2
5 ball tennis 2 ball_2
6 chair office 1 chair_1
7 chair office 2 chair_1
8 chair office 3 chair_1
9 chair lounge 1 chair_1
10 chair dining 1 chair_1
... ... ... ......
所需的 df(已将对象添加到 group ball_2
)
type object index group
0 ball soccer 1 ball_1
1 ball soccer 2 ball_1
2 ball basket 1 ball_1
3 ball bouncy 1 ball_1
4 ball tennis 1 ball_2
5 ball tennis 2 ball_2
6 ball soccer 1 ball_2
7 ball soccer 2 ball_2
8 ball basket 1 ball_2
9 chair office 1 chair_1
10 chair office 2 chair_1
11 chair office 3 chair_1
12 chair lounge 1 chair_1
13 chair dining 1 chair_1
... ... ... ......
解决方案
你可以试试这个:
def addfisrtgroup(x):
missing=np.arange(3-x.nunique().object)
typegroup=x.iloc[0,0]
msk=np.isin(df.loc[df.group.eq(f'{typegroup}_1')].object.factorize()[0],missing)
return pd.concat([x,df.loc[df.group.eq(f'{typegroup}_1')][msk]])
temp=df.groupby('group')
.apply(lambda x: addfirstgroup(x) if x.nunique().object<3 else x)
.drop(columns='group')
groups=temp.index.get_level_values(0).to_frame().reset_index(drop=True)
pd.concat([temp.reset_index(drop=True), groups],1)
输出:
type object index group
0 ball soccer 1 ball_1
1 ball soccer 2 ball_1
2 ball basket 1 ball_1
3 ball bouncy 1 ball_1
4 ball tennis 1 ball_2
5 ball tennis 2 ball_2
6 ball soccer 1 ball_2
7 ball soccer 2 ball_2
8 ball basket 1 ball_2
9 chair office 1 chair_1
10 chair office 2 chair_1
11 chair office 3 chair_1
12 chair lounge 1 chair_1
13 chair dining 1 chair_1
推荐阅读
- dll - 如何将非托管 DLL 和图像与托管 DLL 合并?
- c# - 为什么 SemaphoreSlim 不在异步任务中释放?
- xcode - Macports 上的 xcode 10.0
- reactjs - 图像阵列上的灯箱
- r - 解决内存泄漏 - Shiny R
- flutter - 具有 BLoC 模式的 BottomNavigationBar
- java - 值不能转换为 JSON 数组
- java - 为什么在搜索项目期间我的回收站视图适配器清除?
- c# - 如何从 Math.Round 语句中查找变量
- datatable - 如何在 Flutter 中的 DataTable 内监听 DataRow 的点击