首页 > 解决方案 > 在每个唯一列值之后添加空行

问题描述

我试图在每个唯一Salary列值之后添加空行(不包括没有空行的重复值)。

当前输入:

    Name     Country  Department  Salary
0   John     USA      Finance     12000
1   John     Egypt    Finance     12000
2   Jack     France   Marketing   13000
3   Geroge   UK       Accounts    11000
4   Steven   India    Data        10000
5   Mohammed Jordan   IT          10000

预期输出:

    Name     Country  Department  Salary
0   John     USA      Finance     12000
1   John     Egypt    Finance     12000

2   Jack     France   Marketing   13000

3   Geroge   UK       Accounts    11000

4   Steven   India    Data        10000
5   Mohammed Jordan   IT          10000

我尝试过的:

import pandas as pd

df = pd.DataFrame({'Name': {0: 'John',1: 'John',2: 'Jack',
                            3: 'Geroge',4: 'Steven',5: 'Mohammed'},
                   'Country': {0: 'USA',1: 'Egypt',2: 'France',
                               3: 'UK',4: 'India',5: 'Jordan'},
                   'Department': {0: 'Finance',1: 'Finance',2: 'Marketing',
                                  3: 'Accounts',4: 'Data',5: 'IT'},
                   'Salary': {0: 12000, 1: 12000, 2: 13000, 
                              3: 11000, 4: 10000, 5: 10000}})

df.index = range(0, 2*len(df), 2)
df2 = df.reindex(index=range(2*len(df)))

我得到了什么(这是不正确的):

    Name      Country   Department  Salary
0   John      USA       Finance     12000.0
1   NaN       NaN       NaN         NaN
2   John      Egypt     Finance     12000.0
3   NaN       NaN       NaN         NaN
4   Jack      France    Marketing   13000.0
5   NaN       NaN       NaN         NaN
6   Geroge    UK        Accounts    11000.0
7   NaN       NaN       NaN         NaN
8   Steven    India     Data        10000.0
9   NaN       NaN       NaN         NaN
10  Mohammed  Jordan    IT          10000.0
11  NaN       NaN       NaN         NaN

如果有人可以在这里帮助我,将不胜感激。

标签: pythonpandas

解决方案


国际大学联盟:

尝试通过迭代来附加空数据框groupby()

由于我按“部门”分组,但您也可以根据需要按“工资”或其他列分组

l=[]
for x,y in df.groupby('Department',sort=False):
    l.append(y)
    l.append(pd.DataFrame([[float('NaN')]*len(y.columns)],columns=y.columns))

df=pd.concat(l,ignore_index=True).iloc[:-1]

输出df

    Name        Country     Department  Salary
0   John        USA         Finance     12000.0
1   John        Egypt       Finance     12000.0
2   NaN         NaN         NaN         NaN
3   Jack        France      Marketing   13000.0
4   NaN         NaN         NaN         NaN
5   Geroge      UK          Accounts    11000.0
6   NaN         NaN         NaN         NaN
7   Steven      India       Data        10000.0
8   NaN         NaN         NaN         NaN
9   Mohammed    Jordan      IT          10000.0

推荐阅读