首页 > 解决方案 > 如何在 Python 中按两列对 DataFrame 进行排序?

问题描述

我有数据帧 df(TCP 数据包),包括四列服务器、客户端、seq、ack。例如,

server    client    seq         ack
A         B         207876062   2372538506
A         B         207876089   2372538616
B         A         2372538590  207876089
A         B         207876062   2372538590
B         A         2372538506  207876062

我想依次按列 seq 和 ack 排序:

server    client    seq       ack
A         B         207876062   2372538506
B         A         2372538506  207876062
A         B         207876062   2372538590
B         A         2372538590  207876089
A         B         207876089   2372538616

有什么方法可以按正确的顺序排序吗?

谢谢

标签: pythonpandasdataframe

解决方案


考虑到df是您要处理的数据框,我会这样做:

# Step 1 split dataframes between two sub-dataframes
df_a = df[df['server'] == 'A']
df_b = df[df['server'] == 'B']

# Step 2 sorting sub-dataframes by fields seq and ack
df_a = df_a.sort_values(by=['seq', 'ack'])
df_b = df_b.sort_values(by=['seq', 'ack'])

# Step 3 adding a sorting key
df_a['sorting_key'] = range(1, df_a.shape[0] + 1)
df_b['sorting_key'] = range(1, df_b.shape[0] + 1)

# Step 4 shifting the sorting key for the second dataframe
df_b['sorting_key'] = df_b['sorting_key'].apply(lambda x: x + 0.5)

# Step 5 Concatenate the two dataframe and sorting them by the sorting key
df_c = pd.concat([df_a, df_b]).sort_values(by=['sorting_key'])

# Step 6 Clean up a bit the result
df_c = df_c.reset_index(drop=True).drop(['sorting_key'], axis=1)

更新

如果您不知道有多少台服务器,只需添加如下循环:


# Step 1 split dataframes between two sub-dataframes
sub_df = []
for e in set(df['server']):
    sub_df.append(df[df['server'] == e])

# Step 2 sorting sub-dataframes by fields seq and ack and adding a sorting key
sub_df_1 = []
for tdf in sub_df:
    tdf = tdf.sort_values(by=['seq', 'ack'])
    tdf['sorting_key'] = range(1, tdf.shape[0] + 1)
    sub_df_1.append(tdf)

# Step 3 shifting the sorting key for the second dataframe
sub_df_2 = [sub_df_1[0]]
delta = 0.1
for tdf in sub_df_1[1:]:
    tdf['sorting_key'] = tdf['sorting_key'].apply(lambda x: x + delta)
    delta += delta / 10
    sub_df_2.append(tdf)

# Step 4 Concatenate the two dataframe and sorting them by the sorting key
df_c = pd.concat(sub_df_2).sort_values(by=['sorting_key'])

# Step 5 Clean up a bit the result
df_c = df_c.reset_index(drop=True).drop(['sorting_key'], axis=1)

祝你好运


推荐阅读