首页 > 解决方案 > 如何增加每列/组的索引

问题描述

我需要从以下格式格式化数据框:

| country   | county   | city   | street    |
|-----------|----------|--------|-----------|
| country 1 | county 1 | city 1 | street 1  |
| country 1 | county 1 | city 1 | street 2  |
| country 1 | county 1 | city 2 | street 3  |
| country 2 | county 2 | city 3 | street 4  |
| country 2 | county 2 | city 3 | street 5  |
| country 3 | county 3 | city 4 | street 6  |
| country 3 | county 4 | city 5 | street 7  |
| country 3 | county 4 | city 6 | street 8  |
| country 3 | county 4 | city 6 | street 9  |
| country 3 | county 4 | city 6 | street 10 |

| country   | county   | city   | street    | count |
|-----------|----------|--------|-----------|-------|
| country 1 |          |        |           | 3     |
|           | county 1 |        |           | 3     |
|           |          | city 1 |           | 2     |
|           |          |        | street 1  | 1     |
|           |          |        | street 2  | 1     |
|           |          | city 2 |           | 1     |
|           |          |        | street 3  | 1     |
| country 2 |          |        |           | 2     |
|           | county 2 |        |           | 2     |
|           |          | city 3 |           | 2     |
|           |          |        | street 4  | 1     |
|           |          |        | street 5  | 1     |
| country 3 |          |        |           | 5     |
|           | county 3 |        |           | 1     |
|           |          | city 4 |           | 1     |
|           |          |        | street 6  | 1     |
|           | county 4 |        |           | 4     |
|           |          | city 5 |           | 1     |
|           |          |        | street 7  | 1     |
|           |          | city 6 |           | 3     |
|           |          |        | street 8  | 1     |
|           |          |        | street 9  | 1     |
|           |          |        | street 10 | 1     |

列数可能会有所不同。

我正在使用多个groupby管理计数并尝试在 python 中格式化但没有成功。有办法只用熊猫吗?

标签: pandasdataframepandas-groupby

解决方案


您可以遍历列本身并依靠在DataFrame.value_counts()不同的嵌套级别上进行计数。您需要在执行此操作时使用索引,以便稍后正确重新对齐所有内容,但最后您只需pd.concat将这些块粘在一起:

chunk_counts = []

for col in test_df.columns:
    counts = test_df.loc[:, :col].value_counts()
    n_empty_levels = test_df.columns.size - test_df.columns.get_loc(col) - 1
    empty_levels = [[""]] * n_empty_levels
    
    new_levels = [*counts.index.levels, *empty_levels]
    new_index = pd.MultiIndex.from_product(new_levels, names=test_df.columns)
    
    chunk_counts.append(counts.reindex(new_index))
    
final_series = (pd.concat(chunk_counts)
                .sort_index()
                .dropna()
                .astype(int)
                .rename("count"))

如果你, repr看起来很好print(final_series),但是多索引在每个嵌套级别下面没有空条目(只是以这种方式MultiIndex显示。当我们使用时,这变得很明显reset_index。要将我们的系列放回框架中需要保持 OP 请求的格式,我们需要再做一些调整。

index_cols = final_series.index.names
final_df = final_series.reset_index()
final_df[index_cols] = final_df[index_cols].where(~final_df[index_cols].apply(pd.Series.duplicated))
final_df = final_df.fillna("")

print(final_df)

      Country    County    City     Street  count
0   Country 1                                   3
1              County 1                         3
2                        City 1                 2
3                                 Street 1      1
4                                 Street 2      1
5                        City 2                 1
6                                 Street 3      1
7   Country 2                                   2
8              County 2                         2
9                        City 3                 2
10                                Street 4      1
11                                Street 5      1
12  Country 3                                   5
13             County 3                         1
14                       City 4                 1
15                                Street 6      1
16             County 4                         4
17                       City 5                 1
18                                Street 7      1
19                       City 6                 3
20                               Street 10      1
21                                Street 8      1
22                                Street 9      1

推荐阅读