首页 > 解决方案 > 用多列对 pandas DataFrame 进行分组

问题描述

假设我在pandasDataFrame 中有这个:

+-------------------+-----------------+----------+---------+----------+--------+--------------+
| Family            | Genus           | Species  | hasHair | laysEggs | canFly | hasLongHorns |
+-------------------+-----------------+----------+---------+----------+--------+--------------+
| Bovidae           | Ovis            | Sheep    |    1    |     0    |    0   |       0      |
+-------------------+-----------------+----------+---------+----------+--------+--------------+
| Passeroidea       | Passeridae      | Sparrow  |    0    |     1    |    1   |       0      |
+-------------------+-----------------+----------+---------+----------+--------+--------------+
| Ornithorhynchidae | Ornithorhynchus | Platypus |    1    |     1    |    0   |       0      |
+-------------------+-----------------+----------+---------+----------+--------+--------------+
| Bovidae           | Ovis            | Mouflon  |    1    |     0    |    0   |       1      |
+-------------------+-----------------+----------+---------+----------+--------+--------------+
| Passeroidea       | Passeridae      | Passer   |    0    |     1    |    1   |       0      |
+-------------------+-----------------+----------+---------+----------+--------+--------------+

我想“总结”数据以获得以下信息:

+-------------------+-----------------+----------+---------+----------+--------+--------------+
| Family            | Genus           | Species  | hasHair | laysEggs | canFly | hasLongHorns |
+-------------------+-----------------+----------+---------+----------+--------+--------------+
| Bovidae           | Ovis            | Sheep    |    1    |     0    |    0   |       0      |
+-------------------+-----------------+----------+---------+----------+--------+--------------+
|                   |                 | Mouflon  |    1    |     0    |    0   |       1      |
+-------------------+-----------------+----------+---------+----------+--------+--------------+
| Ornithorhynchidae | Ornithorhynchus | Platypus |    1    |     1    |    0   |       0      |
+-------------------+-----------------+----------+---------+----------+--------+--------------+
| Passeroidea       | Passeridae      | Sparrow  |    0    |     1    |    1   |       0      |
+-------------------+-----------------+----------+---------+----------+--------+--------------+
|                   |                 | Passer   |    0    |     1    |    1   |       0      |
+-------------------+-----------------+----------+---------+----------+--------+--------------+

如您所见,与实际数据处理相比,这是一种增强可读性的布局:属性的值不变。我只想制作一份更易于阅读的报告。

现在,我不确定如何解决这个问题。任何人都可以提供一些指示吗?

谢谢!

R。

标签: pythonpandas

解决方案


为了更容易阅读,您可以创建MultiIndex和排序它:

df = df.set_index(['Family','Genus', 'Species']).sort_index()

推荐阅读