python - Python pandas 创建不均匀的多索引
问题描述
我有以下代码,
IDX_VALS_BANKNOTER_PATRIMONY = [['PATRIMONY'],['GOLD']]
IDX_VALS_BANKNOTER_ASSETS = [['ASSETS'],['DEPOSITS', 'ADVANCES']]
IDX_VALS_BANKNOTER_LIABILITIES = [['LIABILITIES'], ['CLIENTS', 'SUPPLIERS']]
IDX_BANKNOTER_PATRIMONY = pd.MultiIndex.from_product(IDX_VALS_BANKNOTER_PATRIMONY)
IDX_BANKNOTER_ASSETS = pd.MultiIndex.from_product(IDX_VALS_BANKNOTER_ASSETS)
IDX_BANKNOTER_LIABILITIES = pd.MultiIndex.from_product(IDX_VALS_BANKNOTER_LIABILITIES)
IDX_BANKNOTER = IDX_BANKNOTER_PATRIMONY.append(IDX_BANKNOTER_ASSETS).append(IDX_BANKNOTER_LIABILITIES)
print(IDX_BANKNOTER)
打印以下索引:
MultiIndex([( 'PATRIMONY', 'GOLD'),
( 'ASSETS', 'DEPOSITS'),
( 'ASSETS', 'ADVANCES'),
('LIABILITIES', 'CLIENTS'),
('LIABILITIES', 'SUPPLIERS')],
)
(我使用.from_product()
是因为我希望最终添加更多标签)我的问题如下:我想在第三列上扩展这个多索引,这样我得到一个看起来像这样的多索引:
'PATRIMONY', 'GOLD',
'ASSETS', 'DEPOSITS',
'ASSETS', 'ADVANCES',
'LIABILITIES', 'CLIENTS', 'Dr. Foo'
'LIABILITIES', 'CLIENTS', 'Dr. House'
'LIABILITIES', 'CLIENTS', 'Richard'
'LIABILITIES', 'SUPPLIERS', 'PORT1',
'LIABILITIES', 'SUPPLIERS', 'PORT2'
这意味着多索引将是不均匀的,第三级仅由“LIABILITIES”使用,并且根据客户名称或供应商名称,CLIENTS 和 SUPPLIERS 的索引不同。我尝试附加以下索引:
IDX_FIRST_EXTENSION_NAMES = [['LIABILITIES'], ['CLIENTS'], ['Dr. Foo', 'Dr. House', 'Richard']]
IDX_FIRST_EXTENSION = pd.MultiIndex.from_product(IDX_FIRST_EXTENSION_NAMES)
IDX_SECOND_EXTENSION_NAMES = [['LIABILITIES'], ['SUPPLIERS'], ['PORT1', 'PORT2']]
IDX_SECOND_EXTENSION = pd.MultiIndex.from_product(IDX_SECOND_EXTENSION_NAMES)
DESIRED_RESULT = IDX_BANKNOTER.append(IDX_FIRST_EXTENSION).append(IDX_SECOND_EXTENSION)
但我得到的回报是:
MultiIndex([( 'PATRIMONY', 'GOLD'),
( 'ASSETS', 'DEPOSITS'),
( 'ASSETS', 'ADVANCES'),
('LIABILITIES', 'CLIENTS'),
('LIABILITIES', 'CLIENTS'),
('LIABILITIES', 'CLIENTS'),
('LIABILITIES', 'SUPPLIERS'),
('LIABILITIES', 'SUPPLIERS')],
)
我对使用 pandas 很陌生,关于多索引的文档并没有帮助(它的初始化多索引的示例数量相当有限,并且没有不均匀多索引的示例)。有没有人有指针?我正在制作这个多索引以便于操作相应的数据,例如能够访问特定的客户帐户
df['LIABILITIES']['CLIENTS']['(CLIENT NAME)']
或者能够得到下所有值的总和['CLIENTS']
。理想情况下,我希望将数据框的列保留为时间标签。
任何帮助表示赞赏,谢谢。
解决方案
代码:
import pandas as pd
IDX_VALS_BANKNOTER_PATRIMONY = [['PATRIMONY'],['GOLD'], ['']]
IDX_VALS_BANKNOTER_ASSETS = [['ASSETS'],['DEPOSITS', 'ADVANCES'], ['']]
IDX_BANKNOTER_PATRIMONY = pd.MultiIndex.from_product(IDX_VALS_BANKNOTER_PATRIMONY)
IDX_BANKNOTER_ASSETS = pd.MultiIndex.from_product(IDX_VALS_BANKNOTER_ASSETS)
IDX_BANKNOTER = IDX_BANKNOTER_PATRIMONY.append(IDX_BANKNOTER_ASSETS)
IDX_FIRST_EXTENSION_NAMES = [['LIABILITIES'], ['CLIENTS'], ['Dr. Foo', 'Dr. House', 'Richard']]
IDX_FIRST_EXTENSION = pd.MultiIndex.from_product(IDX_FIRST_EXTENSION_NAMES)
IDX_SECOND_EXTENSION_NAMES = [['LIABILITIES'], ['SUPPLIERS'], ['PORT1', 'PORT2']]
IDX_SECOND_EXTENSION = pd.MultiIndex.from_product(IDX_SECOND_EXTENSION_NAMES)
WANTED_RESULT = IDX_BANKNOTER.append(IDX_FIRST_EXTENSION).append(IDX_SECOND_EXTENSION)
print(WANTED_RESULT)
输出:
MultiIndex([( 'PATRIMONY', 'GOLD', ''),
( 'ASSETS', 'DEPOSITS', ''),
( 'ASSETS', 'ADVANCES', ''),
('LIABILITIES', 'CLIENTS', 'Dr. Foo'),
('LIABILITIES', 'CLIENTS', 'Dr. House'),
('LIABILITIES', 'CLIENTS', 'Richard'),
('LIABILITIES', 'SUPPLIERS', 'PORT1'),
('LIABILITIES', 'SUPPLIERS', 'PORT2')],
)
推荐阅读
- machine-learning - 我的模型总是给出相同的预测值
- html - 无法调整从 MVC 中的数据库检索到的图像大小
- node.js - 尝试安装 gatsby.js,出现很多错误
- java - 有没有办法确定文件路径,而不管操作系统如何
- python - nltk 已安装但仍然出现错误
- github - 通过 url 获得星号、表格和其他元数据?
- mysql - 使用外部联接过滤活动记录
- c++ - 如何使用 QByteArray 而不是 SNDFILE
- ios - ld: 升级到 60.5 后未找到 -lRCTGeolocation 错误的库
- excel - 将范围行数组转换为 Powerpoint 幻灯片表中的连续列