python - Accessing one level of a multi-index in Pandas
问题描述
I have a dataframe that seems like a simple use case for a multi index: I have ISO week numbers and dates as an index, and I'd like to filter by a specific week. Following the instructions in the docs , it looks like I ought to be able to index just by passing a string of the week number. However, this passes me a Key Error.
MCVE:
data = {'foo': {('2016_32', '2016-08-07'): 0.14285714285714285,
('2016_32', '2016-08-08'): 0.14285714285714285,
('2016_32', '2016-08-09'): 0.14285714285714285,
('2016_32', '2016-08-10'): 0.14285714285714285,
('2016_32', '2016-08-11'): 0.14285714285714285,
('2016_32', '2016-08-12'): 0.14285714285714285,
('2016_32', '2016-08-13'): 0.14285714285714285,
('2016_36', '2016-09-04'): 0.14285714285714285,
('2016_36', '2016-09-05'): 0.14285714285714285,
('2016_36', '2016-09-06'): 0.14285714285714285,
('2016_36', '2016-09-07'): 0.14285714285714285,
('2016_36', '2016-09-08'): 0.14285714285714285,
('2016_36', '2016-09-09'): 0.14285714285714285},
'bar': {('2016_32', '2016-08-07'): np.nan,
('2016_32', '2016-08-08'): np.nan,
('2016_32', '2016-08-09'): np.nan,
('2016_32', '2016-08-10'): np.nan,
('2016_32', '2016-08-11'): np.nan,
('2016_32', '2016-08-12'): np.nan,
('2016_32', '2016-08-13'): np.nan,
('2016_36', '2016-09-04'): 0.0,
('2016_36', '2016-09-05'): 0.0,
('2016_36', '2016-09-06'): 0.0,
('2016_36', '2016-09-07'): 0.0,
('2016_36', '2016-09-08'): 0.0,
('2016_36', '2016-09-09'): 0.0}}
df = pd.DataFrame(data)
df['2016_32']
KeyError: '2016_32'
解决方案
一般用于选择Multiindex
使用DataFrame.xs
:
#default first level should be omit
print (df.xs('2016_32'))
#select by second level
#print (df.xs('2016-09-07', level=1))
foo bar
2016-08-07 0.142857 NaN
2016-08-08 0.142857 NaN
2016-08-09 0.142857 NaN
2016-08-10 0.142857 NaN
2016-08-11 0.142857 NaN
2016-08-12 0.142857 NaN
2016-08-13 0.142857 NaN
或者loc
:
#no parameter if select first level
print (df.loc['2016_32'])
#if want select second level axis=0 and : for select all values of first level
print (df.loc(axis=0)[:, '2016-09-07'])
MultiIndex 在列和行中的选择差异:
np.random.seed(235)
a = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
a1 = pd.MultiIndex.from_product([['A', 'B', 'C'], ['E','F']])
df = pd.DataFrame(np.random.randint(10, size=(6, 8)), index=a1, columns=a)
print (df)
bar baz foo qux
one two one two one two one two
A E 8 1 5 8 3 5 3 3
F 3 1 3 6 6 1 0 2
B E 0 3 1 7 0 0 8 2
F 6 7 7 4 2 7 7 5
C E 7 3 1 7 3 9 7 3
F 8 2 0 8 5 2 2 0
#select by column bar level
print (df['bar'])
one two
A E 8 1
F 3 1
B E 0 3
F 6 7
C E 7 3
F 8 2
#select by column bar and then by `one`
print (df['bar']['one'])
A E 8
F 3
B E 0
F 6
C E 7
F 8
Name: one, dtype: int32
#select by tuples for columns select
print (df[('bar', 'one')])
A E 8
F 3
B E 0
F 6
C E 7
F 8
Name: (bar, one), dtype: int32
对于按行选择(索引中的多索引),请使用loc
:
print (df.loc['A'])
bar baz foo qux
one two one two one two one two
E 8 1 5 8 3 5 3 3
F 3 1 3 6 6 1 0 2
print (df.loc['A'].loc['F'])
bar one 3
two 1
baz one 3
two 6
foo one 6
two 1
qux one 0
two 2
Name: F, dtype: int32
print (df.loc[('A', 'F')])
bar one 3
two 1
baz one 3
two 6
foo one 6
two 1
qux one 0
two 2
Name: (A, F), dtype: int32
推荐阅读
- docker - docker 无法创建挂载路径并且总是失败
- dynamic - 为什么必须在 Gateway&Hystrix 的回退功能上扩展 HystrixCommand 或使用 @HystrixCommand?
- reactjs - 如何使用反应钩子验证电子邮件和密码?
- python - 如何使用 seaborn stripplot 居中“色调”着色
- java - 如何反序列化地图
正确地在杰克逊 - javascript - 如何在移动浏览器的照片的按住菜单中编辑/禁用选项?
- java - 如何在 Wildfly 中使应用程序在 Internet 上可用?
- ldap - quarkus 原生 ldap 调用
- antlr4 - 如何让 ANTLR4 语法解析单行而不需要在中间换行?
- matlab - 为什么预定义的变量没有在函数句柄中显示它们的值?