python - Reindex Bins-index DataFrame
问题描述
我想为应用 groupby 后未出现在索引/列中的 bin 添加行和列为零:
import numpy as np
import pandas as pd
bins = np.arange(-0.1, 2, 0.1)
names = np.random.random_integers(0, 100, 1000)
a = np.random.random(1000)
b = np.random.random(1000)
matrix = pd.DataFrame([names, pd.cut(a, bins), pd.cut(b, bins)]).T
matrix.columns = ['names', 'a', 'b']
matrix = matrix.groupby(['a', 'b']).count()
matrix.reset_index(inplace=True)
matrix = matrix.pivot(index='a', columns='b', values='names').fillna(0)
解决方案
将pandas.cut
方法的输出分配给变量以访问categories
属性:
bins = np.arange(-0.1, 2, 0.1)
names = np.random.random_integers(0, 100, 1000)
a = np.random.random(1000)
b = np.random.random(1000)
##############################
# Use pd.cut like this
a_bins = pd.cut(a, bins)
b_bins = pd.cut(b, bins)
##############################
matrix = pd.DataFrame([names, a_bins, b_bins]).T
matrix.columns = ['names', 'a', 'b']
matrix = matrix.groupby(['a', 'b']).count()
matrix.reset_index(inplace=True)
matrix = matrix.pivot(index='a', columns='b', values='names').fillna(0)
##################################################
# Reindex with this
matrix = matrix.reindex(index=a_bins.categories,
columns=b_bins.categories,
fill_value=0)
##################################################