python - 附加不存在的行数据时,来自 pd.Crosstab 的 Multiindex 问题
问题描述
大家早,
星期五快乐。我有一些 excel 输出,通过交叉表命令显示Client
它们Sector
和它们的计数。Result
这很好用:
dfAll_Clients = {'All_Client': ['AAA','BBB','CCC','DDD','EEE','FFF'],
'City': ['SY','LN','NY','TO','TK','LA']}
dfAll_Clients = pd.DataFrame.from_dict(dfAll_Clients)
df = { 'Client': ['AAA', 'AAA', 'AAA',
'BBB', 'BBB', 'BBB', 'BBB','BBB','BBB','BBB',
'CCC',
'DDD','DDD','DDD','DDD','DDD','DDD','DDD','DDD','DDD','DDD'],
'Sector': ['GOV', 'GOV', 'CORP',
'GOV', 'GOV', 'GOV', 'GOV','CORP','CORP','CORP',
'GOV',
'GOV','GOV','GOV','GOV','GOV','GOV','GOV','GOV','GOV','CORP'],
'Result': ['Covered', 'Customer Reject', 'Customer Timeout',
'Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','No RFQ',
'No RFQ',
'Covered','Customer Reject','Customer Timeout','Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','No RFQ']
}
df = pd.DataFrame.from_dict(df)
# print(df)
vals = ['Covered',
'Customer Reject',
'Customer Timeout',
'Dealer Reject',
'Dealer Timeout',
'Done',
'No RFQ',
'Tied Covered',
'Tied Done',
'Tied Traded Away',
'Traded Away']
df = (pd.crosstab([df.Client,
df.Sector],
df.Result,
margins=True,
margins_name='Total_Result_Per_Client')
.drop('Total_Result_Per_Client')
.reindex(vals + ['Total_Result_Per_Client'], axis=1, fill_value=0))
# Total Priced Back = (All RFQ's - Dealer Reject - Dealer_Timeout) / All RFQ's
df['Total_Priced_Back'] = (df['Total_Result_Per_Client']- df['Dealer Reject'] - df['Dealer Timeout']) / (df['Total_Result_Per_Client'])
# Hit_Rate = (Done + Tied Done) / Total RFQ's less Customer Reject and Customer Timeout
df['Hit_Rate'] = (df['Done'] + df['Tied Done']) / (df['Total_Result_Per_Client']- df['Customer Reject'] - df['Customer Timeout'])
# Populate any nulls due to 0/0
df = df.fillna(0)
# Format Pct cols
decimals2 = 2
df['Total_Priced_Back'] = df['Total_Priced_Back'].apply(lambda x: round(x * 100, decimals2)).astype(str) + '%'
df['Hit_Rate'] = df['Hit_Rate'].apply(lambda x: round(x * 100, decimals2)).astype(str) + '%'
print (df)
df.to_excel('C:\Temp\Out_Data_EOM_Key_Clients_Corp.xlsx')
excel提取符合要求。
一个额外的请求已经出现,以添加当前月份数据中不存在但可能在未来月份中的所有其他可能的客户。对于Client
没有数据的每个在交叉表中添加一行,并为每个字段插入;#NA
. 最终输出将是:
我希望通过以下方式添加这些行:
# Get list of all possible clients
dfAll_Clients = pd.DataFrame.from_dict(dfAll_Clients.All_Client)
new_index = tuple(list(dfAll_Clients.All_Client))
print(new_index)
# Append clients not present in current row entries
dfTemp = df.reindex(new_index, fill_value=0)
print(dfTemp)
问题是交叉表的结果是多索引的。我尝试使用扁平化交叉表输出,df = df.stack([0]).reset_index()
但这完全改变了结构并完全偏离了最终输出。我现在得到TypeError: Expected tuple, got str
任何帮助,将不胜感激。
解决方案
你可以试试reindex
#here E and F (l) you can get it by
cond = dfAll_Clients.All_Client.isin(df.index.get_level_values(0))
l = dfAll_Clients.loc[~cond,'All_Client'].unique().tolist()
l = [(x, None)for x in l]
df = df.reindex(pd.MultiIndex.from_tuples(df.index.tolist()+l))
推荐阅读
- python - 如何评估具有可变内积分上限的双积分
- node.js - 猫鼬模型限制
- segmentation-fault - Fortran 中的段错误的玩具示例
- c++ - 如何只读取最后一行的开头?
- discord.js - 我如何设置发送模因的限制(discord.js 机器人)
- c - 删除C中特殊指标之间的子字符串
- php - 从图像 URL WordPress 中删除图像大小
- azure-application-insights - 我不理解带有“sum”函数的 Azure App Insights Metrics 图?
- git - github 是否也会自动将 PR 合并到父分支?
- flutter - 在非相邻选项卡上使用 TabBar 进行导航时的奇怪行为