首页 > 解决方案 > 从数据帧的段(循环)计算

问题描述

2 个数据帧。1短1长。我想将长的分成几块,使用相关系数将它们与短的进行比较。

分裂很好。然而,当把它们放在计算中时,它会返回 Nan。

import pandas as pd

data_a = {'ID': ["a1","a2","a3","a4","a5","a6","a7","a8","a9","a10","a11","a12","a13","a14","a15"], 
'Unit_Weight': [178,153,193,195,214,157,205,212,219,166,217,186,170,207,204]}

df_a = pd.DataFrame(data_a)

data_b = {'ID': ["b1","b2","b3","b4","b5"], 
'Unit_Weight': [128,123,123,125,204]}

df_b = pd.DataFrame(data_b)

size = 5      # 5 rows in the long data-frame
list_of_df_a = [df_a.loc[i:i+size-1,:] for i in range(0, len(df_a),size)]

for each in list_of_df_a:
    corr_e = each['Unit_Weight'].corr(df_b['Unit_Weight'])

输出:

0.6797202605786716
nan
nan

出了什么问题,如何纠正?谢谢你。

ps:这些是手动计算的结果:

0.6797202605786716
-0.5501914564062937
0.2653370297540246

   ID  Unit_Weight
0  a1          178
1  a2          153
2  a3          193
3  a4          195
4  a5          214
    ID  Unit_Weight
5   a6          157
6   a7          205
7   a8          212
8   a9          219
9  a10          166
     ID  Unit_Weight
10  a11          217
11  a12          186
12  a13          170
13  a14          207
14  a15          204

标签: pythonpandasloopsdataframe

解决方案


两者都需要相同的索引Series,因此DataFrame.reset_index与 一起使用drop=True

for each in list_of_df_a:
    corr_e = each['Unit_Weight'].reset_index(drop=True).corr(df_b['Unit_Weight'])
    print (corr_e)

0.6797202605786716
-0.5501914564062937
0.26533702975402457

推荐阅读