首页 > 解决方案 > 获取 Pandas 相关性的问题

问题描述

我有这个代码:

data = pd.read_csv("out.csv")
df=data[['created_at','ticker','close']]
print(df)
print(df.corr())

out.csv看起来像这样:

created_at,ticker,adj_close,close,high,low,open,volume
2020-06-02 09:30:00-04:00,A,90.33000183105469,90.33000183105469,90.41000366210938,89.94999694824219,90.0,45326.0
2020-06-02 09:31:00-04:00,A,90.2300033569336,90.2300033569336,90.2300033569336,90.22000122070312,90.22000122070312,709.0
2020-06-08 15:56:00-04:00,ZYXI,22.899900436401367,22.899900436401367,22.959999084472656,22.829999923706055,22.959999084472656,5304.0
2020-06-08 15:57:00-04:00,ZYXI,22.920000076293945,22.920000076293945,22.950000762939453,22.889999389648438,22.899999618530273,5317.0
2020-06-08 15:58:00-04:00,ZYXI,22.860000610351562,22.860000610351562,22.93000030517578,22.860000610351562,22.90999984741211,10357.0

我想查看使用收盘价随时间变化的代码之间的相关矩阵,这就是我包含 created_at 列的原因。但是,当我执行 print(df.corr) 时,我只看到下面的结果,不知道为什么

       close
close    1.0

标签: pythonpandas

解决方案


找到答案https://www.interviewqs.com/blog/py_stock_correlation

data = pd.read_csv("out.csv")
dfdata=data[['created_at','ticker','close']]
# print(df)
df_pivot = dfdata.pivot('created_at','ticker','close').reset_index()
print("loaded df")
# print(df_pivot.head())
corr_df = df_pivot.corr(method='pearson')
#reset symbol as index (rather than 0-X)
corr_df.head().reset_index()
del corr_df.index.name
print(corr_df.head(10))

推荐阅读