首页 > 解决方案 > Python R^2 计算出错

问题描述

我正在尝试对 csv 中的数据进行一些基本分析。数据有一个时间戳,“Test A”和“Test B”都有一个值。【csv文件数据样本】

我得到了测试 A 和测试 B 的平均值,以及测试结果之间的差异。但我真的需要计算 r^2 值来了解这两个测试的相关性。我知道在 excel 中执行此操作的一种非常简单的方法,但是我有很多数据,因此需要对其进行编码以达到最佳效果。我必须计算 r^2 的代码部分返回错误

类型错误:** 或 pow() 不支持的操作数类型:'LinregressResult' 和 'int'

我想知道这是否可能是因为我是在 float64 格式的列数据上做的?[类型错误信息]

理想情况下,我还在寻找一种仅分析部分数据的方法 - 我想分析每小时的数据(每小时 45 个数据点)。任何人都有办法只包含特定的行部分?

非常感谢!!

import pandas as pd
from scipy import stats

# Read the file in csv 
data_input = pd.read_csv("StackOF_r2.csv", low_memory=False)

#Output the number of rows
print("Total rows: {0}".format(len(data_input)))

# See which headers are available
print(list(data_input))

# Get the data from the data columns
data_A = data_input['Test A']
data_B = data_input['Test B']

# Average the data for Test A
Test_A = data_input['Test A'].mean()
print 'Test A Average: ', round(Test_A, 4)

# Average the data for Test B
Test_B = data_input['Test B'].mean()
print 'Test B Average: ', round(Test_B, 4)

# Calculate difference to between tests
Error = Test_A - Test_B
print 'Error (difference between averages): ', round(abs(Error), 4)

# Work out the r2 value between the two tests
r_value = stats.linregress(data_A, data_B)
print "r_value: ", r_value
print "R-squared: ", r_value**2

print data_input['Test A'].dtypes

标签: pythonexcelpandasdataframestatistics

解决方案


我已经复制了你的片段,它现在应该适合你了。您遇到的问题是 linregress 返回多个值,因此即使您不使用它们,等号左侧也必须有一个逗号分隔的列表才能捕获它们。

import pandas as pd
from scipy import stats

# Read the file in csv 
data_input = pd.read_csv("StackOF_r2.csv", low_memory=False)

#Output the number of rows
print("Total rows: {0}".format(len(data_input)))

# See which headers are available
print(list(data_input))

# Get the data from the data columns
data_A = data_input['Test A']
data_B = data_input['Test B']

# Average the data for Test A
Test_A = data_input['Test A'].mean()
print 'Test A Average: ', round(Test_A, 4)

# Average the data for Test B
Test_B = data_input['Test B'].mean()
print 'Test B Average: ', round(Test_B, 4)

# Calculate difference to between tests
Error = Test_A - Test_B
print 'Error (difference between averages): ', round(abs(Error), 4)

# Work out the r2 value between the two tests
##### This is the correction #####
slope, intercept, r_value, p_value, std_err = stats.linregress(data_A, data_B)
print "r_value: ", r_value
print "R-squared: ", r_value**2

print data_input['Test A'].dtypes

文档


推荐阅读