python - Statsmodels OLS terms undefined
问题描述
I'm trying to do a basic linear regression example and I have an example dataset with sepal_length, sepal_width, petal_length, petal_width. However, in my R forumla if I try anything with more terms than "sepal_length ~ petal_length"
like "sepal_length ~ petal_length + sepal_width + petal_width"
I get the error NameError: name 'sepal_width' is not defined
This happens with any term where I use the +
operator to add a third column from the dataset. The columns work if I add them independently. What am I doing wrong?
Here is the code:
irises = pd.read_csv("data/iris.csv")
model1 = sm.OLS.from_formula("sepal_length ~ petal_length", data=irises).fit()
print(model1.summary())
xs = pd.DataFrame({'petal_length': np.linspace(irises.petal_length.min(), irises.petal_length.max(), 100)})
ys = model1.predict(xs)
sns.scatterplot(x='petal_length', y='sepal_length', data=irises)
plt.plot(xs, ys, color='black', linewidth=4)
plt.show()
For example,
this works:
model1 = sm.OLS.from_formula("sepal_length ~ petal_length", data=irises).fit()
this doesn't work:
model1 = sm.OLS.from_formula("sepal_length ~ petal_length + sepal_width", data=irises).fit()
I get the error sepal_width is not defined. And I get the same error for any term I add like this.
but this does work:
model1 = sm.OLS.from_formula("sepal_length ~ sepal_width", data=irises).fit()
and so does this:
model1 = sm.OLS.from_formula("sepal_length ~ petal_length + np.power(petal_length, 2)", data=irises).fit()
In essence I'm trying to use more than two independent variables in sm.OLS.from_formula
.
解决方案
推荐阅读
- nginx - NGINX:强制 nginx 使用所有工作人员进行负载平衡
- node.js - updateMany() 后如何获取所有更新文档的值?
- c++ - char* escape "\" 在 libcurl 的 POSTFIELD 中丢失
- c# - 如何解决此错误“当前不会命中断点。没有为此文档加载任何符号。”
- mit-scratch - 如何将 SB3 文件转换为 EXE
- excel - 使用 excel vba 关闭 Word 应用程序
- r - 在染色体上绘制映射的 SNP?
- android - 多个动画的相同插值器
- tsql - 如何创建参数化递归 CTE 以展平标量函数中的层次结构?
- dialogflow-es - 仅当@chatbot 时,如何让聊天机器人在频道中响应?