首页 > 解决方案 > Statsmodels OLS terms undefined

问题描述

I'm trying to do a basic linear regression example and I have an example dataset with sepal_length, sepal_width, petal_length, petal_width. However, in my R forumla if I try anything with more terms than "sepal_length ~ petal_length" like "sepal_length ~ petal_length + sepal_width + petal_width" I get the error NameError: name 'sepal_width' is not defined This happens with any term where I use the + operator to add a third column from the dataset. The columns work if I add them independently. What am I doing wrong?

Here is the code:

irises = pd.read_csv("data/iris.csv")
model1 = sm.OLS.from_formula("sepal_length ~ petal_length", data=irises).fit()
print(model1.summary())
xs = pd.DataFrame({'petal_length': np.linspace(irises.petal_length.min(), irises.petal_length.max(), 100)})
ys = model1.predict(xs)
sns.scatterplot(x='petal_length', y='sepal_length', data=irises)
plt.plot(xs, ys, color='black', linewidth=4)
plt.show()

For example,

this works:

model1 = sm.OLS.from_formula("sepal_length ~ petal_length", data=irises).fit()

this doesn't work:

model1 = sm.OLS.from_formula("sepal_length ~ petal_length + sepal_width", data=irises).fit()

I get the error sepal_width is not defined. And I get the same error for any term I add like this.

but this does work:

model1 = sm.OLS.from_formula("sepal_length ~ sepal_width", data=irises).fit()

and so does this:

model1 = sm.OLS.from_formula("sepal_length ~ petal_length + np.power(petal_length, 2)", data=irises).fit()

In essence I'm trying to use more than two independent variables in sm.OLS.from_formula.

标签: pythonstatsmodels

解决方案


尝试 import statsmodels.api as sm,我得到了它的工作。 例子


推荐阅读