首页 > 解决方案 > pyGAM`y 数据不在 logit 链接函数的域中`

问题描述

我试图找出葡萄酒数据集的化学特性在多大程度上影响数据集的质量特性。

错误:

ValueError: y 数据不在 logit 链接函数的域中。预期域:[0.0, 1.0],但找到 [3.0, 9.0]

编码:

import pandas as pd

from pygam import LogisticGAM

white_data = pd.read_csv("winequality-white.csv",sep=';');

X = white_data[[
    "fixed acidity","volatile acidity","citric acid","residual sugar","chlorides","free sulfur dioxide",
    "total sulfur dioxide","density","pH","sulphates","alcohol"
]]

print(X.describe)

y = pd.Series(white_data["quality"]);

print(white_quality.describe)

white_gam = LogisticGAM().fit(X, y)

上述代码的输出:

<bound method NDFrame.describe of       fixed acidity  volatile acidity  citric acid  residual sugar  chlorides  \
0               7.0              0.27         0.36            20.7      0.045   
1               6.3              0.30         0.34             1.6      0.049   
2               8.1              0.28         0.40             6.9      0.050   
3               7.2              0.23         0.32             8.5      0.058   
4               7.2              0.23         0.32             8.5      0.058   
...             ...               ...          ...             ...        ...   
4893            6.2              0.21         0.29             1.6      0.039   
4894            6.6              0.32         0.36             8.0      0.047   
4895            6.5              0.24         0.19             1.2      0.041   
4896            5.5              0.29         0.30             1.1      0.022   
4897            6.0              0.21         0.38             0.8      0.020   

      free sulfur dioxide  total sulfur dioxide  density    pH  sulphates  \
0                    45.0                 170.0  1.00100  3.00       0.45   
1                    14.0                 132.0  0.99400  3.30       0.49   
2                    30.0                  97.0  0.99510  3.26       0.44   
3                    47.0                 186.0  0.99560  3.19       0.40   
4                    47.0                 186.0  0.99560  3.19       0.40   
...                   ...                   ...      ...   ...        ...   
4893                 24.0                  92.0  0.99114  3.27       0.50   
4894                 57.0                 168.0  0.99490  3.15       0.46   
4895                 30.0                 111.0  0.99254  2.99       0.46   
4896                 20.0                 110.0  0.98869  3.34       0.38   
4897                 22.0                  98.0  0.98941  3.26       0.32   

      alcohol  
0         8.8  
1         9.5  
2        10.1  
3         9.9  
4         9.9  
...       ...  
4893     11.2  
4894      9.6  
4895      9.4  
4896     12.8  
4897     11.8  

[4898 rows x 11 columns]>
<bound method NDFrame.describe of 0       6
1       6
2       6
3       6
4       6
       ..
4893    6
4894    5
4895    6
4896    7
4897    6
Name: quality, Length: 4898, dtype: int64>
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-71-e1c5720823a6> in <module>
     16 print(white_quality.describe)
     17 
---> 18 white_gam = LogisticGAM().fit(X, y)

~/miniconda3/lib/python3.7/site-packages/pygam/pygam.py in fit(self, X, y, weights)
    893 
    894         # validate data
--> 895         y = check_y(y, self.link, self.distribution, verbose=self.verbose)
    896         X = check_X(X, verbose=self.verbose)
    897         check_X_y(X, y)

~/miniconda3/lib/python3.7/site-packages/pygam/utils.py in check_y(y, link, dist, min_samples, verbose)
    227                              .format(link, get_link_domain(link, dist),
    228                                      [float('%.2f'%np.min(y)),
--> 229                                       float('%.2f'%np.max(y))]))
    230     return y
    231 

ValueError: y data is not in domain of logit link function. Expected domain: [0.0, 1.0], but found [3.0, 9.0]

文件:(我使用的是 Jupyter Notebook,但我认为您不需要):https ://drive.google.com/drive/folders/1RAj2Gh6WfdzpwtgbMaFVuvBVIWwoTUW5?usp=sharing

标签: pythonpygam

解决方案


您可能想要使用LinearGAM – LogisticGAM 用于分类任务。


推荐阅读