python - 为什么我有不同的编码器产生相同的结果?
问题描述
我正在使用California Housing Price dataset,这就是我所做的:
import pandas as pd
from sklearn.model_selection import train_test_split
housing = pd.read_csv("housing.csv")
X = housing.drop(["longitude", "latitude", "median_house_value"], axis=1)
y = housing["median_house_value"]
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42
)
import category_encoders as ce
encoder_list = [ce.WOEEncoder(), ce.OneHotEncoder()]
for encoder in encoder_list:
numeric_transformer = Pipeline(
steps=[
("imputer", SimpleImputer(strategy="median")),
("scaler", StandardScaler()),
]
)
categorical_transformer = Pipeline(
steps=[
("imputer", SimpleImputer(strategy="constant")),
("encoder", encoder),
]
)
pipe = Pipeline(
steps=[("preprocessor", preprocessor), ("regressor", LinearRegression())]
)
pipe.fit(X_train, y_train)
pipe.predict(X_test)
print(encoder)
print(pipe.score(X_test, y_test))
为什么这会产生两个相似的结果?他们不应该不同吗?当我尝试不同的缩放器时,也会发生同样的情况。
解决方案
推荐阅读
- node.js - Angular +Workbox = build ChunkLoadError: Loading chunk # and Refused to execute script because its MIME
- react-native - Why cant xcode find
? - jquery - Selectize.js - Populate select element with result of ajax request
- sql - 多个JOIN,一张表需要两次
- android - Anroid bottomsheet 库 com.cocosw:bottomsheet
- c# - 如何以某种方式显示版本号
- amazon-web-services - Route53、证书管理器和 CloudFront 之间的 CloudFormation 循环依赖关系
- r - str_detect() 的意外结果
- node.js - Webpack 4 无法为 web 构建生产
- python - 为什么 mypy 常见问题解答提到性能影响?