python - 我不确定为什么决策树和随机森林显示 100% 准确度?
问题描述
我目前正在研究一个模型,该模型可以读取结构化数据并确定某人是否患有疾病。我认为问题在于数据没有在训练数据和测试数据之间进行拆分。我不知道我将如何做到这一点。
我不确定要尝试什么。
import pandas as pd
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
import seaborn as sns
from sklearn.tree import DecisionTreeClassifier
heart_data = pd.read_csv('cardio_train.csv')
heart_data.head()
heart_data.shape
heart_data.describe()
heart_data.isnull().sum()
heart_data_columns = heart_data.columns
predictors = heart_data[heart_data_columns[heart_data_columns != 'target']] # all columns except Breast Cancer
target = heart_data['target'] # Breast Cancer column
#This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type
predictors.head()
target.head()
#normalize the data by subtracting the mean and dividing by the standard deviation.
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()
n_cols = predictors_norm.shape[1] # number of predictors
def regression_model():
# create model
model = Sequential()
#inputs
model.add(Dense(50, activation='relu', input_shape=(n_cols,)))
model.add(Dense(50, activation='relu')) # activation function
model.add(Dense(1))
# compile model
model.compile(optimizer='adam', loss='mean_squared_error')
#loss measures the results and figures out how bad it did. Optimizer generates next guess.
return model
# build the model
model = regression_model()
print (model)
# fit the model
history=model.fit(predictors_norm, target, validation_split=0.3, epochs=10, verbose=2)
#Decision Tree
print ("Processing Decision Tree")
dtc = DecisionTreeClassifier()
dtc.fit(predictors_norm,target)
print("Decision Tree Test Accuracy {:.2f}%".format(dtc.score(predictors_norm, target)*100))
#Support Vector Machine
print ("Processing Support Vector Machine")
svm = SVC(random_state = 1)
svm.fit(predictors_norm, target)
print("Test Accuracy of SVM Algorithm: {:.2f}%".format(svm.score(predictors_norm,target)*100))
#Random Forest
print ("Processing Random Forest")
rf = RandomForestClassifier(n_estimators = 1000, random_state = 1)
rf.fit(predictors_norm, target)
print("Random Forest Algorithm Accuracy Score : {:.2f}%".format(rf.score(predictors_norm,target)*100))
我得到的消息是这个决策树测试准确度 100.00% 但是,支持向量机得到 73.37%
解决方案
推荐阅读
- python - python元组不变性问题
- mysql - POST 以表达来自 React 表单的数据
- django - 将一个结果推送到另一个工作 django-rq
- r - 如何在 R 中显示或打印环境的内容
- android - Android / Gradle - 具有依赖项的工件(aar)
- ionic3 - indexOf 在 ionic3 中的数组中搜索对象时不起作用?
- xamarin - 将 Api.ai 安装到 xamarin 时出现问题
- angularjs - 检查用户是否在 AngularFire 中是匿名的
- c - C读取多个单词后跟一个整数
- ios - 如何在不从firebase For ios获取值的情况下读取父节点名称