首页 > 解决方案 > Numpy.where 未在预测数据集中返回预期输出

问题描述

我已经在数据集上训练了这个模型。虽然准确度很低,但这还不是我关心的问题。我的问题是,当我添加一个名为 df['predict] 的新列(最后)时,为什么它不返回数据集中的预测输出,但是当我运行 df['predict] 时,我得到了输出。

import numpy as np
import pandas as pd 

df1 = pd.DataFrame(np.random.randint(1,33,  size =(10000,5)), columns = ['s1','s2','s3','s4','s5'])
df2 = pd.DataFrame(np.random.randint(34,41,  size =(10000,5)), columns = ['s1','s2','s3','s4','s5'])
df3 = pd.DataFrame(np.random.randint(42,53,  size =(10000,5)), columns = ['s1','s2','s3','s4','s5'])
df4 = pd.DataFrame(np.random.randint(54,66,  size =(10000,5)), columns = ['s1','s2','s3','s4','s5'])
df5 = pd.DataFrame(np.random.randint(67,88,  size =(10000,5)), columns = ['s1','s2','s3','s4','s5'])
df6 = pd.DataFrame(np.random.randint(89,100,  size =(10000,5)), columns = ['s1','s2','s3','s4','s5'])
df7 = pd.DataFrame(np.random.randint(90,100,  size =(10000,5)), columns = ['s1','s2','s3','s4','s5'])

df = pd.concat([df1,df2,df3,df4,df5,df6,df7])

df['marks obtained'] = df.sum(axis = 1)

df['Total'] = 500

df['percentage'] = (df['marks obtained']/df['Total'])*100

def grade(x):
if x >= 80:
    return 'A+'
if x >= 70:
    return 'A'
if x >= 60:
    return 'B'
if x >= 50:
    return 'C'
if x >= 40:
    return 'D'
if x >= 33:
    return 'E'
else:
    return 'fail'

df['grade'] = df['percentage'].apply(grade)

dic = {'A+': 1, 'A': 2, 'B': 3, 'C': 4, 'D':5, 'E': 6, 'fail': 7}

df['grade1'] = df['grade'].map(dic)

from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam, SGD

x = df.loc[:,'s1':'s5'].to_numpy()
y = df['grade']

y = pd.get_dummies(y).to_numpy()

model = Sequential()
model.add(Dense(5, activation = 'relu', input_shape =(5,)))
model.add(Dense(7, activation = 'softmax'))

model.compile(optimizer = SGD(lr=0.8), loss=  'categorical_crossentropy', metrics = ['acc'])

model.fit(x,y,epochs = 30)

def true(q):
for i in q:
    if i == 1:
        print('A+')
    if i == 2:
        print('A')
    if i == 3:
        print('B')
    if i == 4:
        print('C')
    if i == 5:
        print('D')
    if i == 6:
        print('E')            
        
w = np.argmax(model.predict(x), axis = 1)

df['predict'] = np.where(np.argmax(model.predict(x), axis = 1), true(w), 'fail')

输出:

df['predict'] = np.where(np.argmax(model.predict(x), axis = 1), true(w), 'fail')返回以下内容:

A+
A+
A+
A+
A+
.
.
.
.

但是当我打印出它返回的数据集时:

s1      s2  s3  s4  s5  marks obtained  Total   percentage  grade   grade1  predict
0       2   23  9   23  2       59      500       11.8      fail      7      None
1       1   4   6   12  5       28      500        5.6      fail      7      None
2       17  20  26  24  13     100      500       20.0      fail      7      None
3       18  16  4   19  13      70      500       14.0      fail      7      None
4       22  30  21  19  9      101      500       20.2      fail      7      None
... ... ... ... ... ...    ...      ...       ...       ...     ...      ...
9995    90  94  97  91  91      463     500       92.6       A+       1      None
9996    90  94  96  90  96      466     500       93.2       A+       1      None
9997    93  92  99  93  92      469     500       93.8       A+       1      None
9998    98  98  99  93  92      480     500       96.0       A+       1      None
9999    93  95  97  93  97      475     500       95.0       A+       1      None
70000 rows × 11 columns

标签: pandasnumpydeep-learningpredictionsequential

解决方案


true 的定义应该返回一个变量而不是打印它。

def true(q):
    for i in q:
        if i == 1:
            return('A+')
        if i == 2:
            return('A')
        if i == 3:
            return('B')
        if i == 4:
            return('C')
        if i == 5:
            return('D')
        if i == 6:
            return('E')  

df['predict'] = np.where(np.argmax(model.predict(x), axis = 1), true(w), 'fail')

另外,您可以使用 softmax 层将输出转换为概率值。


推荐阅读