首页 > 解决方案 > 为表格数据构建自动编码器

问题描述

我正在尝试为本教程中的表格数据集构建自动编码器: https ://pierpaolo28.github.io/blog/blog29/ 。我使用了房价数据集。我估算了缺失值,对数据进行了归一化和虚拟化,但是在训练我的自动编码器时我得到了可怕的损失结果。

这是自动编码器代码:

from tensorflow.keras.models import Model, load_model
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard
from tensorflow.keras import regularizers

input_dim = X.shape[1]
encoding_dim = 30

input_layer = Input(shape=(input_dim, ))

encoder = Dense(int(input_dim / 2), activation="tanh", 
                activity_regularizer=regularizers.l1(10e-5))(input_layer)
encoder = Dense(int(input_dim / 2), activation=tf.keras.layers.LeakyReLU(alpha=0.01))(encoder)
encoder = Dense(int(input_dim / 4), activation=tf.keras.layers.LeakyReLU(alpha=0.01))(encoder)
encoder = Dense(int(input_dim / 4), activation="tanh")(encoder)
encoder = Dense(encoding_dim, activation=tf.keras.layers.LeakyReLU(alpha=0.01))(encoder) # the bottleneck layer

decoder = Dense(int(input_dim / 4), activation="tanh")(encoder)
decoder = Dense(int(input_dim / 4), activation=tf.keras.layers.LeakyReLU(alpha=0.01))(decoder)
decoder = Dense(int(input_dim / 2), activation=tf.keras.layers.LeakyReLU(alpha=0.01))(decoder)
decoder = Dense(int(input_dim / 2), activation='tanh')(decoder)

output_layer = Dense(input_dim, activation=tf.keras.layers.LeakyReLU(alpha=0.01))(decoder)

autoencoder = Model(inputs=input_layer, outputs=output_layer)

from sklearn import metrics
autoencoder.compile(optimizer='adam', 
                    loss = tf.keras.losses.MeanAbsolutePercentageError(),
                    metrics = ["mse", "mae" ])

history = autoencoder.fit (X_train, X_train,
                    epochs=100,
                    batch_size=3000,
                    shuffle=True,
                    validation_data=(X_test, X_test),
                    verbose=1).history

这是三个第一个和三个最后一个训练时期:

Epoch 1/100
2/3 [===================>..........] - ETA: 0s - loss: 150489.0781 - mse: 0.1862 - mae: 0.1872WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0110s vs `on_train_batch_end` time: 0.8014s). Check your callbacks.
INFO:tensorflow:Assets written to: model.z1\assets
3/3 [==============================] - 3s 976ms/step - loss: 144704.7656 - mse: 0.1862 - mae: 0.1872 - val_loss: 52245.2031 - val_mse: 0.1862 - val_mae: 0.1871
Epoch 2/100
3/3 [==============================] - 0s 11ms/step - loss: 52511.4844 - mse: 0.1862 - mae: 0.1871 - val_loss: 61640.7930 - val_mse: 0.1861 - val_mae: 0.1871
Epoch 3/100
3/3 [==============================] - 0s 11ms/step - loss: 56115.3867 - mse: 0.1861 - mae: 0.1871 - val_loss: 80056.0391 - val_mse: 0.1861 - val_mae: 0.1871
...
Epoch 97/100
3/3 [==============================] - 0s 9ms/step - loss: 42958.1523 - mse: 0.1824 - mae: 0.1845 - val_loss: 42867.2852 - val_mse: 0.1824 - val_mae: 0.1845
Epoch 98/100
3/3 [==============================] - 0s 9ms/step - loss: 44007.1992 - mse: 0.1824 - mae: 0.1845 - val_loss: 46307.1250 - val_mse: 0.1824 - val_mae: 0.1845
Epoch 99/100
3/3 [==============================] - 0s 11ms/step - loss: 45433.2266 - mse: 0.1824 - mae: 0.1845 - val_loss: 46171.5547 - val_mse: 0.1824 - val_mae: 0.1845
Epoch 100/100
3/3 [==============================] - 0s 9ms/step - loss: 45709.4219 - mse: 0.1824 - mae: 0.1845 - val_loss: 47696.0117 - val_mse: 0.1824 - val_mae: 0.1845

我可能做错了什么?我尝试添加和删除隐藏层来处理模型的复杂性,但没有任何帮助。是否不可能为表格数据集训练自动编码器?数据集相当大,(1460, 276),那么为什么自动编码器拟合不足?

标签: pythontensorflowmachine-learningdeep-learningautoencoder

解决方案


推荐阅读