python - 为表格数据构建自动编码器
问题描述
我正在尝试为本教程中的表格数据集构建自动编码器: https ://pierpaolo28.github.io/blog/blog29/ 。我使用了房价数据集。我估算了缺失值,对数据进行了归一化和虚拟化,但是在训练我的自动编码器时我得到了可怕的损失结果。
这是自动编码器代码:
from tensorflow.keras.models import Model, load_model
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard
from tensorflow.keras import regularizers
input_dim = X.shape[1]
encoding_dim = 30
input_layer = Input(shape=(input_dim, ))
encoder = Dense(int(input_dim / 2), activation="tanh",
activity_regularizer=regularizers.l1(10e-5))(input_layer)
encoder = Dense(int(input_dim / 2), activation=tf.keras.layers.LeakyReLU(alpha=0.01))(encoder)
encoder = Dense(int(input_dim / 4), activation=tf.keras.layers.LeakyReLU(alpha=0.01))(encoder)
encoder = Dense(int(input_dim / 4), activation="tanh")(encoder)
encoder = Dense(encoding_dim, activation=tf.keras.layers.LeakyReLU(alpha=0.01))(encoder) # the bottleneck layer
decoder = Dense(int(input_dim / 4), activation="tanh")(encoder)
decoder = Dense(int(input_dim / 4), activation=tf.keras.layers.LeakyReLU(alpha=0.01))(decoder)
decoder = Dense(int(input_dim / 2), activation=tf.keras.layers.LeakyReLU(alpha=0.01))(decoder)
decoder = Dense(int(input_dim / 2), activation='tanh')(decoder)
output_layer = Dense(input_dim, activation=tf.keras.layers.LeakyReLU(alpha=0.01))(decoder)
autoencoder = Model(inputs=input_layer, outputs=output_layer)
from sklearn import metrics
autoencoder.compile(optimizer='adam',
loss = tf.keras.losses.MeanAbsolutePercentageError(),
metrics = ["mse", "mae" ])
history = autoencoder.fit (X_train, X_train,
epochs=100,
batch_size=3000,
shuffle=True,
validation_data=(X_test, X_test),
verbose=1).history
这是三个第一个和三个最后一个训练时期:
Epoch 1/100
2/3 [===================>..........] - ETA: 0s - loss: 150489.0781 - mse: 0.1862 - mae: 0.1872WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0110s vs `on_train_batch_end` time: 0.8014s). Check your callbacks.
INFO:tensorflow:Assets written to: model.z1\assets
3/3 [==============================] - 3s 976ms/step - loss: 144704.7656 - mse: 0.1862 - mae: 0.1872 - val_loss: 52245.2031 - val_mse: 0.1862 - val_mae: 0.1871
Epoch 2/100
3/3 [==============================] - 0s 11ms/step - loss: 52511.4844 - mse: 0.1862 - mae: 0.1871 - val_loss: 61640.7930 - val_mse: 0.1861 - val_mae: 0.1871
Epoch 3/100
3/3 [==============================] - 0s 11ms/step - loss: 56115.3867 - mse: 0.1861 - mae: 0.1871 - val_loss: 80056.0391 - val_mse: 0.1861 - val_mae: 0.1871
...
Epoch 97/100
3/3 [==============================] - 0s 9ms/step - loss: 42958.1523 - mse: 0.1824 - mae: 0.1845 - val_loss: 42867.2852 - val_mse: 0.1824 - val_mae: 0.1845
Epoch 98/100
3/3 [==============================] - 0s 9ms/step - loss: 44007.1992 - mse: 0.1824 - mae: 0.1845 - val_loss: 46307.1250 - val_mse: 0.1824 - val_mae: 0.1845
Epoch 99/100
3/3 [==============================] - 0s 11ms/step - loss: 45433.2266 - mse: 0.1824 - mae: 0.1845 - val_loss: 46171.5547 - val_mse: 0.1824 - val_mae: 0.1845
Epoch 100/100
3/3 [==============================] - 0s 9ms/step - loss: 45709.4219 - mse: 0.1824 - mae: 0.1845 - val_loss: 47696.0117 - val_mse: 0.1824 - val_mae: 0.1845
我可能做错了什么?我尝试添加和删除隐藏层来处理模型的复杂性,但没有任何帮助。是否不可能为表格数据集训练自动编码器?数据集相当大,(1460, 276),那么为什么自动编码器拟合不足?
解决方案
推荐阅读
- git - 如何强制 git diff 创建 git 二进制补丁?
- php - 使用 PHP MYSQL 在三个插槽中获取记录
- process - 关于进程控制块和执行命令的说明
- python - 从行中提取数据并添加到特定编号的框中
- c - 如何修复 C 代码的容器配置
- android - 如果我使用视图绑定,如何找到 NavController?
- webpack - Material UI:元素类型无效:需要一个字符串(对于内置组件)或一个类/函数(对于复合组件)但得到:null
- java - 我的 String.split() 函数运行不正常,我不知道为什么?
- c++ - 在qt上找不到libssh的功能
- python - Ubuntu 卸载所有 Python 和 Pip