首页 > 解决方案 > tfdatasets上的R keras模型,重复列

问题描述

我试图弄清楚如何在我的模型中使用 tfdataset,原因在这里提到: Keras predict repeat columns 但是我一直试图让代码运行。以下是我尝试测试 tfdatasets 如何工作的一段代码,但它不起作用。此代码只是尝试使用 keras 模型来预测(或拟合)具有 3 列(c1、c2、c3)和目标列 y 的数据集。

library(keras)
library(tfaddons)
library(tfdatasets)

alldata <- data.frame(c1=rnorm(100),c2=rnorm(100),c3=rnorm(100), target=rnorm(100))
alldata_xvar <- c("c1","c2","c3")

write.csv(alldata,file = "alldata.csv", row.names=F)

alldata_spec <- csv_record_spec("alldata.csv")

xfeatures <- alldata_xvar
model <- keras_model_sequential()
model %>% 
  layer_dense(units = length(xfeatures), activation = "relu", input_shape = length(xfeatures)) %>% 
  layer_dense(units = 1) 


dataset <- text_line_dataset("alldata.csv", record_spec = alldata_spec) 
dataset %>% dataset_prepare(x = xfeatures,y="target")

predicted <- model %>%predict(dataset) 

错误信息是:

Error in py_call_impl(callable, dots$args, dots$keywords) : 
  ValueError: in user code:

    File "/.local/share/r-miniconda/envs/r-reticulate/lib/python3.7/site-packages/keras/engine/training.py", line 1621, in predict_function  *
        return step_function(self, iterator)
    File "/.local/share/r-miniconda/envs/r-reticulate/lib/python3.7/site-packages/keras/engine/training.py", line 1611, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/.local/share/r-miniconda/envs/r-reticulate/lib/python3.7/site-packages/keras/engine/training.py", line 1604, in run_step  **
        outputs = model.predict_step(data)
    File "/.local/share/r-miniconda/envs/r-reticulate/lib/python3.7/site-packages/keras/engine/training.py", line 1572, in predict_step
        return self(x, training=False)
    File "/.local/share/r-miniconda/envs/r-reticulate/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 67, in error_handl

我的最终目标是在数据集有 3 列 (c1,c2,c3) 和 100 个额外列 (RegionVar1,...RegionVar100) 的所有行具有相同值时运行预测。我不想在我的文件 alldata.csv 中包含所有这 100 列(这会使文件太大),所以我想我可以使用 dataset_map 函数向从文件 alldata.csv 读取的数据添加额外信息。在下面的代码中, region[regionvar,] 是一个长度为 100 的向量,其中包含 (RegionVar1,...RegionVar100) 的值,我希望所有记录(行)都相同。

dataset <- dataset %>% 
  dataset_map(function(record) {
    record$Region <- region[regionvar,]
    record
  })

以下是我的代码,它显然不起作用(因为第一段代码不起作用):

library(keras)
library(tfaddons)
library(tfdatasets)
numregion <- 50
numvarregion <-  100
alldata <- data.frame(c1=rnorm(100),c2=rnorm(100),c3=rnorm(100), target=rnorm(100))
alldata_xvar <- c("c1","c2","c3")
region <- matrix( rnorm(50*numvarregion), 50, numvarregion) 
regionvar_name1 <- rep("RegionVar",numvarregion)
regionvar_name2 <- seq(1,numvarregion)
regionvar_name <- cbind(regionvar_name1,regionvar_name2)
regionvar_name <- apply(regionvar_name,1,paste,collapse="")
region <- as.data.frame(region)
names(region) <- regionvar_name



region <- array_reshape(as.matrix(region), c(nrow(region), ncol(region)))


write.csv(alldata,file = "alldata.csv", row.names=F)
alldata_spec <- csv_record_spec("alldata.csv")

xfeatures <- c(alldata_xvar,names(region)) 
model <- keras_model_sequential()
model %>% 
  layer_dense(units = length(xfeatures), activation = "relu", input_shape = length(xfeatures)) %>% 
  layer_dense(units = 1) 


regionvar <- 1
dataset <- text_line_dataset("alldata.csv", record_spec = alldata_spec) 
dataset <- dataset %>% 
  dataset_map(function(record) {
    record$Region <- region[regionvar,]
    record
  })
dataset %>% dataset_prepare(x = xfeatures,y="target")
predicted <- model %>%predict(dataset) 

您能否就我应该如何使代码运行提出建议,或者是否有其他方法可以解决原始链接中重复列的问题?

标签: rtensorflowkeras

解决方案


推荐阅读