首页 > 解决方案 > com.google.gson.JsonSyntaxException:java.lang.IllegalStateException:预期的 BEGIN_OBJECT 在 cross_validation_metrics_summary

问题描述

我正在使用H2ODRFH2OGridSearch模型创建具有随机离散网格搜索超参数优化的随机森林管道。但是,当我将 nfolds 设置为大于 1 的任何数字并调用fit()时,我会收到错误消息。我的代码如下所示:

val drf =  new H2ODRF()
    .setFeaturesCols(featuresCols)
    .setLabelCol(labelCol)
    .setColumnsToCategorical(categoricalCols)
    .setSplitRatio(splitRatio)
    .setNfolds(4)

val nps = Map(
        "ntrees" -> Array(10, 50).map(_.asInstanceOf[AnyRef]))

val search = new H2OGridSearch()
    .setHyperParameters(hyperParams)
    .setAlgo(drf)

val model = search.fit(data) // data is a Spark DataFrame
com.google.gson.JsonSyntaxException: java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 608096 path $.cross_validation_metrics_summary[0].data[0][0]
  at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:224)
  at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:41)
  at com.google.gson.internal.bind.ArrayTypeAdapter.read(ArrayTypeAdapter.java:72)
  at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:41)
  at com.google.gson.internal.bind.ArrayTypeAdapter.read(ArrayTypeAdapter.java:72)
  at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.read(ReflectiveTypeAdapterFactory.java:129)
  at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:220)
  at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:41)
  at com.google.gson.internal.bind.ArrayTypeAdapter.read(ArrayTypeAdapter.java:72)
  at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.read(ReflectiveTypeAdapterFactory.java:129)
  at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:220)
  at com.google.gson.Gson.fromJson(Gson.java:887)
  at com.google.gson.Gson.fromJson(Gson.java:852)
  at com.google.gson.Gson.fromJson(Gson.java:801)
  at ai.h2o.sparkling.backend.utils.RestCommunication$class.ai$h2o$sparkling$backend$utils$RestCommunication$$deserialize(RestCommunication.scala:164)
  at ai.h2o.sparkling.backend.utils.RestCommunication$$anonfun$request$1.apply(RestCommunication.scala:147)
  at ai.h2o.sparkling.backend.utils.RestCommunication$$anonfun$request$1.apply(RestCommunication.scala:145)
  at ai.h2o.sparkling.utils.ScalaUtils$.withResource(ScalaUtils.scala:28)
  at ai.h2o.sparkling.backend.utils.RestCommunication$class.request(RestCommunication.scala:145)
  at ai.h2o.sparkling.ml.algos.H2OGridSearch.request(H2OGridSearch.scala:46)
  at ai.h2o.sparkling.backend.utils.RestCommunication$class.query(RestCommunication.scala:54)
  at ai.h2o.sparkling.ml.algos.H2OGridSearch.query(H2OGridSearch.scala:46)
  at ai.h2o.sparkling.ml.algos.H2OGridSearch.getGridModels(H2OGridSearch.scala:129)
  at ai.h2o.sparkling.ml.algos.H2OGridSearch.fit(H2OGridSearch.scala:163)
  at ai.h2o.sparkling.ml.algos.H2OGridSearch.fit(H2OGridSearch.scala:46)
  at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:153)
  at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:149)
  at scala.collection.Iterator$class.foreach(Iterator.scala:891)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
  at scala.collection.IterableViewLike$Transformed$class.foreach(IterableViewLike.scala:44)
  at scala.collection.SeqViewLike$AbstractTransformed.foreach(SeqViewLike.scala:37)
  at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:149)
  ... 59 elided
Caused by: java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 608096 path $.cross_validation_metrics_summary[0].data[0][0]
  at com.google.gson.stream.JsonReader.beginObject(JsonReader.java:385)
  at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:213)
  ... 90 more

该错误似乎是由cross_validation_metrics_summary仅在 Nfolds 大于 1 时返回的字段引起的。是否有解决此问题的方法?

编辑:我正在使用Prostate Data并使用 Spark 版本2.4.4、Scala 版本2.11.12,并使用以下苏打水版本ai.h2o:sparkling-water-package_2.11:3.30.0.4-1-2.4

编辑:通过 Sparkling Water 源代码搜索后,开始看起来问题出在GridSchemaV99. 是否有我应该更新的设置/配置来寻找不同的架构?

标签: apache-sparkh2osparkling-water

解决方案


推荐阅读