首页 > 解决方案 > XGBoost4j-spark 从局部模型预测稀疏向量

问题描述

我在 Databricks 上运行。我正在尝试使用在 R 中本地训练的模型来使用inxgboost进行分布式预测。数据位于 中,其中包含来自 的稀疏向量的特征列。我已经成功地在这种格式的数据上训练了一个不相关的模型。xgboost4j-sparkscalaDataframeorg.apache.spark.ml.linalg.Vectors.sparse

数据如下所示:

train_sparse.filter("ID == 1").show(false)
+-----------+------------------------------------------+
|ID|feature_vector                            |
+-----------+------------------------------------------+
|1          |(4056,[0,1,1097,2250],[26.0,1.0,1.0,57.0])|
+-----------+------------------------------------------+

必须首先创建一个桥类才能加载到本地模型中。

%scala
package ml.dmlc.xgboost4j.scala.spark2
import ml.dmlc.xgboost4j.scala.Booster
import ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel
class XGBoostRegBridge(
    uid: String,
    _booster: Booster) {
  val xgbRegressionModel = new XGBoostRegressionModel(uid, _booster)
}

import ml.dmlc.xgboost4j.scala.spark2._
import ml.dmlc.xgboost4j.scala.XGBoost
val model = XGBoost.loadModel("/dbfs/FileStore/tmp/xgb53.model")
val bri = new XGBoostRegBridge("uid", model)
bri.xgbRegressionModel.setFeaturesCol("feature_vector")
var pred = bri.xgbRegressionModel.transform(train_sparse)
pred.show()

Job aborted due to stage failure.
Caused by: XGBoostError: [17:36:06] /workspace/jvm-packages/xgboost4j/src/native/xgboost4j.cpp:159: [17:36:06] /workspace/jvm-packages/xgboost4j/src/native/xgboost4j.cpp:78: Check failed: jenv->ExceptionOccurred(): 
Stack trace:
  [bt] (0) /local_disk0/tmp/libxgboost4j3687488462117693459.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x53) [0x7f0ff8810843]
  [bt] (1) /local_disk0/tmp/libxgboost4j3687488462117693459.so(XGBoost4jCallbackDataIterNext+0xd10) [0x7f0ff880d960]
  [bt] (2) /local_disk0/tmp/libxgboost4j3687488462117693459.so(xgboost::data::SimpleDMatrix::SimpleDMatrix<xgboost::data::IteratorAdapter<void*, int (void*, int (*)(void*, XGBoostBatchCSR), void*), XGBoostBatchCSR> >(xgboost::data::IteratorAdapter<void*, int (void*, int (*)(void*, XGBoostBatchCSR), void*), XGBoostBatchCSR>*, float, int)+0x2f8) [0x7f0ff8902268]
  [bt] (3) /local_disk0/tmp/libxgboost4j3687488462117693459.so(xgboost::DMatrix* xgboost::DMatrix::Create<xgboost::data::IteratorAdapter<void*, int (void*, int (*)(void*, XGBoostBatchCSR), void*), XGBoostBatchCSR> >(xgboost::data::IteratorAdapter<void*, int (void*, int (*)(void*, XGBoostBatchCSR), void*), XGBoostBatchCSR>*, float, int, std::string const&, unsigned long)+0x45) [0x7f0ff88f79b5]
  [bt] (4) /local_disk0/tmp/libxgboost4j3687488462117693459.so(XGDMatrixCreateFromDataIter+0x152) [0x7f0ff881e682]
  [bt] (5) /local_disk0/tmp/libxgboost4j3687488462117693459.so(Java_ml_dmlc_xgboost4j_java_XGBoostJNI_XGDMatrixCreateFromDataIter+0x96) [0x7f0ff880b7b6]
  [bt] (6) [0x7f1020017ee7]


Stack trace:
  [bt] (0) /local_disk0/tmp/libxgboost4j3687488462117693459.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x53) [0x7f0ff8810843]
  [bt] (1) /local_disk0/tmp/libxgboost4j3687488462117693459.so(XGBoost4jCallbackDataIterNext+0xdc4) [0x7f0ff880da14]
  [bt] (2) /local_disk0/tmp/libxgboost4j3687488462117693459.so(xgboost::data::SimpleDMatrix::SimpleDMatrix<xgboost::data::IteratorAdapter<void*, int (void*, int (*)(void*, XGBoostBatchCSR), void*), XGBoostBatchCSR> >(xgboost::data::IteratorAdapter<void*, int (void*, int (*)(void*, XGBoostBatchCSR), void*), XGBoostBatchCSR>*, float, int)+0x2f8) [0x7f0ff8902268]
  [bt] (3) /local_disk0/tmp/libxgboost4j3687488462117693459.so(xgboost::DMatrix* xgboost::DMatrix::Create<xgboost::data::IteratorAdapter<void*, int (void*, int (*)(void*, XGBoostBatchCSR), void*), XGBoostBatchCSR> >(xgboost::data::IteratorAdapter<void*, int (void*, int (*)(void*, XGBoostBatchCSR), void*), XGBoostBatchCSR>*, float, int, std::string const&, unsigned long)+0x45) [0x7f0ff88f79b5]
  [bt] (4) /local_disk0/tmp/libxgboost4j3687488462117693459.so(XGDMatrixCreateFromDataIter+0x152) [0x7f0ff881e682]
  [bt] (5) /local_disk0/tmp/libxgboost4j3687488462117693459.so(Java_ml_dmlc_xgboost4j_java_XGBoostJNI_XGDMatrixCreateFromDataIter+0x96) [0x7f0ff880b7b6]
  [bt] (6) [0x7f1020017ee7]

这是某种类型的迭代器错误,但我没有使用自定义迭代器。

标签: scalaapache-sparkxgboost

解决方案


刚需要bri.xgbRegressionModel.setMissing(0.0F),现在可以了


推荐阅读