apache-spark-sql - cloudera 中的 Hbase-Spark 连接器问题:java.lang.AbstractMethodError
问题描述
我正在尝试将 Spark 数据帧写入 Hbase,但是当我对同一数据帧执行任何操作或写入/保存方法时,它会出现以下异常:
{
java.lang.AbstractMethodError
at org.apache.spark.Logging$class.log(Logging.scala:50)
at org.apache.spark.sql.execution.datasources.hbase.HBaseFilter$.log(HBaseFilter.scala:121)
at org.apache.spark.sql.execution.datasources.hbase.HBaseFilter$.buildFilters(HBaseFilter.scala:124)
at org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD.getPartitions(HBaseTableScan.scala:60)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
----------
Here is my code:
import org.apache.spark.sql.{SQLContext, _}
import org.apache.spark.sql.execution.datasources.hbase._
import org.apache.spark.{SparkConf, SparkContext}
def catalog = s"""{
| |"table":{"namespace":"default", "name":"Contacts"},
| |"rowkey":"key",
| |"columns":{
| |"rowkey":{"cf":"rowkey", "col":"key", "type":"string"},
| |"officeAddress":{"cf":"Office", "col":"Address", "type":"string"},
| |"officePhone":{"cf":"Office", "col":"Phone", "type":"string"},
| |"personalName":{"cf":"Personal", "col":"Name", "type":"string"},
| |"personalPhone":{"cf":"Personal", "col":"Phone", "type":"string"}
| |}
| |}""".stripMargin
def withCatalog(cat: String): DataFrame = {
| spark.sqlContext
| .read
| .options(Map(HBaseTableCatalog.tableCatalog->cat))
| .format("org.apache.spark.sql.execution.datasources.hbase")
| .load()
| }
val df = withCatalog(catalog)
i was able to create dataframe, but i perform
df.show()
它给了我错误:
java.lang.AbstractMethodError
at org.apache.spark.Logging$class.log(Logging.scala:50)
at org.apache.spark.sql.execution.datasources.hbase.HBaseFilter$.log(HBaseFilter.scala:121)
at org.apache.spark.sql.execution.datasources.hbase.HBaseFilter$.buildFilters(HBaseFilter.scala:124)
at org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD.getPartitions(HBaseTableScan.scala:60)`
请给出一些建议:我正在从 Hbase 导入表并创建 catlog,并基于该创建数据框,使用:- Spark 1.6 HBase 1.2.0-cdh5.13.3 cloudera
解决方案
遇到同样的问题,我使用的是 hbase-spark 1.2.0-cdh5.8.4。
我尝试在 1.2.0-cdh5.13.0 版本上编译它,之后错误不再存在。您应该尝试重新编译源代码或使用更高版本。
推荐阅读
- vhdl - 仅在 Modelsim 上出现未知标识符错误?
- uwp - 构建 Ionic 4 Cordova 通用 Windows 应用程序显示白屏
- javascript - 使用 Flask 时 Javascript 请求导致未定义
- django-models - 使用 Django 1.11 自定义登录页面
- c++ - 从线程接收结果
- c# - Windows 服务中的 C# GetAsyncKeyState
- python-3.x - TypeError: 输出数组 img 的布局与 cv::Mat 不兼容 (step[ndims-1] != elemsize or step[1] != elemsize*nchannels)
- h2 - H2 关于 URL 用户密码等的一些混淆和澄清,
- c# - 查找文件中最后一次出现的字符串
- javascript - 如何在 Javascript ES6 中设置类的属性