首页 > 解决方案 > Spark Cassandra CassandraSourceRelation directJoinSetting 异常错误

问题描述

    

    // Input Identifiers
    val ids = List("4723847392423894", "4329479647236423", "42348726782684")


    import spark.implicits._
    val settings = Map("table" -> "table_name", "keyspace" -> "keyspace_name")
    val tableDF = spark.read.format("org.apache.spark.sql.cassandra").options(settings).load()
    val idsListDF = ids.asInstanceOf[List[String]].toDF("id").persist()
    idsListDF.join(tableDF, tableDF.col("id") === idsListDF.col("id"), "inner").persist()



例外

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.cassandra.CassandraSourceRelation.directJoinSetting()Lorg/apache/spark/sql/cassandra/DirectJoinSetting;
    at org.apache.spark.sql.cassandra.execution.CassandraDirectJoinStrategy$.containsSafePlans(CassandraDirectJoinStrategy.scala:333)
    at org.apache.spark.sql.cassandra.execution.CassandraDirectJoinStrategy$.validJoinBranch(CassandraDirectJoinStrategy.scala:283)
    at org.apache.spark.sql.cassandra.execution.CassandraDirectJoinStrategy.rightValid(CassandraDirectJoinStrategy.scala:139)
    at org.apache.spark.sql.cassandra.execution.CassandraDirectJoinStrategy.hasValidDirectJoin(CassandraDirectJoinStrategy.scala:87)
    at org.apache.spark.sql.cassandra.execution.CassandraDirectJoinStrategy.apply(CassandraDirectJoinStrategy.scala:30)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
    at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)



你能帮我看看代码有什么问题吗?

我已经尝试过directJoin(Automatic)自动,总是,总是关闭,但仍然没有运气

idsListDF.join(tableDF.directJoin(Automatic), tableDF.col("batch_id") === idsListDF.col("id"), "inner").persist()

仅供参考 - 我正在使用 Spark Cassandra 连接器 jar - https://github.com/datastax/spark-cassandra-connector

标签: dataframeapache-sparkcassandraapache-spark-sqlspark-cassandra-connector

解决方案


尽管我无法查明原因,但这看起来像是一个环境问题。

我已经在 DataStax 聘请了分析团队,一旦收到回复,我将发布更新。干杯!

PS 感谢您发布 Spark + 连接器版本。我建议使用这些详细信息更新您的原始问题,以便其他贡献者更容易帮助您。


推荐阅读