首页 > 解决方案 > Spark Streaming Kafka 集成问题

问题描述

我在 Windows 机器中使用 docker 来完成我的示例 Spark + Kafka 项目。我面对

Failed to find data source: kafka. Please deploy the application as per the deploym ent section of "Structured Streaming + Kafka Integration Guide".;

构建.sbt

lazy val root = (project in file(".")).
  settings(
    inThisBuild(List(
      version := "0.1.0",
      scalaVersion := "2.12.2",
      assemblyJarName in assembly := "sparktest.jar"
)),
    name := "sparktest",
    libraryDependencies ++= List(
      "org.apache.spark" %% "spark-core" % "2.4.0",
      "org.apache.spark" %% "spark-sql" % "2.4.0",
      "org.apache.spark" %% "spark-streaming" % "2.4.0",
      "org.apache.spark" %% "spark-sql-kafka-0-10" % "2.4.0" % "provided",
      "org.apache.kafka" %% "kafka" % "2.1.0",
      "org.scalatest" %% "scalatest" % "3.0.5",
      "com.typesafe.scala-logging" %% "scala-logging" % "3.9.0"
    ),
      dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-databind" % "2.9.8",
      dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-core" % "2.9.8",
      dependencyOverrides += "com.fasterxml.jackson.module" % "jackson-module-scala_2.12" % "2.9.8")

assemblyMergeStrategy in assembly := {
  case PathList("META-INF", xs@_*) => MergeStrategy.discard
  case x => MergeStrategy.first
}

代码

val inputStreamDF = spark.readStream.format("kafka").option("kafka.bootstrap.servers", "kafka:9092")
                           .option("subscribe", "test1")
                           .option("startingOffsets", "earliest")
                           .load()

有没有人遇到过类似的问题,你是怎么解决的?

标签: dockerapache-sparkapache-kafka

解决方案


推荐阅读