scala - Spark Scala: "cannot resolve symbol saveAsTextFile (reduceByKey)" - IntelliJ Idea
问题描述
I suppose some dependencies are not defined in build.sbt file.
I've added library dependencies in build.sbt file, but still I'm getting this error mentioned from title of this question. Try to search for solution on the google but couldn't find it
My spark scala source code (filterEventId100.scala) :
package com.projects.setTopBoxDataAnalysis
import java.lang.System._
import java.text.SimpleDateFormat
import java.util.Date
import org.apache.spark.sql.SparkSession
object filterEventId100 extends App {
if (args.length < 2) {
println("Usage: JavaWordCount <Input-File> <Output-file>")
exit(1)
}
val spark = SparkSession
.builder
.appName("FilterEvent100")
.getOrCreate()
val data = spark.read.textFile(args(0)).rdd
val result = data.flatMap{line: String => line.split("\n")}
.map{serverData =>
val serverDataArray = serverData.replace("^", "::")split("::")
val evenId = serverDataArray(2)
if (evenId.equals("100")) {
val serverId = serverDataArray(0)
val timestempTo = serverDataArray(3)
val timestempFrom = serverDataArray(6)
val server = new Servers(serverId, timestempFrom, timestempTo)
val res = (serverId, server.dateDiff(server.timestampFrom, server.timestampTo))
res
}
}.reduceByKey{
case(x: Long, y: Long) => if ((x, y) != null) {
if (x > y) x else y
}
}
result.saveAsTextFile(args(1))
spark.stop
}
class Servers(val serverId: String, val timestampFrom: String, val timestampTo: String) {
val DATE_FORMAT = "yyyy-MM-dd hh:mm:ss.SSS"
private def convertStringToDate(s: String): Date = {
val dateFormat = new SimpleDateFormat(DATE_FORMAT)
dateFormat.parse(s)
}
private def convertDateStringToLong(dateAsString: String): Long = {
convertStringToDate(dateAsString).getTime
}
def dateDiff(tFrom: String, tTo: String): Long = {
val dDiff = convertDateStringToLong(tTo) - tFrom.toLong
dDiff
}
}
My build.sbt file:
name := "SetTopProject"
version := "0.1"
scalaVersion := "2.12.8"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.4.3" exclude ("org.apache.hadoop","hadoop-yarn-server-web-proxy"),
"org.apache.spark" %% "spark-sql_2.12" % "2.4.3" exclude ("org.apache.hadoop","hadoop-yarn-server-web-proxy"),
"org.apache.hadoop" %% "hadoop-common" % "3.2.0" exclude ("org.apache.hadoop","hadoop-yarn-server-web-proxy"),
"org.apache.spark" %% "spark-sql_2.12" % "2.4.3" exclude ("org.apache.hadoop","hadoop-yarn-server-web-proxy"),
"org.apache.spark" %% "spark-hive_2.12" % "2.4.3" exclude ("org.apache.hadoop","hadoop-yarn-server-web-proxy"),
"org.apache.spark" %% "spark-yarn_2.12" % "2.4.3" exclude ("org.apache.hadoop","hadoop-yarn-server-web-proxy")
)
I was expecting everything will be fine because
val spark = SparkSession
.builder
.appName("FilterEvent100")
.getOrCreate()
is defined well (without any compiler's errors) and I use spark value to define data value:
val data = spark.read.textFile(args(0)).rdd
which calls saveAsTextFile and reducedByKey functions:
val result = data.flatMap{line: String => line.split("\n")}...
}.reducedByKey {case(x: Long, y: Long) => if ((x, y) != null) {
if (x > y) x else y
}
result.saveAsTextFile(args(1))
What I should to to remove compiler errors for saveAsTextFile
and reduceByKey
functions calls?
解决方案
代替
val spark = SparkSession
.builder
.appName("FilterEvent100")
.getOrCreate()
val data = spark.read.textFile(args(0)).rdd
至
val conf = new SparkConf().setAppName("FilterEvent100")
val sc = new SparkContext(conf)
val spark = SparkSession.builder.config(sc.getConf).getOrCreate()
val data = sc.textfile(args(0))
推荐阅读
- arrays - 计算文件中有多少这个词,但其中有很多要分开
- graphics - 如何在 R 中仅绘制 ggdag 的一个调整集?
- isabelle - 评估复杂的集合理解表达式
- c - 无法从设备驱动程序创建的设备中读取
- javascript - 无法读取未定义的属性(读取“名称”)?
- node.js - RSocketTcpClient 不是构造函数 - 带有 Rsocket-js 的 NodeJS
- webpack - webpack-dev-server 看起来与 webpack 构建不同
- php - PHP - 允许将函数重新声明为包含来自另一个文件
- xslt - XSLT 分组和计算总数
- reactjs - .map 函数在页面上多次重新渲染