首页 > 解决方案 > AnalysisException:在数据架构中发现重复的列:`hour`、`eventTime`

问题描述

我想从json文件中加载数据,但是我得到这个异常: AnalysisException: Found duplicate column(s) in the data schema: hour, eventTime,这是我的代码

ss.sqlContext.setConf("spark.sql.caseSensitive", "true")
val pathList = buildFilePath(eid, url, startTime, endTime)
println(pathList)
val writePath = "/result/" + id + "/" + eid
ss.read
  .json(pathList: _*)
  .select(columns.split(",").map(m => new Column(m.trim)): _*)
  .repartition(1)
  .write.option("header", "true").csv(writePath)

ss.close()

def buildFilePath(eid: String, urls: String, startTime: String, endTime: String): List[String] = {
var eventPath = ""
if (eid.equals("1")) {
  eventPath = basePath + "/event1"
} else if (eid.equals("2")) {
  eventPath = basePath + "/event2"
}
urls
  .split(",")
  .flatMap(url => {
    val dateList = getTimeRange(startTime, endTime, "yyyy-MM-dd")
    dateList
      .par
      .map(date => eventPath + "/" + url.trim + "/" + date)
      .flatMap(p => Hdfs.files(p).flatMap(f => Hdfs.files(f)))
  })
  .map(m => m.toString)
  .toList

}

标签: scalaapache-spark

解决方案


问题解决了。由于加载多个文件,它需要这样做: .json(ss.read.textFile(pathList: _*))


推荐阅读