首页 > 解决方案 > 在 Scala-Spark 中执行时出现异常 - java.lang.NumberFormatException:对于输入字符串:“volume”

问题描述

执行以下代码片段时出现异常。我正在使用的数据集是“stocks.csv”,其中包含列 - 日期、符号、交易量、开盘价、收盘价、最高价、最低价和 adjclose

    val stock = 
    sc.textFile("C:/Users/kondr/Desktop/stocks/stocks.csv")
    val splits = stock.map(record => record.split(","))
    val symvol = splits.map(arr => (arr(1),arr(2).toInt))
    val maxvol = symvol.reduceByKey((vol1,vol2) => 
    Math.max(vol1,vol2),1)
    maxvol.collect().foreach(println)

错误信息

21/05/05 14:09:31 错误执行程序:阶段 2.0(TID 2)中任务 0.0 中的异常 java.lang.NumberFormatException:对于输入字符串:java.lang.NumberFormatException.forInputString 处的“volume”(NumberFormatException.java: 65)

标签: scalaapache-sparknumberformatexception

解决方案


这是跳过第一行的方法

stock.zipWithIndex().filter(_._2 != 0)
  .map(_._1)
  .map(record => record.split(" "))
  .map(arr => (arr(1),arr(2).toInt))
  .reduceByKey((vol1,vol2) => Math.max(vol1,vol2),1)

或者您可以直接将其读取到数据框,如下所示

val csvDF = spark.read
  .option("header", true)
  .option("delimiter", " ")
  .csv("stock.txt")

csvDF.show(false)

输出:

+----------+------+-------+-----------+-----------+-----------+-----------+-----------+
|date      |symbol|volume |open       |close      |high       |low        |adjclose   |
+----------+------+-------+-----------+-----------+-----------+-----------+-----------+
|18-04-2019|A     |2874100|75.73000336|76.16999817|76.54000092|75.30999756|76.16999817|
|17-04-2019|A     |4472000|78.15000153|75.43000031|78.31999969|74.45999908|75.43000031|
|16-04-2019|A     |3441500|80.81999969|77.55000305|80.95999908|77.19000244|77.55000305|
|15-04-2019|A     |1627300|81         |80.40000153|81.12999725|79.91000366|80.40000153|
|12-04-2019|A     |1249300|81.43000031|80.98000336|82.05999756|80.90000153|80.98000336|
+----------+------+-------+-----------+-----------+-----------+-----------+-----------+

推荐阅读