首页 > 解决方案 > Spark scala异常重载方法值foreach

问题描述

  def main(args: Array[String]): Unit = {

    val conf = new SparkConf().setAppName("Spardsl").setMaster("local")
    val sc = new SparkContext(conf)
    sc.setLogLevel("ERROR")
    val sparksession= SparkSession.builder().getOrCreate()//spark session initialization
    val struct =
      StructType(
        StructField("txnno", StringType, true) ::
          StructField("txndate", StringType, false) ::
          StructField("custno", StringType, true)::
          StructField("amount", StringType, true)::
          StructField("category", StringType, false)::
          StructField("product", StringType, true)::
          StructField("city", StringType, true)::
          StructField("state", StringType, true)::
          StructField("spendby", StringType, false):: Nil)

    val txndf = sparksession.read.format("csv").schema(struct).load("file:///D:/bigdata_tasks/txns.csv")
    println("=============Normal txndf=======================")
    txndf.show()
    println("===================with column========================")
    val withcolumnMonth=txndf.withColumn("col_check",lit("Y")).withColumn("col_month",expr("split(txndate,'-')[0]"))
    withcolumnMonth.show()
    println("===================with column========================")
    val spenbyCol= when(col("spendby")==="credit",0)
      .when(col("spendby")==="cash",1)
      .otherwise(3)

    val withcolumnspeendBy=txndf.withColumn("col_check",lit("Y"))
      .withColumn("col_month",expr("split(txndate,'-')[0]"))
      .withColumn("col_spend",spenbyCol)
    //withcolumnspeendBy.show()
    withcolumnspeendBy.foreach(println)
      }

在intellij中运行它后,它给出了以下异常

 enterError:(43, 24) overloaded method value foreach with alternatives:
 (func: org.apache.spark.api.java.function.ForeachFunction[org.apache.spark.sql.Row])Unit <and>
  (f: org.apache.spark.sql.Row => Unit)Unit cannot be applied to (Unit)
withcolumnspeendBy.foreach(println) 

此异常的可能原因是什么?

标签: scalaapache-spark

解决方案


有几种方法可以做到这一点:

withcolumnspeendBy.collect.foreach(println)

withcolumnspeendBy.rdd.foreach(println)

withcolumnspeendBy.foreach(println(_))

每一个都会给出不同的结果。

println(_)需要将正确的函数类型传递给foreach,这需要(Row => Unit).


推荐阅读