首页 > 解决方案 > toDF 不是 org.apache.spark.rdd.RDD 的成员

问题描述

我一直在尝试在 spark shell 中运行这个 spark 程序,但它抛出了这个错误,我已经导入了隐式但没有更改。

这里我想使用 toDF 方法将 RDD 转换为 DataFrame 但我无法识别错误。

代码:

scala> {
     |   case class HService(
     |                        uhid:String,
     |                        locationid:String,
     |                        doctorid:String,
     |                        billdate: String,
     |                        servicename: String,
     |                        servicequantity: String,
     |                        starttime: String,
     |                        endtime: String,
     |                        servicetype: String,
     |                        servicecategory: String,
     |                        deptname: String
     |                      )
     | 
     |   def main(args: Array[String])
     |   {
     | 
     |     val conf = new SparkConf().setAppName("HHService") // Configuration conf = new Configuration();
     | 
     |     val sc = new SparkContext(conf) // Job job = Job.getInstance(conf, "word count");
     | 
     |     val sqlContext = new org.apache.spark.sql.SQLContext(sc)
     | 
     |     import sqlContext.implicits._
     | 
     |     val hospitalDataText = sc.textFile("/home/training/Desktop/Data/services.csv")
     |     val header = hospitalDataText.first()
     |     val hospitalData = hospitalDataText.filter(a => a != header)
     |     val hData = hospitalData.map(_.split(",")).map(p => HService(p(0),p(1),p(2),p(3),p(4),p(5),p(6),p(7),p(8),p(9),p(10)))
     |     hData.take(4).foreach(println)
     |     val hosService = hData.toDF()
     |     hosService.registerTempTable("HService")
     |     val results =sqlContext.sql("SELECT doctorid, count(uhid) as visits FROM HService GROUP BY doctorid order by visits desc")
     |     results.collect().foreach(println)
     |   }
     | }

错误:

<console>:61: error: value toDF is not a member of org.apache.spark.rdd.RDD[HService]
           val hosService = hData.toDF()
                                  ^

标签: scalaapache-sparkspark-dataframe

解决方案


看来您没有使用SparkSession,以下示例代码有效:

import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder.master("local[4]").getOrCreate
import spark.implicits._
val hospitalDataText = spark.read.textFile("/tmp/services.csv")
val hData = hospitalDataText.map(_.split(",")).map(p => HService(p(0),p(1),p(2),p(3),p(4),p(5),p(6),p(7),p(8),p(9),p(10)))
val hosService = hData.toDF()


hData: org.apache.spark.sql.Dataset[HService] = [uhid: string, locationid: string ... 9 more fields]
hosService: org.apache.spark.sql.DataFrame = [uhid: string, locationid: string ... 9 more fields]

推荐阅读