首页 > 解决方案 > generate multiple returns under spark

问题描述

I am using ALS library from Spark, and have some problems of generating multiple returns from one row. say I have a file, and the separator for one line is '#'. So here is what I got so far:

val ratings : RDD[Rating] = data.map(_.split('#')).map(items => {
   for (i <- 1 until items.length) 
      if ( items(i).length() > 2)
         Rating(items(0).toInt, i.toInt, items(i).toDouble)    
})

so ideally, i would like to generate the data with Rating type, but the error shows "type mismatched: found Unit, required: org.apache.spark.mllib.recommendation.Rating"

Is there a way to create multiple rows from one row in Spark using Scala? Any thoughts?

I am using spark 2.1.X and scala 2.11.

标签: scalaapache-sparkapache-spark-mllib

解决方案


You should also pass something (Rating(-1, -1, -1.0)), in case if ( items(i).length() > 2) this condition not satisfy. and then filter the value based on Rating(-1, -1, -1.0)

example

val ratings : RDD[Rating] = data.map(_.split('#')).map(items => 
{
   for (i <- 1 until items.length) 
       if ( items(i).length() > 2)
          Rating(items(0).toInt, i.toInt, items(i).toDouble) 
       else
          Rating(-1, -1, -1.0) 

}
)

推荐阅读