scala - generate multiple returns under spark
问题描述
I am using ALS library from Spark, and have some problems of generating multiple returns from one row. say I have a file, and the separator for one line is '#'. So here is what I got so far:
val ratings : RDD[Rating] = data.map(_.split('#')).map(items => {
for (i <- 1 until items.length)
if ( items(i).length() > 2)
Rating(items(0).toInt, i.toInt, items(i).toDouble)
})
so ideally, i would like to generate the data with Rating type, but the error shows "type mismatched: found Unit, required: org.apache.spark.mllib.recommendation.Rating"
Is there a way to create multiple rows from one row in Spark using Scala? Any thoughts?
I am using spark 2.1.X and scala 2.11.
解决方案
You should also pass something (Rating(-1, -1, -1.0)), in case if ( items(i).length() > 2)
this condition not satisfy. and then filter the value based on Rating(-1, -1, -1.0)
example
val ratings : RDD[Rating] = data.map(_.split('#')).map(items =>
{
for (i <- 1 until items.length)
if ( items(i).length() > 2)
Rating(items(0).toInt, i.toInt, items(i).toDouble)
else
Rating(-1, -1, -1.0)
}
)
推荐阅读
- c++ - GLSL忽略片段着色器中的统一布尔比较?
- ruby-on-rails - Rails ActiveStorage:DirectUpload 回调
- c# - 从 .Net 调用 AWS Step Functions
- ajax - Wordpress:从帖子ID(ajax)返回页面的HTML
- php - json_decode() 期望参数 1 是给定的字符串数组
- javascript - 用树状结构组织数据列表中的节点
- javascript - 使用 iframe 检测不活动
- python - 以 Excel 形式读取 Pandas - Python
- c# - 如何将字典中的所有值从一个键加到下一个键?
- node.js - 如何调试挂着的快递服务器?