首页 > 解决方案 > 加入 RDD 并映射值 - 构造函数无法实例化为预期类型

问题描述

我是 Scala 和 Spark 世界的新手,因此需要一点帮助。

我有两个 vals,两个 -RDD[((String, String), Double)]

以及以下值:-

RDD1 = 
((a, b), 10)
((c, d), 20)
((g, h),50)

RDD2 = 
((a, b), 20)
((e, f), 30)
((g, h), 10)

所需的输出是:-

(a, b, 30)
(c, d, 20)
(e, f, 30)
(g, h, 60)

由于某些政策,我很抱歉发布模拟数据,但非常感谢任何帮助。

我试过了:-

    val joined = rdd1.fullOuterJoin(rdd2).map{case(x, y, z) => (x._1, x._2, y+z)}

but seems I'm making some mistake. It shows error that:-
[error] ...../class.scala:59: constructor cannot be instantiated to expected type;
[error]  found   : (T1, T2, T3)
[error]  required: ((String, String), (Option[Double], Option[Double]))
[error]       val joined = rdd1.fullOuterJoin(rdd2).map{case(x, y, z) => (x._1, x._2, y._1+z._1)}
[error]                                                                    ^
[error] ...../class.scala:59: not found: value x
[error]       val joined = rdd1.fullOuterJoin(rdd2).map{case(x, y, z) => (x._1, x._2, y._1+z._1)}
[error]                                                                                  ^
[error] ...../class.scala:59: not found: value x
[error]       val joined = rdd1.fullOuterJoin(rdd2).map{case(x, y, z) => (x._1, x._2, y._1+z._1)}
[error]                                                                                        ^
[error] ...../class.scala:59: not found: value y
[error]       val joined = rdd1.fullOuterJoin(rdd2).map{case(x, y, z) => (x._1, x._2, y._1+z._1)}
[error]                                                                                              ^
[error] four errors found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 3 s, completed 25 Jul, 2018 6:54:09 PM

任何帮助将不胜感激

标签: scalaapache-sparkrdd

解决方案


试试这个,因为double值是Options

val rdd1 = sc.parallelize(Seq((("a","b"),10.0),
                          (("c","d"),20.0),
                          (("g","h"),50.0)))

val rdd2 = sc.parallelize(Seq((("a","b"),20.0),
                          (("e","f"),30.0),
                          (("g","h"),10.0)))

rdd1.fullOuterJoin(rdd2).map {case ((x1, x2), (y1, y2))  => (x1,x2,y1.getOrElse(0.0) + y2.getOrElse(0.0))}.collect.foreach(println)

//((g,h),60)
//((c,d),20)
//((e,f),30)
//((a,b),30)

推荐阅读