scala - 与 combineByKey 相关的查询
问题描述
对于以下输入 => 使用combineByKey[('B', 1), ('B', 2), ('A', 3), ('A', 4), ('A', 5)]
处理后,我期待以下输出
预期输出 =>[('A', [(3, 9), (4, 16), (5, 25)]), ('B', [(1, 1), (2, 4)])]
scala> val x = sc.parallelize(Array(('B',1),('B',2),('A',3),('A',4),('A',5)))
x: org.apache.spark.rdd.RDD[(Char, Int)] = ParallelCollectionRDD[46] at parallelize at <console>:24
scala> def createCombiner (element:Int) :String = (element.toString + "," + Math.pow(element,2).toInt)
createCombiner: (element: Int)String
scala> def mergeValue (accumlator:String, element:Int) : String = (accumlator + (element.toString + Math.pow(element,2).toInt))
mergeValue: (accumlator: String, element: Int)String
scala> def mergeComb (accumlator:String ,accumlator1:String):String = (accumlator + accumlator1)
mergeComb: (accumlator: String, accumlator1: String)String
scala> val combRDD = x.map(t => (t._1, (t._2))).combineByKey(createCombiner, mergeValue, mergeComb)
combRDD: org.apache.spark.rdd.RDD[(Char, String)] = ShuffledRDD[48] at combineByKey at <console>:31
scala> combRDD.collect
res39: Array[(Char, String)] = Array((A,3,94,165,25), (B,1,12,4))
我无法获得预期的输出。因为,我对火花很陌生,我需要一些输入。
解决方案
关于什么:
scala> val x = sc.parallelize(Array(('B',1),('B',2),('A',3),('A',4),('A',5)))
scala> def createCombiner(element:Int) : List[(Int, Int)] = List(element -> element * element)
scala> def mergeValue (accumulator: List[(Int, Int)], element:Int) : List[(Int, Int)] = accumulator ++ createCombiner(element)
scala> def mergeComb (accumulator: List[(Int, Int)], accumulator1: List[(Int, Int)]): List[(Int, Int)] = (accumulator ++ accumulator1)
scala> val combRDD = x.combineByKey(createCombiner, mergeValue, mergeComb)
scala> combRDD.collect
// res0: Array[(Char, List[(Int, Int)])] = Array((A,List((3,9), (4,16), (5,25))), (B,List((1,1), (2,4))))
// Or
scala> combRDD.mapValues(_.mkString("[", ", ", "]")).collect
res1: Array[(Char, String)] = Array((A,[(3,9), (4,16), (5,25)]), (B,[(1,1), (2,4)]))
推荐阅读
- java - 了解 Google Cloud Platform 上的实例小时数
- python - 如何创建临时令牌 django
- html - 如何将一个href页面制作成另一个文件夹中的入门工具包
- excel - 当隐藏一列而过滤另一列时,Excel 无法剪切/粘贴行
- swift - 闭包中的弱引用
- python - Dictionary to global environment python
- ios - 超出目标 c 限制的值的算术运算
- vaadin - vaadin 10如何在左侧创建垂直菜单栏并在右侧显示内容
- arrays - Python:一种从批处理中擦除相等/相似的numpy数组的快速方法
- haskell - Haskell中非空多叶树的应用实例