首页 > 解决方案 > 如何对键的转换进行reduceByKey并返回整个记录

问题描述

我分别有一个带有[String, Int]类型列的 RDD。

RDD 值如下:

("A x",3)
("A y",4)
("A z",1)
("B y",2)
("C w",5)
("C y",2)
("E x",1)
("E z",3)

我想要完成的是获得这样的RDD (String,Int)

("A y",4) #among the key's that contains y, (A y) has the max value
("A x",3) #among the key's that contains x, (A x) has the max value
("E z",3) #among the key's that contains z, (E z) has the max value
("C w",5) #among the key's that contains w, (C w) has the max value

我尝试了一个循环概念(通过使用计数器),flatMap但它不起作用。是否有捷径可寻?

标签: pythonpysparkrddreduce

解决方案


推荐阅读