scala - reduce by key using a different map
问题描述
So I have a map
key:timestamp Val(IP,seconds)
(1421927423,(59.166.0.9,0.011))
(1421927423,(59.166.0.3,0.011))
(1421927423,(59.45.0.2,27.203556))
(1421927423,(59.166.0.8,0.018))
(1421927423,(59.166.0.8,1.256667))
(1421927423,(175.45.176.2,27.203556))
(1421927424,(59.166.0.8,0.018))
(1421927426,(59.166.0.8,0.018))
and then another map finding the max of x._2
(1421927423,(175.45.176.2,27.203556))
(1421927426,(59.166.0.8,0.018))
I want to then reduce map one based on map 1, if the key and the max seconds match to add it to the new map
解决方案
于是,我走了一条不同的路……
val file = sc.textFile("UNSW-NB15_1.csv")
val splitfile = file.map(x => x.split(","));
// create map Key=Start time Value =(IP, best arrival time)
val keyval = splitfile.map(x => (x(28), (x(0), (if( x(30).toDouble>x(31).toDouble ) {x(30).toDouble} else {x(31).toDouble}) ) ) )
// create map //Key=StartTime Value=best arrival time for finding max
val newResult = splitfile.map(x => (x(28), (if( x(30).toDouble>x(31).toDouble ) {x(30).toDouble} else {x(31).toDouble}) ) )
//find max time for each key Key=StartTime Value=Max time
val findMax = newResult.reduceByKey{case (a,b) => if (a>b){a} else {b} }
//Join 2 key/value pairs Key=StartTime Value=[(IP, ArrivalTime), MaxTime
val data = keyval.join(findMax)
//Compares MaxTime With ArrivalTime for each key and only leaves ones with max
val findIPs = data.filter{ case(x,y) => y._1._2==y._2 }
findIPs.collect.foreach(println)
这很愚蠢,但它让我到了那里......
推荐阅读
- javascript - 如何在 JSS 格式上编写多个 CSS 属性?
- ruby - 使用 Ruby Faraday OpenSSL PKCS12 的 POST 请求
- hadoop - 为嵌套的 CSV 数据创建 Hive 表
- nginx - nginx 在其他 server_name 处返回 200
- javascript - 我已经安装了 jasmine 但测试无法运行总是说没有找到规格
- css - css中的浮动错误
- google-chrome - 用于 chrome 模拟设备的 OnePlus 3 和 Oneplus 5 视口大小
- java - 字段@Inject 在 Dagger2 中不起作用
- function - Select into 语句用于 null 或多个值
- angular - mat-grid-tile 使用鼠标拖动调整大小