首页 > 解决方案 > Scala group on named Tuple to get min value

问题描述

I have a Seq of Named tuples as follows:

Seq[GeoDetails] where GeoDetails is of the type GeoDetails(geo_name: String, first_geo_time: Long)

In the Sequence, there can be multiple records for a single Geo and I want to write a function to Group by on geo_name and take the MIN value for first_geo_time. For e.g.

Input:

Seq(GeoDetails("cn", 1111111111111L), GeoDetails("mx", 2222222222222L), GeoDetails("mx", 3333333333333L), GeoDetails("cn", 4444444444444L))

Desired Output:

Seq(GeoDetails("cn", 1111111111111L), GeoDetails("mx", 2222222222222L))

I think using groupBy and foldLeft can do the job but I'm new to Scala and would appreciate some help on this. I want to get the output with the case class being maintained

标签: scalaapache-spark

解决方案


类似(Scala 2.13):

 val list = Seq(GeoDetails("cn", 1111111111111L), GeoDetails("mx", 2222222222222L), GeoDetails("mx", 3333333333333L), GeoDetails("cn", 4444444444444L))

 list.groupBy(_.geo_name).view.mapValues(el => el.sortBy(_.first_geo_time).head).toList

推荐阅读