首页 > 解决方案 > Scala比较两个分隔字符串并生成第三个分隔字符串

问题描述

我有两个字符串str1=A#2021-04-02,B#2021-04-01,C#2021-04-02str2=A#2021-04-02#60.0,B#2021-04-02#80.0,C#2021-04-01#60.0.

字符串的第一部分是group,第二部分是datestr2将有一个额外的字段百分比。现在我想通过比较两个字符串来生成一个字符串,比如组部分是否匹配,然后检查日期部分str2是否大于日期部分str1percentagestr2 的部分应该是>= 75

输出字符串应该是这样str3=A#2021-04-02,B#2021-04-02,C#2021-04-02的,因为对于组 Bstr2的日期大于str1percentage >= 75

if str1=A#2021-04-02,B#2021-04-01,C#2021-04-02and str2A#2021-04-02#60.0,B#2021-04-02#60.0,C#2021-04-01#60.0thenstr3将是A#2021-04-02,B#2021-04-01,C#2021-04-02因为百分比部分不 >= 75。

标签: scalaapache-spark

解决方案


def parseString(s: String) = s.split(',').map(_.split('#'))

val str1: String = ???
val str2: String = ???
    
//Note:
// 1. collect would drop invalid parsed string silently
// 2. we are not parsing date and leaving it as string for simplicity - i.e. we assume all dates are valid string
// 3. `p.toDouble` can fail if p is not a valid double
val rdd1 = sc.parallelize(parseString(str1)).collect { case Array(g, d, _*) => g -> d }
val rdd2 = sc.parallelize(parseString(str2)).collect { case Array(g, d, p, _*) => g -> (d, p.toDouble) }

// 3. we assume a left outer join here base on your requirement to default to the left date if condition fail
val str3 = rdd1.leftOuterJoin(rdd2).map {
  case (g, (d1, Some((d2, p)))) if d2 > d1 && p >= 75 => s"$g#$d2"
  case (g, (d1, _)) => s"$g#$d1"
}.collect.mkString(",")

推荐阅读