首页 > 解决方案 > PATINDEX in spark sql

问题描述

I have this statement in sql

Case WHEN AAAA is not null then AAAA
     Else RTRIM(LEFT(BBBB, PATINDEX('%[0-9]%', BBBB) - 1))
     END as NAME. 

I need to convert this to spark sql. I tried using indexOf, but it doesn't take the string '%[0-9]%. How do i convert the above statement to spark SQL. please help

Thanks !

标签: apache-spark

解决方案


我在 scala spark 中执行此操作的代码。我用 udf 来做到这一点。编辑:假设字符串需要从第一次出现的数字中删除。

import spark.implicits._
val df = Seq(("SOUTH TEXAS SYNDICATE 454C"),
  ("SANDERS 34-27 #3TF"),
  ("K. R. BRACKEN B 3H"))
  .toDF("name")

df.createOrReplaceTempView("temp")
val getIndexOfFirstNumber = (s: String) => {
      val str = s.split("\\D+").filter(_.nonEmpty).toList
      s.indexOf(str(0))
    }
spark.udf.register("getIndexOfFirstNumber", getIndexOfFirstNumber)

spark.sql("""
select name,substr(name, 0, getIndexOfFirstNumber(name) -1) as final_name
from temp
""").show(20,false)

结果 ::

   +------------------------------------+----------------------+
   |name                                |final_name            |
   +------------------------------------+----------------------+
   |SOUTH TEXAS SYNDICATE 454C          |SOUTH TEXAS SYNDICATE |
   |SANDERS 34-27 #3TF                  |SANDERS               |
   |K. R. BRACKEN B 3H                  |K. R. BRACKEN B       |
   |ALEXANDER-WESSENDORFF 1 (SA) A5 A 5H|ALEXANDER-WESSENDORFF |
   |USZYNSKI-FURLOW (SA) B 3H           |USZYNSKI-FURLOW (SA) B|
   +------------------------------------+----------------------+

推荐阅读