首页 > 解决方案 > How to emulate the array_join() method in spark 2.2

问题描述

For example, if I have a dataframe like this:

|sex|        state_name| salary| www|
|---|------------------|-------|----|
|  M|   Ohio,California|    400|3000|
|  M|           Oakland|     70| 300|
|  M|DF,Tbilisi,Calgary|    200|3500|
|  M|            Belice|    200|3000|
|  m|    Sofia,Helsinki|    800|7000|

I need to concatenate as a String the comma separated values in the "state_name" column with a delimiter specified by me. Also, I need to put a string at the end and the beginning of the generated string (the opposite of a strip() method or function).

For example, if I want an output like this:

|cool_city                       |
|--------------------------------|
|[***Ohio<-->California***]      |
|[***Oakland***]                 |
|[***DF<-->Tbilisi<-->Calgary***]|
|[***Belice***]                  |
|[***Sofia<-->Helsinki***]       |

The solution that I've already coded with Spark 3.1.1 is this:

    df.select(concat(lit("[***"),
    array_join(split(col("state_name"),","),"<-->"),lit("***]")).as("cool_city")).show()

The problem is that the computer where this will be running is using Spark 2.1.1 and the array_join() method isn't supported in this version (it's a pretty big project and upgrading the Spark version isn't over the table). Im pretty new using scala/spark and I don't know if there's another function that could help me emulating the array_join() use or if someone knows where to find the way to code a UDF with the same usefulness.

I would greatly appreciate your help!

标签: scalaapache-sparkapache-spark-sql

解决方案


I don't know Scala, but try this:

df.select(concat(lit("[***"),
                 concat_ws("<-->", split(col("state_name"), ",")),
                 lit("***]")).as("cool_city")).show()

UPDATE

Avoiding column split:

df.select(concat(lit("[***"),
                 regexp_replace(col("state_name"), ",", "<-->"),
                 lit("***]")).as("cool_city")).show()

推荐阅读