scala - How to emulate the array_join() method in spark 2.2
问题描述
For example, if I have a dataframe like this:
|sex| state_name| salary| www|
|---|------------------|-------|----|
| M| Ohio,California| 400|3000|
| M| Oakland| 70| 300|
| M|DF,Tbilisi,Calgary| 200|3500|
| M| Belice| 200|3000|
| m| Sofia,Helsinki| 800|7000|
I need to concatenate as a String the comma separated values in the "state_name" column with a delimiter specified by me. Also, I need to put a string at the end and the beginning of the generated string (the opposite of a strip()
method or function).
For example, if I want an output like this:
|cool_city |
|--------------------------------|
|[***Ohio<-->California***] |
|[***Oakland***] |
|[***DF<-->Tbilisi<-->Calgary***]|
|[***Belice***] |
|[***Sofia<-->Helsinki***] |
The solution that I've already coded with Spark 3.1.1 is this:
df.select(concat(lit("[***"),
array_join(split(col("state_name"),","),"<-->"),lit("***]")).as("cool_city")).show()
The problem is that the computer where this will be running is using Spark 2.1.1 and the array_join()
method isn't supported in this version (it's a pretty big project and upgrading the Spark version isn't over the table). Im pretty new using scala/spark and I don't know if there's another function that could help me emulating the array_join() use or if someone knows where to find the way to code a UDF with the same usefulness.
I would greatly appreciate your help!
解决方案
I don't know Scala, but try this:
df.select(concat(lit("[***"),
concat_ws("<-->", split(col("state_name"), ",")),
lit("***]")).as("cool_city")).show()
UPDATE
Avoiding column split:
df.select(concat(lit("[***"),
regexp_replace(col("state_name"), ",", "<-->"),
lit("***]")).as("cool_city")).show()
推荐阅读
- python - 如何在 Tensorflow 中删除张量中的重复值?
- swift - 快速将 UIImage 更改为数组或矩阵
- javascript - 为什么这个链接在粘贴到 url 栏中时有效,但在单击时无效?
- asp.net-core - datetimepicker 在 asp.net 核心中不起作用
- javascript - 为什么 Catch 不捕获错误?
- mysql - SQL Redshift 查询以选择每个组中的前 x 个日期
- matlab - Matlabs“splitapply”用于具有多个参数的函数
- python - 过期时间后将模型字段值更改为 True
- gradle - 如何在本地 kotlinc 上运行远程调试
- .net - 使用 `dotnet new webapi` 命令创建项目时,我在 Web Api 项目中没有 **App_Start** 文件夹