首页 > 解决方案 > pyspark sql中带有正则表达式的case语句

问题描述

我有下面的pyspark代码。在代码中,我从另一个已转换为临时视图的数据框创建数据框。然后我使用 sql 查询在最终查询中创建一个新字段。我试图创建的字段的代码最初来自 postgresql,我想知道在 pyspark sql 中 case 语句和正则表达式的正确版本是什么?

case when a.field2::varchar ~ '^[0-9]+$' then a.field2::varchar else '0' end

我只是强制转换(field2 作为字符串)吗?

什么是正则表达式测试的正确 pyspark sql 版本?

代码:

from pyspark.sql.types import *
from pyspark.context import SparkContext
from pyspark.sql import Window
from pyspark.sql import SQLContext
from pyspark.sql.functions import col
from pyspark.sql.functions import first
from pyspark.sql.functions  import date_format
from pyspark.sql.functions import lit,StringType

from pyspark.sql.functions import date_trunc, udf,trim, upper, to_date, substring, length, min, when, format_number, dayofmonth, hour, dayofyear,  month, year, weekofyear, date_format, unix_timestamp
from pyspark import SparkConf
from pyspark.sql.functions import coalesce
from pyspark.sql import SparkSession
from pyspark.sql.functions import year, month, dayofmonth
from pyspark.sql.functions import UserDefinedFunction
import datetime
from pyspark.sql.functions import year
from pyspark.sql.functions import datediff,coalesce,lag
from pyspark.sql.functions import when, to_date
from pyspark.sql.functions import date_add
from pyspark.sql.functions import UserDefinedFunction

import traceback
import sys
import time
import math
import datetime



table_df.createOrReplaceTempView("table")


query="""select
case when a.field2::varchar ~ '^[0-9]+$' then a.field2::varchar else '0' end as field1

from table a"""


df=spark.sql(query)

标签: pythonsqlapache-sparkpysparkapache-spark-sql

解决方案


你可以试试:

query = """
    select
        case when a.field2 rlike '^[0-9]+$' 
             then a.field2
             else '0' 
             end as field1
    from table a
"""

df = spark.sql(query)

推荐阅读