首页 > 解决方案 > Pyspark 在另一列中的一列中查找模式

问题描述

我有一个包含两列地址和街道名称的数据框。

from pyspark.sql.functions import *
import pyspark.sql

df = spark.createDataFrame([\
    ['108 badajoz road north ryde 2113, nsw, australia', 'north ryde'],\
    ['25 smart street fairfield 2165, nsw, australia', 'smart street']
  ],\ 
  ['address', 'street_name'])

df.show(2, False)

+------------------------------------------------+---------------+
|address                                         |street_name    |
+------------------------------------------------+---------------+
|108 badajoz road north ryde 2113, nsw, australia|north ryde     |
|25 smart street fairfield 2165, nsw, australia  |smart street   |
+------------------------------------------------+---------------+

我想查找是否street_name存在address并在新列中返回布尔值。我可以像下面那样手动搜索模式。

df.withColumn("new col", col("street").rlike('.*north ryde.*')).show(20,False)
----------------------------------------------+---------------+-------+
|address                                         |street_name |new col|
+------------------------------------------------+------------+-------+
|108 badajoz road north ryde 2113, nsw, australia|north ryde  |true   |
|25 smart street fairfield 2165, nsw, australia  |smart street|false  |
+------------------------------------------------+------------+-------+

但我想用street_name下面的列替换手动值

 df.withColumn("new col", col("street")\
  .rlike(concat(lit('.*'),col('street_name'),col('.*))))\
  .show(20,False) 

标签: pythonregexapache-sparkdataframepyspark

解决方案


你可以通过简单地使用contains函数来做到这一点。有关更多详细信息,请参阅

from pyspark.sql.functions import col, when

df = df.withColumn('new_Col',when(col('address').contains(col('street_name')),True).otherwise(False))
df.show(truncate=False)

+------------------------------------------------+------------+-------+ 
|address                                         |street_name |new_Col|      
+------------------------------------------------+------------+-------+ 
|108 badajoz road north ryde 2113, nsw, australia|north ryde  |true   | 
|25 smart street fairfield 2165, nsw, australia  |smart street|true   | 
+------------------------------------------------+------------+-------+

推荐阅读