python - How to apply function to Pyspark dataframe column?
问题描述
I have a dataframe that looks like this:
+-----------+-------+-----------------+
|A |B | Num|
+-----------+-------+-----------------+
| BAKEL| BAKEL| 1 341 2323 01415|
| BAKEL| BAKEL| 2 272 7729 00307|
| BAKEL| BAKEL| 2 341 1224 00549|
| BAKEL| BAKEL| 2 341 1200 01194|
| BAKEL| BAKEL|1 845 0112 101159|
+-----------+-------+-----------------+
And I want an output like this:
+-----------+-------+---------------+
|A |B | Num|
+-----------+-------+---------------+
| BAKEL| BAKEL| 1341232301415|
| BAKEL| BAKEL| 2272772900307|
| BAKEL| BAKEL| 2341122400549|
| BAKEL| BAKEL| 2341120001194|
| BAKEL| BAKEL| 18450112101159|
+-----------+-------+---------------+
where the spaces in the values of the last column has been removed.
How can i do that with pyspark?
解决方案
使用函数regexp_replace()
来解决这个问题 -
from pyspark.sql.functions import regexp_replace
myValues = [('BAKEL','BAKEL','1 341 2323 01415'),('BAKEL','BAKEL','2 272 7729 00307'),
('BAKEL','BAKEL','2 341 1224 00549'),('BAKEL','BAKEL','2 341 1200 01194'),
('BAKEL','BAKEL','1 845 0112 101159'),]
df = sqlContext.createDataFrame(myValues,['A','B','Num'])
df = df.withColumn('Num',regexp_replace('Num',' ',''))
#Convert String to Long (integral value)
df = df.withColumn('Num', df['Num'].cast("long"))
df.show()
+-----+-----+--------------+
| A| B| Num|
+-----+-----+--------------+
|BAKEL|BAKEL| 1341232301415|
|BAKEL|BAKEL| 2272772900307|
|BAKEL|BAKEL| 2341122400549|
|BAKEL|BAKEL| 2341120001194|
|BAKEL|BAKEL|18450112101159|
+-----+-----+--------------+
df.printSchema()
root
|-- A: string (nullable = true)
|-- B: string (nullable = true)
|-- Num: long (nullable = true)
推荐阅读
- c# - 如何将自定义属性添加到依赖项跟踪?
- php - 如何在laravel中合并一个数组元素?
- javascript - 如何将特定插件加载到 WordPress 中的 ajax 处理程序?
- c# - Angular中一个请求和多个请求的性能差异
- python - QObject 实例化上的 PySide 分段错误
- c# - Roslyn-查找未使用的变量
- ios - 在 ios 中关闭 sqlite db 时出错
- java - Apache POI WorkbookFatory.create() 在 Android Studio 中导致 NullPointerException
- javascript - 搜索和过滤手风琴的数据在 vuejs 中不起作用
- php - OCI_BIND_BY_NAME 中的异常行为