python - '如果在 UDF 内部'
问题描述
我有一张这样的桌子:
+----------------------+
|Country_state |
+----------------------+
| Virginia |
| New Jersey |
| British Columbia |
|Over the North Sea |
| Germany |
| Belgium |
| Germany |
| Bulgeria |
| England |
| England |
| Germany |
| England |
| Belgium |
...
我需要得到国家所以我写了简单的udf:
def USA(co):
states = ["Alaska", "Alabama", "Arkansas", "American Samoa", "Arizona", "California", "Colorado", "Connecticut", "District ", "of Columbia", "Delaware", "Florida", "Georgia", "Guam", "Hawaii", "Iowa", "Idaho", "Illinois", "Indiana", "Kansas", "Kentucky", "Louisiana", "Massachusetts", "Maryland", "Maine", "Michigan", "Minnesota", "Missouri", "Mississippi", "Montana", "North Carolina", "North Dakota", "Nebraska", "New Hampshire", "New Jersey", "New Mexico", "Nevada", "New York", "Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Puerto Rico", "Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas", "Utah", "Virginia", "Virgin Islands", "Vermont", "Washington", "Wisconsin", "West Virginia", "Wyoming"]
if co in states:
return "USA"
else:
return co
但这似乎总是错误的,我不知道为什么。
这就是我所说的:
usa = udf(USA, StringType())
finalCountry = c. withColumn("CountryFINAL", usa(c.Country_state))
解决方案
不需要UDF
,使用.isin
和when/otherwise
from pyspark.sql import functions as F
states = ["Alaska", "Alabama", "Arkansas", "American Samoa", "Arizona", "California", "Colorado", "Connecticut", "District ", "of Columbia", "Delaware", "Florida", "Georgia", "Guam", "Hawaii", "Iowa", "Idaho", "Illinois", "Indiana", "Kansas", "Kentucky", "Louisiana", "Massachusetts", "Maryland", "Maine", "Michigan", "Minnesota", "Missouri", "Mississippi", "Montana", "North Carolina", "North Dakota", "Nebraska", "New Hampshire", "New Jersey", "New Mexico", "Nevada", "New York", "Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Puerto Rico", "Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas", "Utah", "Virginia", "Virgin Islands", "Vermont", "Washington", "Wisconsin", "West Virginia", "Wyoming"]
df.withColumn("countryFINAL", F.when(F.col("Country_state").isin(states), F.lit("USA"))\
.otherwise(F.col("Country_state"))).show()
#+------------------+------------------+
#| Country_state| countryFINAL|
#+------------------+------------------+
#| Virginia| USA|
#| New Jersey| USA|
#| British Columbia| British Columbia|
#|Over the North Sea|Over the North Sea|
#| Germany| Germany|
#| Belgium| Belgium|
#| Germany| Germany|
#| Bulgeria| Bulgeria|
#| England| England|
#| England| England|
#| Germany| Germany|
#| England| England|
#| Belgium| Belgium|
#+------------------+------------------+
推荐阅读
- c - 我尝试在 C 中反转字符串而不使用
功能,没用 - sql - How to count number of records for each week, from last month activity on a table?
- python - how can we fill all empty list inside the list in python from another list element?
- python - 在 featuretools 1.0.0 中将 cutoff_time 传递给 dfs 的正确方法
- c# - Use c# `Index` and `Range` with jagged arrays
- bash - 如何使用awk计算每个县的酒店数量?
- flutter - Dart:如何正确处理空安全性?
- yocto - openblas.bb:do_compile 失败,退出代码为“1”
- c - 如何对 GTK2 小部件中的文本应用透明度?
- reactjs - 共享 Cookie