python - 如何在pyspark中将数据框附加在一起?
问题描述
我有一个 pyspark 数据框,它是机器学习预测的输出,如下所示:
predictions = model.transform(test_data)
+-----------------+-----------------+-----+------------------+-------+--------------------+--------------------+----------+
|col1_imputed |col2_imputed |label| features|row_num| rawPrediction| probability|prediction|
+-----------------+-----------------+-----+------------------+-------+--------------------+--------------------+----------+
| -0.002353| 0.9762| 0|[-0.002353,0.9762]| 1|[-0.8726465863653...|[0.29470390100153...| 1.0|
| -0.08637| 0.06524| 0|[-0.08637,0.06524]| 3|[-0.6029409441836...|[0.35367114067727...|
root
|-- col1_imputed: double (nullable = true)
|-- col2_imputed: double (nullable = true)
|-- label: integer (nullable = true)
|-- features: vector (nullable = true)
|-- row_num: integer (nullable = true)
|-- rawPrediction: vector (nullable = true)
|-- probability: vector (nullable = true)
|-- prediction: double (nullable = false)
我将该probability
列转换为仅选择其向量中的正预测,但我想将此新转换附加到上面的数据框(或用这个新的唯一正概率替换当前概率列),我在尝试时遇到错误这个。
from pyspark.sql.functions import udf
from pyspark.sql.types import FloatType
secondelement=udf(lambda v:float(v[1]),FloatType())
pos_prob = predictions.select(secondelement('probability')) #selects second element in probability column
#trying to add the new pos_prob column and naming it 'prob' to the dataframe:
df = predictions.withColumn('prob', predictions.select(secelement('probability'))).collect()
AssertionError: col should be Column
我也尝试lit()
通过阅读类似的问题来解决它,但这给出了另一个错误:
df = all_preds.withColumn('prob', lit(all_preds.select(secelement('probability')))).collect()
AttributeError: 'DataFrame' object has no attribute '_get_object_id'
解决方案
您可以将 UDF 与 一起使用withColumn
,例如
from pyspark.sql.functions import udf
from pyspark.sql.types import FloatType
secondelement = udf(lambda v: float(v[1]), FloatType())
df = predictions.withColumn('prob', secondelement('probability'))
推荐阅读
- swift - 构建亚马逊附属链接 Swift
- javascript - 使用 jest 模拟模块库
- javascript - SafeAreaView 顶部未在 react-native 中显示背景颜色
- typescript - 我的自定义 VSCode extsion 可以正常调试,但安装时不能
- javascript - 运行 npm test 时如何允许“函数外返回”语法?
- azure - ID 更改后 Azure AD 和 SQL 登录/用户不起作用
- python-3.x - 嗅探 RTS 并用 Scapy 发送 CTS 作为回报
- python - 从字母中拆分数字
- javascript - 如何让我的猫鼬收集数据显示在客户端?
- python - Python:TypeError:强制转换为 Unicode:需要字符串或缓冲区,PosixPath