python - 根据字典的值将字典列表的值分配给数据帧,具体取决于数据帧的列值
问题描述
我想将取决于字典字段(localSymbol_aux)的字典列表的值分配给数据帧,具体取决于火花结构化流(pyspark)中数据帧(localSymbol)的列值
我有以下字典列表
level_1_batch_list_of_dict = [
{
'localSymbol_aux': 'EUR.USD',
'level_1_precio_max_sesion_aux': 4,
'level_1_precio_min_sesion_aux': 0
},
{
'localSymbol_aux': 'USD.JPY',
'level_1_precio_max_sesion_aux': 6,
'level_1_precio_min_sesion_aux': 0
}
]
我有以下数据框:
+------------------------------------------+-----------+-----------------+
|ventana |localSymbol|precio_max_sesion|
+------------------------------------------+-----------+-----------------+
|{2021-05-10 11:33:30, 2021-05-10 11:34:00}|EUR.USD |0 |
|{2021-05-10 11:34:00, 2021-05-10 11:34:30}|EUR.USD |0 |
|{2021-05-10 11:34:30, 2021-05-10 11:35:00}|EUR.USD |0 |
|{2021-05-10 11:35:00, 2021-05-10 11:35:30}|EUR.USD |0 |
|{2021-05-10 11:35:30, 2021-05-10 11:36:00}|EUR.USD |0 |
|{2021-05-10 11:36:00, 2021-05-10 11:36:30}|EUR.USD |0 |
|{2021-05-10 11:33:30, 2021-05-10 11:34:00}|USD.JPY |0 |
|{2021-05-10 11:34:00, 2021-05-10 11:34:30}|USD.JPY |0 |
|{2021-05-10 11:34:30, 2021-05-10 11:35:00}|USD.JPY |0 |
|{2021-05-10 11:35:00, 2021-05-10 11:35:30}|USD.JPY |0 |
|{2021-05-10 11:35:30, 2021-05-10 11:36:00}|USD.JPY |0 |
|{2021-05-10 11:36:00, 2021-05-10 11:36:30}|USD.JPY |0 |
+------------------------------------------+-----------+-----------------+
但我想要以下内容:
+------------------------------------------+-----------+-----------------+
|ventana |localSymbol|precio_max_sesion|
+------------------------------------------+-----------+-----------------+
|{2021-05-10 11:33:30, 2021-05-10 11:34:00}|EUR.USD |4 |
|{2021-05-10 11:34:00, 2021-05-10 11:34:30}|EUR.USD |4 |
|{2021-05-10 11:34:30, 2021-05-10 11:35:00}|EUR.USD |4 |
|{2021-05-10 11:35:00, 2021-05-10 11:35:30}|EUR.USD |4 |
|{2021-05-10 11:35:30, 2021-05-10 11:36:00}|EUR.USD |4 |
|{2021-05-10 11:36:00, 2021-05-10 11:36:30}|EUR.USD |4 |
|{2021-05-10 11:33:30, 2021-05-10 11:34:00}|USD.JPY |6 |
|{2021-05-10 11:34:00, 2021-05-10 11:34:30}|USD.JPY |6 |
|{2021-05-10 11:34:30, 2021-05-10 11:35:00}|USD.JPY |6 |
|{2021-05-10 11:35:00, 2021-05-10 11:35:30}|USD.JPY |6 |
|{2021-05-10 11:35:30, 2021-05-10 11:36:00}|USD.JPY |6 |
|{2021-05-10 11:36:00, 2021-05-10 11:36:30}|USD.JPY |6 |
+------------------------------------------+-----------+-----------------+
有任何想法吗?
谢谢!!
解决方案
您可以将上面的字典转换为 PySpark Dataframe 然后做一个简单的连接
# this is the reference dataframe
ref = spark.createDataFrame(level_1_batch_list_of_dict)
ref.show(10, False)
# +-----------------------------+-----------------------------+---------------+
# |level_1_precio_max_sesion_aux|level_1_precio_min_sesion_aux|localSymbol_aux|
# +-----------------------------+-----------------------------+---------------+
# |4 |0 |EUR.USD |
# |6 |0 |USD.JPY |
# +-----------------------------+-----------------------------+---------------+
# this is your main dataframe
df.show(2, False)
# +------------------------------------------+-----------+-----------------+
# |ventana |localSymbol|precio_max_sesion|
# +------------------------------------------+-----------+-----------------+
# |{2021-05-10 11:33:30, 2021-05-10 11:34:00}|EUR.USD |0 |
# |{2021-05-10 11:34:00, 2021-05-10 11:34:30}|EUR.USD |0 |
# +------------------------------------------+-----------+-----------------+
# join them together and get correct values
(df
.join(ref, on=[df.localSymbol == ref.localSymbol_aux])
.select(df.ventana, df.localSymbol, ref.level_1_precio_max_sesion_aux.alias('precio_max_sesion'))
.show(100, False)
)
# +------------------------------------------+-----------+-----------------+
# |ventana |localSymbol|precio_max_sesion|
# +------------------------------------------+-----------+-----------------+
# |{2021-05-10 11:33:30, 2021-05-10 11:34:00}|EUR.USD |4 |
# |{2021-05-10 11:34:00, 2021-05-10 11:34:30}|EUR.USD |4 |
# |{2021-05-10 11:34:30, 2021-05-10 11:35:00}|EUR.USD |4 |
# |{2021-05-10 11:35:00, 2021-05-10 11:35:30}|EUR.USD |4 |
# |{2021-05-10 11:35:30, 2021-05-10 11:36:00}|EUR.USD |4 |
# |{2021-05-10 11:36:00, 2021-05-10 11:36:30}|EUR.USD |4 |
# |{2021-05-10 11:33:30, 2021-05-10 11:34:00}|USD.JPY |6 |
# |{2021-05-10 11:34:00, 2021-05-10 11:34:30}|USD.JPY |6 |
# |{2021-05-10 11:34:30, 2021-05-10 11:35:00}|USD.JPY |6 |
# |{2021-05-10 11:35:00, 2021-05-10 11:35:30}|USD.JPY |6 |
# |{2021-05-10 11:35:30, 2021-05-10 11:36:00}|USD.JPY |6 |
# |{2021-05-10 11:36:00, 2021-05-10 11:36:30}|USD.JPY |6 |
# +------------------------------------------+-----------+-----------------+
推荐阅读
- python-3.x - 在 Azure ML Studio 中执行 Python 脚本模块失败
- json - 使用 activeroute 传递对象数组的最佳方法 - Angular
- angular - 将元素属性值绑定到对象
- rest - 用于云设备通信的 MQTT 或 REST
- lisp - Common Lisp - type checking two variables
- c# - 是否有可能摆脱这种 Nuget Hell 情景?
- java - 与替换 Java 的组合
- netbeans - 如何让 NetBeans 8.0.2 拼写检查器提出更正建议?
- javascript - Uploading CVS file to server along with user input data
- visual-studio - 能否将 Service Fabric 应用程序从 Windows Docker 容器内部署到群集?