pyspark - Join on 1 = 1 on pyspark
问题描述
Good day,
I wish to check if my variable exists in either of two tables and get the result in single table for furture processing. I figured that simple:
'''
select concent, concent_big from
(select count(*) as concent where core_id = "{}" ) as a
left join
(select count(*) as concent_big ,concent_2 where core_id = "{}" ) as b
on 1 = 1
'''
However this does not seem be allowed. It's a bit confusing, since I've done similar things in Sql previously. Now pySpark is giving me hard time. I came up with work-around, but it's (imho) silly:
'''
select concent, concent_big from
(select count(*) as concent, 1 as tmp_key from concent where core_id = "{}" ) as a
left join
(select count(*) as concent_big , 1 as tmp_key from concent_2 where core_id = "{}" ) as b
on a.tmp_key = b.tmp_key
'''
Any ideas how to do this more elegantly?
解决方案
为什么不直接使用crossJoin
?一个警告 - 这会产生一个完整的笛卡尔积,所以你的桌子可能会在尺寸上爆炸,但这似乎是你想要的效果。你可以在这里阅读:https ://spark.apache.org/docs/2.4.3/api/python/pyspark.sql.html#pyspark.sql.DataFrame.crossJoin
编辑:使用 Spark SQL 时,语言遵循ANSI SQL标准,因此命令变为CROSS JOIN
.
希望这可以帮助。
推荐阅读
- javascript - 使用多级键过滤文本的角管
- node.js - 如何使用节点配置 ssl
- javascript - React + Redux + Thunk + Axios 应用程序中最好的数据建模策略是什么?
- python - 如何理解卡方列联表
- batch-file - 批处理文件将文件从文本文件中的源移动到另一个文本文件中的多个文件夹
- c# - 使用 c# windows 窗体从 combobox.selectedvalue 触发动作
- c++ - C ++:具有成员字段作为地址的用户定义类
- angular - 当我 console.log 类它返回 null
- javascript - abc.filter().map() ==> to reduce() 应该怎么用?JavaScript
- node.js - Watson-discovery 和 Nodejs:我的查询没有得到预期的答案