mysql - 使用 GraphX spark lib 为以下 Hive 连接生成层次结构序列
问题描述
下面 id 示例数据集,用于“t_id”和“parent_id”具有依赖关系的事务。
t_id , first_name , parent_id , amount , dept_id , sal , datetime_updated
1 Jared None 1000 5 4088908 13/10/2017
2 Jared 1 -5000 1 8033313 17/10/2018
3 Jared 2 1000 5 17373148 23/07/2018
4 Tucker None 10000 3 16320817 08/09/2018
5 Tucker 4 -10000 2 5094970 24/08/2017
6 Tucker 5 5000 1 7435169 09/11/2018
7 Tucker 5 -2500 5 7859621 21/12/2018
8 Tucker 4 3000 2 5639934 14/07/2018
下面使用的查询
select
t1.t_id ,
t1.first_name,
t1.amount,
t1.parent_id,
t2.t_id ,
t2.first_name,
t2.amount,
t2.parent_id,
t3.t_id ,
t3.first_name,
t3.amount,
t3.parent_id,
t4.t_id ,
t4.first_name,
t4.amount,
t4.parent_id
from Transactions t1
left join Transactions t2
on t1.parent_id = t2.t_id
left join Transactions t3
on t2.parent_id = t3.t_id
left join Transactions t4
on t3.parent_id = t4.t_id;
上述查询的输出
+------+------------+--------+-----------+------+------------+--------+-----------+------+------------+--------+-----------+------+------------+--------+-----------+
| t_id | first_name | amount | parent_id | t_id | first_name | amount | parent_id | t_id | first_name | amount | parent_id | t_id | first_name | amount | parent_id |
+------+------------+--------+-----------+------+------------+--------+-----------+------+------------+--------+-----------+------+------------+--------+-----------+
| 1 | Jared | 1000 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 2 | Jared | -5000 | 1 | 1 | Jared | 1000 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 3 | Jared | 1000 | 2 | 2 | Jared | -5000 | 1 | 1 | Jared | 1000 | 0 | NULL | NULL | NULL | NULL |
| 4 | Tucker | 10000 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 5 | Tucker | -10000 | 4 | 4 | Tucker | 10000 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 6 | Tucker | 5000 | 5 | 5 | Tucker | -10000 | 4 | 4 | Tucker | 10000 | 0 | NULL | NULL | NULL | NULL |
| 7 | Tucker | -2500 | 5 | 5 | Tucker | -10000 | 4 | 4 | Tucker | 10000 | 0 | NULL | NULL | NULL | NULL |
| 8 | Thane | 3000 | 4 | 4 | Tucker | 10000 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 9 | Nicholas | 1000 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 10 | Mason | 2000 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 11 | Noah | 5000 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
+------+------------+--------+-----------+------+------------+--------+-----------+------+------------+--------+-----------+------+------------+--------+-----------+
问题/问题
I want generate the same output as mention above results,
but I cannot use the above join condition
as it is failing over larger data set when working on spark-sql.
Is there any other way I can optimise the above query to generate same
kind of data.
解决方案
推荐阅读
- javascript - 如何使用带有提交事件的flask或vanilla js更改html块的可见性/显示?
- vue.js - 如何在 {{}} vue 中使用上下文“this”?
- c# - Unity WebGl 构建失败
- openssl - 如何为使用 OpenSSL 的软件执行 NIAP 认证?
- python - 如果我从笔记本实例中关闭 JupyterLab,我的代码会消失吗?
- javascript - 在 VS Code 中使用 DOM IntelliSense 而不是 Node
- blockchain - 如何从公钥中导出卡尔达诺地址?
- c# - C# 使用 live.com 身份验证和下载页面登录网站
- android - android cordova插件的jar库中的非确定性类加载问题
- r - 如何在 R (tsDyn) 中为 VAR 模型设置参数限制