jdbc - Column name is not passed to PostgreSQL on JDBC Scan in Apache Drill
问题描述
While trying to run SQL query for PostgreSQL, instead of the column names from the table referred it it pushing down * to the database.
select
m.m_id,
cnt_c_no
from (
select
m_id
from pg_test_main.test1.table1
where
last_date >= '2019-01-01 00:00:00'
) as m
left join (
select
ci.m_id,
count(ci.c_no) as cnt_c_no
from (
select
m_id,
c_no
from pg_test.public.table2
) as ci
inner join (
select
c_no
from pg_test.public.table3
where
is_del = 'F'
) as c on ci.c_no = c.c_no
group by
ci.m_id
) as join1 on m.m_id = join1.m_id;
00-00 Screen
00-01 Project(m_id=[$0], cnt_c_no=[$1])
00-02 Project(m_id=[$0], cnt_c_no=[$2])
00-03 HashJoin(condition=[=($0, $1)], joinType=[left], semi-join: =[false])
00-05 Jdbc(sql=[SELECT "m_id" FROM "test1"."table1" WHERE "last_date" >= '2019-01-01 00:00:00' ])
00-04 Project(m_id0=[$0], cnt_c_no=[$1])
00-06 HashAgg(group=[{0}], cnt_c_no=[COUNT($1)])
00-07 Project(m_id=[$0], c_no=[$1])
00-08 HashJoin(condition=[=($1, $2)], joinType=[inner], semi-join: =[false])
00-10 Project(m_id=[$3], c_no=[$1])
00-12 Jdbc(sql=[SELECT * FROM "public"."table2" ])
00-09 Project(c_no0=[$0])
00-11 Project(c_no=[$0])
00-13 SelectionVectorRemover
00-14 Filter(condition=[=($60, 'F')])
00-15 Jdbc(sql=[SELECT * FROM "public"."table3" ])
As you can see, Jdbc Scan for table1 was using column names.
but, Jdbc Scan for table2 and table3 was not using column names. It pushed down * to the database.
How can I control jdbc scan so that it can push down colume names?
Apache Drill version is 1.16.0 (embedded-mode)
解决方案
I tried to reproduce it with MySQL on both Drill 1.17 and Drill 1.15, but for the query, similar to the query you have specified, all the query is pushed into the JDBC storage:
SELECT m.person_id,
cnt_c_no
FROM
(SELECT person_id
FROM mysql.`drill_mysql_test1`.person1
WHERE date_field >= '2019-01-01 00:00:00') AS m
LEFT JOIN
(SELECT ci.person_id,
count(ci.last_name) AS cnt_c_no
FROM
(SELECT person_id,
last_name
FROM mysql.`drill_mysql_test`.person) AS ci
INNER JOIN
(SELECT last_name
FROM mysql.`drill_mysql_test`.person2
WHERE boolean_field = 'F' ) AS c ON ci.last_name = c.last_name
GROUP BY ci.person_id) AS join1 ON m.person_id = join1.person_id
The plan for this query:
00-00 Screen
00-01 Project(person_id=[$0], cnt_c_no=[$1])
00-02 Jdbc(sql=[SELECT `t0`.`person_id`, `t5`.`cnt_c_no` FROM (SELECT `person_id` FROM `drill_mysql_test1`.`person1` WHERE `date_field` >= '2019-01-01 00:00:00') AS `t0` LEFT JOIN (SELECT `t1`.`person_id`, COUNT(`t1`.`last_name`) AS `cnt_c_no` FROM (SELECT `person_id`, `last_name` FROM `drill_mysql_test`.`person`) AS `t1` INNER JOIN (SELECT `last_name` FROM `drill_mysql_test`.`person2` WHERE `boolean_field` = 'F') AS `t3` ON `t1`.`last_name` = `t3`.`last_name` GROUP BY `t1`.`person_id`) AS `t5` ON `t0`.`person_id` = `t5`.`person_id` ])
Could you please provide CTAS for Postgres tables, so I will try to reproduce it again with specific data types. Also, if possible, please check whether this issue is still reproduced on Drill 1.17.
UPD: Comment under this answer helped to discover that this problem was caused by the following issue: https://issues.apache.org/jira/browse/DRILL-7340 and it will be resolved in Drill 1.18.0.
推荐阅读
- php - Laravel 8 - 如何实现简单的 API 密钥认证
- python - 加载预训练模型后与可变输入尺寸相关的错误
- python - 如何根据不同df中提供的索引值对数据框中的特定列进行求和。返回总和和布尔 T/F
- python - 如何忽略 pd.read_html 中的“未找到表”错误?
- html - 如何在点击时重置我在 Angular 中创建的计时器?
- bash - sed 替换:替换字符串是需要扩展的变量并且包含斜杠
- excel - Excel VBA:在开关情况下使用字符+整数
- amazon-web-services - 有没有办法在 AppStream 2.0 中触发主机用户浏览器上的操作?
- javascript - Draggable JS 中的固定元素
- javascript - 在 javascript 槽代码中的数组中添加对象?