首页 > 解决方案 > Column name is not passed to PostgreSQL on JDBC Scan in Apache Drill

问题描述

While trying to run SQL query for PostgreSQL, instead of the column names from the table referred it it pushing down * to the database.

select
  m.m_id,
  cnt_c_no
from (
    select
      m_id
    from pg_test_main.test1.table1
    where
      last_date >= '2019-01-01 00:00:00'
  ) as m
left join (
    select
      ci.m_id,
      count(ci.c_no) as cnt_c_no
    from (
        select
          m_id,
          c_no
        from pg_test.public.table2
      ) as ci
    inner join (
        select
          c_no
        from pg_test.public.table3
        where
          is_del = 'F'
      ) as c on ci.c_no = c.c_no
    group by
      ci.m_id
  ) as join1 on m.m_id = join1.m_id;
00-00    Screen
00-01      Project(m_id=[$0], cnt_c_no=[$1])
00-02        Project(m_id=[$0], cnt_c_no=[$2])
00-03          HashJoin(condition=[=($0, $1)], joinType=[left], semi-join: =[false])
00-05            Jdbc(sql=[SELECT "m_id" FROM "test1"."table1" WHERE "last_date" >= '2019-01-01 00:00:00' ])
00-04            Project(m_id0=[$0], cnt_c_no=[$1])
00-06              HashAgg(group=[{0}], cnt_c_no=[COUNT($1)])
00-07                Project(m_id=[$0], c_no=[$1])
00-08                  HashJoin(condition=[=($1, $2)], joinType=[inner], semi-join: =[false])
00-10                    Project(m_id=[$3], c_no=[$1])
00-12                      Jdbc(sql=[SELECT * FROM "public"."table2" ])
00-09                    Project(c_no0=[$0])
00-11                      Project(c_no=[$0])
00-13                        SelectionVectorRemover
00-14                          Filter(condition=[=($60, 'F')])
00-15                            Jdbc(sql=[SELECT * FROM "public"."table3" ])

As you can see, Jdbc Scan for table1 was using column names.

but, Jdbc Scan for table2 and table3 was not using column names. It pushed down * to the database.

How can I control jdbc scan so that it can push down colume names?

Apache Drill version is 1.16.0 (embedded-mode)

标签: jdbcapache-drill

解决方案


I tried to reproduce it with MySQL on both Drill 1.17 and Drill 1.15, but for the query, similar to the query you have specified, all the query is pushed into the JDBC storage:

SELECT m.person_id,
       cnt_c_no
FROM
  (SELECT person_id
   FROM mysql.`drill_mysql_test1`.person1
   WHERE date_field >= '2019-01-01 00:00:00') AS m
LEFT JOIN
  (SELECT ci.person_id,
          count(ci.last_name) AS cnt_c_no
   FROM
     (SELECT person_id,
             last_name
      FROM mysql.`drill_mysql_test`.person) AS ci
   INNER JOIN
     (SELECT last_name
      FROM mysql.`drill_mysql_test`.person2
      WHERE boolean_field = 'F' ) AS c ON ci.last_name = c.last_name
   GROUP BY ci.person_id) AS join1 ON m.person_id = join1.person_id

The plan for this query:

00-00    Screen
00-01      Project(person_id=[$0], cnt_c_no=[$1])
00-02        Jdbc(sql=[SELECT `t0`.`person_id`, `t5`.`cnt_c_no` FROM (SELECT `person_id` FROM `drill_mysql_test1`.`person1` WHERE `date_field` >= '2019-01-01 00:00:00') AS `t0` LEFT JOIN (SELECT `t1`.`person_id`, COUNT(`t1`.`last_name`) AS `cnt_c_no` FROM (SELECT `person_id`, `last_name` FROM `drill_mysql_test`.`person`) AS `t1` INNER JOIN (SELECT `last_name` FROM `drill_mysql_test`.`person2` WHERE `boolean_field` = 'F') AS `t3` ON `t1`.`last_name` = `t3`.`last_name` GROUP BY `t1`.`person_id`) AS `t5` ON `t0`.`person_id` = `t5`.`person_id` ])

Could you please provide CTAS for Postgres tables, so I will try to reproduce it again with specific data types. Also, if possible, please check whether this issue is still reproduced on Drill 1.17.

UPD: Comment under this answer helped to discover that this problem was caused by the following issue: https://issues.apache.org/jira/browse/DRILL-7340 and it will be resolved in Drill 1.18.0.


推荐阅读