首页 > 解决方案 > 点击屋。在集群上创建数据库以超时结束

问题描述

我有一个由 Clickhouse 的两个节点组成的集群。这两个实例都在 docker 容器中。主机之间的所有通信都已成功检查 - ping、telnet、wget 工作正常。在 Zookeeper 中,我可以在ddl brunch 下看到我触发的查询。

“create database on cluster”语句的每次执行都以超时结束。问题是什么?有人有什么想法吗?

有配置文件的片段。

版本 20.10.3.30

<remote_servers>
        <history_cluster>
            <shard>
                <replica>
                    <host>10.3.194.104</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>10.3.194.105</host>
                    <port>9000</port>
                </replica>
            </shard>
        </history_cluster>
  </remote_servers>
  <zookeeper>
                <node index="1">
                        <host>10.3.194.106</host>
                        <port>2181</port>
                </node>
  </zookeeper>

“宏”部分

    <macros incl="macros" optional="true" />

日志片段

2020.11.20 22:38:44.104001 [ 90 ] {68062325-a6cf-4ac3-a355-c2159c66ae8b} <Error> executeQuery: Code: 159, e.displayText() = DB::Exception: Watching task /clickhouse/task_queue/ddl/query-0000000013 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 2 unfinished hosts (0 of them are currently active), they are going to execute the query in background (version 20.10.3.30 (official build)) (from 172.17.0.1:51272) (in query: create database event_history on cluster history_cluster;), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, long&, unsigned long&, unsigned long&>(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, long&, unsigned long&, unsigned long&) @ 0xd8dcc75 in /usr/bin/clickhouse
1. DB::DDLQueryStatusInputStream::readImpl() @ 0xd8dc84d in /usr/bin/clickhouse
2. DB::IBlockInputStream::read() @ 0xd71b1a5 in /usr/bin/clickhouse
3. DB::AsynchronousBlockInputStream::calculate() @ 0xd71761d in /usr/bin/clickhouse
4. ? @ 0xd717db8 in /usr/bin/clickhouse
5. ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0x7b8c17d in /usr/bin/clickhouse
6. std::__1::__function::__func<ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()>(void&&, void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()&&...)::'lambda'(), std::__1::allocator<ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()>(void&&, void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()&&...)::'lambda'()>, void ()>::operator()() @ 0x7b8e67a in /usr/bin/clickhouse
7. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x7b8963d in /usr/bin/clickhouse
8. ? @ 0x7b8d153 in /usr/bin/clickhouse
9. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
10. clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so

标签: clickhouse

解决方案


最可能的问题是节点 docker 内部 IP/主机名。

节点启动器(执行“on cluster”的地方)将 10.3.194.104 和 10.3.194.105 的任务放入 ZK。所有节点不断检查任务队列并拉取他们的任务。如果他们的 IP/主机名是 127.0.0.1 / localhost 他们永远找不到他们的任务。因为 10.3.194.104 != 127.0.0.1。


推荐阅读