首页 > 解决方案 > 为什么 Cassandra TableWriter 写入 0 条记录以及如何解决?

问题描述

我正在尝试将 RDD 写入 Cassandra 表。如下图 TableWriter 多次写入 0 行,最后写入 Cassandra。

18/10/22 07:15:50 INFO TableWriter: Wrote 0 rows to log_by_date in 0.171 s.
18/10/22 07:15:50 INFO Executor: Finished task 4.0 in stage 0.0 (TID 4). 622 bytes result sent to driver
18/10/22 07:15:50 INFO TableWriter: Wrote 0 rows to log_by_date in 0.220 s.
18/10/22 07:15:50 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 665 bytes result sent to driver
18/10/22 07:15:50 INFO TableWriter: Wrote 0 rows to log_by_date in 0.194 s.
18/10/22 07:15:50 INFO TableWriter: Wrote 0 rows to log_by_date in 0.224 s.
18/10/22 07:15:50 INFO Executor: Finished task 6.0 in stage 0.0 (TID 6). 708 bytes result sent to driver
18/10/22 07:15:50 INFO TableWriter: Wrote 0 rows to log_by_date in 0.231 s.
18/10/22 07:15:50 INFO Executor: Finished task 5.0 in stage 0.0 (TID 5). 622 bytes result sent to driver
18/10/22 07:15:50 INFO Executor: Finished task 3.0 in stage 0.0 (TID 3). 622 bytes result sent to driver
18/10/22 07:15:50 INFO TableWriter: Wrote 0 rows to log_by_date in 0.246 s.
18/10/22 07:15:50 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 708 bytes result sent to driver
18/10/22 07:15:50 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 418 ms on localhost (executor driver) (1/8)
18/10/22 07:15:50 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 433 ms on localhost (executor driver) (2/8)
18/10/22 07:15:50 INFO TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 426 ms on localhost (executor driver) (3/8)
18/10/22 07:15:50 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 433 ms on localhost (executor driver) (4/8)
18/10/22 07:15:50 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 456 ms on localhost (executor driver) (5/8)
18/10/22 07:15:50 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 436 ms on localhost (executor driver) (6/8)
18/10/22 07:15:50 INFO TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 424 ms on localhost (executor driver) (7/8)
18/10/22 07:15:50 INFO **TableWriter: Wrote 1 rows to log_by_date in 0.342 s.**

为什么它之前多次保存失败,如何调整它以用于生产?

标签: apache-sparkdatastaxcassandra-3.0

解决方案


这不是用户 10465355 指出的故障。当 Spark 将一项工作分解为多个任务时,可能是工作分布不均,或者没有足够的工作让每个任务都有工作要做。这会导致某些任务为空,因此当它们由 Spark Cassandra 连接器处理时,它们会写入 0 行。

例如说;

  1. 您将 100 条记录读入 10 个 Spark 分区/任务
  2. 你做了一个过滤器,用过滤器消除值,所以现在 5 个任务中只剩下 30 条记录。其他 5 个是空的。
  3. 当您编写时,您现在只会看到为 5 个任务写入的记录,并且 5 个任务将报告它们没有写入任何行。

推荐阅读