java - saveAll() is too slow. Cassandra Database with Spring boot. why?
问题描述
I am trying to insert in batches (Objects are stored in an arraylist and as soon as count is divisible by 10000, I insert all these objects into my table. But it takes more than 4 minutes to do so. Is there any approach which is faster?
arr.add(new Car(name, count, type));
if(count%10000==0){
repository.saveAll(arr);
arr.clear();
}
解决方案
So here is what is happening. I am most curious to see the table definition inside Cassandra. But given your Car
constructor,
new Car(name, count, type)
Given those column names, I'm guessing that name
is the partition key.
The reason that is significant, is because the hash of the partition key column is what Cassandra uses to figure out which node (token range) the data should be written to.
When you saveAll
on 10000 Cars
at once, there is no way you can guarantee that all 10000 of those are going to the same node. To deal with this, Spring Data Cassandra must be using a BATCH
(or something like it) behind the scenes. If it is a BATCH
, that essentially puts one Cassandra node (designated as a "coordinator") to route writes to the required nodes. Due to Cassandra's distributed nature, that is never going to be fast.
If you really need to store 10000 of them, the best way would be send one write at a time asynchronously. Of course, you won't want 10000 threads all writing concurrently, so you'll want to throttle-down (limit) the number of active threads in your code. DataStax's Ryan Svihla has written a couple of articles detailing how to do this. I recommend this one- Cassandra: Batch Loading Without the Batch - The Nuanced Edition.
tl;dr;
Spring Data Cassandra's saveAll
really shouldn't be used to persist several thousand writes. If I were using Spring Data Cassandra, I wouldn't even go beyond double-digits with saveAll
, TBH.
Edit
Check out this answer for details on how to use Spring Boot/Data with Cassandra asyncrhonously: AsyncCassandraOperations examples
推荐阅读
- regex - 如何在正则表达式的替换中评估表达式
- c# - 如何进行灵活排序
- regex - 如果字符串部分为 31 个字符或更长,则在字符串内匹配
- javascript - 尝试向单个元素添加单击事件,但希望它们一次执行一个
- javascript - 如何将 DIV 定位在具有动态大小的固定 DIV 下方
- python - 在 Python 中减少递归深度的方法
- python - SystemCheckError(django 中的 slug)
- php - 如何以低成本在 AWS 中运行 laravel php 应用程序?
- excel - 工作簿自动打开 + MsgBox 作为用户响应的运行时错误
- google-api - 即使活动当前正在直播,YouTube API 也会为直播活动返回 0