首页 > 解决方案 > 在 bigquery 中跨项目复制数据时性能偶尔会降低

问题描述

在将一个项目中的数据复制到位于 bigquery 中相同数据位置的另一个项目时,我遇到了非常缓慢的移动,但是与我们拥有的其他操作相比,移动数据最多需要 2 分钟,大约 100,000 条记录在 bigquery 上完成了数亿次数据的复制,只需要几秒钟的时间,因此我想找出为什么会发生如此小的数据集出现这种异常缓慢的移动。有没有人遇到过类似的问题并知道它背后的原因是什么?

谢谢。

此致,

标签: google-bigquery

解决方案


The cause of the slow copy problem could come from method of creation your source table, e.g. it could have been created by several imports jobs that can caused such a fragmentation.

So the difference in time is not because the amount of data stored in your table, but the way the data is fragmented inside.

Although the running time is very reasonable, if you want to speed it up more, you can try COALESCE/MERGE your table. One way of doing this is to export the table to Google Cloud Storage and re-import it back (not append). This should reduce the fragmentation and help in case you want to optimize your operations and gain a few seconds.

Running time of few minutes for table copy method is considered internally as absolutely normal for a table copy job and this does not classify as a BigQuery deficiency.

Refer to official documentation. And if you want to know more about fragmentation in BigQuery, I recommend you O'REILLY "Google BigQuery: The Definitive Guide: Data Warehousing, Analytics, and Machine Learning at Scale" book.

I hope you find the above pieces of information useful.


推荐阅读