google-bigquery - Is it possible to stream to a BigQuery partitioned table while preserve caching?
问题描述
I have a single table in BigQuery time partitioned by day . Dataflow job uses streaming API to insert new records continuously, but only to the newest partitions (two in the corner case, when data comes slightly out of order on the border of days).
On the other side, I query the table a lot aggregating historical months, not touching the most recent days, i.e. the streaming buffer as well.
I would like to leverage the caching of the results of such queries. Streaming to the table unfortunately disables the cache, even though theoretically the cached results are not influenced by the streamed rows.
How do I use caching on historical partitions while still be able to stream to the newest partitions?
If it is impossible out of the box, is it a good design to:
- manage two tables: "recent" , and "historical".
- Streaming happens to "recent".
- Periodically "recent" is merged into "historical", then "recent" is purged.
- Querying happens on some kind of a view on "historical" and "recent" ?
If yes, how would I define such a view that will use caching if only "historical" data is queried? Or would I need to have my own query rewrite tool?
Maybe you have other ideas?
解决方案
推荐阅读
- javascript - 为什么 async await 不能正确地与 axios 一起工作?
- youtube-api - YouTube API ListResponse ETag 不一致
- php - 使用 laravel 将 Sql 服务器连接到 php (php artisan migrate)
- matlab - 将 GPX 绘制到底图上
- java - 我的java代码得到编译错误。如何修复它?
- c# - System.Collections.Generic 中的堆栈和列表在异步环境中不保持顺序
- regex - 我需要使用正则表达式作为文件名,而当前正则表达式不起作用
- data-structures - 删除节点后进行两次旋转以重新平衡 avl 树
- excel - 如何根据变量激活特定工作表。该变量将具有要激活的所需工作表的名称
- python-3.x - 如何删除特定发送给某人的邮件,并且该邮件位于已发送邮箱中?