python - 在 c 之后使用来自 Python 生成器的大型结果集时,Snowflake 会抛出 403 Forbidden。6个小时
问题描述
我有一个返回数百万个结果的查询,但我不想将整个结果集存储在内存中,所以我使用以下代码批量检索记录:
def batch_query_results(sql_filename: str, num_rows: int) -> list[tuple]:
"""
Executes arbitrary query from SQL file and returns results as a batch-result
generator
"""
with get_connection() as conn:
with open(sql_filename, "r", encoding="utf-8") as f:
query = f.read()
cur = conn.cursor().execute(query)
logger.info("Executed Snowflake query!")
unfetched_results = True
while unfetched_results:
results = cur.fetchmany(num_rows)
if len(results) > 0:
logger.info(f"Fetched {len(results)} rows from query results...")
yield results
else:
unfetched_results = False
当我设置时num_row = 10_000
,这工作得很好,直到我到达第 980001 条记录(所以 c.第 98 次调用生成器以产生下一批),这是在执行开始后大约 6 小时,因为我的速率受限于使用每个结果的 API。
然后我收到以下错误:
Traceback (most recent call last):
File "/Users/.../lib/python3.9/site-packages/snowflake/connector/result_batch.py", line 315, in _download
raise RetryRequest(error)
snowflake.connector.network.RetryRequest: 000403: 403: HTTP 403: Forbidden
这就像发条一样发生。我查看了源代码,并注意到了这个文档字符串:
class ResultBatch(abc.ABC):
"""Represents what the back-end calls a result chunk.
These are parts of a result set of a query. They each know how to retrieve their
own results and convert them into Python native formats.
As you are iterating through a ResultBatch you should check whether the yielded
value is an ``Exception`` in case there was some error parsing the current row
we might yield one of these to allow iteration to continue instead of raising the
``Exception`` when it occurs.
These objects are pickleable for easy distribution and replication.
Please note that the URLs stored in these do expire. The lifetime is dictated by the
Snowflake back-end, at the time of writing this this is 6 hours.
They can be iterated over multiple times and in different ways. Please follow the
code in ``cursor.py`` to make sure that you are using this class correctly.
"""
这是我收到此错误的原因吗?如果是这样,除了将我的工作分成 980000 行增量之外,我还能如何解决这个问题?
{注意:这与这个未回答的问题相似,但并不完全相同。}
解决方案
推荐阅读
- c# - Task.Run() 访问方法范围变量是否安全?
- javascript - 变量从 Flask 发送到 JS 后仍然未定义
- c++ - DX11 CreateDepthStencil 视图 - nullptr
- node.js - 是否可以通过单击加载页面的 DOM 来触发 puppeteer 的屏幕截图?
- r - 在 dplyr 函数调用中使用 = inside bquote
- amazon-web-services - Flutter:如何为 AWS S3 编写 PUT 请求
- c++ - 用于从多个线程一致地存储在原子变量中的 C++ 内存顺序
- ruby-on-rails - 如何在 Rails 测试中要求 gem?
- python - 使用熊猫图的不同列的不同线宽
- matrix - 在 Maple 中对具有函数的矩阵求导。想要将函数保留为素数(f' 或 f'')而不评估