首页 > 解决方案 > 在 c 之后使用来自 Python 生成器的大型结果集时,Snowflake 会抛出 403 Forbidden。6个小时

问题描述

我有一个返回数百万个结果的查询,但我不想将整个结果集存储在内存中,所以我使用以下代码批量检索记录:

def batch_query_results(sql_filename: str, num_rows: int) -> list[tuple]:
    """
    Executes arbitrary query from SQL file and returns results as a batch-result
    generator
    """
    with get_connection() as conn:
        with open(sql_filename, "r", encoding="utf-8") as f:
            query = f.read()
            cur = conn.cursor().execute(query)
            logger.info("Executed Snowflake query!")
            unfetched_results = True
            while unfetched_results:
                results = cur.fetchmany(num_rows)
                if len(results) > 0:
                    logger.info(f"Fetched {len(results)} rows from query results...")
                    yield results
                else:
                    unfetched_results = False

当我设置时num_row = 10_000,这工作得很好,直到我到达第 980001 条记录(所以 c.第 98 次调用生成器以产生下一批),这是在执行开始后大约 6 小时,因为我的速率受限于使用每个结果的 API。

然后我收到以下错误:

Traceback (most recent call last):
  File "/Users/.../lib/python3.9/site-packages/snowflake/connector/result_batch.py", line 315, in _download
    raise RetryRequest(error)
snowflake.connector.network.RetryRequest: 000403: 403: HTTP 403: Forbidden

这就像发条一样发生。我查看了源代码,并注意到了这个文档字符串:

class ResultBatch(abc.ABC):
    """Represents what the back-end calls a result chunk.

    These are parts of a result set of a query. They each know how to retrieve their
    own results and convert them into Python native formats.

    As you are iterating through a ResultBatch you should check whether the yielded
    value is an ``Exception`` in case there was some error parsing the current row
    we might yield one of these to allow iteration to continue instead of raising the
    ``Exception`` when it occurs.

    These objects are pickleable for easy distribution and replication.

    Please note that the URLs stored in these do expire. The lifetime is dictated by the
    Snowflake back-end, at the time of writing this this is 6 hours.

    They can be iterated over multiple times and in different ways. Please follow the
    code in ``cursor.py`` to make sure that you are using this class correctly.
    """

这是我收到此错误的原因吗?如果是这样,除了将我的工作分成 980000 行增量之外,我还能如何解决这个问题?


{注意:这与这个未回答的问题相似,但并不完全相同。}

标签: pythongeneratorsnowflake-cloud-data-platform

解决方案


推荐阅读