首页 > 解决方案 > 使用数据集访问 sqlite 的资源限制

问题描述

如何映射在内部访问大型数组上的 SQLite 数据的函数?

应用它〜128次后,它似乎对我不利。

next(iter(tbl))['value']如果替换为固定值,则代码运行。因此,它似乎不是连接 ( dataset.connect(...)) 或表 ( tbl=c['table']) 对象的构造的资源问题,而是从数据库中获取值的一些泄漏。

注意:我使用的是奇数list(map(modify, data))构造,因为我的实际用例是应用这个访问 Spark RDD 上的数据库的函数。这是我的问题的“普通python”等价物。

测试用例:

import dataset
import numpy


fname = '/tmp/test.db'
def ensure_db():
    if not os.path.exists(fname):
        c = ez_connection()
        tbl = c['table']
        tbl.insert({'value':1.0})
        os.chmod(fname, 0o777) # this is a left-over when I thought file permissions might be a problem
    assert(os.path.exists(fname))

def ez_connection():
    return dataset.connect('sqlite:///'+fname)


def modify(value):
    with ez_connection() as c:
        tbl = c['table']
        val = next(iter(tbl))['value'] # easy way to get the value out
    return val+value

if __name__ == "__main__":
    ensure_db()
    for i in range(4, 2048, 4):
        data = numpy.arange(i)
        print(f"about to map {i} items ...", end=' ')
        res = list(map(modify, data))
        print('OK')

产生输出:

about to map 4 items ... OK
about to map 8 items ... OK
about to map 12 items ... OK
about to map 16 items ... OK
about to map 20 items ... OK
about to map 24 items ... OK
about to map 28 items ... OK
about to map 32 items ... OK
about to map 36 items ... OK
about to map 40 items ... OK
about to map 44 items ... OK
about to map 48 items ... OK
about to map 52 items ... OK
about to map 56 items ... OK
about to map 60 items ... OK
about to map 64 items ... Traceback (most recent call last):
  File "/home/Dave/pyspark-env/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 3212, in _wrap_pool_connect
  File "/home/Dave/pyspark-env/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 307, in connect
  File "/home/Dave/pyspark-env/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 767, in _checkout
  File "/home/Dave/pyspark-env/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 425, in checkout
  File "/home/Dave/pyspark-env/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 256, in _do_get
  File "/home/Dave/pyspark-env/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 253, in _create_connection
  File "/home/Dave/pyspark-env/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 368, in __init__
  File "/home/Dave/pyspark-env/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 611, in __connect
  File "/home/Dave/pyspark-env/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
  File "/home/Dave/pyspark-env/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 207, in raise_
  File "/home/Dave/pyspark-env/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 605, in __connect
  File "/home/Dave/pyspark-env/lib/python3.8/site-packages/sqlalchemy/engine/create.py", line 578, in connect
  File "/home/Dave/pyspark-env/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 584, in connect
sqlite3.OperationalError: unable to open database file

标签: pythonsqlitepython-dataset

解决方案


推荐阅读