首页 > 解决方案 > Firestore 数据库写入的性能?

问题描述

操作系统:Mac OS Catalina v 10.15.1

Python版本:Python 3.7.1

我使用 Firestore 作为我的数据库,用于带有 Python SDK 的个人项目。我目前正在尝试优化我的后端,我注意到写入 Firestore 的速度很慢。以下面的示例代码为例:

import firebase_admin
from firebase_admin import credentials
from firebase_admin import firestore
import time


cred = credentials.Certificate("./path/to/adminsdk.json")
firebase_admin.initialize_app(cred)
db = firestore.client()

test_data = {f"test_field_{i}":f"test_value_{i}" for i in range(20)}

now = time.time()
db.collection(u'latency_test_collection').document(u'latency_test_document').set(test_data)
print(f"Total time: {time.time()-now}")

上面的代码运行时间超过 300 毫秒,这似乎很慢,尤其是当我有多个比上面的例子大得多的写入时。我检查了我的互联网连接,无论连接如何,性能都在这个值附近徘徊。Firestore 写入的这种性能是预期的,还是有办法为此优化我的代码?

标签: pythonfirebasegoogle-cloud-platformgoogle-cloud-firestore

解决方案


Like @Nebulastic said, batches are much more efficient than one by one transactions. I just ran a test from my laptop in Europe to a Firestore located in us-west2 (Los Angeles). Here are the actual results from one by one deletions and batch deletions.

$ python firestore_test.py 
Creating 10 documents
Wrote 10 documents in 1.80 seconds.
Deleting documents one by one
Deleted 10 documents in 7.97 seconds.
###
Creating 10 documents
Wrote 10 documents in 0.92 seconds.
Deleting documents in batch
Deleted 10 documents in 1.71 seconds.
###
Creating 2000 documents
Wrote 2000 documents in 6.27 seconds.
Deleting documents in batch
Deleted 2000 documents in 9.80 seconds.

Here's the test code:

from time import time
from uuid import uuid4
from google.cloud import firestore

DB = firestore.Client()

def generate_user_data(entries = 10):
    print('Creating {} documents'.format(entries))
    now = time()
    batch = DB.batch()
    for counter in range(entries):
        # Each transaction or batch of writes can write to a maximum of 500 documents.
        # https://cloud.google.com/firestore/quotas#writes_and_transactions
        if counter % 500 == 0 and counter > 0:
            batch.commit()
            batch = DB.batch()

        user_id = str(uuid4())
        data = {
            "some_data": str(uuid4()),
            "expires_at": int(now)
            }
        user_ref = DB.collection(u'users').document(user_id)
        batch.set(user_ref, data)
    batch.commit()
    print('Wrote {} documents in {:.2f} seconds.'.format(entries, time() - now))

def delete_one_by_one():
    print('Deleting documents one by one')
    now = time()
    docs = DB.collection(u'users').where(u'expires_at', u'<=', int(now)).stream()
    counter = 0
    for doc in docs:
        doc.reference.delete()
        counter = counter + 1
    print('Deleted {} documents in {:.2f} seconds.'.format(counter, time() - now))

def delete_in_batch():
    print('Deleting documents in batch')
    now = time()
    docs = DB.collection(u'users').where(u'expires_at', u'<=', int(now)).stream()
    batch = DB.batch()
    counter = 0
    for doc in docs:
        counter = counter + 1
        if counter % 500 == 0:
            batch.commit()
        batch.delete(doc.reference)
    batch.commit()
    print('Deleted {} documents in {:.2f} seconds.'.format(counter, time() - now))


generate_user_data(10)
delete_one_by_one()
print('###')
generate_user_data(10)
delete_in_batch()
print('###')
generate_user_data(2000)
delete_in_batch()

推荐阅读