首页 > 解决方案 > 如何通过python列表自动进行多个mongoDB查询

问题描述

这是我的查询

%%time
from pymongo import MongoClient
import datetime as dt
mongo_client = MongoClient(...credential...)
db_score = mongo_client['at-device-info']
cvsms = db_score['flat_sms']
test = cvsms.find({'customer_id': {'$in': list1}},{ 'customer_id': 1,'timestamp': 1})
df1 = pd.DataFrame(list(test))

我所做的是复制最后两行并更改list1list2, 并将 df1 更改为df2. 所以它会变成

test = cvsms.find({'customer_id': {'$in': list2}},{ 'customer_id': 1,'timestamp': 1})
df2 = pd.DataFrame(list(test))

然后继续对list3和做同样的事情df3。如何为 48 个列表自动执行此操作,一个查询需要 4 分钟才能在我的 jupyter 笔记本上运行

标签: pythonmongodbpandasdataframe

解决方案


您总是可以遍历所有查询并制作一个 DataFrame 并将它们添加到一个列表中,如下所示:

from pymongo import MongoClient
import datetime as dt
import pandas as pd

mongo_client = MongoClient(...
credential...)
db_score = mongo_client['at-device-info']
cvsms = db_score['flat_sms']

list1 = [1,2,3,4,5] # list of values to search
list2 = [6,7,8,9,10] # list of values to search
lists = [list1,list2]
df_list = []

for lst in lists:
    test = cvsms.find({'customer_id': {'$in': lst}}, {'customer_id': 1, 'timestamp': 1})
    df = pd.DataFrame(list(test))
    df_list.append(df)


# If you want to access each dataframe seperately from the list you can access the individual list elements
df1 = df_list[0]
df2 = df_list[1]


full_df = pd.concat(df_list)

如果您想加快速度,可以尝试将concurrent模块与ThreadPoolExecutor或 一起使用ProcessPoolExecutor

from concurrent import futures

def query_df(lst):
    test = cvsms.find({'customer_id': {'$in': lst}}, {'customer_id': 1, 'timestamp': 1})
    df = pd.DataFrame(list(test))
    return df

with futures.ThreadPoolExecutor(max_workers=4) as f:
    df_list = f.map(query_df,lists)

full_df = pd.concat(df_list)

最后,您可以通过连接列表从较小的数据帧中创建一个大数据帧。


推荐阅读