首页 > 解决方案 > 从 Python 脚本将 MongoDB 数据导入 Azure ML Studio

问题描述

目前在 Azure ML 中执行 python 脚本,代码如下。(Python 2.7.11)其中从 mongoDB 获得的结果正在尝试使用 pyMongo 在 DataFrame 中返回。

我收到一个错误,例如::

"C:\pyhome\lib\site-packages\pymongo\topology.py", line 97, in select_servers
        self._error_message(selector))
    ServerSelectionTimeoutError: ... ('The write operation timed out',)

如果您知道错误的原因以及需要改进的地方,请告诉我。

我的源代码:

import pymongo as m
import pandas as pd

def azureml_main(dataframe1 = None, dataframe2 = None):

uri = "mongodb://xxxxx:yyyyyyyyyyyyyyy@zzz.mongodb.net:xxxxx/?ssl=true&replicaSet=globaldb"
client = m.MongoClient(uri,connect=False)
db = client['dbName']
coll = db['colectionName']
cursor = coll.find()
df = pd.DataFrame(list(cursor))
return df,

错误详情:

Error 0085: The following error occurred during script evaluation, please view the output log for more information:
---------- Start of error message from Python interpreter ----------
Caught exception while executing function: Traceback (most recent call last):
  File "C:\server\invokepy.py", line 199, in batch
    odfs = mod.azureml_main(*idfs)
  File "C:\temp\55a174d8dc584942908423ebc0bac110.py", line 32, in azureml_main
    result =  pd.DataFrame(list(cursor))
  File "C:\pyhome\lib\site-packages\pymongo\cursor.py", line 977, in next
    if len(self.__data) or self._refresh():
  File "C:\pyhome\lib\site-packages\pymongo\cursor.py", line 902, in _refresh
    self.__read_preference))
  File "C:\pyhome\lib\site-packages\pymongo\cursor.py", line 813, in __send_message
    **kwargs)
  File "C:\pyhome\lib\site-packages\pymongo\mongo_client.py", line 728, in _send_message_with_response
    server = topology.select_server(selector)
  File "C:\pyhome\lib\site-packages\pymongo\topology.py", line 121, in select_server
    address))
  File "C:\pyhome\lib\site-packages\pymongo\topology.py", line 97, in select_servers
    self._error_message(selector))
ServerSelectionTimeoutError: xxxxx-xxx.mongodb.net:xxxxx: ('The write operation timed out',)
Process returned with non-zero exit code 1

标签: pythonmongodbazurepymongoazure-machine-learning-studio

解决方案


据我所知,有一个限制Execute Python Scripts会导致这个问题,请参阅Limitations下面的部分了解它。

限制

执行 Python 脚本当前具有以下限制:

  1. 沙盒执行。Python 运行时当前是沙盒化的,因此不允许以持久方式访问网络或本地文件系统。模块完成后,本地保存的所有文件都会被隔离并删除。Python 代码无法访问运行它的机器上的大多数目录,当前目录及其子目录除外。

由于上述原因,您无法通过模块中pymongo的驱动程序直接从 Azure Cosmos DB 在线导入数据。Execute Python Script但是您可以将Import Data模块与 Azure Cosmos DB 的连接和参数信息一起使用,并将其输出连接到输入Execute Python Script以获取数据,如下图所示。

在此处输入图像描述

有关在线导入数据的更多信息,请参阅Import from online data sources官方文档的部分Import your training data into Azure Machine Learning Studio from various data sources


推荐阅读