首页 > 解决方案 > 从 URL 读取 Zip 文件并将 Ms Access 文件转换为数据框

问题描述

我写这条消息是为了寻求帮助,以便为这个 python 代码找到解决方案。与我公司的零件数据库相比,我正在尝试生成 FAA PMA 零件的可视化。我已经使用 Power Bi、SQL 查询和 Excel 文件完成了它,但它不是在线可视化。每个月我都必须从 URL ( https://rgl.faa.gov/Regulatory_and_Guidance_Library/rgPMA.nsf/f6a80b2e3c91131686257bcf0063a042/bc1b3424cc899bee86257beb006237ae/$FILE/PMA.zip ) 下载 FAA PMA 部件以更新此可视化。

话虽如此,我已经尝试执行这些 python 代码,以便从上述 Url 读取 Ms Access 文件并将其转换为 Pandas DataFrame。

但是,由于以下原因,我没有成功完成这项任务:

首先,我尝试使用 sqlalchemy,但出现了以下错误。

    from zipfile import ZipFile
    from io import BytesIO
    import urllib
    from urllib.request import urlopen
    from sqlalchemy import create_engine
    import pandas as pd
    
    r = urlopen("https://rgl.faa.gov/Regulatory_and_Guidance_Library/rgPMA.nsf/f6a80b2e3c91131686257bcf0063a042/bc1b3424cc899bee86257beb006237ae/$FILE/PMA.zip").read()
    file = ZipFile(BytesIO(r))
    pma_accdb = file.open("PMA.accdb")
    
    connection_string = (r'DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};'
                         r'DBQ=pma_accdb;'
                         r'ExtendedAnsiSQL=1;'
                                              )
    connection_url = f"access+pyodbc:///?odbc_connect={urllib.parse.quote_plus(connection_string)}"
    engine = create_engine(connection_url)
    sql_df = pd.read_sql("SELECT * FROM Parts", con=engine)
    
    print(sql_df.head())

错误:回溯(最近一次调用最后):文件“C:/Users/thiago.ribeiro/.PyCharmCE2019.1/config/scratches/scratch.py​​”,第 17 行,在 engine = create_engine(connection_url) 文件“C:\用户\thiago.ribeiro\AppData\Local\Programs\Python\Python38\lib\site-packages\sqlalchemy\ engine_init_.py”,第 500 行,在 create_engine 返回 strategy.create(*args, **kwargs) 文件“C:\Users\thiago.ribeiro\AppData\Local\Programs\Python\Python38\lib\site-packages\sqlalchemy \engine\strategies.py”,第 61 行,在创建入口点 = u._get_entrypoint() 文件“C:\Users\thiago.ribeiro\AppData\Local\Programs\Python\Python38\lib\site-packages\sqlalchemy\engine \url.py”,第 172 行,在 _get_entrypoint cls = registry.load(name) 文件“C:\Users\thiago.ribeiro\AppData\Local\Programs\Python\Python38\lib\site-packages\sqlalchemy\util\ langhelpers.py”,第 267 行,在加载中引发 exc.NoSuchModuleError(sqlalchemy.exc.NoSuchModuleError:无法加载插件:sqlalchemy.dialects:access.pyodbc

之后我尝试使用pyodbc,但也出现了以下错误。

    from zipfile import ZipFile
    from io import BytesIO
    from urllib.request import urlopen
    import pyodbc
    import pandas as pd
    
    r = urlopen("https://rgl.faa.gov/Regulatory_and_Guidance_Library/rgPMA.nsf/f6a80b2e3c91131686257bcf0063a042/bc1b3424cc899bee86257beb006237ae/$FILE/PMA.zip").read()
    file = ZipFile(BytesIO(r))
    pma_accdb = file.open("PMA.accdb")
    
    driver = '{Microsoft Access Driver (*.mdb, *.accdb)}'
    filepath = pma_accdb
    
    myDataSources = pyodbc.dataSources()
    access_driver = myDataSources['MS Access Database']
    
    cnxn = pyodbc.connect(driver=access_driver, dbq=filepath, autocommit=True)
    crsr = cnxn.cursor()
    crsr.execute("SELECT * FROM Parts")
    
    sql_df = pd.read_sql("SELECT * FROM Parts", con=engine)
    
    print(sql_df.head())

错误:回溯(最近一次调用最后一次):第 17 行,在 cnxn = pyodbc.connect(driver=access_driver, dbq=filepath, autocommit=True) pyodbc.Error: ('HY000', '[HY000] [Microsoft][Driver ODBC Microsoft Access] 一般错误无法打开注册表项临时(易失性)Ace DSN 进程 0x2efc 线程 0x4a20 DBC 0xd39ea788

综上所述,能否请您帮助我了解我的代码哪里出了问题,以及如何改进它以便从 url 读取 MS Access 并将其转换为 pandas 数据框?

标签: pythonms-accesssqlalchemypyodbcsqlalchemy-access

解决方案


我得到了朋友的帮助。这是解决方案:


        from zipfile import ZipFile
        from io import BytesIO
        import urllib
        from urllib.request import urlopen
        from sqlalchemy import create_engine
        import pandas as pd
        
        resp = urlopen("https://rgl.faa.gov/Regulatory_and_Guidance_Library/rgPMA.nsf/f6a80b2e3c91131686257bcf0063a042/bc1b3424cc899bee86257beb006237ae/$FILE/PMA.zip").read()
        zipfile = ZipFile(BytesIO(resp))
        
        connection_string = (r'DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};'
        r'DBQ=' + zipfile.extract('PMA.accdb') + ';'
        r'ExtendedAnsiSQL=1;'
        )
        connection_url = f"access+pyodbc:///?odbc_connect={urllib.parse.quote_plus(connection_string)}"
        engine = create_engine(connection_url)
        sql_df = pd.read_sql("SELECT * FROM Parts", con=engine)


推荐阅读