首页 > 解决方案 > 快速API | 羽毛文件压缩下载(zl4 和 zstd)

问题描述

我正在构建一个简单的 Web 应用程序。

Fastapi 上的 Back_end 和反应中的 fron_end。

出于安全原因,服务器必须只能在内存而不是磁盘中工作。

工作流程如下:

对于 zstd,这是服务器端代码,它从客户端启动自动下载,因为我是从本机 HTML 表单调用的。

    @router.post("/files", response_class = StreamingResponse)
async def anonymization(file: bytes = File(...), config: str = Form(...)):
    # time init
    tic = timeit.default_timer()
    # file as str
    inMemoryFile = BytesIO(file)
    # dataframe
    df = pd.read_feather(inMemoryFile)
    # send to function to handle anonymization
    results_df = anonymize(df, config)
    inMemoryFile.close()
    # output file
    outMemoryFile = BytesIO()
    feather.write_dataframe(results_df, outMemoryFile, compression='zstd')
    # response
    response = StreamingResponse(
        iter([outMemoryFile.getvalue()]),
        media_type='application/zstandard',
        headers={
            'Content-Disposition': 'attachment; filename=dataset.feather.zst',
            'Access-Control-Expose-Headers': 'Content-Disposition'
        }
    )
    # print
    outMemoryFile.close()
    # time end
    toc = timeit.default_timer()
    elapsed = toc-tic
    print(f'Time elapsed is aproximately {elapsed} seconds o {elapsed/60} minutes. For n rows {len(df.index)}')  # seconds
    # return
    return response

当使用 zstandard python 库解压文件时,这会返回错误,代码如下。

def extract_zst(archive: Path, out_path: Path):
archive = Path(archive).expanduser()
out_path = Path(out_path).expanduser().resolve()
# need .resolve() in case intermediate relative dir doesn't exist

dctx = zstandard.ZstdDecompressor()

with tempfile.TemporaryFile(suffix=".feather") as ofh:
    with archive.open("rb") as ifh:
        dctx.copy_stream(ifh, ofh)
    ofh.seek(0)
    with tarfile.open(fileobj=ofh) as z:
        z.extractall(out_path)


Traceback (most recent call last):
    extract_zst(inputPath, outputPath)
  File "c:\Users\david\OneDrive\Escritorio\David USFQ\11vo_semestre\titulacion\prototipo\anonymity_ultima_version\back_end\fastapi_server\temp_conversions\decompress.py", line 27, in extract_zst
    dctx.copy_stream(ifh, ofh)
zstd.ZstdError: zstd decompressor error: Unknown frame descriptor

关于 lz4 压缩,这是服务器端代码。

    @router.post("/files", response_class = StreamingResponse)
async def anonymization(file: bytes = File(...), config: str = Form(...)):
    # time init
    tic = timeit.default_timer()
    # file as str
    inMemoryFile = BytesIO(file)
    # dataframe
    df = pd.read_feather(inMemoryFile)
    # send to function to handle anonymization
    results_df = anonymize(df, config)
    inMemoryFile.close()
    # output file
    outMemoryFile = BytesIO()
    feather.write_dataframe(results_df, outMemoryFile, compression='lz4')
    # response
    response = StreamingResponse(
        iter([outMemoryFile.getvalue()]),
        media_type='application/octet-stream',
        headers={
            'Content-Disposition': 'attachment; filename=dataset.feather.lz4',
            'Access-Control-Expose-Headers': 'Content-Disposition'
        }
    )
    # print
    outMemoryFile.close()
    # time end
    toc = timeit.default_timer()
    elapsed = toc-tic
    print(f'Time elapsed is aproximately {elapsed} seconds o {elapsed/60} minutes. For n rows {len(df.index)}')  # seconds
    # return
    return response

使用 Linux 命令解压 lz4 文件时的错误。

lz4 dataset.feather.lz4
Error 44 : Unrecognized header : file cannot be decoded

在此先感谢,我整个下午都在努力解决这个问题。

标签: pythondownloadcompressionfastapifeather

解决方案


推荐阅读