首页 > 解决方案 > 如果内容处置或 url 本身中不存在文件名,如何检索文件名?

问题描述

尝试在 python 中使用请求从 content-disposition 获取文件名,但文件名不存在,我还尝试从 url 本身生成名称。但是对于某些网址,例如。https://www.seedr.cc/zip/88714186?st=fa176033e056f391a766486e690bbcf0b2720842c31cac289a91738304636bac&e=1589129102

我无法从 url 获取文件名,并且没有内容处置标头。但是当我使用 IDM 之类的下载管理器甚至任何浏览器时,我都可以毫无问题地获取文件名。

对于上述类似 IDM 生成的名称是“8. Post Interview.zip”,我的代码给出的文件名是“88714186.zip”

我的代码片段是:

import os, re
import requests

from urllib.parse import unquote, urlparse
import mimetypes

useragent = {'User-agent': 'Mozilla/5.0 (X11; Ubuntu; Linux i686 on x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2820.59 Safari/537.36'}

def fix_fileName(response, fileName):
    name, extension = os.path.splitext(fileName)
    if not extension:
        mime = response.headers['Content-Type']
        if mime != 'application/octet-stream':
            extension = mimetypes.guess_extension(response.headers['Content-Type'])    
        return name + extension
    else:
        return fileName

def downloader(url):
    with requests.get(url, stream= True, headers=useragent) as response:
        if response.raise_for_status:
            print(response.headers)
            if 'filename' in response.headers['Content-Disposition']:
                fileName = re.findall("filename=(.+)", response.headers["Content-Disposition"])[0].strip('"')
                fileName = fix_fileName(response,fileName)
            else:
                fileName = os.path.basename(urlparse(url).path)
                fileName = fix_fileName(response,fileName)

            with open(fileName,'wb') as output_file:
                output_file.write(response.content)

def main():
    url='https://www.seedr.cc/zip/88714707?st=01607f3f1b4adac3f8bf6292fdbac137207de1defb75646daafc9781dda8dc26&e=1589129561'
    downloader(url)

if __name__ == "__main__":
    main()

如何在python中实现这一点?请帮我解决。

标签: pythonpython-requestspython-3.7content-disposition

解决方案



推荐阅读