首页 > 解决方案 > Scrapy 返回“Last Modified”日期错误:“KeyError: 'last-modified'”/“ValueError: year 1610477971 is out of range”

问题描述

我正在尝试使用 Scrapy 返回一组 URL 的“最后修改”日期。但是,我返回一个错误,指出:KeyError:'last-modified'。具体来说,如下:

  File "C:\spider.py", line 460, in fetch_dates
    url_time = r.headers['last-modified']
  File "C:\structures.py", line 52, in __getitem__
    return self._store[key.lower()][1]
KeyError: 'last-modified'

我为此使用的代码是:

  def fetch_dates(self, response):
        url = response.url
        r = requests.head(response.url)
        url_time = r.headers['last-modified']
        url_date = parsedate(url_time)
        for url in url_date:
            if os.path.exists('1url-to-date.csv'):
                append_write = 'a'
            else:
                append_write = 'w'
            
            with open('1url-to-date.csv', append_write) as url_f:
                url_f.write(url_time + "&,&" + url + "\n")
        
        return Item()

该代码也没有生成我的 csv 文件或返回我需要的信息。有什么建议么?谢谢!

编辑:我修改为以下内容;

    def fetch_dates(self, response):
        url = response.url
        r = requests.head(response.url)
        url_time = r.headers.get("last-modified", str(time.time()))
        url_date = parsedate(url_time)
        for url in url_date:
            if os.path.exists('1url-to-date.csv'):
                append_write = 'a'
            else:
                append_write = 'w'
            
            with open('1url-to-date.csv', append_write) as url_f:
                url_f.write(url_time + "&,&" + url + "\n")
        
        return Item()

但是,现在我收到了这个新错误:“ValueError: year 1610477971 is out of range”。任何建议都会非常有帮助。谢谢!

标签: pythondatetimeweb-scrapingscrapyweb-crawler

解决方案


推荐阅读