首页 > 解决方案 > 为什么修改项目中的功能不起作用?刮擦

问题描述

我之所以这样做def convert_date是因为我从网站上抓取的日期看起来像这样2021-05-06T23:59:59,而且这也是到期日期,所以我必须从中减去 30 天来生成发布日期。我正在寻找我的修改功能不起作用的原因。我已经使用了输入和输出过程,但我仍然在数据库中获得了未更改的字符串。有人可以给我一些提示吗?

#it​​ems.py

from scrapy.item import Item, Field
from datetime import timedelta, date
from scrapy.loader.processors import MapCompose, TakeFirst


def convert_date(text):
    text = text.split('T')
    year = int(text[0][:4])
    month = int(text[0][5:7])
    day = int(text[0][8:10])
    process_date = date(year, month, day)
    final_date = process_date - timedelta(30)
    return final_date


class JobOfferItem(Item):
    title = Field()
    employer = Field()
    country = Field()
    employer_page = Field()
    pub_date = Field(
        input_processor=MapCompose(convert_date),
        output_processor=TakeFirst()
        )
    salary = Field()
    seniority = Field()
    offer_id = Field()
    region = Field()
    city = Field()
    url = Field()

#pipilines.py

import sqlite3


class OfferscraperPipeline:
    def __init__(self):
        self.create_connection()
        self.create_table()

    def create_connection(self):
        self.conn = sqlite3.connect('joboffer.db')
        self.curr = self.conn.cursor()

    def create_table(self):
        self.curr.execute("""create table if not exists offer_tb(
                            title text,
                            employer text,
                            country text,
                            employer_page text,
                            pub_date text,
                            salary text,
                            seniority text,
                            offer_id text unique,
                            region text,
                            city text,
                            url text
                            )""")

    def store_db(self, item):
        self.curr.execute("""insert into offer_tb values (?,?,?,?,?,?,?,?,?,?, ?)""", (
            item['title'][0],
            item['employer'][0],
            item['country'][0],
            item['employer_page'][0],
            item['pub_date'][0],
            item['salary'][0],
            item['seniority'][0],
            item['offer_id'][0],
            item['region'][0],
            item['city'][0],
            item['url'],
        ))
        self.conn.commit()

    def process_item(self, item, spider):
        self.store_db(item)
        return item

标签: pythonweb-scrapingscrapy

解决方案


推荐阅读