首页 > 解决方案 > Scrapy:如何用字典中的其他项目替换无用的值

问题描述

我目前正在从网站上抓取价格,大多数产品都有最高和最低价格,但并非所有产品都有最低价格。那些没有最小值的人,我一直在为无文本“”替换那些无用的值,但我想用最高价格替换那些空值(基本上是因为如果价格没有改变,最小值和最大值是相同的) .

代码很广泛,所以我导入了以下库:

import os
import scrapy
from ..items import TutorialItem
import pandas as pd
from scrapy.http import Request
from scrapy.http import FormRequest
from scrapy.selector import Selector
from scrapy.utils.response import open_in_browser
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

class KikoSpider2(scrapy.Spider):
    name = "kiko2"

    login_page = 'https://www.kikowireless.com/login'
    formdata = {'email': 'thisisan@email.com',
                 'password': 'QQntDXqK9'}

代码继续..

重要的事情来了:

def parse_products(self, response):
        items = TutorialItem()
        category = response.meta['category']

        article_name = response.css('#content .name a::text').extract()
        article_price = [ x.replace('$', '').replace('\n', '').replace('\t', '').replace(' ', '') for x in response.css('.price::text').extract()]
        article_price_min = [x.replace('\t', '').replace(
        '$', '').replace('\n', 'n').split()[-1].replace('n', '') for x in response.css('.discount::text').extract()] 

        items['article_name'] = article_name
        items['article_price'] = article_price
        items['article_price_min'] = article_price_min
        for item in zip(article_name, article_price, article_price_min):
            scraped_info = {'supplier_url' : item[0],
                                'supplier_item_name' : item[1],
                                'max_price' : item[2],
                                'min_price' : item[3],
                                  }
                # print(scraped_info)
            df_result = pd.DataFrame.from_dict(scraped_info.items())
            print(df_result)
            yield scraped_info

代码行提取文章的最低价格,我该怎么做才能用与相同基础项目对应的article_price = [ x.replace('$', '').replace('\n', '').replace('\t', '').replace(' ', '') for x in response.css('.price::text').extract()] 空格填充其中的空白。article_price

标签: pythonscrapyweb-crawler

解决方案


这很简单。

#this if checks if the value is not null or empty
if article_price_min:
  items['article_price_min'] = article_price_min
else:
  items['article_price_min'] = article_price

推荐阅读