首页 > 解决方案 > 如何在 Scrapy 上递归获取类别及其内容

问题描述

当我运行我的抓取代码时,它会抓取https://www.hurriyetemlak.com/istanbul-adalar-maden-satilik/daire/82579-379这个网站。我的信息(ilan_bilgileri)在我的 csv 文件的一列中。我想要在单独的列上递归地获取信息类别及其内容(每个广告都有不同的类别和不同的位置)。最好的方法是什么?我是scrapy和python的新手,所以希望有人能指出我正确的方向。我不允许将图片放在帖子上,所以这里是csv结果的链接https://i.stack.imgur.com/XppT5.png . 这是我的蜘蛛代码:

class HurriyetEmlak(scrapy.Spider):
    name = 'hurriyetspider'
    start_urls = ['https://www.hurriyetemlak.com/istanbul-adalar-maden-satilik/daire/82579-379']



    custom_settings={ 'FEED_URI': "hurriyet_son.csv",
                       'FEED_FORMAT': 'csv'}

    def parse(self, response):
        il = response.xpath('//*[contains(concat( " ", @class, " " ), concat( " ", "short-info-list", " " ))]//li[(((count(preceding-sibling::*) + 1) = 1) and parent::*)]/text()').extract()
        ilce = response.xpath('//*[contains(concat( " ", @class, " " ), concat( " ", "short-info-list", " " ))]//li[(((count(preceding-sibling::*) + 1) = 2) and parent::*)]/text()').extract()
        mahalle = response.xpath('//*[contains(concat( " ", @class, " " ), concat( " ", "short-info-list", " " ))]//li[(((count(preceding-sibling::*) + 1) = 3) and parent::*)]/text()').extract()
        fiyat = response.xpath('//*[contains(concat( " ", @class, " " ), concat( " ", "price", " " ))]/text()').extract()
        baslik = response.css('.txt::text').extract()
        deger = response.css('.adv-info-list div span , .txt+ span::text').extract()

        scraped_info = {
            'İl': il,
            'İlçe' : ilce,
            'Mahalle' : mahalle,
            'Fiyat' : fiyat,
            'İlan Bilgileri - Başlık': baslik,
            'İlan Bilgileri - Değer' : deger
        }
        yield scraped_info  ```





  

标签: pythonweb-scrapingscrapy

解决方案


我猜您正在尝试将所有列信息写入一列,而不是一行。如果您将使用默认 CSV 进行抓取,例如

scrapy crawl Hurriyet -o hurriyet_son.csv

它将像您一样将所有信息写在一行中。我认为 CSV 库会对您有所帮助。你可以把它当作一个标题,而不是满足于下面的代码。

import csv


news_titles=[]  
for new in scraped_info:
    news_titles.append(new.text)

    print (news_titles)
with open('hurriyet_son.csv', 'yeni') as f:
    writer csv.writer(f)
    writer.writerow(news_titles)
    f.close()

让我知道你的更新。


推荐阅读