首页 > 解决方案 > Scrapy 命令用于提取 Web 的特定字段

问题描述

我想使用以 json 格式编写的网络数据在 python 上创建一个字典。我正在尝试使用scrapy,但没有任何反应。

我试过这个:

from scrapy.crawler import CrawlerProcess

class my_spider(scrapy.Spider):

    name = "my_spider"

    start_urls = ["https://meteo.cat/observacions/llistat-xema"]

    def parse(self, response):

        for slug in response.xpath('//td[@headers="metacom"]'):
            yield {'title': slug.extract()}
if __name__ == "__main__":

    process = CrawlerProcess({
         'USER_AGENT': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0',
         'DOWNLOAD_HANDLERS': {'s3': None},
         'LOG_ENABLED': True})

    process.crawl(my_spider) 

    process.start()

Any suggestion, plase?

标签: pythonscrapy

解决方案


您可以尝试运行以下命令:

from scrapy import Spider
from scrapy.crawler import CrawlerProcess

# Remember to name class in camelcases.
class Myspider(Spider):
    name = "my_spider"

    start_urls = ["https://meteo.cat/observacions/llistat-xema"]

    def parse(self, response):
        for slug in response.xpath('//td[@headers="metacom"]'):
            yield {'title': slug.extract()}


if __name__ == "__main__":
    process = CrawlerProcess({
        'USER_AGENT': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0',
        'DOWNLOAD_HANDLERS': {'s3': None},
        'LOG_ENABLED': True})

    process.crawl(Myspider)

    process.start()

推荐阅读