首页 > 解决方案 > Steam Scrapy 问题

问题描述

这是我的代码:

# -*- coding: utf-8 -*-
import scrapy

class GameSpider(scrapy.Spider):
    name = 'game'
    allowed_domains = ['store.steampowered.com']
    start_urls = ['https://store.steampowered.com/search/results/?query&start=0&count=50&dynamic_data=&sort_by=_ASC&snr=1_7_7_230_7&category1=998&infinite=1']

    def parse(self, response):

        print(response.body)
        game_href = str(response.xpath(".//@href").extract())
        
        print(game_href)

我的问题是当我运行时scrapy,我只得到 17 个链接(总共 50 个链接)。我尝试检查response.body,它是正确的。

标签: pythonscrapysteam

解决方案


The page returns json data, yet you are parsing it as html.

If you only parse the actual html part, you will get all of the links:

>>> fetch('https://store.steampowered.com/search/results/?query&start=0&count=50&dynamic_data=&sort_by=_ASC&snr=1_7_7_230_7&category1=998&infinite=1')
2020-11-04 07:13:12 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://store.steampowered.com/search/results/?query&start=0&count=50&dynamic_data=&sort_by=_ASC&snr=1_7_7_230_7&category1=998&infinite=1> (referer: None)
>>> data = response.json()
>>> sel = scrapy.Selector(text=data['results_html'])
>>> game_href = sel.xpath('//@href').getall()
>>> len(game_href)
50

推荐阅读