python - Steam Scrapy 问题
问题描述
这是我的代码:
# -*- coding: utf-8 -*-
import scrapy
class GameSpider(scrapy.Spider):
name = 'game'
allowed_domains = ['store.steampowered.com']
start_urls = ['https://store.steampowered.com/search/results/?query&start=0&count=50&dynamic_data=&sort_by=_ASC&snr=1_7_7_230_7&category1=998&infinite=1']
def parse(self, response):
print(response.body)
game_href = str(response.xpath(".//@href").extract())
print(game_href)
我的问题是当我运行时scrapy
,我只得到 17 个链接(总共 50 个链接)。我尝试检查response.body
,它是正确的。
解决方案
The page returns json data, yet you are parsing it as html.
If you only parse the actual html part, you will get all of the links:
>>> fetch('https://store.steampowered.com/search/results/?query&start=0&count=50&dynamic_data=&sort_by=_ASC&snr=1_7_7_230_7&category1=998&infinite=1')
2020-11-04 07:13:12 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://store.steampowered.com/search/results/?query&start=0&count=50&dynamic_data=&sort_by=_ASC&snr=1_7_7_230_7&category1=998&infinite=1> (referer: None)
>>> data = response.json()
>>> sel = scrapy.Selector(text=data['results_html'])
>>> game_href = sel.xpath('//@href').getall()
>>> len(game_href)
50
推荐阅读
- python - 我正在测试一些模块及其功能,但是当我运行它时,它说:
- python - 将多行附加到空数据框中的特定列
- mysql - Mysql Innodb 表中允许的最大行数
- c++ - 如何将图像拆分为 M x N 个图块
- redis - Redis multi - 它是线程安全的吗?
- c# - Visual Studio xamarin.forms 中的秒表在反应时间测试中增加了 100 毫秒
- javascript - 如何在单选按钮单击时映射数据?
- r - 如果时间在时间戳的 X 分钟内(时间戳加或减),则加入数据帧
- java - 无法捕获 Kafka TopicExistsException
- python - 在 W10 中设置 tkinter 树视图标题颜色