python - 请求完成后 Scrapy 迭代器立即停止
问题描述
这是我的代码,用于扫描用户并输出他们的 SteamID 和库存价值:
import scrapy
bot_words = [
"bot",
"BOT",
"Bot",
"[tf2mart]"
]
class AccountSpider(scrapy.Spider):
name = "accounts"
start_urls = [
'file:///Users/max/Documents/promotebot/tutorial/tutorial/TF2ITEMS.htm'
]
def linkgen(self):
global steamid
print("Downloading Page...")
yield scrapy.Request("http://www.backpack.tf" + steamid, callback=self.parse_accounts)
print("Page successfully downloaded.")
def parse(self, response):
global steamid
lgen = self.linkgen()
for tr in response.css("tbody"):
for user in response.css("span a"):
if bot_words not in response.css("span a"):
print("Parsed info")
print("User: " + user.extract())
steamid = user.css('::attr(href)').extract()[0]
print("Steam ID: " + steamid)
lgen.next()
def parse_accounts(self, response):
for key in response.css("ul.stats"):
print("Value finding function activted.")
value = response.css("span.refined-value::text").extract()
print(value)
预期的输出是:
Parsed info
User: <a href="/profiles/76561198017108***">user</a>
Steam ID: /profiles/76561198017108***
(SOME VALUE)
但是,当前的输出是:
Parsed info
User: <a href="/profiles/76561198017108***">user</a>
Steam ID: /profiles/76561198017108***
Downloading Page...
Parsed info
User: <a href="/profiles/76561198015589***">user</a>
Steam ID: /profiles/76561198015589***
Page successfully downloaded.
2018-06-13 21:42:45 [scrapy.core.scraper] ERROR: Spider error processing <GET file:///Users/max/Documents/promotebot/tutorial/tutorial/TF2ITEMS.htm> (referer: None)
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/twisted/internet/defer.py", line 653, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/Users/max/Documents/promotebot/tutorial/tutorial/spiders/accounts_spider.py", line 32, in parse
lgen.next()
StopIteration
尽管多线程(链接生成器在解析函数再次激活它时下载请求),该函数仍然应该工作(?)
解决方案
我认为你不应该只是打电话lgen.next()
,而是应该像这样产生它,yield lgen.next()
因为lgen
它只是一个生成器并且lgen.next()
只会检索一个scrapy请求,为了scrapy下载它你必须产生这个请求。
推荐阅读
- php - 404 上的 http 请求
- php - 使用 Algolia PHP API 获取 facets 列表
- wpf - 指定元素已经是 2 个标签的另一个元素的逻辑子元素
- python-3.x - 如何查找pyspark数据框的特定列是否包含数值
- rust - Rust 生命周期,数据流入其他引用
- spring-boot - 使用 Eureka 将 Dockerized SpringBoot 微服务部署到 AWS Fargate
- .net - 将 reg 查询输出放入变量中
- grep - 如何在一行上grep重复的字符串?
- visual-studio - VS2019 中的引用路径无法识别
- c# - 根据名称搜索c#从csv文件中获取详细信息