python - scrapy xpath 选择器不返回
问题描述
我正在尝试重写我使用 requests-html 库创建的代码。由于项目需要额外的功能,我现在正在使用scrapy。
我很难让 scrapy/splash spider 获取 xpaths。每次我运行代码时,我都一无所获。
使用 requests-html,xpaths 返回所需的数据。
请求-html代码:
from requests_html import AsyncHTMLSession, HTMLSession
asession = AsyncHTMLSession()
async def get_page():
code = 'NASDAQ-MDB'
r = await asession.get(f'https://www.tradingview.com/symbols/{code}/')
await r.html.arender(wait=4)
return r
results = asession.run(get_page)
for result in results:
enterprise_value_sel = "(//span[@class='tv-widget-fundamentals__value apply-overflow-tooltip'])[2]"
total_shares_outstanding_sel = "(//span[@class='tv-widget-fundamentals__value apply-overflow-tooltip'])[4]"
enterprise_value = result.html.xpath(enterprise_value_sel, first=True).text
total_shares_outstanding = result.html.xpath(total_shares_outstanding_sel, first=True).text
print(enterprise_value, total_shares_outstanding)
scrapy_splash_code:
import scrapy
from scrapy_splash import SplashRequest
import json
from tradingview.items import *
import datetime
import os
class TradingviewsigsSpider(scrapy.Spider):
name ='tradingviewsigs'
script = """
function main(splash, args)
assert(splash:go(args.url))
assert(splash:wait(5.5))
local scroll_to =splash:jsfunc("window.scrollTo")
scroll_to(0, 800)
return {
html =splash:html(),
png =splash:png(),
har =splash:har(),
}
end
"""
start_urls =['https://tradingview.com/symbols/NASDAQ-MDB/']
def start_requests(self):
for url in self.start_urls:
yield SplashRequest(url=url,
callback=self.parse,
endpoint='execute',
args={'lua_source': self.script})
def parse(self, response):
url = response.url
print('Crawling: < {} >'.format(url))
financials = TradingviewItem()
financials['enterprise_val_sel'] = response.xpath("(//span[@class='tv-widget-fundamentals__value apply-overflow-tooltip'])[2]/text()").extract_first()
financials['total_shares_outstanding_sel'] = response.xpath("(//span[@class='tv-widget-fundamentals__value apply-overflow-tooltip'])[4]/text()").extract_first()
yield financials
我需要做什么才能使 xpaths 与 scrapy 兼容?
解决方案
您需要相应地修复您的 XPath(使用()
并[position]
选择您需要的内容):
(//span[@class='tv-widget-fundamentals__value apply-overflow-tooltip'])[2]
(//span[@class='tv-widget-fundamentals__value apply-overflow-tooltip'])[4]
输出:9.334B - 57.566M
推荐阅读
- python - 如何将键盘输入变成函数以减少混乱
- swift - 在 SwiftUI 中的 Picker 之外使用 Picker 选择值
- line - Pine脚本从if语句中删除旧行?
- javascript - 如何从 HTML 中选择要在 Javascript 中使用的特定用户输入
- c# - 动作中所有参数的 BindRequired 属性
- java - 使用HMS AREngine做图像跟踪,一直报ACameraMetadata错误
- javascript - 物体卡在地板上
- swiftui - 列表内的 SWIftUI anchorPreference
- regex - 否定的正则表达式在 RewriteRule 中不起作用
- flutter - Flutter StatefulWidget 没有小部件属性