首页 > 解决方案 > 错误:请求和 lxml 库在网络抓取中返回空括号

问题描述

我在使用 Requests 和 lxml 库在 Python 中进行网络抓取时遇到问题。

我需要从网站上以黄色捕获信息(http://www.b3.com.br/pt_br/market-data-e-indices/indices/indices-amplos/indice-ibovespa-ibovespa-composicao-da- carteira.htm)。但是,这会返回:[]

请问,有人可以帮我吗?

发送下面的代码

from lxml import html
import requests
 
page = requests.get('http://www.b3.com.br/pt_br/market-data-e-indices/indices/indices-amplos/indice-ibovespa-ibovespa-composicao-da-carteira.htm')
tree = html.fromstring(page.content)
 
cod = tree.xpath('//*[@id="divContainerIframeB3"]/div/div[1]/form/div[2]/div/table/tbody/tr[1]/td[1]')
 
print('The code is : ', cod)

退货图片: 在此处输入图像描述

检查浏览器: 在此处输入图像描述

标签: pythonweb-scrapingpython-requestslxml

解决方案


数据是通过 Javascript 从外部源加载的。您可以使用此脚本加载 Json 数据:

import json
import base64
import requests


api_url = "https://sistemaswebb3-listados.b3.com.br/indexProxy/indexCall/GetPortfolioDay/{encoded_string}"

page = 1
index = "IBOV"

s = {
    "language": "pt-br",
    "pageNumber": page,
    "pageSize": 20,
    "index": index,
    "segment": "1",
}

encoded_string = base64.b64encode(str(s).encode("utf-8")).decode("utf-8")

data = requests.get(
    api_url.format(encoded_string=encoded_string),
    verify=False,
).json()

# uncomment this to get all data:
# print(json.dumps(data, indent=4))

for result in data["results"]:
    print(
        "{:<8} {:<15} {:15}".format(
            result["cod"], result["asset"], result["theoricalQty"]
        )
    )

印刷:

ABEV3    AMBEV S/A       4.355.174.839  
ASAI3    ASSAI           157.635.935    
AZUL4    AZUL            327.283.207    
BTOW3    B2W DIGITAL     201.549.295    
B3SA3    B3              1.930.877.944  
BBSE3    BBSEGURIDADE    671.584.841    
BRML3    BR MALLS PAR    843.728.684    
BBDC3    BRADESCO        1.261.986.269  
BBDC4    BRADESCO        4.687.814.597  
BRAP4    BRADESPAR       222.075.664    
BBAS3    BRASIL          1.283.197.221  
BRKM5    BRASKEM         264.640.575    
BRFS3    BRF SA          811.759.800    
BPAC11   BTGP BANCO      263.871.572    
CRFB3    CARREFOUR BR    391.758.726    
CCRO3    CCR SA          1.115.695.556  
CMIG4    CEMIG           969.723.092    
HGTX3    CIA HERING      126.186.408    
CIEL3    CIELO           1.112.196.638  
COGN3    COGNA ON        1.847.994.874  

推荐阅读