首页 > 解决方案 > 从 yahoo Finance 抓取 python 中的数据

问题描述

我想从 yahoo Finance 中抓取特定符号的数据。

我可以抓取表格格式,但不能抓取非表格格式。我应用相同的原理在同一页面中抓取信息,但没有结果。

到目前为止,我可以从https://finance.yahoo.com/quote/AAPL/profile?p=AAPL

我用来刮表的代码是:

import numpy as np
import pandas as pd

import requests
import lxml
from lxml import html

symbol = 'AAPL'

url = 'https://finance.yahoo.com/quote/' + symbol + '/profile?p=' + symbol

page = requests.get(url)
tree = html.fromstring(page.content)

table = tree.xpath('//table') 

assert len(table) == 1 
tstring = lxml.etree.tostring(table[0], method='html')
df = pd.read_html(tstring)[0]

df

我想刮右边的桌子

Sector: Consumer Goods
Industry: Electronic Equipment
Full Time Employees: 137,000

如果您能帮助获取信息或提供一些提示和建议,我将不胜感激。

标签: pythonyahoo-finance

解决方案


您可以使用以下兄弟姐妹

import requests
from lxml import html

xp = "//span[text()='Sector']/following-sibling::span[1]"

symbol = 'AAPL'

url = 'https://finance.yahoo.com/quote/' + symbol + '/profile?p=' + symbol

page = requests.get(url)
tree = html.fromstring(page.content)

d = {}
for label in ['Sector', 'Industry', 'Full Time Employees']:
    xp = f"//span[text()='{label}']/following-sibling::span[1]"
    s = tree.xpath(xp)[0]
    d[label] = s.text_content()


print(d['Full Time Employees'])
print(d['Industry'])
print(d['Sector'])

推荐阅读