首页 > 解决方案 > 从 HTML 中获取特定字符串以进行网络抓取

问题描述

我正在尝试获取网站中超链接的股票名称。为了重现性:

import requests
from bs4 import BeautifulSoup

URL = 'https://seekingalpha.com/news/3592559-nvax-nbl-among-premarket-gainers'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='bullets_ul')

stock_elems = results.find_all('span', class_='ticker-hover-wrapper')

在此处输入图像描述

我正在尝试在列表中添加下划线的名称。

我尝试了以下代码的一些变体但没有成功:

for stock_elem in stock_elems:
    stock_name = stock_elem.find('href', class_='*')
    print(symbol_name.text.strip())

任何帮助将不胜感激。

标签: pythonhtmlweb-scrapingbeautifulsoup

解决方案


尝试这个:

import requests
from bs4 import BeautifulSoup

URL = 'https://seekingalpha.com/news/3592559-nvax-nbl-among-premarket-gainers'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='bullets_ul')

stock_elems = results.find_all('span', class_='ticker-hover-wrapper')
ls=[i.find('a').text for i in stock_elems]

输出:

ls
['DPW',
 'IMRN',
 'BTAI',
 'SONN',
 'VOLT',
 'IBIO',
 'AIKI',
 'DGLY',
 'IDRA',
 'HTBX',
 'JOB',
 'NAK',
 'VBIV',
 'NBL',
 'OGEN',
 'ANVS',
 'XBIO',
 'BNTX',
 'CKPT',
 'FIXX',
 'FLDM',
 'PDSB',
 'CFRX',
 'MVIS',
 'NVAX']

推荐阅读