首页 > 解决方案 > 使用美汤时无法获取特定标签

问题描述

当我想获取问题的文本部分时,我想从堆栈溢出网站中提取信息:

import requests
from bs4 import BeautifulSoup
response=requests.get("https://stackoverflow.com/")
soup=BeautifulSoup(response.text,"html.parser",multi_valued_attributes=None)

for tag in soup.find_all('a',class_='question-hyperlink'):
    print(tag)

这根本没有输出。我认为过滤类时存在一些问题,但我不确定它是什么。

这个工作正常:

import requests  
from bs4 import BeautifulSoup
response=requests.get("https://stackoverflow.com/questions")
soup=BeautifulSoup(response.text,"html.parser")
question=soup.select(".question-summary")

for a in question:
    print(a.select_one(".question-hyperlink").getText())

但是前一个有什么问题?

标签: web-scrapingbeautifulsouphtml-parsing

解决方案


questions在第一个代码片段的这一行的 url 中缺少:

response=requests.get("https://stackoverflow.com/")

这工作正常:

import requests
from bs4 import BeautifulSoup

response = requests.get("https://stackoverflow.com/questions")
soup = BeautifulSoup(response.text, "html.parser")

for tag in soup.find_all('a', class_='question-hyperlink'):
    print(tag.getText(strip=True))

输出:

Pass a json object in function as a variable
iPhone Application Development in Windows 10 Platform
Jetty Websocket API Session
Exit from a multiprocessing Pool for loop using apply_async and terminate
bootstrap 5 grid layout col-md-6 not working correctly
R comparison (1) is possible only for atomic and list types
NeutralinoJS: error: missing required argument 'name'
Formatting text editor with Elementor

and so on ...

否则,就没有锚标签的此类。


推荐阅读