python - Scraping Stackoverflow hyperlinks with Beautifulsoup
问题描述
I'm learning scraping with Beautifulsoup and am using Stackoverflow's interesting questions section ("https://stackoverflow.com/?tab=interesting") for practice.
I want to extract hyperlinks for the top 5 questions that the user has tagged with 'java' AND that has at least one answer (ok if the answer has been accepted but not a requirement).
I've looked at the Beautifulsoup documentation, but I can't get it to come together.
Thanks for any help!
CODE:
from bs4 import BeautifulSoup
from urllib.request import urlopen
html = urlopen("https://stackoverflow.com/?tab=interesting")
content = html.read()
soup = BeautifulSoup(content)
soup.findAll('a',{'class':'question-hyperlink'}, href = True , limit=5) # question link
soup.findAll('div', {'class':'status answered'}, limit=5) # question answer
soup.findAll('a',{'class':'post-tag'}, rel ='tag' , text = 'java', limit=5) # question user tag
DESIRED OUTPUT (as hyperlinks):
https://stackoverflow.com/questions/number/first-question-to-meet-the-criteria
https://stackoverflow.com/questions/number/second-question-to-meet-the-criteria
https://stackoverflow.com/questions/number/third-question-to-meet-the-criteria
https://stackoverflow.com/questions/number/forth-question-to-meet-the-criteria
https://stackoverflow.com/questions/number/fifth-question-to-meet-the-criteria
解决方案
尝试这个:
from bs4 import BeautifulSoup
import requests
html = requests.get("https://stackoverflow.com/?tab=interesting")
soup = BeautifulSoup(html.content)
# find and iterate over all parent divs of questions
for elem in soup.findAll('div',{'class':'question-summary narrow'}):
# get count of answers
answer = elem.find("div", {"class": "mini-counts"})
if answer.text != "0":
# check if question is tagged with "Java"
tags = elem.find("div", {"class": "t-java"})
if tags is not None:
# print link
print(elem.find("a")["href"])
如果您没有得到打印输出,请尝试将标签更改t-python
为例如。
推荐阅读
- java - 用 Junit 测试字符串是否相等,即使它相等也返回 false
- xml - 尝试计算按父元素名称和索引值过滤的命名 XML 元素
- linux - 如何用 bash 中的前导零替换文件名中的数字?
- c - 如何用c中的字符串中的“\””替换'“'?
- php - 如何更新laravel中的一对多关系?
- swift - 在 Vapor 中循环遍历数组并正确处理未来
- python - 创建一个创建自己的类的对象的方法
- json - 用其他变量实现列表的 Gson 序列化/反序列化类
- javascript - 搜索栏触发但没有更改生效 - React Hooks
- gatling - 无法使用加特林在 If 条件中添加多个参数