python - 使用 requests 和 BeautifulSoup 解析 leetcode 问题内容
问题描述
我正在尝试解析 Leetcode 上面试问题的内容。
例如,在https://leetcode.com/problems/two-sum/上,
我想得到
Given an array of integers, return indices of the two numbers such that they add up to a specific target.
You may assume that each input would have exactly one solution, and you may not use the same element twice.
这似乎并不难。我使用 requests 和 BeautifulSoup 来做到这一点:
url = 'https://leetcode.com/graphql/two-sum'
try:
page = requests.get(url)
except (requests.exceptions.ReadTimeout,requests.exceptions.ConnectTimeout):
print('time out')
return 'time out'
soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify())
但是,正如您通过开发者控制台(F12)在页面上的页面响应中看到的那样,响应不包括页面上显示的内容。
有没有办法获取这些内容?
解决方案
你不需要硒。该页面对动态内容执行 POST 请求。基本上,将 MySql 查询发送到后端数据库。因此,执行以下操作要快得多:
import requests
from bs4 import BeautifulSoup as bs
data = {"operationName":"questionData","variables":{"titleSlug":"two-sum"},"query":"query questionData($titleSlug: String!) {\n question(titleSlug: $titleSlug) {\n questionId\n questionFrontendId\n boundTopicId\n title\n titleSlug\n content\n translatedTitle\n translatedContent\n isPaidOnly\n difficulty\n likes\n dislikes\n isLiked\n similarQuestions\n contributors {\n username\n profileUrl\n avatarUrl\n __typename\n }\n langToValidPlayground\n topicTags {\n name\n slug\n translatedName\n __typename\n }\n companyTagStats\n codeSnippets {\n lang\n langSlug\n code\n __typename\n }\n stats\n hints\n solution {\n id\n canSeeDetail\n __typename\n }\n status\n sampleTestCase\n metaData\n judgerAvailable\n judgeType\n mysqlSchemas\n enableRunCode\n enableTestMode\n envInfo\n libraryUrl\n __typename\n }\n}\n"}
r = requests.post('https://leetcode.com/graphql', json = data).json()
soup = bs(r['data']['question']['content'], 'lxml')
title = r['data']['question']['title']
question = soup.get_text().replace('\n',' ')
print(title, '\n', question)
推荐阅读
- jquery - 用jquery进行数字匹配游戏。第二个选择会影响上一个选择
- react-testing-library - 查询被查询元素的子元素
- java - 如何为 Null 列表或目标返回 204 而不是 200 []
- sql - 将两行合并为一个 Postgresql
- c++ - 使用 std::enable::if 和 std::is_base_of 来约束继承
- mysql - 2个表的MySQL连接查询:返回不正确
- angular - 等待所有可观察对象完成(按顺序 + 父子关系)
- wordpress - 如何在前端也为 webp 版本(无插件)输出缩略图的 html?
- css - 如何使悬停持续更长时间
- c++ - 时间:2019-01-10 标签:c++rapidjsongetadd value to existing array