beautifulsoup - 使用 BeautifulSoup 抓取论坛并以表格形式显示
问题描述
如何编码 BeautifulSoup 以表格格式显示结果?像这样的东西:
Topic | Views | Replies
---------------------------------------
XPS 7590 problems | 557 | 8
SSD not working | 76 | 3
我的代码是:
import requests, re
from bs4 import BeautifulSoup
import pandas as pd
r = requests.get("https://www.dell.com/community/XPS/bd-p/XPS")
soup = BeautifulSoup(r.content)
g_data = soup.find_all("div", {"class": "lia-component-messages-column-thread-info"})
for item in g_data:
print (item.find_all("h2", {"class": "message-subject"})[0].text)
print (item.find_all("span", {"class": "lia-message-stats-count"})[0].text) #replies
print (item.find_all("span", {"class": "lia-message-stats-count"})[1].text) #views
解决方案
只需通过初始化一个空的数据框并将每个“行”附加到其中来构造一个数据框:
import requests, re
from bs4 import BeautifulSoup
import pandas as pd
r = requests.get("https://www.dell.com/community/XPS/bd-p/XPS")
soup = BeautifulSoup(r.content)
g_data = soup.find_all("div", {"class": "lia-component-messages-column-thread-info"})
df = pd.DataFrame()
for item in g_data:
topic = item.find_all("h2", {"class": "message-subject"})[0].text.strip()
replies = item.find_all("span", {"class": "lia-message-stats-count"})[0].text.strip() #replies
views = item.find_all("span", {"class": "lia-message-stats-count"})[1].text.strip() #views
df = df.append(pd.DataFrame([[topic, views, replies]], columns=['Topic','Views','Replies']), sort=False).reset_index(drop=True)
输出:
print (df)
Topic Views Replies
0 FAQ Modern Standby 1057 0
1 FAQ XPS Laptops 4315 0
2 Where is the Precision Laptops Forum board? 624 0
3 XPS 15-9570, color banding issue 5880 192
4 XPS 7590 problems.. 565 9
5 XPS 13 7390 2-in-1 Display and Touchscreen issues 17 2
6 Dell XPS 9570 I7-8750H video display issues 9 0
7 XPS 9360 Fn lock for PgUp PgDn 12 0
8 Dell XPS DPC Latency Fix 1724 4
9 XPS 13 7390 2-in-1, Realtek drivers lead to fr... 253 11
10 XPS 12 9q23 Touch screen firmware update fix 36 1
11 Dell XPS 15 9570 when HDMI plugged in, screen ... 17 0
12 XPS 13 7390 2 in 1 bluetooth keyboard and mous... 259 10
13 xps15 7590 wifi problem 46 1
14 Unable to update Windows from 1803 to 1909 - X... 52 5
15 Dell XPS 9300 - Thunderbolt 3 Power Delivery I... 28 0
16 Dell XPS 15 9560, right arrow key or right of ... 26 0
17 XPS 13 2020 (9300) Ubuntu sudden shut down 24 0
18 Dell XPS 15 9750 won’t login 26 0
19 XPS 13 9360 Windows Hello Face - reconfigurati... 29 2
20 Enclosure for Dell XPS 13 9360 512 GB pcie nvm... 181 7
21 XPS 13 7390 Firmware 1.3.1 Issue - Bluetooth /... 119 2
22 SSD Onboard? 77 3
23 XPS 13 9350 only turns on when charger connected 4090 11
24 Integrated webcam not working 45 1
25 Docking station for XPS 15 9570, Dell TB16 not... 53 4
26 Dell XPS 13 9370 34 1
27 XPS 13 9380 overheat while charging 602 3
28 DELL XPS 13 (9300) REALTEK AUDIO DRIVER PROBLEM 214 2
29 XPS 15 9570 freezing Windows 10 222 6
30 XPS 13 (9300) - Speaker Vibration 40 2
31 Dell XPS 15 9570 Fingerprint reader not workin... 158 2
32 XPS 9570 Intel 9260 No Bluetooth 34 0
推荐阅读
- r - R - ggplot 将二进制标志变量 (0/1) 随时间的分布显示为标准化条形图 (%)
- python - View.py 不会得到 ajax 但我可以通过直接在 URL 中传递 JSON 来更新数据库,Django
- python - Flask SQLAlchemy 无法在 venv 中设置属性错误
- javascript - 如何使用 socket.io JavaScript 库在 Angular 和 Express.js 之间建立单一连接
- powerbi - 自动导出 PowerBi 可视化数据?
- pytorch - 具有不同时间步长的 PyTorch LSTM
- multithreading - Flutter 中哪里可以定义 Isolate?如何在 Flutter 中启动线程?
- arrays - 使用嵌套循环多次重复一个函数
- android - AVD 的模拟器进程被杀死
- swift - 发送图像后如何修复 UI 聊天表格视图