python - 在 Beautiful Soup 中使用 find 函数时返回 NoneType 值
问题描述
我正在使用 Beautiful Soup 从网站中提取表格。find 函数返回一个 NoneType 值,我不知道如何继续将所有表提取到 pandas DataFrames。
import pandas as pd
import datetime as dt
import pandas_datareader as web
import matplotlib.pyplot as plt
from matplotlib import style
import matplotlib.ticker as ticker
from bs4 import BeautifulSoup
import requests
url='https://www.federalreserve.gov/monetarypolicy/bst_recenttrends_accessible.htm'
html_content=requests.get(url).content
soup = BeautifulSoup(html_content, "html.parser")
get_table = soup.find("table", class_='pubtables')
get_table_data = get_table.find_all("tr")
print(type(get_table_data))
解决方案
您在表格中看到的数据是从外部 URL 加载的。您可以使用此示例将表加载到各种 DataFrame:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = 'https://www.federalreserve.gov/data.xml'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
for chart in soup.select('chart'):
series = {}
index = []
for s in chart.select('series'):
series[s['description']] = []
temp_index = []
for o in s.select('observation'):
temp_index.append(o['index'])
series[s['description']].append(o['value'])
if len(temp_index) > len(index):
index = temp_index
series['index'] = index
max_len = len(max(series.values(), key=len))
for k in series:
series[k] = series[k] + ['No Data'] * (max_len - len(series[k]))
df = pd.DataFrame(series).set_index('index')
print(df)
print('-' * 80)
印刷:
Total Assets
index
1-Aug-07 870261.00
8-Aug-07 865453.00
15-Aug-07 864931.00
22-Aug-07 862775.00
29-Aug-07 872873.00
... ...
12-Aug-20 6957277.00
19-Aug-20 7010637.00
26-Aug-20 6990418.00
2-Sep-20 7017492.00
9-Sep-20 7010614.00
[685 rows x 1 columns]
--------------------------------------------------------------------------------
Total Assets ... Support for Specific Institutions**
index ...
1-Aug-07 870261.00 ... 0
8-Aug-07 865453.00 ... 0
15-Aug-07 864931.00 ... 0
22-Aug-07 862775.00 ... 0
29-Aug-07 872873.00 ... 0
... ... ... ...
12-Aug-20 6957277.00 ... No Data
19-Aug-20 7010637.00 ... No Data
26-Aug-20 6990418.00 ... No Data
2-Sep-20 7017492.00 ... No Data
9-Sep-20 7010614.00 ... No Data
[685 rows x 4 columns]
--------------------------------------------------------------------------------
All Liquidity Facilities* ... Term Asset-Backed Securities Loan Facility
index ...
1-Aug-07 235.00 ... 0
8-Aug-07 255.00 ... 0
15-Aug-07 264.00 ... 0
22-Aug-07 2262.00 ... 0
29-Aug-07 1358.00 ... 0
... ... ... ...
12-Aug-20 116308.00 ... 1619.00
19-Aug-20 112435.00 ... 2266.00
26-Aug-20 107342.00 ... 2256.00
2-Sep-20 103978.00 ... 2639.00
9-Sep-20 85581.00 ... 2639.00
[685 rows x 5 columns]
--------------------------------------------------------------------------------
Total Support to AIG*** ... Maiden Lane II LLC Maiden Lane III LLC
index ...
1-Aug-07 0 0 ... 0 0
8-Aug-07 0 0 ... 0 0
15-Aug-07 0 0 ... 0 0
22-Aug-07 0 0 ... 0 0
29-Aug-07 0 0 ... 0 0
... ... ... ... ... ...
12-Feb-20 0 No Data ... 0 0
19-Feb-20 0 No Data ... 0 0
26-Feb-20 0 No Data ... 0 0
4-Mar-20 0 No Data ... 0 0
11-Mar-20 0 No Data ... 0 0
[659 rows x 5 columns]
--------------------------------------------------------------------------------
Currency in Circulation ... Treasury Balance
index ...
1-Aug-07 814159.00 ... 4769.00
8-Aug-07 814587.00 ... 4670.00
15-Aug-07 813042.00 ... 5109.00
22-Aug-07 811795.00 ... 5329.00
29-Aug-07 812431.00 ... 4924.00
... ... ... ...
12-Aug-20 2006160.00 ... 1635143.00
19-Aug-20 2009610.00 ... 1636393.00
26-Aug-20 2013933.00 ... 1607449.00
2-Sep-20 2021810.00 ... 1651823.00
9-Sep-20 2030151.00 ... 1570533.00
[685 rows x 3 columns]
--------------------------------------------------------------------------------
推荐阅读
- laravel - Laravel 数据透视表批量插入多个字段
- oracle - 使用 oracle toad 获取 Oracle 查看代码信息
- function - 在 Go 应用程序中找不到接收器函数
- flutter - PlacePicker 谷歌位置
- react-native - 我们如何使用 react-native-gifted-chat 库实现群聊功能
- async-await - 未捕获的 SyntaxError:await 仅在异步函数和模块的顶层主体中有效
- java - Freemarker 不在邮件模板上显示图像
- selenium - 如何使用 Selenium 和 python 单击 devtools 控制台选项卡
- php - 多列的 ORDER BY 不能与 mysql 中的 Join 一起使用
- postgresql - 无法升级实例,因为一个或多个数据库包含声明为 WITH OIDS 的表,目标版本不支持该表