python - Pandas read_html 产生带有元组列名的空 df
问题描述
我想检索以下网站上的表格并将它们存储在熊猫数据框中:https ://www.acf.hhs.gov/orr/resource/ffy-2012-13-state-of-colorado-orr-funded-程式
但是,页面上的第三个表返回一个空数据框,其中所有表的数据都存储在元组中作为列标题:
Empty DataFrame
Columns: [(Service Providers, State of Colorado), (Cuban - Haitian Program, $0), (Refugee Preventive Health Program, $150,000.00), (Refugee School Impact, $450,000), (Services to Older Refugees Program, $0), (Targeted Assistance - Discretionary, $0), (Total FY, $600,000)]
Index: []
有没有办法将元组标头“展平”为标头+值,然后将其附加到由所有四个表组成的数据帧中?我的代码在下面——它已经在其他类似的页面上工作过,但由于该表的格式而不断中断。谢谢!
funds_df = pd.DataFrame()
url = 'https://www.acf.hhs.gov/programs/orr/resource/ffy-2011-12-state-of-colorado-orr-funded-programs'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
year = url.split('ffy-')[1].split('-orr')[0]
tables = page.content
df_list = pd.read_html(tables)
for df in df_list:
df['URL'] = url
df['YEAR'] = year
funds_df = funds_df.append(df)
解决方案
- 对于这个站点,不需要
beautifulsoup
或requests
pandas.read_html
在 URL上DataFrames
为每个创建一个列表。<table>
import pandas as pd
url = 'https://www.acf.hhs.gov/orr/resource/ffy-2012-13-state-of-colorado-orr-funded-programs'
# read the url
dfl = pd.read_html(url)
# see each dataframe in the list; there are 4 in this case
for i, d in enumerate(dfl):
print(i)
display(d) # display worker in Jupyter, otherwise use print
print('\n')
dfl[0]
Service Providers Cash and Medical Assistance* Refugee Social Services Program Targeted Assistance Program TOTAL
0 State of Colorado $7,140,000 $1,896,854 $503,424 $9,540,278
dfl[1]
WF-CMA 2 RSS TAG-F CMA Mandatory 3 TOTAL
0 $3,309,953 $1,896,854 $503,424 $7,140,000 $9,540,278
dfl[2]
Service Providers Refugee School Impact Targeted Assistance - Discretionary Services to Older Refugees Program Refugee Preventive Health Program Cuban - Haitian Program Total
0 State of Colorado $430,000 $0 $100,000 $150,000 $0 $680,000
dfl[3]
Volag Affiliate Name Projected ORR MG Funding Director
0 CWS Ecumenical Refugee & Immigration Services $127,600 Ferdi Mevlani 1600 Downing St., Suite 400 Denver, CO 80218 303-860-0128
1 ECDC ECDC African Community Center $308,000 Jennifer Guddiche 5250 Leetsdale Drive Denver, CO 80246 303-399-4500
2 EMM Ecumenical Refugee Services $191,400 Ferdi Mevlani 1600 Downing St., Suite 400 Denver, CO 80218 303-860-0128
3 LIRS Lutheran Family Services Rocky Mountains $121,000 Floyd Preston 132 E Las Animas Colorado Springs, CO 80903 719-314-0223
4 LIRS Lutheran Family Services Rocky Mountains $365,200 James Horan 1600 Downing Street, Suite 600 Denver, CO 80218 303-980-5400
推荐阅读
- c# - 是否可以在 c# 中使用 VS Shortcut 作为事件
- r - 是什么决定了ggplot“标准化父坐标”中的父级?
- node.js - 使用弹性转码器修改分辨率后如何获取 s3 视频对象的大小?
- c# - 如何在没有等待响应的情况下通过单个 TCP 连接发送多个请求?C#
- asp.net - 如何在 IISExpress 中将 ASP.NET Core Web 应用程序作为子应用程序托管到 .NET Framework 应用程序
- wordpress - 加载我的数据库的帖子表中的额外元字段
- python - django-environ - 在 .env 中管理 LDAP DN
- c - 将 uint8_t 数组(十六进制)转换为 C 中的字符串
- c - 我们可以在 C 中声明指针函数还是别的什么?
- pdf - KYC 后如何使用高级电子签名签署文件