pandas - How to use pd.DataFrame method to manually create a dataframe from info scraped using beautifulsoup4
问题描述
I made it to the point where all tr
data data has been scraped and I am able to get a nice printout. But when I go to implement the pd.DataFrame
as in df= pd.DataFrame({"A": a})
etc, I get a syntax error
Here is a list of my imported libraries in the Jupyter Notebook:
import pandas as pd
import numpy as np
import bs4 as bs
import requests
import urllib.request
import csv
import html5lib
from pandas.io.html import read_html
import re
Here is my code:
source = urllib.request.urlopen('https://www.zipcodestogo.com/Texas/').read()
soup = bs.BeautifulSoup(source,'html.parser')
table_rows = soup.find_all('tr')
table_rows
for tr in table_rows:
td = tr.find_all('td')
row = [i.text for i in td]
print(row)
texas_info = pd.DataFrame({
"title": Texas
"Zip Code" : [Zip Code],
"City" :[City],
})
texas_info.head()
I expect to get a dataframe with two columns, one being the 'Zip Code' and the other the 'Cities'
解决方案
尝试创建 DataFrame 并执行for
循环以将表中的每一行附加到 DataFrame 中。
df = pd.DataFrame()
for tr in table_rows:
td = tr.find_all('td')
row = [i.text for i in td]
print(row)
zipCode = row[0] # assuming first column
city = row[1] # assuming second column
df = df.append({"Zip Code": zipCode, "City" : city}, ignore_index=True)
如果您只需要这两列,则不应包含title
在 DataFrame 中(这将创建另一列);由于缺少逗号,该行也恰好是发生语法错误的地方。
推荐阅读
- javascript - 将 axios 与 React 和 Redux 一起使用
- sql - 使用 Distinct Query 连接三个表 - SQL
- swagger-ui - 在 IE 上尝试迭代 SpringDoc Open API Swagger 的不可迭代实例的尝试无效
- amazon-mws - 有没有办法使用亚马逊MWS API下载亚马逊的商业报告?
- numpy - 我收到 ValueError: could not broadcast input array from shape (3072) into shape (5000)
- javascript - 在 nodejs 和 ejs 中删除请求
- python - 通过在 URL 中传递凭据或使用无密码功能访问 Jupyter Notebook
- regex - 如何使用崇高文本中的正则表达式删除单词之间的空格?
- python - python- 在 python gui 中打开纯 pdf 页面,而不在图像中转换它们
- r - 使用 Shiny 循环中的动态图渲染动态选项卡