python-3.x - Python beautifulSoup:创建和组合列表并删除像 \n 这样的冗余
问题描述
如何将完整列表组合到数据框中。当我打印时,它似乎只打印第一条记录,它还包括 \n 和其他冗余,如 ' 等。
import requests
from requests_html import HTML, HTMLSession
from bs4 import BeautifulSoup
import pandas as pd
import csv
import json
url = 'https://lehighsports.com/sports/mens-soccer/schedule/2018'
lehigh = requests.get(url).text
soup = BeautifulSoup(lehigh,'lxml')
for opp in soup.find_all('div',class_="sidearm-schedule-game-opponent-text"):
opp_list = []
opp_list.append(opp.text)
# print(opp_list)
for conf in soup.find_all('div',class_="sidearm-schedule-game-conference-conference"):
conf_list = []
conf_list.append(conf.text)
# print(conf_list)
dict = {'opponent':[opp_list],'conference':[conf_list]}
df = pd.DataFrame(dict)
print(df)
解决方案
您正在设置opp_list
并conf_list
在每次迭代中[]
- 仅将它们初始化一次。此外,您不必在字典创建中加上括号{'opponent':opp_list,'conference':conf_list}
要删除空格,您可以使用.get_text()
带有strip=True
和separator=
参数的方法。
例如:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = 'https://lehighsports.com/sports/mens-soccer/schedule/2018'
lehigh = requests.get(url).text
soup = BeautifulSoup(lehigh,'lxml')
opp_list = []
for opp in soup.find_all('div',class_="sidearm-schedule-game-opponent-text"):
opp_list.append(opp.get_text(strip=True, separator=' '))
conf_list = []
for conf in soup.find_all('div',class_="sidearm-schedule-game-conference-conference"):
conf_list.append(conf.get_text(strip=True))
dict = {'opponent':opp_list,'conference':conf_list}
df = pd.DataFrame(dict)
print(df)
印刷:
opponent conference
0 at UConn
1 vs Drexel
2 at George Washington
3 at St. John's
4 vs Binghamton
5 at Rider
6 vs Penn
7 at Army Patriot League*
8 vs Cornell
9 at Boston U Patriot League*
10 vs #20 Colgate Patriot League*
11 vs Navy Patriot League*
12 at Lafayette Patriot League*
13 at Dartmouth
14 vs American Patriot League*
15 at Bucknell Patriot League*
16 at Loyola (Md.) Patriot League*
17 vs Holy Cross Senior Night Patriot League*
18 vs No. 3 Colgate (Semifinals)
推荐阅读
- javascript - 向下滚动时更改固定标题徽标
- flutter - 如何在 Flutter 中使用分享插件
- python - 显示存储在嵌套列表中的项目的最高价格和数量
- javascript - 和正则表达式中的运算符?
- php - Laravel 中的 Fullcalendar 语言环境不起作用
- c# - 如何使用 .Net MVC 5 从两个单独的列中获取数据以显示在 DropDownList 中
- java - 在 Tomcat 6 中部署 Spring Boot 应用程序(传统部署)
- html - 如何在 PDF 中使用 CSS 分页
- python - 如何获取用于调用 google api 的委托凭证对象?
- python - 在我们可以使用列变量进行建模之前,列可接受的方差是多少?