python - 无法获得连接到网页中某个选项卡的选择性名称
问题描述
我使用 requests 模块和 BeautifulSoup 库在 python 中编写了一个脚本,以Browse Our Offices
从网站上获取此标题下不同人的姓名。问题是当我运行我的脚本时,它会获得自动填充的随机名称,这意味着无需选择任何选项卡。
访问该页面时,您可以看到这些选项卡如下图所示:
我想做出如下图所示的选择。为了更清楚 - 我想选择United states
选项卡,然后选择每个选项卡states
来解析连接到它们的名称。就是这样。
我试过:
import requests
from bs4 import BeautifulSoup
link = "https://www.schooleymitchell.com/offices/"
res = requests.get(link)
soup = BeautifulSoup(res.text,"lxml")
for item in soup.select("#consultant_info > strong"):
print(item.text)
上面的脚本生成随机名称,但我希望将名称连接到United States
选项卡。
如何在不使用硒的情况下在选择时填充所有名称United States
以及不同的选项卡?states
解决方案
重要数据在<div>
标签中id="office_box"
。您只对内部以 .<div>
结尾的顾问感兴趣-usa
。第一列包含名称、第二个城市和州:
import re
import requests
from bs4 import BeautifulSoup
url = 'https://www.schooleymitchell.com/offices/'
soup = BeautifulSoup(requests.get(url).text, 'lxml')
for div in soup.select('#office_box div[id*="-usa"] div.consultant_info_container'):
for a in div.select('a'):
a.extract()
info = div.get_text(separator=" ").strip()
info = re.split(r'\s{2}', info)
for data in info:
print('{: ^45}'.format(data), end='|')
print()
印刷:
Steven Bremer | Gadsden, Alabama | Voice: 256-328-2485 |
David George | Montgomery, Alabama | Voice: 334-649-7535 |
Zachary G. Madrigal, MBA | Phoenix, Arizona | Voice: 602-677-7804 |
Richard E. Perraut Jr. | Phoenix, Arizona | Voice: 480-659-3831 |
Stephen Moore B.A. | Scottsdale, Arizona | Voice: 480-354-3423 Toll-Free: 866-213-5141 |
Danny Caballes | Tempe, Arizona | Voice: 480-592-0776 |
Brian Lutz | Tucson, Arizona | Voice: 520-447-7921 Toll-Free: 888-633-1451 |
Travis McElroy | Bakersfield, California | Voice: 800-361-4578 |
Matt Denburg | Orange County, California | Affiliated Office | Bottomline Consulting Group, Inc.| Voice: 714-482-6025 |
Pete Craigmile | San Diego, California | Voice: |
Greg Lowry | San Francisco, California | Affiliated Office | DBA Lowry Telecom Consultant| Voice: 415-692-0708 Ext 1 |
Dave Tankersley | Colorado Springs, Colorado | Voice: 719-266-1098 |
Sanjay Tyagi | Denver, Colorado | Voice: 303-317-3110 |
Richard Ray | Highlands Ranch, Colorado | Voice: 303-306-8568 |
Richard Norlin | Highlands Ranch, Colorado | Voice: 612-309-5451 |
Dave Dellacato | Bridgeport, Connecticut | Voice: 203-442-1311 |
Patrick Delehanty | Brookfield, Connecticut | Voice: 475-289-2325 |
Greg Wisz | Fairfield County, Connecticut | Voice: 616-884-0058 |
Jack McCullough | Fairfield County, Connecticut | Voice: 203-767-5551 |
Matthew McCarthy | Hartford, Connecticut | Voice: 203-304-9886 |
Paul Nelson BS CHE, MBA | Hartford, Connecticut | Voice: 860-926-4260 |
...and so on.
推荐阅读
- python - Python Dataframe Merge:字符串大写问题
- swift3 - 如何从快速枚举数组中删除一个值
- angularjs - 在第一个下拉列表中进行选择之前,如何使底部的两个下拉列表不可见?
- c# - 依赖注入:在 ASP.NET Core 中设置和共享作用域服务的属性
- javascript - 在另一个 div 中移动一个 div
- python - 如何使用Django中的默认连接从选择查询中获取键(列名)-值对
- c# - 无法打开登录请求的数据库“tda”。登录失败。用户 'tda' 登录失败
- android - 沉浸式模式在 NDK 中引发异常,即使它是从主线程调用的
- flutter - 如何在 Flutter 中制作“显示”文本动画?
- ios - 推送通知 iOS 12 Xcode 10