首页 > 解决方案 > 无法获得连接到网页中某个选项卡的选择性名称

问题描述

我使用 requests 模块和 BeautifulSoup 库在 python 中编写了一个脚本,以Browse Our Offices从网站上获取此标题下不同人的姓名。问题是当我运行我的脚本时,它会获得自动填充的随机名称,这意味着无需选择任何选项卡。

网站链接

访问该页面时,您可以看到这些选项卡如下图所示:

在此处输入图像描述

我想做出如下图所示的选择。为了更清楚 - 我想选择United states选项卡,然后选择每个选项卡states来解析连接到它们的名称。就是这样。

在此处输入图像描述

我试过:

import requests
from bs4 import BeautifulSoup

link = "https://www.schooleymitchell.com/offices/"

res = requests.get(link)
soup = BeautifulSoup(res.text,"lxml")
for item in soup.select("#consultant_info > strong"):
    print(item.text)

上面的脚本生成随机名称,但我希望将名称连接到United States选项卡。

如何在不使用硒的情况下在选择时填充所有名称United States以及不同的选项卡?states

标签: pythonpython-3.xweb-scraping

解决方案


重要数据在<div>标签中id="office_box"。您只对内部以 .<div>结尾的顾问感兴趣-usa。第一列包含名称、第二个城市和州:

import re
import requests
from bs4 import BeautifulSoup

url = 'https://www.schooleymitchell.com/offices/'

soup = BeautifulSoup(requests.get(url).text, 'lxml')


for div in soup.select('#office_box div[id*="-usa"] div.consultant_info_container'):
    for a in div.select('a'):
        a.extract()
    info = div.get_text(separator=" ").strip()
    info = re.split(r'\s{2}', info)
    for data in info:
        print('{: ^45}'.format(data), end='|')
    print()

印刷:

        Steven Bremer                |              Gadsden, Alabama               |             Voice: 256-328-2485             |
        David George                 |             Montgomery, Alabama             |             Voice: 334-649-7535             |
  Zachary G. Madrigal, MBA           |              Phoenix, Arizona               |             Voice: 602-677-7804             |
   Richard E. Perraut Jr.            |              Phoenix, Arizona               |             Voice: 480-659-3831             |
     Stephen Moore B.A.              |             Scottsdale, Arizona             | Voice: 480-354-3423 Toll-Free: 866-213-5141 |
       Danny Caballes                |               Tempe, Arizona                |             Voice: 480-592-0776             |
         Brian Lutz                  |               Tucson, Arizona               | Voice: 520-447-7921 Toll-Free: 888-633-1451 |
       Travis McElroy                |           Bakersfield, California           |             Voice: 800-361-4578             |
        Matt Denburg                 |          Orange County, California          | Affiliated Office | Bottomline Consulting Group, Inc.|             Voice: 714-482-6025             |
       Pete Craigmile                |            San Diego, California            |                   Voice:                    |
         Greg Lowry                  |          San Francisco, California          | Affiliated Office | DBA Lowry Telecom Consultant|          Voice: 415-692-0708 Ext 1          |
       Dave Tankersley               |         Colorado Springs, Colorado          |             Voice: 719-266-1098             |
        Sanjay Tyagi                 |              Denver, Colorado               |             Voice: 303-317-3110             |
         Richard Ray                 |          Highlands Ranch, Colorado          |             Voice: 303-306-8568             |
       Richard Norlin                |          Highlands Ranch, Colorado          |             Voice: 612-309-5451             |
       Dave Dellacato                |           Bridgeport, Connecticut           |             Voice: 203-442-1311             |
      Patrick Delehanty              |           Brookfield, Connecticut           |             Voice: 475-289-2325             |
          Greg Wisz                  |        Fairfield County, Connecticut        |             Voice: 616-884-0058             |
       Jack McCullough               |        Fairfield County, Connecticut        |             Voice: 203-767-5551             |
      Matthew McCarthy               |            Hartford, Connecticut            |             Voice: 203-304-9886             |
   Paul Nelson BS CHE, MBA           |            Hartford, Connecticut            |             Voice: 860-926-4260             |

...and so on.

推荐阅读