首页 > 解决方案 > 网页抓取问题。我正在使用美丽的汤和蟒蛇

问题描述

我需要此代码中的标题、地址、电话号码和描述。到目前为止,我已经做到了。现在我被卡住了,请帮助新的网络抓取

from IPython.core.display import display, HTML

display(HTML("<style>.container { width:100% !important; }</style>"))

from bs4 import BeautifulSoup as soup

import urllib.request

import pandas as pd

withurllib.request.urlopen("http://buildingcongress.org/list/category/architects-6") as url:

s = url.read()

page_soup = soup(s, 'html.parser')

listings = []

for rows in page_soup.find_all("div"):

    if ("mn-list-item-odd" in rows["mn-listing mn-nonsponsor mn-search-result-priority-highlight-30"]) or ("mn-list-item-even" in rows["mn-listing mn-nonsponsor mn-search-result-priority-highlight-30"]):

        name = rows.find("div", class_="mn-title").a.get_text()
   

我的 for 循环出现错误。我卡住了,请帮忙

标签: pythonhtmlweb-scrapingbeautifulsoup

解决方案


使用正则表达式搜索类然后迭代。

import re
import requests
from bs4 import BeautifulSoup

url = "http://buildingcongress.org/list/category/architects-6"

res = requests.get(url)
soup = BeautifulSoup(res.text,"lxml")
for rows in soup.find_all('div',class_=re.compile('mn-list-item-odd|mn-list-item-even')):
    name = rows.find("div", class_="mn-title").find('a').text
    print(name)

推荐阅读