首页 > 解决方案 > Web Scraping 时我无法获取所有数据

问题描述

我正在尝试通过网络抓取此 URL = https://www.ventanillaunicaenfermeria.es/BuscarColegiados.php。我需要收集“N°cole”的值。列和“Nombre Colegiado”列。

我正在使用 BeautifulSoup,但我只得到“N°cole”的值。柱子。我该如何解决?

谢谢!

这是我的代码:

from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

page = requests.get('https://www.ventanillaunicaenfermeria.es/BuscarColegiados.php')
soup = BeautifulSoup(page.text, 'html.parser')
data = soup.find_all("span",{'class':'colColegiado'})
numero_col = []
for i in data:
    data_num = i.text.strip()
    numero_col.append(data_num)
numero_col
['Nº cole.',
 '6478',
 '13107',
 '7341',
 '12110',
 '5625',
 '4877',
 '4700',
 '9126',
 '8444',
 '13120',
 '5023',
 '12235',
 '7747',
 '17701',
 '17391',
 '17944',
 '17772',
 '7230',
 '11729',
 '17275']

标签: pythonweb-scraping

解决方案


您当前正在从错误的 html 元素中获取值 - 它应该来自该类的所有<p>s resalto

import requests
from bs4 import BeautifulSoup
#import pandas as pd
#import numpy as np

page = requests.get('https://www.ventanillaunicaenfermeria.es/BuscarColegiados.php')
soup = BeautifulSoup(page.text, 'html.parser')
data = soup.find_all("p",{'class':'resalto'})
schools = []
for result in data:
    data_num = result.contents[0].text.strip()
    #numero_col.append(data_num)
    data_name = str(result.contents[1])
    schools.append((data_num,data_name))
print(schools)

推荐阅读