首页 > 解决方案 > Beautiful Soup 一遍又一遍地返回相同的输出

问题描述

我是网络抓取的新手。我希望刮板返回所有带有关键字“neuro”的段落,但是当我运行代码时,它似乎为所有迭代返回相同的输出。你能指出我的错误吗?

import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import re 

from time import sleep
from random import randint

url = "https://www.findamasters.com/masters-degrees/united-kingdom/?40w900"
results = requests.get(url)
info =[]  
page_number = np.arange(1,1219)
soup = BeautifulSoup(results.text, "html.parser")

for page in page_number:
    page = requests.get("https://www.findamasters.com/masters-degrees/united-kingdom/?40w900&PG=" + str(page))
    div = soup.find("p", string =re.compile('neuro'))

sleep(randint(2,10))

masters = pd.DataFrame({
    'info': div})
masters.to_csv('masters.csv')

但我得到的唯一输出是:

<p>It’s our mission to prolong and improve the lives of patients, and we seek to do this by conducting world-leading research in areas such as neuroscience, oncology, infectious diseases and more.</p>
<p>It’s our mission to prolong and improve the lives of patients, and we seek to do this by conducting world-leading research in areas such as neuroscience, oncology, infectious diseases and more.</p>
....

标签: pythonwebweb-scrapingbeautifulsoup

解决方案


这是你的问题。BeautifulSoup参数results.text和结果来自固定网址“ https://www.findamasters.com/masters-degrees/united-kingdom/?40w900 ”。

因此更改代码如下。

import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import re 

from time import sleep
from random import randint

url = "https://www.findamasters.com/masters-degrees/united-kingdom/?40w900"
results = requests.get(url)
info =[]  
page_number = np.arange(1,1219)
soup = BeautifulSoup(results.text, "html.parser")

for page in page_number:
    page = requests.get("https://www.findamasters.com/masters-degrees/united-kingdom/?40w900&PG=" + str(page))
    results = requests.get(page)
    soup = BeautifulSoup(results.text, "html.parser")
    div = soup.find("p", string =re.compile('neuro'))

sleep(randint(2,10))

masters = pd.DataFrame({
    'info': div})
masters.to_csv('masters.csv')

推荐阅读