python - Beautiful Soup 一遍又一遍地返回相同的输出
问题描述
我是网络抓取的新手。我希望刮板返回所有带有关键字“neuro”的段落,但是当我运行代码时,它似乎为所有迭代返回相同的输出。你能指出我的错误吗?
import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import re
from time import sleep
from random import randint
url = "https://www.findamasters.com/masters-degrees/united-kingdom/?40w900"
results = requests.get(url)
info =[]
page_number = np.arange(1,1219)
soup = BeautifulSoup(results.text, "html.parser")
for page in page_number:
page = requests.get("https://www.findamasters.com/masters-degrees/united-kingdom/?40w900&PG=" + str(page))
div = soup.find("p", string =re.compile('neuro'))
sleep(randint(2,10))
masters = pd.DataFrame({
'info': div})
masters.to_csv('masters.csv')
但我得到的唯一输出是:
<p>It’s our mission to prolong and improve the lives of patients, and we seek to do this by conducting world-leading research in areas such as neuroscience, oncology, infectious diseases and more.</p>
<p>It’s our mission to prolong and improve the lives of patients, and we seek to do this by conducting world-leading research in areas such as neuroscience, oncology, infectious diseases and more.</p>
....
解决方案
这是你的问题。BeautifulSoup
参数results.text
和结果来自固定网址“ https://www.findamasters.com/masters-degrees/united-kingdom/?40w900 ”。
因此更改代码如下。
import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import re
from time import sleep
from random import randint
url = "https://www.findamasters.com/masters-degrees/united-kingdom/?40w900"
results = requests.get(url)
info =[]
page_number = np.arange(1,1219)
soup = BeautifulSoup(results.text, "html.parser")
for page in page_number:
page = requests.get("https://www.findamasters.com/masters-degrees/united-kingdom/?40w900&PG=" + str(page))
results = requests.get(page)
soup = BeautifulSoup(results.text, "html.parser")
div = soup.find("p", string =re.compile('neuro'))
sleep(randint(2,10))
masters = pd.DataFrame({
'info': div})
masters.to_csv('masters.csv')
推荐阅读
- javascript - 如何读取事件数据属性
- umbraco - Umbraco 8 - 复选框 (TrueFalse) 总是返回 False
- c++ - 设置类实例的变量
- wso2 - 一些 WSO2 默认属性的问题
- python - TKinter GUI 冻结直到子进程结束并实时输出到文本小部件
- javascript - 在构造函数中声明变量有什么用
- c# - 为什么生成 EDMX 时窗口会关闭?
- node.js - 在 Heroku 上托管时,Express 会话不保存护照用户 ID
- html - 我怎样才能用 flexbox 完成这个设计?
- kotlin - 为什么kotlin.coroutines.CoroutineContext.Element继承自kotlin.coroutines.CoroutineContext