python - 发生异常:Python 中的 TypeError
问题描述
我对编码很陌生,所以很抱歉这是一个愚蠢的问题。每次尝试为 Python 爬虫运行此代码时,我都会收到错误消息。任何帮助都会很棒。
Exception has occurred: TypeError
'module' object is not callable
File "C:\Users\quawee\OneDrive\seaporn.org-scraper\seaporn.org-scraper.py", line 33, in <module>
articles = requests(x)
从这段代码....
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time
articlelist = []
def request(x):
url = f'https://www.seaporn.org/category/hevc/page/{x}/'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'}
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content, features='lxml')
return soup.find_all('article', class_ = 'post-summary')
def parse(articles):
for item in articles:
link = item.find({'a': 'entry-link'})
article = {
'link': link['href']
}
articlelist.append(article)
def output():
df = pd.DataFrame(articlelist)
df.to_excel('articlelist.xlsx', index=False)
print('Saved to xlsx.')
x = 5000
while True:
print(f'Page {x}')
articles = requests(x)
x = x + 1
time.sleep(3)
if len(articles) != 0:
parse(articles)
else:
break
print('Completed, total articles is', len(articlelist))
output()
解决方案
您定义的函数的名称是request(x)
. 您requests(x)
在 while 循环内调用。这应该可行,我只是更正了拼写:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time
articlelist = []
def request(x):
url = f'https://www.seaporn.org/category/hevc/page/{x}/'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'}
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content, features='lxml')
return soup.find_all('article', class_ = 'post-summary')
def parse(articles):
for item in articles:
link = item.find({'a': 'entry-link'})
article = {
'link': link['href']
}
articlelist.append(article)
def output():
df = pd.DataFrame(articlelist)
df.to_excel('articlelist.xlsx', index=False)
print('Saved to xlsx.')
x = 5000
while True:
print(f'Page {x}')
articles = request(x)
x = x + 1
time.sleep(3)
if len(articles) != 0:
parse(articles)
else:
break
print('Completed, total articles is', len(articlelist))
output()
推荐阅读
- c - 是否可以在 C 中将指针分配给数组或将数组的地址更改为指针地址?
- python - 使用 stdin.read() 从控制台读取值时出现问题
- linux - 作为指定用户从另一个内部调用一个 bash 函数
- typescript - nuxt 将变量分配给这个
- javascript - 无法读取未定义的属性“insertOnMatch”-TypeScript
- python - 更新 mongodb 集合中的列
- javascript - Draft.js 中的字体定义
- c# - 如何枚举真正的 Windows 文件资源管理器窗口
- google-cloud-platform - 如何设置本地部署凭据
- r - 如何根据传单R中的列设置标记颜色