首页 > 解决方案 > 使用 python 请求从网页中抓取数据

问题描述

我正在尝试抓取域搜索页面(您可以在其中输入关键字并获得一些随机结果),我在网络选项卡https://api.leandomainsearch.com/search?query=computer&count=all中找到了这个 api url (对于关键字:计算机),但我收到此错误

{'error': True, 'message': 'Invalid API Credentials'}

这是代码

import requests

r = requests.get("https://api.leandomainsearch.com/search?query=cmputer&count=all")
print(r.json())

标签: pythonpython-3.xweb-scrapingpython-requests

解决方案


该站点需要您设置AuthorizationRefererHTTP 标头。

例如:

import re
import json
import requests


kw = 'computer'

url = 'https://leandomainsearch.com/search/'
api_url = 'https://api.leandomainsearch.com/search'

api_key = re.search(r'"apiKey":"(.*?)"', requests.get(url, params={'q': kw}).text)[1]
headers = {'Authorization': 'Key ' + api_key, 'Referer': 'https://leandomainsearch.com/search/?q={}'.format(kw)}
data = requests.get(api_url, params={'query': kw, 'count': 'all'}, headers=headers).json()

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

for d in data['domains']:
    print(d['name'])

print()
print('Total:', data['_meta']['total_records'])

印刷:

...

blackopscomputer.com
allegiancecomputer.com
northpolecomputer.com
monumentalcomputer.com
fissioncomputer.com
hedgehogcomputer.com
blackwellcomputer.com
reflectionscomputer.com
towerscomputer.com
offgridcomputer.com
redefinecomputer.com
quantumleapcomputer.com

Total: 1727

推荐阅读