首页 > 解决方案 > The urllib.request returns empty data, while the same request in postman returns correct data

问题描述

My url:

https://www.grants.gov/grantsws/rest/opportunities/search/

url payload:

payload = { "startRecordNum":0,
            "sortBy":"openDate|desc",
            "oppStatuses":"forecasted|posted"
          }

url headers:

headers = {'Accept':'application/json, text/javascript, */*; q=0.01',
                 'Content-Type':'application/json; charset=UTF-8' ,
                 'Origin':'https://www.grants.gov' , 
                 'Accept-Language':'en-US,en;q=0.9,fa-AF;q=0.8,fa;q=0.7,ru;q=0.6' }

My Python code:

import urllib.request
import urllib.parse

req = urllib.request.Request('https://www.grants.gov/grantsws/rest/opportunities/search/')
req.add_header('Accept','application/json, text/javascript, */*; q=0.01')
req.add_header('Content-Type','application/json; charset=UTF-8')
req.add_header('Origin','https://www.grants.gov')
req.add_header('Accept-Language','en-US,en;q=0.9,fa-AF;q=0.8,fa;q=0.7,ru;q=0.6')
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36')
payload = {"startRecordNum":0,
                "sortBy":"openDate|desc",
                "oppStatuses":"forecasted|posted"}
data = urllib.parse.urlencode(payload).encode()
# data = data.encode('ascii')


# r = urllib.request.urlopen(req)
with urllib.request.urlopen(req, data) as response:
    print(type(response))
    dataList = json.load(response)
    # searchParams = dataList['searchParams']
    # print(searchParams)
    print(dataList)

My result:

{'hitCount': 0, 'startRecord': 0, 'oppHits': [], 'oppStatusOptions': [], 'dateRangeOptions': [], 'suggestion': '', 'eligibilities': [], 'fundingCategories': [], 'fundingInstruments': [], 'agencies': [], 'accessKey': '', 'errorMsgs': []}

Whereas I expect the value for the above dict keys should not be empty as I get the right output using post request in Postman.

What should I do in order to get the right output. it is the link if you want to explore the request and params...

enter link description here

标签: pythonweb-scrapingpython-requestsurllib

解决方案


您需要使用以下方法以 json 格式对有效负载进行编码json.dumps(payload)

import urllib.request
import json

req = urllib.request.Request('https://www.grants.gov/grantsws/rest/opportunities/search/')
req.add_header('Content-Type','application/json; charset=UTF-8')
payload = {
    "startRecordNum": 0,
    "sortBy":"openDate|desc",
    "oppStatuses":"forecasted|posted"
}
data = json.dumps(payload).encode()

with urllib.request.urlopen(req, data) as response:
    dataList = json.load(response)
    print(dataList)

推荐阅读