首页 > 解决方案 > Python aiohttp - 登录页面并获取内容

问题描述

几天来,我一直在尝试使用 aiohttp 登录网站,然后导航到管理区域以获取内容。发布会话后,我不确定如何才能从管理页面获取内容。我也尝试过从会话中抓取 cookie,但我不确定抓取 cookie 后该怎么做。此代码部分已被注释掉,因为它不是首选方式。

async def do_task(session, credentials):
    try:
     async with session.get(credentials['domain']) as r:
         url = r.url #follow redirect to login page
         login_data = {"log": credentials['username'], "pwd": credentials['password']}

         # Please help with below
         await session.post(url, json=login_data)
         return await r.get(f'{url}admin').text()

        # #Attempt with cookies
        #  async with session.post(url, json=login_data) as login:
        #      session.cookie_jar.update_cookies(login.cookies)
        #      return await login.get(f'{url}admin').text()

    except Exception as e:
     print(e)

async def tasks(session, dict_list):
    tasks = []
    for credentials in dict_list:
        task = asyncio.create_task(do_task(session, credentials))
        tasks.append(task)
    results = await asyncio.gather(*tasks)
    return results

async def main(x):
    async with aiohttp.ClientSession() as session:
        data = await tasks(session, x)
        return data

if __name__ == '__main__':
    dict_list = ({
        "username": 'test',
        "domain": 'http://url.com/admin',
        "password": 'enter'
    },
    )

    asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) #for windows
    results = asyncio.run(main(dict_list))

我得到的错误信息是'ClientResponse' object has no attribute 'get'

我以前使用下面的代码对请求做了同样的事情,但我试图用 aiohttp 加快速度。

            with requests.Session() as login_request:
                login_data = {"log": x['username'], "pwd": x['password']
                              }
                login_request.post(url, data=login_data)
                source_code = login_request.get(url).content

标签: pythonaiohttp

解决方案


这里的主要问题:

r.get(f'{url}/admin').text()

在您的情况下rClientResponse(不是ClientSession. 是async with session.get(credentials['domain'])结果)。

您可以异步接收 cookie,然后将它们用于异步抓取。应该是这样的:

async def login(session, domain: str, login_data: dict):
    async with session.post(domain + '/admin', data=login_data) as resp:
        # I don't know how you check success login...
        # data = await resp.json() or data await resp.text()
        # if data... blablabla
        return [domain, resp.cookies]

async def process_page(session, url: str, cookie: SimpleCookie):
    async with session.get(url, cookie_jar=cookie) as resp:
        content = await resp.text()
        # do something + return...

# example logins = [{'domain': '...', 'login_data': {...}}, ...]
# get cookies for all users and domains
cookies = await asyncio.gather(*[
    login(session, l['domain'], t['login_data'])
    for l in logins
])
      
# processing '/page1' for all domains and users 
result = await asyncio.gather(*[
    process_page(session, c[0] + '/page1', c[1])
    for c in cookies
])

推荐阅读