首页 > 解决方案 > 在重定向链接中完成登录表单后如何下载文件

问题描述

我想.tgz通过 python 代码从网站下载一些文件。当我单击文件链接时,它会转到另一个页面,让我填写表格(用于登录),填写表格后,它会返回文件链接并开始下载。我尝试过python3requests但没有成功:

我的代码:

import requests
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

payload={'username':'salvandi69@gmail.com','password':'123asdzxc'}

myurl="https://eogdata.mines.edu/wwwdata/viirs_products/dnb_composites/v10//201707/vcmslcfg/SVDNB_npp_20170701-20170731_75N060W_vcmslcfg_v10_c201708061200.tgz"
myurl2="https://eogauth.mines.edu/auth/realms/master/protocol/openid-connect/auth?response_type=code&scope=email%20openid&client_id=eogdata_oidc&state=VyIetf3UzkQbxOjX-jJ-ae5lMaM&redirect_uri=https%3A%2F%2Feogdata.mines.edu%2Feog%2Foauth2callback&nonce=DRL2KruY5oxbgo2G6HxNHX-CgiMoxfF6FdGOV-FK65o"

r = requests.post(myurl2, verify=False, data=payload, timeout=6)

print(r.text)

myurl是文件链接并被myurl2重定向链接结果:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" class="login-pf">

<head>
    <meta charset="utf-8">
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <meta name="robots" content="noindex, nofollow">

            <meta name="viewport" content="width=device-width,initial-scale=1"/>
    <title>Log in to Earth Observation Group Login</title>
    <link rel="icon" href="/auth/resources/afx5f/login/eog/img/favicon.ico" />
            <link href="/auth/resources/afx5f/common/keycloak/node_modules/patternfly/dist/css/patternfly.min.css" rel="stylesheet" />
            <link href="/auth/resources/afx5f/common/keycloak/node_modules/patternfly/dist/css/patternfly-additions.min.css" rel="stylesheet" />
            <link href="/auth/resources/afx5f/common/keycloak/lib/zocial/zocial.css" rel="stylesheet" />
            <link href="/auth/resources/afx5f/login/eog/css/login.css" rel="stylesheet" />
</head>

<body class="">
  <div class="login-pf-page">
    <div id="kc-header" class="login-pf-page-header">
      <div id="kc-header-wrapper" class=""><div class="kc-logo-text"><span>EOG</span></div></div>
    </div>
    <div class="card-pf ">
      <header class="login-pf-header">
            <div id="kc-locale">
                <div id="kc-locale-wrapper" class="">
                    <div class="kc-dropdown" id="kc-locale-dropdown">
                        <a href="#" id="kc-current-locale-link">English</a>
                        <ul>
                                <li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=de">Deutsch</a></li>
                                <li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=no">Norsk</a></li>
                                <li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=ru">Русский</a></li>
                                <li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=sv">Svenska</a></li>
                                <li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=pt-BR">Português (Brasil)</a></li>
                                <li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=lt">Lietuvių</a></li>
                                <li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=en">English</a></li>
                                <li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=it">Italiano</a></li>
                                <li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=fr">Français</a></li>
                                <li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=zh-CN">中文简体&lt;/a></li>
                                <li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=es">Español</a></li>
                                <li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=cs">Čeština</a></li>
                                <li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=ja">日本語&lt;/a></li>
                                <li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=sk">Slovenčina</a></li>
                                <li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=pl">Polish</a></li>
                                <li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=ca">Català</a></li>
                                <li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=nl">Nederlands</a></li>
                                <li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=tr">tr</a></li>
                        </ul>
                    </div>
                </div>
            </div>
                <h1 id="kc-page-title">        We are sorry...
</h1>
      </header>
      <div id="kc-content">
        <div id="kc-content-wrapper">


        <div id="kc-error-message">
            <p class="instruction">Invalid Request</p>
        </div>


        </div>
      </div>

    </div>
  </div>
</body>
</html>

标签: pythonpython-requests

解决方案


主要问题是:您POST使用登录页面发送到 url,但form不必这样做。您应该检查<form action=...>以获取正确的POST.

我用来BeautifulSoupHTML.

我没有usernamepassword测试所有元素,但至少现在POST获取带有登录表单和消息Invalid username or password.的页面,而不是带有Invalid Request

import requests
from bs4 import BeautifulSoup as BS

s = requests.Session()
#s.headers.update({'User-Agent': 'Mozilla/5.0'})

# --- use tgz to get login page -------

url_tgz = "https://eogdata.mines.edu/wwwdata/viirs_products/dnb_composites/v10//201707/vcmslcfg/SVDNB_npp_20170701-20170731_75N060W_vcmslcfg_v10_c201708061200.tgz"

r = s.get(url_tgz)
#print(r.status_code)
#print(r.history)
print('\n--- url page ---\n')
print(r.url)

# --- find url in form ---

soup = BS(r.text, 'html.parser')
item = soup.find('form') 
url = item['action']

print('\n--- url form ---\n')
print(url)

print('\n--- url page == url in form ---\n')
print( r.url == url )

# --- login ---

payload = {
    'username': 'salvandi69@gmail.com',
    'password': '123asdzxc',
    'credentialId': '',
}

r = s.post(url, data=payload)
#print(r.status_code)
#print(r.history)
#print(r.url)
#print(r.text)

# --- result ---

print('\n--- login ---\n')
soup = BS(r.text, 'html.parser')
item = soup.find('span', {'class': 'kc-feedback-text'})
if item:
    print('Message:', item.text)
else:
    print("Can't see error message")

print('\n--- end ---\n')

推荐阅读