首页 > 解决方案 > perl 中 useragent 和 http 库的 Python3 等效项是什么?

问题描述

我对 Perl 和 Python 很陌生。我必须将一些在 Perl 中创建的旧函数转换为 Python。我正在努力为 -HTML::Form->parse() -{ua}->simple_request() 等模块找到 python 等效项。

我已经使用了诸如 beautifulsoup 之类的模块,这些模块在解析 html 页面中的数据时非常方便。

但我需要喜欢在整个代码中不断使用用户代理模块,但我无法在 python 中找到完美的替代方案。

perl中的代码初始化如下:

sub new {
    my ($class, %args) = @_;
    $ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0;
    my $self = { # default args:
#                 ip        => '10.10.10.10',
                port        => 443,
        transparent => 0,
#       logger      => 
        user_agent  => "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36",
#       user_agent  => "mybrowser",
        ssl_ver         => '23',
                %args,
               };

    unlink "cookies.txt";
    $self->{ua} = LWP::UserAgent->new(keep_alive => 10);
    $self->{ua}->agent($self->{user_agent});
    Net::SSL::send_useragent_to_proxy(1);
    $self->{ua}->timeout(90 * 1);
#   $self->{ua}->timeout(200 * 1);
    $ENV{'HTTPS_VERSION'} = $self->{ssl_ver};
    my $cookie_jar = HTTP::Cookies->new(
        file        => "cookies.txt",
        hide_cookie2    => 1,
#           autosave    => 1,
    );

    $self->{ua}->cookie_jar($cookie_jar);

    # Set proxy
    if (! $self->{transparent}) {
        my $proxy = 'http://' . $self->{ip} . ':' . $self->{port};  # don't add .'/' !
        $self->{logger}->Log("Set UA proxy: $proxy", 4);
        $self->{ua}->proxy('http', $proxy);
        $self->{ua}->proxy('https', $proxy);
#       $ua->proxy('https', $proxy);    # break authentication
        $ENV{'HTTPS_PROXY'} = $proxy;
        $self->{logger}->Log("Set HTTPS proxy: $ENV{'HTTPS_PROXY'}", 4);
        $self->{proxy} = $proxy;
    }

=head
    my $context = new IO::Socket::SSL::SSL_Context(
          SSL_version => 'TLSv1',
          SSL_verify_mode => Net::SSLeay::VERIFY_NONE(),
          );
        IO::Socket::SSL::set_default_context($context);
=cut
    @LWP::Protocol::http::EXTRA_SOCK_OPTS = (LocalAddr => $self->{init}->{client_ip},
                        SSL_version => $self->{ssl_ver},
                        SSL_cipher_list => $self->{ssl_cipher});

        bless $self, $class or die "Can't bless $class: $!";
        return $self;
}

现在这适用于初始化部分,但主要问题是在使用以下模块时出现:

my $form = HTML::Form->parse($res);
if (condition){
      $post = $form->make_request;
}
$res = $self->{ua}->simple_request($post);
$self->{ua}->no_proxy("10.x.x.x", "test.com", "10.x.x.x", "10.x.x.x", "10.x.x.x", "tests.com", "dummy.com");

...
$req->authorization_basic($login,$password);
$res = $self->{ua}->simple_request($req);


....

$req = $self->GetCommonRequest( $url );
        $req->authorization_basic($login,$password);
        $req->header(Content_Type => 'application/x-www-form-urlencoded',
            Accept => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
            'Accept-Encoding' => 'gzip, deflate',
            Host => $host);
...

使用 {ua} 模块用户的地方,如 simple_request、no_proxy 和模块,如 authorization_basic。我无法找到这些的 python 等价物。

如果有人能让我知道这些模块的 python 等价物,我将不胜感激。

提前非常感谢。

标签: pythonpython-3.xcode-conversion

解决方案


尝试使用这样的东西:

from urllib2 import urlopen, URLError, HTTPError, Request
from httplib import BadStatusLine, IncompleteRead

# url -- the URL you're trying to access
# data -- some params you want to POST
try :
    headers = {
        'User-Agent': 'Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11',
        'accept'    : 'application/json',
    }
    headers['Authorization'] = 'Bearer %s' % self.auth[nickname]['access_token']

    if data is None :   # GET method
        req = Request( url, None, headers)
    else :  # POST method
        headers['Content-Type'] = 'application/json'
        data = json.dumps(data).encode('utf-8')
        req = Request( url, data, headers)

    result = urlopen( req ).read()

    print result
    return json.loads( result )

except HTTPError, e:
    log( 'HTTP error: ' + str(e.code) )
    result = e.read()
    print result
    return json.loads( result )
except URLError, e:
    log_this( 'unable to reach a server: ' + str(e.reason) )
except BadStatusLine, e:
    log_this( 'Bad Status Line' )
except IncompleteRead, e :
    log_this( 'IncompleteRead: ' + str(e) )
except Exception, e :
    log_this( str(e) + ': ' + url )
    log_this( traceback.format_exc() )

推荐阅读