python - 使用 requests.get 获取数据时 URL 中的特殊字符
问题描述
从 IRI 获取具有一些特殊字符的内容时,我遇到了一些问题。我一直在严格使用requests
模块。以下是一些引起问题的 URL
https://cwur.org/2018-19/King's-College-London.php
https://cwur.org/2018-19/University-of-Wisconsin–Madison.php
import requests
res = requests.get('https://cwur.org/2018-19/University-of-São-Paulo.php')
res.text
解决方案
为了获得响应 200,请在标头中传递一个 User-Agent。
import requests
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11'}
res = requests.get('https://cwur.org/2018-19/University-of-São-Paulo.php', headers=headers)
print(res.status_code)
print("---" * 10)
print(res.text)
输出:
200
------------------------------
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
<meta name="description" content="The Center for World University Rankings (CWUR) is a leading consulting organization and publisher of the largest academic ranking of global universities.">
<meta name="keywords" content="ranking, rankings, university, universities, college, colleges, 2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, world, top, best, global, Ranking universitario mundial, Classement mondial des universités , Weltweites Universitätsranking, Zentrum für weltweite Universitätsrankings , ××ר×× ×××× ××רס××××ת ××¢××××, ××ר×× ×××ר×× ×××× ××רס××××ת ××¢××××, ì¸ê³ ëíìì, ãä¸çã®å¤§å¦ããã, ä¸ç大å¸æåä¸å¿, ì¸ê³ëíëí¹ì¼í°,ä¸ç大å¦ã©ã³ãã³ã°ã»ã³ã¿ã¼, Ranking mundial universitário, РейÑинг ÑнивеÑÑиÑеÑов миÑа , ÑазÑабоÑки ÑейÑинга ÑнивеÑÑиÑеÑов миÑа, ÙرÙز ,تصÙÙ٠اÙجاÙعات اÙعاÙÙÙØ© ,تصÙÙÙ, اÙجاÙعات, جاÙعات, اÙعاÙÙ, تصÙÙ٠اÙجاÙعات, ÙرÙز تصÙÙ٠اÙجاÙعات اÙعاÙÙÙØ©, Ranking de universidades del mundo, subject, subjects, journal, journals, ranking by subjects, country ranking, country rankings">
<link rel="icon" type="image/png" href="../../favicon.png" />
<!-- Bootstrap core CSS -->
<link href="../../dist/css/bootstrap.min.css" rel="stylesheet">
<!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
<link href="../../assets/css/ie10-viewport-bug-workaround.css" rel="stylesheet">
<!-- Custom styles for this template -->
<link href="../../starter-template.css" rel="stylesheet">
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
<style type="text/css">
/* CSS used here will be applied after bootstrap.css */
.navbar-custom {
color: #FFFFFF;
background-color: #222222;
border-color: #222222;
}
</style>
<title> University of São Paulo Ranking | CWUR World University Rankings 2018-2019</title>
</head>
<body>
<nav class="navbar navbar-inverse navbar-fixed-top">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a href="http://cwur.org"><img src="../images/logo_944_400.png" height="50"></a>
</div>
<div id="navbar" class="navbar-collapse collapse">
<ul class="nav navbar-nav">
<li><a href="../about.php" style="color:white">About</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false" style="color:white">World University Rankings <span class="caret"></span></a>
<ul class="dropdown-menu">
<li class="dropdown-header">World University Rankings</li>
<li><a href="../2020-21.php">2020-21</a></li>
<li><a href="../2019-20.php">2019-20</a></li>
<li><a href="../2018-19.php">2018-19</a></li>
<li><a href="../2017.php">2017</a></li>
<li><a href="../2016.php">2016</a></li>
<li><a href="../2015.php">2015</a></li>
<li><a href="../2014.php">2014</a></li>
<li><a href="../2013.php">2013</a></li>
<li><a href="../2012.php">2012</a></li>
<li role="separator" class="divider"></li>
<li class="dropdown-header">University Rankings by Country</li>
<li><a href="../2018-19/country.php">2018-19</a></li>
<li><a href="../2017/country.php">2017</a></li>
<li><a href="../2016/country.php">2016</a></li>
<li><a href="../2015/country.php">2015</a></li>
<li><a href="../2014/country.php">2014</a></li>
<li role="separator" class="divider"></li>
<li><a href="../2017/subjects.php">Rankings by Subject</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false" style="color:white">Methodology <span class="caret"></span></a>
<ul class="dropdown-menu">
<li><a href="../methodology/world-university-rankings.php">World University Rankings</a></li>
<li><a href="../methodology/subject-rankings.php">Subject Rankings</a></li>
</ul>
</li>
<li><a href="../media.php" style="color:white">Media</a></li>
</ul>
</div>
</div>
</nav>
<div class="container">
<div class="page-header">
<h4> University of São Paulo Ranking - CWUR World University Rankings 2018-2019</h4>
<!-- Go to www.addthis.com/dashboard to customize your tools -->
<div class="addthis_toolbox addthis_default_style addthis_32x32_style"> <a class="addthis_button_preferred_1"></a> <a class="addthis_button_preferred_2"></a> <a class="addthis_button_preferred_3"></a> <a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div> </div>
<div class="row">
<div class="col-md-8">
<table class="table table-bordered table-hover">
<tr><td><b>Institution Name</b></td><td>University of São Paulo </td></tr>
<tr><td><b>Native Name</b></td><td>Universidade de São Paulo </td></tr>
<tr><td><b>Location</b></td><td>Brazil</td></tr>
<tr><td><b>World Rank</b></td><td>77</td></tr>
<tr><td><b>National Rank</b></td><td>1</td></tr>
<tr><td><b>Quality of Education Rank</b></td><td>583</td></tr>
<tr><td><b>Alumni Employment Rank</b></td><td>256</td></tr>
<tr><td><b>Quality of Faculty Rank</b></td><td>109</td></tr>
<tr><td><b>Research Output Rank</b></td><td>4</td></tr>
<tr><td><b>Quality Publications Rank</b></td><td>60</td></tr>
<tr><td><b>Influence Rank</b></td><td>162</td></tr>
<tr><td><b>Citations Rank</b></td><td>139</td></tr>
<tr><td><b>Overall Score</b></td><td>82.6</td></tr>
<tr><td><b>Domain</b></td><td>usp.br</td></tr>
</table>
</div>
<div class="col-md-4">
<div class="table-responsive">
<table class="table table-bordered table-hover">
<tr><td><a href="http://cwur.org/2020-21.php">Top 2000 Universities (2020-21)</a></td></tr>
<tr><td><a href="http://cwur.org/2019-20.php">Top 2000 Universities (2019-20)</a></td></tr>
<tr><td><a href="http://cwur.org/2018-19.php">Top 1000 Universities (2018-19)</a></td></tr>
<tr><td><a href="http://cwur.org/2018-19/country.php">Ranking by Country (2018-2019)</a></td></tr>
<tr><td><a href="http://cwur.org/2017.php">Top 1000 Universities (2017)</a></td></tr>
<tr><td><a href="http://cwur.org/2017/country.php">Ranking by Country (2017)</a></td></tr>
<tr><td><a href="http://cwur.org/2017/subjects.php">Rankings by Subject</a></td></tr>
<tr><td><a href="http://cwur.org/2016.php">Top 1000 Universities (2016)</a></td></tr>
<tr><td><a href="http://cwur.org/2016/country.php">Ranking by Country (2016)</a></td></tr>
<tr><td><a href="http://cwur.org/2015.php">Top 1000 Universities (2015)</a></td></tr>
<tr><td><a href="http://cwur.org/2015/country.php">Ranking by Country (2015)</a></td></tr>
<tr><td><a href="http://cwur.org/2014.php">Top 1000 Universities (2014)</a></td></tr>
<tr><td><a href="http://cwur.org/2014/country.php">Ranking by Country (2014)</a></td></tr>
</table>
</div>
</div>
</div>
<p>Copyright © 2012-2020 Center for World University Rankings</p>
</div>
<!-- Bootstrap core JavaScript
================================================== -->
<!-- Placed at the end of the document so the pages load faster -->
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
<script>window.jQuery || document.write('<script src="../../assets/js/vendor/jquery.min.js"><\/script>')</script>
<script src="../../dist/js/bootstrap.min.js"></script>
<!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
<script src="../../assets/js/ie10-viewport-bug-workaround.js"></script>
<!-- Go to www.addthis.com/dashboard to customize your tools -->
<script type="text/javascript" src="//s7.addthis.com/js/300/addthis_widget.js#pubid=ra-5316b43f5ee1fc57"></script>
</body>
</html>
更新:
如果是 unicode url,您可以将它们转换为字符串
import requests
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11'}
url = "https://cwur.org/2018-19/University-of-S\xc3\xa3o-Paulo.php"
new_url = url.encode("iso-8859-1").decode()
res = requests.get(new_url, headers=headers)
print(res.status_code)
print("---" * 10)
print(res.text)
推荐阅读
- amazon-web-services - Amazon S3 生命周期规则:存档文件具有最近的日期
- javascript - 通过选择选项 javascript 更改品牌详细信息
- c# - 映射列表
- > 到具有最大值对象属性的字典
- java - 如何在 Mockito 单元测试中为基于 SpringBoot 的服务类指定匹配器
- powershell - 为什么 PowerShell 会在 Linux 作为第三台计算机时出现双跳问题
- javascript - 什么算法最适合对表格的单元格进行分组?
- c - .txt 文件中的行是否正好有 300 个字符?
- python - 使用 Pandas 数据框的 SQL Server 合并?
- python - 如何让用户在 Django 中编辑/更新 .py 文件?
- flutter - Flutter ListView 在底层列表更新时不更新