首页 > 技术文章 > Python3 urlparse

itlqs 2016-11-11 19:11 原文

>>> from urllib.parse import urlparse
>>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
>>> o   
ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
            params='', query='', fragment='')
>>> o.scheme
'http'
>>> o.port
80
>>> o.geturl()
'http://www.cwi.nl:80/%7Eguido/Python.html'
>>> from urllib.parse import urlparse
>>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
           params='', query='', fragment='')
>>> urlparse('www.cwi.nl/%7Eguido/Python.html')
ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',
           params='', query='', fragment='')
>>> urlparse('help/Python.html')
ParseResult(scheme='', netloc='', path='help/Python.html', params='',
           query='', fragment='')
AttributeIndexValueValue if not present
scheme 0 URL scheme specifier scheme parameter
netloc 1 Network location part empty string
path 2 Hierarchical path empty string
params 3 Parameters for last path element empty string
query 4 Query component empty string
fragment 5 Fragment identifier empty string
username   User name None
password   Password None
hostname   Host name (lower case) None
port   Port number as integer, if present None

 

来源:https://docs.python.org/3/library/urllib.parse.html?highlight=urlparse#urllib.parse.urlparse

 

from urllib.parse import urljoin
>>> urljoin("http://www.asite.com/folder/currentpage.html", "anotherpage.html")
'http://www.asite.com/folder/anotherpage.html'
>>> urljoin("http://www.asite.com/folder/currentpage.html", "folder2/anotherpage.html")
'http://www.asite.com/folder/folder2/anotherpage.html'
>>> urljoin("http://www.asite.com/folder/currentpage.html", "/folder3/anotherpage.html")
'http://www.asite.com/folder3/anotherpage.html'
>>> urljoin("http://www.asite.com/folder/currentpage.html", "../finalpage.html")
'http://www.asite.com/finalpage.html'

推荐阅读