首页 > 解决方案 > 提取主机名、时间戳、HTTP 请求方法、URI 和协议

问题描述

我想从下面的响应中提取主机名、时间戳、HTTP 请求方法、URI 和协议

unicomp6.unicomp.net - - [01/Jul/1995:00:00:06 -0400] "GET /shuttle/countdown/ HTTP/1.0" 200 3985
199.120.110.21 - - [01/Jul/1995:00:00:09 -0400] "GET /shuttle/missions/sts-73/mission-sts-73.html HTTP/1.0" 200 4085

使用正则表达式。请让我知道我该怎么做

标签: pythonregex

解决方案


我尝试了下面的代码

timestamp - r"\[\d+/\D+/.*\]
host name - (\d+\.\d+\.\d+\.\d+)\s* |(.+)\.(com|info|biz|tv|net)
status code - "\s\d{3}

但没有得到想要的结果。它说期望字符串或字节大小

regex = r'\b(\d+\.\d+\.\d+\.\d+)\s* |(.+)\.(com|info|biz|tv|net)'

sample_text = ("[unicomp6.unicomp.net - - [01/Jul/1995:00:00:06 -0400] "GET /shuttle/countdown/ HTTP/1.0" 200 3985, 199.120.110.21 - - [01/Jul/1995:00:00:09 -0400] "GET /shuttle/missions/sts-73/mission-sts-73.html HTTP/1.0" 200 4085]")

matches = re.findall(regex, sample_text)
hosts = []
for matchNum, match in enumerate(matches, start=1):
    hosts.append(match.group()[1:27])
print(hosts)

推荐阅读