首页 > 解决方案 > 尝试使用 re.verbose 制作字典

问题描述

我正在尝试编写一个模式来制作像这样的字典:

string= '30.95.91.251 - larson8319 [21/Jun/2019:16:02:02 -0700] "PUT /one-to-one/whiteboard HTTP/1.0" 401 7270'

看起来像:

dic= {"host":"30.95.91.251", 
      "user_name":"larson8319", 
      "time":"21/Jun/2019:16:02:02 -0700",
      "request":"PUT /one-to-one/whiteboard HTTP/1.0"}

使用此代码:

pattern = '''
(?P<host>.*)
(-\ )
(?P<user_name>\w*)
(?P<time>\W.+)
(?P<request>\w+)
'''
for item in re.finditer(pattern, logdata, re.VERBOSE):
    print(item.groupdict())'

但我无法让括号消失并订购请求部分。

标签: pythonregexverbose

解决方案


更具体并使用字符类 ( [...]):

(?P<host>[\d.]+)[-\s]+
(?P<user_name>\w+)\s+
\[(?P<time>[^][]+)\]\s+
"(?P<request>[^"]+)"

在 regex101.com 上查看演示


或者 - 完全使用解析器:

from parsimonious.grammar import Grammar
from parsimonious.nodes import NodeVisitor

string = '30.95.91.251 - larson8319 [21/Jun/2019:16:02:02 -0700] "PUT /one-to-one/whiteboard HTTP/1.0" 401 7270'


class LogVisitor(NodeVisitor):
    grammar = Grammar(
        r"""
        entry   = ip user time request rest
        ip      = ~"[\d.]+" junk
        user    = ~"\w+" junk
        time    = "[" ~"[^][]+" "]" junk
        request = '"' ~"[^\"]+" '"'
        junk    = ~"[-\s]*"
        rest    = ~".*"
        """
    )

    def generic_visit(self, node, visited_children):
        return visited_children or node

    def visit_entry(self, node, visited_children):
        ip, user, time, request, *_ = visited_children
        return dict([ip, user, time, request])

    def __clean__(self, node, visited_children, first=False):
        if first:
            _, what, *_ = visited_children
        else:
            what, *_ = visited_children
        return what.text

    def visit_ip(self, node, visited_children):
        return ('ip', self.__clean__(node, visited_children))

    def visit_user(self, node, visited_children):
        return ('user', self.__clean__(node, visited_children))

    def visit_time(self, node, visited_children):
        return ('time', self.__clean__(node, visited_children, True))

    def visit_request(self, node, visited_children):
        return ('request', self.__clean__(node, visited_children, True))


lv = LogVisitor()
result = lv.parse(string)
print(result)

这会产生

{'ip': '30.95.91.251', 'user': 'larson8319', 'time': '21/Jun/2019:16:02:02 -0700', 'request': 'PUT /one-to-one/whiteboard HTTP/1.0'}

推荐阅读