python - 重新忽略某些行
问题描述
我试图将我的数据转换为字典列表,例如
example_dict = {"host":"146.204.224.152",
"user_name":"feest6811", #note: sometimes the user name is missing! In this case, use '-' as the value for the username.**)
"time":"21/Jun/2019:15:45:24 -0700",
"request":"POST /incentivize HTTP/1.1"} #note: not everything is a POST
我的数据:
86.187.99.249 - tillman6650 [21/Jun/2019:15:46:03 -0700] "POST /efficient/unleash HTTP/1.1" 405 22390
76.72.133.93 - carroll1056 [21/Jun/2019:15:46:05 -0700] "POST /morph/optimize/plug-and-play HTTP/2.0" 400 27172
73.162.151.229 - dubuque3528 [21/Jun/2019:15:46:08 -0700] "DELETE /transition/holistic/e-business HTTP/2.0" 301 13923
13.112.8.80 - rau5026 [21/Jun/2019:15:46:09 -0700] "HEAD /ubiquitous/transparent HTTP/1.1" 200 16928
159.253.153.40 - - [21/Jun/2019:15:46:10 -0700] "POST /e-business HTTP/1.0" 504 19845
136.195.158.6 - feeney9464 [21/Jun/2019:15:46:11 -0700] "HEAD /open-source/markets HTTP/2.0" 204 21149
219.194.113.255 - - [21/Jun/2019:15:46:12 -0700] "PATCH /next-generation/niches/mindshare HTTP/1.0" 503 20246
59.101.239.174 - brekke3293 [21/Jun/2019:15:46:13 -0700] "DELETE /ubiquitous/seize/web-enabled HTTP/2.0" 302 14017
我的代码:
pattern = """
(?P<host>.*) #User host
(-\ ) #Separator
(?P<user_name>\w*) #User name
(\ \[) #Separator for pharanteses and space
(?P<time>\S*\ -0700) #time
(\]\ ) #Separator for pharanteses and space
(?P<request>.*")
"""
for user in re.finditer(pattern,logdata,re.VERBOSE):
print(user.groupdict())
输出:
{'host': '86.187.99.249 ', 'user_name': 'tillman6650', 'time': '21/Jun/2019:15:46:03 -0700', 'request': '"POST /efficient/unleash HTTP/1.1"'}
{'host': '76.72.133.93 ', 'user_name': 'carroll1056', 'time': '21/Jun/2019:15:46:05 -0700', 'request': '"POST /morph/optimize/plug-and-play HTTP/2.0"'}
{'host': '73.162.151.229 ', 'user_name': 'dubuque3528', 'time': '21/Jun/2019:15:46:08 -0700', 'request': '"DELETE /transition/holistic/e-business HTTP/2.0"'}
{'host': '13.112.8.80 ', 'user_name': 'rau5026', 'time': '21/Jun/2019:15:46:09 -0700', 'request': '"HEAD /ubiquitous/transparent HTTP/1.1"'}
{'host': '136.195.158.6 ', 'user_name': 'feeney9464', 'time': '21/Jun/2019:15:46:11 -0700', 'request': '"HEAD /open-source/markets HTTP/2.0"'}
{'host': '59.101.239.174 ', 'user_name': 'brekke3293', 'time': '21/Jun/2019:15:46:13 -0700', 'request': '"DELETE /ubiquitous/seize/web-enabled HTTP/2.0"'}
在给定的数据中,一些用户名是“-”,在我的代码中,它只是跳过了这些行。我也必须添加这些行并使用“-”作为用户名的值。
解决方案
您可以将当前的username
正则表达式更改为
(?P<user_name>[\w\-]*)
由于-
符号在正则表达式中具有特殊含义(它表示将匹配从 0 到 9 的任何数字的范围)以从字面上匹配它,因此您需要使用转义它\
推荐阅读
- javascript - 如何以编程方式单击按钮并使用 jQuery 传递参数?
- excel - Firebase:从firebase实时同步数据到excel
- flutter - Flutter 导航到其他路由后返回黑屏
- go - 如何创建具有重复元素的切片
- angular - 如何访问 Angular 中影子根元素下存在的 dom 元素
- python - 使用 *args 将数据帧传递给函数
- sql - 如果时差为 5 分钟如何触发然后重置 otp 填充 null
- python - 如何计算两个 numpy 数组之间匹配的零元素的数量?
- c++ - 如何写向量的无序集,即unordered_set
> 设置? - google-colaboratory - 是否可以通过另一种方式增加 google colab 中的 ram?