首页 > 解决方案 > 如何使用 pandas 读取日志文件?

问题描述

在我的日志文件中,一些条目是 -

1. IP428702 - - [02/Sep/2017:18:44:27 +0200] "GET /?ln=de HTTP/1.1" 200 4858 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 122026 0 NOSSL

2. 22354 - - [01/Sep/2017:07:12:06 +0200] "GET / HTTP/1.1" 200 18359 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1" 131909 0 NOSSL

3. IP428702 - - [02/Sep/2017:18:42:14 +0200] "GET /search?ln=en&sc=1&p=1&action_search=1 HTTP/1.1" 200 9490 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36\"'`--" 2155371 2 NOSSL

4. IP428702 - - [02/Sep/2017:18:42:43 +0200] "GET /search?ln=en&sc=1&p=&action_search= HTTP/1.1" 200 9796 "http://doc.rero.ch/search?l...\"'`--" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 5776261 5 NOSSL

5. IP173839 - - [02/Sep/2017:12:09:55 +0200] "GET /server/document/get_indexing?page_nr=16&from=&to=&url=http://doc.rero.ch/record/1... HTTP/1.1" 200 131113 "http://doc.rero.ch/client/fr//" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112

6. IP423766 - - [01/Sep/2017:14:30:25 +0200] "GET /record/11876/files/bulletin_vals_asla_2007_085.pdf?version=1'\" HTTP/1.1" 200 6847339 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; iebar; acc=none; SV1; snprtz|S04087544802137; .NET CLR 1.1.4322)" 241381 0 NOSSL
IP427 - - [01/Sep/2017:14:30:25 +0200] "GET /record/258826/export/xd?ln=en HTTP/1.1" 200 441 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search..." 114963 0 NOSSL

我用来读取日志条目的代码是

 data = pd.read_csv(
        'path_to loffile', 
        sep=r'\s+(?=(?:[^"]*"[^"]*")*[^"]*$)(?![^\[]*\])',
        engine='python', names = ["ip", "time", "request",   
        "status","size", 
        "referer", "user_agent"],skipfooter = 1,
        usecols = [0,3,4,5,6,7,8])

它返回的是——

"IP423766 - - [01/Sep/2017:14:30:25 +0200] "GET  "  

如何从条目中获取所有内容?

标签: pythonpandas

解决方案


推荐阅读