首页 > 解决方案 > 使用 regex-python3.x 获取时间戳

问题描述

将所有时间戳与文本文件中存在的其他内容分开。例如:

a.txt

2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart

"2019-07-17T07:11:14.894Z" "mgremove datestring"    asfasnfs: remove datepart
"2019-07-17T07:11:14.894Z"     "mgremove datestring"     asfasnfs: remove datepart
"2019-07-17T07:11:14.894Z"     "mgremove datestring"     asfasnfs: remove datepart
"2019-07-17T07:11:14.894Z"      "mgremove datestring"     asfasnfs: remove datepart

17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
"mgremove datestring"     asfasnfs: remove datepart check the value
                         "mgremove datestring"     asfasnfs: remove datepart check the value

我的解决方案对文本中的前 4 行执行此操作,但它不是通用的。我想让它通用,以便它从行的开头自动检测时间戳。

with open("\a.txt") as f:
    for line in f:
        date_string = " ".join(line.strip().split()[:4])
        print(date_sting, line)

预期的解决方案:

date_string = 2019/01/31-11:56:23.288258 line = 2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
date_string = 2019/01/31-11:56:23.288258 line = 2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
date_string = 2019/01/31-11:56:23.288258 line = 2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
date_string = 2019/01/31-11:56:23.288258 line = 2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
date_string = "2019-07-17T07:11:14.894Z" line = "2019-07-17T07:11:14.894Z"      "mgremove datestring"     asfasnfs: remove datepart
date_string = "2019-07-17T07:11:14.894Z" line = "2019-07-17T07:11:14.894Z"      "mgremove datestring"     asfasnfs: remove datepart
date_string = "2019-07-17T07:11:14.894Z" line = "2019-07-17T07:11:14.894Z"      "mgremove datestring"     asfasnfs: remove datepart
date_string = "2019-07-17T07:11:14.894Z" line = "2019-07-17T07:11:14.894Z"      "mgremove datestring"     asfasnfs: remove datepart
date_string = 17 Jul 2019 07:01:10 line = 17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
date_string = 17 Jul 2019 07:01:10 line = 17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
date_string = 17 Jul 2019 07:01:10 line = 17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
date_string = 17 Jul 2019 07:01:10 line =  asfasnfs: remove datepart
date_string = 17 Jul 2019 07:01:10 line =  asfasnfs: remove datepart

文本文件也可能包含其他时间戳模式。有没有办法检测行首的时间戳并获取它?如果行首没有日期,则取最后一行的日期。

标签: regexpython-3.xtimestamppattern-matching

解决方案


包含以下内容a.txt

2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart

"2019-07-17T07:11:14.894Z" "mgremove datestring"    asfasnfs: remove datepart
"2019-07-17T07:11:14.894Z"     "mgremove datestring"     asfasnfs: remove datepart
"2019-07-17T07:11:14.894Z"     "mgremove datestring"     asfasnfs: remove datepart
"2019-07-17T07:11:14.894Z"      "mgremove datestring"     asfasnfs: remove datepart

17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
asfasnfs: remove datepart
                               asfasnfs: remove datepart

这个脚本:

def get_date_string(line):
    rv = ''
    words = line.split()
    while words:
        rv += words.pop(0) + ' '
        if len(rv) > 18:
            break
    return rv.strip()

with open('file.txt', 'r') as f_in:
    last_date_string = ''

    for line in f_in:
        line = line.strip()
        if not line:
            continue

        date_part = get_date_string(line)
        if date_part == line:
            print('date string={: <30} line={}'.format(last_date_string, line))
        else:
            print('date string={: <30} line={}'.format(date_part, line))
            last_date_string = date_part

印刷:

date string=2019/01/31-11:56:23.288258     line=2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
date string=2019/01/31-11:56:23.288258     line=2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
date string=2019/01/31-11:56:23.288258     line=2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
date string=2019/01/31-11:56:23.288258     line=2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart
date string="2019-07-17T07:11:14.894Z"     line="2019-07-17T07:11:14.894Z" "mgremove datestring"    asfasnfs: remove datepart
date string="2019-07-17T07:11:14.894Z"     line="2019-07-17T07:11:14.894Z"     "mgremove datestring"     asfasnfs: remove datepart
date string="2019-07-17T07:11:14.894Z"     line="2019-07-17T07:11:14.894Z"     "mgremove datestring"     asfasnfs: remove datepart
date string="2019-07-17T07:11:14.894Z"     line="2019-07-17T07:11:14.894Z"      "mgremove datestring"     asfasnfs: remove datepart
date string=17 Jul 2019 07:01:10           line=17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
date string=17 Jul 2019 07:01:10           line=17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
date string=17 Jul 2019 07:01:10           line=17 Jul 2019 07:01:10      "mgremove datestring"     asfasnfs: remove datepart
date string=17 Jul 2019 07:01:10           line=asfasnfs: remove datepart
date string=17 Jul 2019 07:01:10           line=asfasnfs: remove datepart

推荐阅读