首页 > 技术文章 > python解析时间格式脚本

junrong624 2015-11-02 17:58 原文

对于这种时间格式:發表於: 星期一 五月 28, 2012 6:59 am

import re
INPUT = "發表於: 星期一 五月 28, 2012 6:59 am    文章主題: 對《大話新聞》改組的誠心思考/蔬菜麵"
pattern = re.compile(r'[\d]+')
b = re.findall(pattern, INPUT)
a = INPUT.split(' ')
monthdict = {"一月": "01","二月": "02", "三月": "01", "四月": "04", "五月": "05", "六月": "06",
                "七月": "07",  "八月": "08",  "九月": "09",  "十月": "10",  "十一月": "11",  "十二月": "12"}
year = a[4]
month = monthdict[a[2]]
day = b[0]
if a[6] == 'pm':
    hour = int(b[2].encode('utf-8')) + 12
hour= b[2]
min = b[3]
OUTPUT = "%s-%s-%s %s:%s:00"% (year, month, day, hour, min)
print OUTPUT

对于这种正常的时间格式   http://www.cdnews.com.tw 2015-11-02 17:33:55

import re
INPUT="http://www.cdnews.com.tw 2015-11-02 17:33:55"
pattern = re.compile(r'[\d]+')
a = re.findall(pattern, INPUT)
year = a[0]
month = a[1]
day = a[2]
hour = a[3]
minute = a[4]
second = a[5]
OUTPUT = "%s-%s-%s %s:%s:%s" % (year,month,day,hour,minute,second)
print OUTPUT

 对于这种时间格式  發表於: 星期三 十二月 14, 2016 6:45 pm

import re
INPUT = "發表於: 星期三 十二月 14, 2016 6:45 pm"
pattern = re.compile(r'[\d]+')
b = re.findall(pattern, INPUT)
a = INPUT.split(' ')
monthdict = {"一月": "01","二月": "02", "三月": "01", "四月": "04", "五月": "05", "六月": "06","七月": "07",  "八月": "08",  "九月": "09",  "十月": "10",  "十一月": "11",  "十二月": "12"}
year = a[4]
month = monthdict[a[2]]
day = b[0]
if a[6] == 'pm':
    hour = int(b[2].encode('utf-8')) + 12
elif a[6] == 'am':
    h = int(b[2])
    if h >= 10:
        hour = h
    elif h<10:
        hour= "0"+b[2]
min = b[3]
OUTPUT = "%s-%s-%s %s:%s:00"% (year, month, day, hour, min)
print OUTPUT

 

推荐阅读