python - 在python中将Timex字符串转换为日期时间
问题描述
我想将 Timex 日期格式字符串列表(来自 SUTime)转换为正常的日期时间格式。问题是,我有许多不同的类型:
dates = ['2018-07-09',
'2018-W15',
'2018-02',
'2018-04-06',
'2018-W15',
'2018-02',
'2015-09',
'2018-09-27 INTERSECT P5D',
'FUTURE_REF',
'FUTURE_REF',
'PXY',
'THIS P1D INTERSECT 2018-09-28',
{'end': 'XXXX-06', 'begin': 'XXXX-04'},
'2014-03-19',
'2018-08-02']
我有两个目标:
- 忽略所有不直接指示特定日期的条目
- 将所有其他格式转换为
'yyyy-mm-dd'
格式,始终引用年、月、周等的第一天。例如:'2018-02'
应该成为'2018-02-01'
或'2018-W15'
2018-04-09
我尝试使用 pandas 的pd.to_datetime
功能,但这不会将周转换为日期
解决方案
It's a bit of a challenge when the data collection isn't uniform. I am unfamiliar with Timex and was unable to find any packages that might help.
This might help you out. I wrote some functions that handle each particular case.
import datetime
from pprint import pprint
dates = ['2018-07-09',
'2018-W15',
'2018-02',
'2018-04-06',
'2018-W15',
'2018-02',
'2015-09',
'2018-09-27 INTERSECT P5D',
'FUTURE_REF',
'FUTURE_REF',
'PXY',
'THIS P1D INTERSECT 2018-09-28',
{'end': 'XXXX-06', 'begin': 'XXXX-04'},
'2014-03-19',
'2018-08-02']
FORMAT = '%Y-%m-%d'
def get_simple_date(item, strformat=FORMAT):
try:
return (True, datetime.datetime.strptime(item, strformat))
except (ValueError, TypeError):
return (False, item)
def get_from_split(is_resolved, item):
if is_resolved:
return (is_resolved, item)
try:
tokens = item.split(' ')
are_resolved, items = zip(*(get_simple_date(token) for token in tokens))
if any(are_resolved):
# assume one valid token
result, = (item for item in items if isinstance(item, datetime.datetime))
return (True, result)
except (ValueError, AttributeError):
pass
return (False, item)
def get_from_no_day(is_resolved, item):
if is_resolved:
return (is_resolved, item)
if not 'W' in item:
try:
return (True, datetime.datetime.strptime(f'{item}-01', FORMAT))
except ValueError:
pass
return (False, item)
def get_from_w_date(is_resolved, item):
if is_resolved:
return (is_resolved, item)
if 'W' in item:
return (True, datetime.datetime.strptime(f'{item}-1', "%Y-W%W-%w"))
return (is_resolved, item)
collection1 = (get_simple_date(item) for item in dates)
collection2 = (get_from_split(*args) for args in collection1)
collection3 = (get_from_no_day(*args) for args in collection2)
collection4 = (get_from_w_date(*args) for args in collection3)
pprint([d for is_resolved, d in collection4 if is_resolved], indent=4)
OUTPUT:
[ datetime.datetime(2018, 7, 9, 0, 0),
datetime.datetime(2018, 4, 9, 0, 0),
datetime.datetime(2018, 2, 1, 0, 0),
datetime.datetime(2018, 4, 6, 0, 0),
datetime.datetime(2018, 4, 9, 0, 0),
datetime.datetime(2018, 2, 1, 0, 0),
datetime.datetime(2015, 9, 1, 0, 0),
datetime.datetime(2018, 9, 27, 0, 0),
datetime.datetime(2018, 9, 28, 0, 0),
datetime.datetime(2014, 3, 19, 0, 0),
datetime.datetime(2018, 8, 2, 0, 0)]
推荐阅读
- python - 计算具有不同长度数据点的压缩距离矩阵
- here-api - 如何使用 Here 绘制带有线条的地图?
- tsql - 试图从 SQL 中的表中获取到期日期
- c++ - 如何在 QMainWindow 中的所有 ui 上递归使用 retranslateUi()?
- sql - 我想从最近生效日期的重复值中发送整条记录
- flutter - Flutter 客户端证书认证
- sql - 如何从具有工作日标志的表中选择下一行
- .net - 从解决方案中的另一个项目启动二进制文件
- javascript - DocuSugn eSignature API 是否支持在 HTML 文档中嵌入地图?
- c++ - 为什么我们在 C++20 中需要 auto after function 概念参数?