首页 > 解决方案 > Python:将字符串向量化为日期时间循环

问题描述

我通过以下循环将日期时间字符串转换为日期时间类型:

import pandas as pd
import datetime as dt
from dateutil import parser

t1 = '2015-01-23T00:00:00+01:00'
t2 = '2015-05-08T00:00:00+02:00'
T = [t1, t2]
X = ['x1', 'x2'] # some other data
data = {'T': T, 'X': X}
df = pd.DataFrame(data)

for i in range(len(df['T'])):
    df.loc[i,'T'] = parser.parse(df['T'][i]).strftime('%Y-%m-%d %H:%M:%S')
    df.loc[i,'T'] = dt.datetime.strptime(df['T'][i], '%Y-%m-%d %H:%M:%S')

然而,它真的很慢。是否有可能对操作进行矢量化?

标签: pythonlistdatetimefor-looptype-conversion

解决方案


这里的问题是您输入的混合 UTC 偏移量。您可以使用内置的pd.to_datetime

pd.to_datetime(df['T']).apply(lambda t: t.replace(tzinfo=None)).dt.strftime('%Y-%m-%d %H:%M:%S')

Out[18]: 
0    2015-01-23 00:00:00
1    2015-05-08 00:00:00
Name: T, dtype: object

如果仅使用 UTC,则可以避免此问题:

pd.to_datetime(df['T'], utc=True).dt.strftime('%Y-%m-%d %H:%M:%S')

Out[19]: 
0    2015-01-22 23:00:00
1    2015-05-07 22:00:00
Name: T, dtype: object

...或设置适当的时区:

pd.to_datetime(df['T'], utc=True).dt.tz_convert('Europe/Berlin').dt.strftime('%Y-%m-%d %H:%M:%S')

Out[20]: 
0    2015-01-23 00:00:00
1    2015-05-08 00:00:00
Name: T, dtype: object

一些timeits:

%timeit pd.to_datetime(df['T']).apply(lambda t: t.replace(tzinfo=None)).dt.strftime('%Y-%m-%d %H:%M:%S')
1.26 ms ± 21.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit pd.to_datetime(df['T'], utc=True).dt.strftime('%Y-%m-%d %H:%M:%S')
631 µs ± 7.92 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit pd.to_datetime(df['T'], utc=True).dt.tz_convert('Europe/Berlin').dt.strftime('%Y-%m-%d %H:%M:%S')
784 µs ± 4.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

推荐阅读