首页 > 解决方案 > 将python日期列表解析为pandas DataFrame

问题描述

需要一些帮助/建议如何将日期转换为 Pandas DataFrame。我的 Python 列表如下所示:

['',
 '20180715:1700-20180716:1600',
 '20180716:1700-20180717:1600',
 '20180717:1700-20180718:1600',
 '20180718:1700-20180719:1600',
 '20180719:1700-20180720:1600',
 '20180721:CLOSED',
 '20180722:1700-20180723:1600',
 '20180723:1700-20180724:1600',
 '20180724:1700-20180725:1600',
 '20180725:1700-20180726:1600',
 '20180726:1700-20180727:1600',
 '20180728:CLOSED']

有没有一种简单的方法可以将其转换为具有两列(开始时间和结束时间)的 Pandas DataFrame?

标签: pandasdatetimepython-3.6

解决方案


样本:

L = ['',
 '20180715:1700-20180716:1600',
 '20180716:1700-20180717:1600',
 '20180717:1700-20180718:1600',
 '20180718:1700-20180719:1600',
 '20180719:1700-20180720:1600',
 '20180721:CLOSED',
 '20180722:1700-20180723:1600',
 '20180723:1700-20180724:1600',
 '20180724:1700-20180725:1600',
 '20180725:1700-20180726:1600',
 '20180726:1700-20180727:1600',
 '20180728:CLOSED']

我认为最好的方法是使用列表理解并按分隔符拆分并过滤掉没有拆分器的值:

df = pd.DataFrame([x.split('-') for x in L if '-' in x], columns=['start','end'])
print (df)
           start            end
0  20180715:1700  20180716:1600
1  20180716:1700  20180717:1600
2  20180717:1700  20180718:1600
3  20180718:1700  20180719:1600
4  20180719:1700  20180720:1600
5  20180722:1700  20180723:1600
6  20180723:1700  20180724:1600
7  20180724:1700  20180725:1600
8  20180725:1700  20180726:1600
9  20180726:1700  20180727:1600

Pandas 解决方案也是可能的,特别是在需要处理的情况Series下 - 这里使用split并且dropna

s = pd.Series(L)

df = s.str.split('-', expand=True).dropna(subset=[1])
df.columns = ['start','end']
print (df)
            start            end
1   20180715:1700  20180716:1600
2   20180716:1700  20180717:1600
3   20180717:1700  20180718:1600
4   20180718:1700  20180719:1600
5   20180719:1700  20180720:1600
7   20180722:1700  20180723:1600
8   20180723:1700  20180724:1600
9   20180724:1700  20180725:1600
10  20180725:1700  20180726:1600
11  20180726:1700  20180727:1600

推荐阅读