首页 > 解决方案 > 使用 Python 读取堆叠标题 Excel 表

问题描述

我有一个 Excel 表,其中标题(TradeDate,Value)堆叠在一起,由类型(ABS,MBS)分隔,格式示例:

ABS,
TradeDate,Value
2019-01-21,21
2019-01-22,22
MBS,
TradeDate,Value
2019-01-21,11
2019-01-22,12
2019-01-23,13

如何将其加载到 python,最好是 pandas 或另一个包中,分别加载每个标题?每种类型的标题都是唯一的/相同的,并且标题索引可以更改。使用上面的示例,我想返回两个单独的数据框或对象。

df_abs

TradeDate,Value
2019-01-21,21
2019-01-22,22

df_mbs

TradeDate,Value
2019-01-21,11
2019-01-22,12
2019-01-23,13

标签: pythonexcelpandas

解决方案


这可能有点过度设计,但找不到更简单的解决方案:

# Mask all the rows which have a date
m = df[0].str.match('([12]\d{3}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01]))')

# Create an equal indicator on each row which has a date, but stops when value changes so we can groupby
df['ind'] = np.where(~m, 0, (m != m.shift(1)).astype(int).cumsum())

# Extract seperate dataframe into a list
dfs = [d for _, d in df[df.ind.ne(0)].groupby('ind')]

# Rename columns to expected output
dfs = [df.reset_index(drop=True).rename(columns={0:'TradeDate', 1:'Value'}) for df in dfs]

输出

for d in dfs:
    print(d,'\n')

    TradeDate Value  ind
0  2019-01-21    21    2
1  2019-01-22    22    2 

    TradeDate Value  ind
0  2019-01-21    11    4
1  2019-01-22    12    4
2  2019-01-23    13    4 

可重现的例子

from io import StringIO

a = StringIO('''
ABS,
TradeDate,Value
2019-01-21,21
2019-01-22,22
MBS,
TradeDate,Value
2019-01-21,11
2019-01-22,12
2019-01-23,13
''')

df = pd.read_csv(a, header=None)

# Out
            0      1
0         ABS    NaN
1   TradeDate  Value
2  2019-01-21     21
3  2019-01-22     22
4         MBS    NaN
5   TradeDate  Value
6  2019-01-21     11
7  2019-01-22     12
8  2019-01-23     13

推荐阅读